Lesson 2: Building and Training CNN model

Learning Goal

To become familiar with MATLAB deep learning toolbox and to construct a working CNN

Pre-Reading

Review

What are the 2 aspects of Machine Learning?

It is important to understand a general perspective of what makes up learning before diving into the topics of Artificial Intelligence, Machine Learning, and Deep Learning. Learning can be viewed to fall into a simple mathematical model:

Learning = Knowledge + Data

The general workflow follows these 5 steps:

Gathering data
Data pre-processing
Researching the best model for the project
Training and testing the model
Evaluation

What is over and under fitting?

The goal of any ML model is to generalize features of the training data in order to be able to predict a probable outcome. There is a sweet spot in the performance of any ML model where it is able to generalize patterns from the training data to predict a probable outcome while not following the training data exactly. If it follows the training data exactly then the prediction become too specific to the training data and it is not able to take into account any variation of new input data, this is known as overfitting. If it generalizes too much, then the variation the ML model takes into account actually reduces the accuracy of the model, this is known as underfitting. A visual representation is given below:

What differentiates the ML and DL?

Traditional ML:

Manually decide on a set of features to implement feature extraction,
Select a model building method (e.g. SVM) to learn a model

Deep Learning:

Use neural networks to automate the feature extraction,
Use neural networks for model building

What is the general concept of a Neural Network?

The structure and name of NN were inspired by the human brain and how it in itself learns and thinks, mimicking the way that neurons fire and communicate with one another. Artificial NN are comprised of node layers, containing an input layer, some hidden layers and an output layer. Each node in this layer can be thought of as an artificial neuron. Each node connects to another and has an associated weight and threshold to it. If the input to any given node is above this assigned threshold then it will activate and send data to the next layer. If the node is not activated, then it will not send data to the next layer.

What is a CNN?

In simplified terms, a CNN works by taking an input image, assigning importance to various aspects/objects in that image and being able to differentiate one from the other. The structure of a CNN was inspired by the network of the human Visual Cortex. Individual neurons respond to stimuli found in a restricted region of the image. A collection of these regions overlap to cover the entire image. In general, a CNN can be trained to understand the complexities of the image better than a tradition feedforward net where the data is just passed forward and then accessed with a cost function.

Watch this helpful video which will make clear everything just discussed in this section and provide a great example to how a CNN works: https://www.youtube.com/watch?v=aircAruvnKk

Database

Getting Database File

Details on Data Source: The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad,5=Surprise, 6=Neutral).

Import the data excel file from following folder link: https://docs.google.com/spreadsheets/d/172ZCF12ufH1fADBiHSMoya5wh9Tsmn8P/edit?usp=sharing&ouid=109210790893542256917&rtpof=true&sd=true

Set-up Folders

The database code is set up at access a sub-folder within current working folder of the MATLAB console and then within this database folder there are 7 different folder corresponding to the each of the emotions we want to classify from the database. It sorts the images into these emotion folders.

Running Code

Notes:

You need to change the filelist path variable to match what you have on your computer.
The database excel sheet needs to be in the current working folder of MATLAB

filelist = dir(fullfile('/Users/user/Desktop/Emotion Project/database/Angry','*.*'));

if length(filelist) <= 4
    datafile = 'icml_face_data.xlsx';
    data = readtable(datafile);

    dim = size(data);

    for j = 1:dim(1)
        im = data(j,3);
        im = table2array(im);
        im = cellfun(@str2num,im,'UniformOutput',false);
        im = cell2mat(im);
        Image = [];

        for i = 1:48
            Image = cat(1,Image,im((((i-1)*48)+1):(i*48)));
        end

        maxx = max(Image(:));
        minx = min(Image(:));
        img  = (Image - minx) / (maxx - minx);

        databaseFolder = 'database';

        emotion = data(j,1);
        emotion = table2array(emotion);

        switch(emotion)
            case 0
                fileFolder = 'Angry';
            case 1
                fileFolder = 'Disgust';
            case 2
                fileFolder = 'Fear';
            case 3
                fileFolder = 'Happy';
            case 4
                fileFolder = 'Sad';
            case 5
                fileFolder = 'Suprise';
            case 6
                fileFolder = 'Neutral';
        end



        filename = strcat(num2str(j),'.png');
        filepath = fullfile(databaseFolder,fileFolder,filename);
        imwrite(img,filepath)
    end
end

Double Check if Worked

It should take a couple minutes for the code to construct the database from the excel datasheet. Once this is done, you should be able to go into the emotion folders and see the png images located in them. If there are no images or if they are in the wrong format then the emotion folders should be cleared and code edited and run again.

In-Depth MATLAB Deep Learning Toolbox

Loading Data

We currently have a bunch of images stored in our database folders. However, we need to convert this into a format that we can train the CNN on. A useful function to do this from the deep learning toolbox is:

imds = imageDatastore(ImgDatasetPath, ...
        'IncludeSubfolders',true, ...
        'LabelSource','foldernames');

This function takes the images we have stored in the in the database folder, sorts them based on the emotion folder name, and stores them in a format we can work with to train the CNN

Splitting Data

We went over last time that when training a neural network, we have a have data with a known classification output. This is the data the network is trained on, but we also need to validate the accuracy of the network after it is trained. Therefore, we need to split up the training data even further into a training data set and a validation data set to get the accuracy of the network. There are many different ways to do this with the DL toolbox, one function is:

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

This takes the splits each label into 70% of the images for training and 30% used for validation and then randomizes the order of the labels that the network is trained on in each epoch

Define Network Architecture

Now we need to define the network architecture and the layers that go into the CNN. A simple CNN looks like this:

%define the layers of the network
    layers = [
        imageInputLayer(inputSize) %input layer

        %first neural layer
        convolution2dLayer(3,8,'Padding','same')
        batchNormalizationLayer
        reluLayer

        %pooling layer for first neural layer
        maxPooling2dLayer(2,'Stride',2)

         %output layers
        fullyConnectedLayer(numClasses) %multiplies the input by weight matrix and adds bias vector.
        softmaxLayer %applies softmax function to input.
        classificationLayer]; % classifies based on input(computes the cross-entropy loss for classification)

The input layers takes in the image and passes it onto the hidden layers . The convolution layer applies sliding convolutional filters to 2-D input. The batch normalization layer normalizes a mini batch of data across all observations for each channel independently. Then the relu Layer is our activation function. This output is then passed to a maxPooling2dLayer which performs down sampling by dividing the input into rectangular pooling regions, then computing the maximum of each region and reduces the computational power needed to train the network. Then we go to the output layers where we have the fully connected layer which multiplies the input by a weight matrix and then adds a bias vector, the softmax layer which essentially acts as the activation function of the output layer and then to the classification layer which actually classifies the input. This is a very simple one and you define the number of neurons in the convolutional layer. Here we only have a network of 8 neurons, but you should stack several of these hidden layer frameworks on top of one another with more neurons to get better results.

Train Network

We have our network architecture and now we need to train it. The first thing we need to do is set our parameters for training the network. Here is an example:

options = trainingOptions('sgdm', ...
        'InitialLearnRate',0.01, ...
        'MaxEpochs',3, ...
        'Shuffle','every-epoch', ...
        'ValidationData',imdsValidation, ...
        'ValidationFrequency',30, ...
        'Verbose',false, ...
        'Plots','training-progress');

Here we set of options for training a network using stochastic gradient descent with momentum. Reduce the learning rate by a factor of 0.01 every epoch. Set the maximum number of epochs for training to 3. Shuffle different classification images into each epoch for diverse training. Set validation frequency to every 30 images. Set Verbose Indicator to false to not display training progress information in the command window. Show plots of training progress.

Then we simply use the following function to begin training the network:

net = trainNetwork(imdsTrain,layers,options);

with the training data, our network layers, and our training parameters as inouts

Test Network

Finally, we need to test for the accuracy of our network. In MATLAB, this is easy to do:

YPred = classify(net,imdsValidation); %classify with network
YValidation = imdsValidation.Labels; %get true label values

%use above elements to determine accuracy
accuracy = sum(YPred == YValidation)/numel(YValidation); %get accuracy

We simply classify the validation data and them compare it to the true results.

Saving and Loading Network

It is important to save the network once trained and also to load an existing trained network using the following functions:

save net load net

net is the name of the network trained and it saves and loads from the current working folder. I would recommend to check is a network is already trained and saved before implementing the training of the network with a try catch function in MATLAB, shown below:

try
    %code
catch exception
    %code
end

Ways to improve on Model Performance

Some Issues with Neural Network

Sometimes neural networks fail to converge due to low dimensionality.
Even a small change in weights can lead to significant change in output. sometimes results may be worse.
The gradient may become zero . In this case , weight optimization fails.
Data overfitting.
Time complexity is too high. Sometimes algorithm runs for days even on small data set.
We get the same output for every input when we predict.

How to improve performance

Increase hidden Layers
In theory, it has been established that many of the functions will converge in a higher level of abstraction. So, it seems more layers better results
Change Activation function
Change Activation function in Output layer
Increase number of neurons
If an inadequate number of neurons are used, the network will be unable to model complex data, and the resulting fit will be poor. If too many neurons are used, the training time may become excessively long, and, worse, the network may overfit the data.
Weight initialization
While training neural networks, first-time weights are assigned randomly. Although weight updation does take place, but sometimes neural network can converge in local minima. When we use multilayered architecture, random weights does not perform well. We can supply optimal initial weights.
This can be done in the form of pretrained networks which are available as add ons and can be imported and retrained on a similar problem
More data
Normalizing/Scaling data
Change learning algorithm parameters