Finetuning a Pretrained Model

Here we are going to look at image classification. We are going to train an image classifier to look at several different labeled classes of photos and determine which class to label a photo. We will be using a very popular dataset called Cifar10. The classes represented in Cifar10 are: plane, car, bird, cat, deer, dog, frog, horse, ship, and truck. The work flow here is as follows:

1. Import and prepare the images
2. Set up a pretrained neural network
3. Finetune the neural network to work with the input data
4. Compile, fit, and evaluate model
Before we used ResNet50 as our image classifier. Here we will use Vgg16 instead. ResNet50 is a more recent image classifier, however, Vgg16 is a good first model to learn due to it's simplicity.

First, we will want to make our imports. We want to import the cifar10 dataset along with the VGG16 architecture. We will be working with the functional Keras API and be using Adam as our optimizer.

In [15]:

from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.layers import Input, Flatten, Dense
from keras.models import Model
from keras.optimizers import Adam
import numpy as np
from keras import utils
from keras.utils import np_utils

We need to specify how many classes we need to we will do that. We are also able to break the cifar10 dataset into our training and test sets. It is important in machine learning to break your datasets into training and testing sets. Later, we will learn that it is important to have a validation set as well, but more on that later. The training and test sets will need to be turned into categorical variables and keras has a nice function called to_categorical that does that for you. An image data generator will then be created which will turn the images into data that is formatted for the Tensorflow backend to read in.

Next, the VGG model will be created with the imagenet weights preloaded. The weights that were created by the imagenet winners were run on the entire imagenet dataset so they are a much better starting point than random for creating an image classifier. The include top = False portion means that the last portion of the VGG16 classifier which are three dense layers (also known as linear layers), are left out. We are doing this because in the next line we will set the layers = trainable to false, which will freeze the current model which mostly consists of convolutional layers currently. Keras requires that the input shape is always explicitly stated so the input shape is stated. Each picture is 32x32 pixels and there are 3 color channels- red, green and blue.

In [16]:

num_classes = 10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

datagen = ImageDataGenerator()

#Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
for layer in model_vgg16_conv.layers: layer.trainable=False

#Create your own input format (here 3x200x200)
input = Input(shape=(32,32, 3),name = 'image_input')

#Use the generated model 
output_vgg16_conv = model_vgg16_conv(input)

Now we are going to add the fully connected layers onto the model. The original VGG16 model has three linear layers with 4096 neurons in each of the first two and 10 in the final linear layer to match the number of categories we are trying to predict.

In [17]:

#Add the fully-connected layers 
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(10, activation='softmax', name='predictions')(x)

Now, we will create the model with the output linear layers added.

In [18]:

#Create your own model 
my_model = Model(input=input, output=x)

/home/kevin/anaconda2/envs/Deeplearning/lib/python3.6/site-packages/ipykernel/__main__.py:2: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=Tensor("im..., outputs=Tensor("pr...)`
  from ipykernel import kernelapp as app

Finally, we will compile the model and run it. The number of total passes a through a dataset a model makes before reaching completion is called a epoch. It is common for the machine learning model to look at every sample of a dataset during an epoch. This can change though as in some machine learning models sets are left out for validating the results, a topic we will cover next. So you may be asking yourself: Why don't you just choose a really high number of epochs and make a lot of passes over the data. You will learn that GPU time is expensive, especially on large datasets and there is a point in which the model will not improve any further.

Next, we will choose our optimizer and set a learning rate. An optimizer is the function that a model follows to improve its accuracy. In this model since we set the convolutional weights to not be trainable, the linear layers are the only layers that are trainable. Think of a linear layer as y = mx + b. The slope (m) will be the value that is optimized to differentiate the 10 different classes. It is optimized by stochastic gradient descent. Gradient descent takes the derivative of the loss function which we specify as categorical crossentropy. The derivate of the loss function lets us know which direction to adjust the slope in order to minimize the categorical cross entropy which better separates the 10 categories.
This can be a bit confusing at first so lets try to simplify it:

We want to correctly put as many of the images in the proper categories. We measure this by calculating the categorical cross entropy of every mini batch run through the model (specified in batch_size)
The model then takes the derivatives of all the linear layers with respect to the loss function.
The model looks at the values for the losses of the linear layers and uses ADAM to change the slope (m) of the linear layer. This creates values for the linear layers that better differentiate the categories.

The model is then compiled and fit. The score lets us look at the metrics for the model after training is complete.

In [19]:

#In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
#my_model.summary() <<run to see a summary of your model

epochs = 3

Adam = Adam(lr=.0001)
my_model.compile(optimizer=Adam, loss ='categorical_crossentropy', metrics=['accuracy'])

my_model.fit_generator(datagen.flow(x_train, y_train, batch_size=256), 
                    steps_per_epoch=len(x_train), epochs=epochs)
score = my_model.evaluate(x_test, y_test, batch_size=256)
my_model.metrics_names , score

Epoch 1/3
50000/50000 [==============================] - 1962s - loss: 1.0876 - acc: 0.9269  
Epoch 2/3
50000/50000 [==============================] - 1949s - loss: 0.0069 - acc: 0.9985  
Epoch 3/3
50000/50000 [==============================] - 1948s - loss: 3.2399e-04 - acc: 1.0000  
10000/10000 [==============================] - 1s

Out[19]:

(['loss', 'acc'], [3.7053978271484374, 0.64890000000000003])

64% accuracy. Ouch... We see that our model is not nearly as good on our test set as our training set. Why is that? Overfitting. We will talk about that next.

Reducing Overfitting