Image Classifiers with Tensorflow 2.0
This project explores Tensorflow 2.0 using Keras to build image classifiers in two different ways.
Sample images from: CIFAR-10 dataset
Recently when Tensorflow 2.0 was released, a number of changes occurred including the integration of Keras as the official highlevel API for Tensorflow. Keras allows models to be defined in three ways: sequentially, functionally, and through subclassing. This project will build the three image classifiers using these two methods: sequential, and functional. Thanks to pyimagesearch for the guide on how to do this.
The first model will be a simple shallow CNN. The second model will be a simplified version of the GoogleNet (a much more complex model that justifies the use of the functional model for defining models):
They will be trained and tested against the CIFAR 10 database: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
I will also explore how to export data to Tensorboard using Tensorflow 2.0 which has moved away from using sessions.
Lastly, I will use an AWS EC2 instance. Note about this is that AWS performed the first fitting reasonably faster than my CPU (2-3x), but it performed the second fitting on the MiniGoogleNet much faster (a couple minutes per epoch vs a couple hours per epoch).
Note 2: I originally tried using tfds to load this data and use tensorflow dataset objects to do this project, however tensorflow datasets don’t play nicely with keras preprocessing modules. Datasets are lazy loaded so all preprocessing on datasets must be done through mapping functions and not on the data directly. These mapping functions must act on tensors natively. However Keras preprocessing modules are unable to be used as mapping functions because they only work on numpy arrays.
Import libraries and data
import tensorflow as tf
import tensorflow.keras as tk
%load_ext tensorboard
#import tensorflow_datasets as tfds
from tensorflow.keras.datasets import cifar10
import numpy as np
import datetime
from sklearn.metrics import classification_report
import tensorflow.keras.preprocessing.image as image
from tensorflow.keras.optimizers import SGD
from sklearn.preprocessing import LabelBinarizer
#data, info = tfds.load(name='cifar10', with_info=True)
(trainX, trainY), (testX, testY) = cifar10.load_data()
Prep data
lb = LabelBinarizer()
trainX, testX = trainX.astype('float32')/255.0, testX.astype('float')/255.0
trainY, testY = lb.fit_transform(trainY), lb.fit_transform(testY)
def shallownet_sequential(height, width, depth, classes):
model = tk.models.Sequential()
input_shape = (height, width, depth)
# add layers
model.add(tk.layers.Conv2D(32, (3,3), padding='same', input_shape=input_shape))
return model
Training the Shallownet
aug = image.ImageDataGenerator(
# hyperparameters
init_lr = 1e-2
batch_size = 128
num_epochs = 30
# create model, create optimizer, compile model
model = shallownet_sequential(32, 32, 3, testY.shape[1])
opt = SGD(lr=init_lr, momentum=0.9, decay=init_lr/num_epochs)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
# create tensorboard callback
log_dir = 'logs/fit/' +'%Y%m%d-%H%M%S')
tb_callback = tk.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# train model
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=batch_size),
validation_data = (testX, testY),
steps_per_epoch = int(len(trainX)/batch_size),
label_names = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
predictions = model.predict(testX, batch_size=batch_size)
print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=label_names))
precision recall f1-score support
airplane 0.58 0.68 0.62 1000
automobile 0.54 0.87 0.67 1000
bird 0.58 0.27 0.37 1000
cat 0.53 0.24 0.33 1000
deer 0.60 0.40 0.48 1000
dog 0.49 0.56 0.52 1000
frog 0.55 0.82 0.66 1000
horse 0.56 0.71 0.63 1000
ship 0.74 0.60 0.66 1000
truck 0.64 0.60 0.62 1000
accuracy 0.57 10000
macro avg 0.58 0.57 0.55 10000
weighted avg 0.58 0.57 0.55 10000
MiniGoogleNet using Keras functional
# convolution module consists of a convolution layer and batch normalization and a relu activation
def conv_module(x, K, kX, kY, stride, chan_dim, padding='same'):
x = tk.layers.Conv2D(K, (kX, kY), strides=stride, padding=padding)(x)
x = tk.layers.BatchNormalization(axis=chan_dim)(x)
x = tk.layers.Activation('relu')(x)
return x
# each inception module contains a 1x1 convolution, a 3x3 convolution, and then concatenates those layers
def inception_module(x, numK1x1, numK3x3, chan_dim):
conv_1x1 = conv_module(x, numK1x1, 1, 1, (1, 1), chan_dim)
conv_3x3 = conv_module(x, numK3x3, 3, 3, (1, 1), chan_dim)
x = tk.layers.concatenate([conv_1x1, conv_3x3], axis=chan_dim)
return x
# each downsampling module contains a convolution and a maxpooling then concatenates them
def downsample_module(x, K, chan_dim):
conv_3x3 = conv_module(x, K, 3, 3, (2,2), chan_dim, padding='valid')
pool = tk.layers.MaxPooling2D((3,3), strides=(2,2))(x)
x = tk.layers.concatenate([conv_3x3, pool], axis=chan_dim)
return x
def minigooglenet_functional(height, width, depth, classes):
input_shape = (height, width, depth)
chan_dim = -1
#input and first convolution module
inputs = tk.layers.Input(shape=input_shape)
x = conv_module(inputs, 96, 3, 3, (1,1), chan_dim)
#two inception modules before a downsampling
x = inception_module(x, 32, 32, chan_dim)
x = inception_module(x, 32, 48, chan_dim)
x = downsample_module(x, 80, chan_dim)
#four inception modules before a downsampling
x = inception_module(x, 112, 48, chan_dim)
x = inception_module(x, 96, 64, chan_dim)
x = inception_module(x, 80, 80, chan_dim)
x = inception_module(x, 48, 96, chan_dim)
x = downsample_module(x, 96, chan_dim)
#two more inception modules and then an averaging pool and dropout
x = inception_module(x, 176, 160, chan_dim)
x = inception_module(x, 176, 160, chan_dim)
x = tk.layers.AveragePooling2D((7,7))(x)
x = tk.layers.Dropout(0.5)(x)
#lastly flatten and apply softmax
x = tk.layers.Flatten()(x)
x = tk.layers.Dense(classes)(x)
x = tk.layers.Activation('softmax')(x)
#create model to return
model = tk.models.Model(inputs, x, name='')
return model
Training MiniGoogleNet
# hyperparameters
init_lr = 1e-2
batch_size = 128
num_epochs = 30
# create model, create optimizer, compile model
model_mgn = minigooglenet_functional(32, 32, 3, testY.shape[1])
opt = SGD(lr=init_lr, momentum=0.9, decay=init_lr/num_epochs)
model_mgn.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
# create tensorboard callback
log_dir = 'logs/fit/' +'%Y%m%d-%H%M%S')
tb_callback = tk.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# train model
H_mgn = model_mgn.fit_generator(
aug.flow(trainX, trainY, batch_size=batch_size),
validation_data = (testX, testY),
steps_per_epoch = int(len(trainX)/batch_size),
label_names = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
predictions = model_mgn.predict(testX, batch_size=batch_size)
print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=label_names))
precision recall f1-score support
airplane 0.80 0.89 0.84 1000
automobile 0.86 0.97 0.91 1000
bird 0.63 0.86 0.72 1000
cat 0.80 0.62 0.69 1000
deer 0.79 0.79 0.79 1000
dog 0.94 0.53 0.68 1000
frog 0.71 0.96 0.81 1000
horse 0.96 0.71 0.82 1000
ship 0.93 0.91 0.92 1000
truck 0.91 0.89 0.90 1000
accuracy 0.81 10000
macro avg 0.83 0.81 0.81 10000
weighted avg 0.83 0.81 0.81 10000
Checking out the Tensorboard
Below we can see the epoch accuracy and the epoch loss for train and testing data for both of the models.
In red we see the training performance for the minigooglenet, and in light blue the validation performance.
In orange we see the training performance for the shallownet and in normal blue the validation performance. Note that the model performs better on the validation data than on the training data, so it is likely that this model is underfit and we could increase the complexity. Of course the minigooglenet is in fact more complex and does in fact perform better.
