This is the 7th post of a series exploring TensorFlow. The primary source of material used is the Udacity course "Intro to TensorFlow for Deep Learning" by TensorFlow. My objective is to document some of the things I learn along the way and perhaps interest you in a similar journey. In a follow-on series I intend to cover TensorFlow Lite using a Raspberry Pi.
Recap
The last post completed my coverage of how image classification works in TensorFlow. In the very first post I complained that "much of the published information for beginners is little more than recipes where arcane Linux commands requiring hours of error prone entry are made without discussion of purpose". I then stated my objectives to develop:
- some understanding of how deep learning works
- the ability to create models from scratch
- a feel for what can go wrong and how to recognize and correct it
The first 6 posts covered the first objective on how deep learning works and the use of TensorFlow. I am still working on the last objective. In this post I will outline my current "recipe" for creating models from scratch. It should be noted that this is based on the training in the Udacity course supplemented by things I find important. I strongly recommend the course as it is free and taught by experts. The basic steps are as follows:
- Gather Images
- Import Packages
- Data Loading
- Data Augmentation
- Create the CNN
- Compile the Model
- Train the Model
- Analyze the Output
Gather Images
The following is only covered lightly in the Udacity course. Gathering the images can easily be the most time consuming part of the exercise if they aren't already in a carefully collected and massaged set. In this case a dataset will be developed from "scratch" - that is I will take the photographs myself and prepare them. When collecting images keep the following in mind:
- Larger photos (i.e. more pixels) can give higher accuracy but there is a trade-off with speed. I haven't found much improvement in the images I have examined with more than 150 x 150 pixels although that could vary depending on the subject. The photos used here were resized to 200 x 200 in Photoshop and then resized again to 150 x 150 in TensorFlow.
- Bigger datasets are better for generalized recognition - e.g. 1000 images or more for each class. However, I am interested in small datasets and what can be done to make them accurate even if they are less general. The classifications in this set vary from containing around 25 to 50 images.
- Lighting, background clutter, partial view, centering, rotation, size, etc. can all be important in the training set, especially for generalized recognition. As an example, in this dataset I have attempted to "fool" the AI by varying size of objects in the images which have the same aspect ratios. To gain more accuracy this would be avoided.
The dataset consists of images taken of various batteries that were on hand. They were taken with an older model Canon PowerShot S95 camera hand held in poor lighting but with a clean background. Camera angle was varied and the batteries rotated and turned to get varied views. This is not a good dataset for training for any number of reasons but I may build on it and in any case wanted to use my own set for educational purposes. The images used were placed in separate folders by class as follows:
AA Batteries
AAA Batteries
C Batteries
CR2 Batteries
D Batteries
9V Batteries
Note that since AA and AAA batteries have similar aspect ratios and I have not kept the scale constant we can expect the model to have trouble with them. The C and D batteries might have the same problem where the battery brands and markings are very similar. My prediction to start was that the model would not be very accurate.
Import Packages
I am using Google's Colab which runs in a virtual machine in the cloud to train models. This has the advantage of being quite fast regardless of the PC being used. TensorFlow is accessed via the Keras API using Python V3. To start, import the following packages:
from __future__ import absolute_import, division, print_function, unicode_literals import os import numpy as np import glob import shutil import matplotlib.pyplot as plt
The os package is used to read files and directory structure, numpy performs the matrix operations, and matplotlib.pyplot is used to graph output.
Next, TensorFlow and the Keras layers used to build the Convolutional Neural Network are loaded. We use the latest version of TensorFlow 2.x.
try: # Use the %tensorflow_version magic if in colab. %tensorflow_version 2.x except Exception: pass import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D from tensorflow.keras.preprocessing.image import ImageDataGenerator
Data Loading
The portion of the following which covers uploading files from your computer and extracting them is not covered in the Udacity training. First, zip all the files for uploading. The file structure used will be outlined further below. Then select the "Files" tab in Colab which will expose an upload option. Click on it and upload your dataset (in this case Battery.zip) as shown in the screenshot below.
You will get a reminder that uploaded files are deleted at the end of a session. This means they will have to be reloaded every time a new session is started. Wait for the download to finish before proceeding. The status is indicated lower left and this can take a while. Press refresh and the zip file will be visible. The dataset is then be extracted with the following code.
from zipfile import ZipFile file_name = "Battery.zip" !pwd with ZipFile(file_name, 'r') as zip: zip.extractall() print('Dataset extracted') !ls base_dir = '/content/Battery'
When run you should see that you are in the /content directory, that the dataset has been extracted, and that the following files / folders are present: Battery, Battery.zip, sample_data. Battery.zip is the file we uploaded and samples_data is a folder with Colab samples. The folder that we want is Battery and as noted above it contains 6 battery classes in folders:
- AA
- AAA
- C
- CR2
- D
- V9
Labels are created for the 6 classes in Python.
classes = ['AA', 'AAA', 'C', 'CR2', 'D','V9']
The extracted dataset has the following directory structure:
Battery
|__ AA
|__ AAA
|__ C
|__ CR2
|__ D
|__ V9
In the code that follows we create a `train` and a `val` folder each containing 6 folders (one for each type of battery). It then moves the images from the original folders to these new folders such that 80% of the images go to the training set and 20% of the images go into the validation set, these being typical values. In the end our directory will have the following structure:
Battery
|__ AA
|__ AAA
|__ C
|__ CR2
|__ D
|__ V9
|__ train
|______ AA: [1.jpg, 2.jpg, 3.jpg ....]
|______ AAA: [1.jpg, 2.jpg, 3.jpg ....]
|______ C: [1.jpg, 2.jpg, 3.jpg ....]
|______ CR2: [1.jpg, 2.jpg, 3.jpg ....]
|______ D: [1.jpg, 2.jpg, 3.jpg ....]
|______ V9: [1.jpg, 2.jpg, 3.jpg ....]
|__ val
|______ AA: [1.jpg, 2.jpg, 3.jpg ....]
|______ AAA: [1.jpg, 2.jpg, 3.jpg ....]
|______ C: [1.jpg, 2.jpg, 3.jpg ....]
|______ CR2: [1.jpg, 2.jpg, 3.jpg ....]
|______ D: [1.jpg, 2.jpg, 3.jpg ....]
|______ V9: [1.jpg, 2.jpg, 3.jpg ....]
The original folder is not deleted, but is empty. The code below also prints the total number of battery images we have for each type of battery.
for cl in classes: img_path = os.path.join(base_dir, cl) images = glob.glob(img_path + '/*.jpg') print("{}: {} Images".format(cl, len(images))) num_train = int(round(len(images)*0.8)) train, val = images[:num_train], images[num_train:] for t in train: if not os.path.exists(os.path.join(base_dir, 'train', cl)): os.makedirs(os.path.join(base_dir, 'train', cl)) shutil.move(t, os.path.join(base_dir, 'train', cl)) for v in val: if not os.path.exists(os.path.join(base_dir, 'val', cl)): os.makedirs(os.path.join(base_dir, 'val', cl)) shutil.move(v, os.path.join(base_dir, 'val', cl)) round(len(images)*0.8)
The following sets up the path for the training and validation sets.
train_dir = os.path.join(base_dir, 'train') print(train_dir) val_dir = os.path.join(base_dir, 'val') print(val_dir) trair_AA_dir = os.path.join(train_dir, 'AA') print(train_AA_dir) train_AAA_dir = os.path.join(train_dir, 'AAA') print(train_AAA_dir) train_C_dir = os.path.join(train_dir, 'C') print(train_C_dir) train_CR2_dir = os.path.join(train_dir, 'CR2') print(train_CR2_dir) train_D_dir = os.path.join(train_dir, 'D') print(train_D_dir) train_V9_dir = os.path.join(train_dir, 'V9') print(train_V9_dir)
Data Augmentation
Overfitting is more likely to occur with a small number of samples and data augmentation seeks to improve this by providing additional data, based on the originals, with random transformations. One of the transformation that can be applied is to resize the image. This is a user variable that can be tweaked to improve performance at the cost of speed. In the example code below the 200 x 200 pixel images in the input dataset are resized to 150 x 150 which seems a reasonable size for these images.
batch_size = 128 IMG_SHAPE = 150
image_gen_train = ImageDataGenerator( rescale=1./255, rotation_range=45, width_shift_range=.15, height_shift_range=.15, horizontal_flip=True, zoom_range=0.5 ) train_data_gen = image_gen_train.flow_from_directory( batch_size=batch_size, directory=train_dir, shuffle=True, target_size=(IMG_SHAPE,IMG_SHAPE), class_mode='sparse' )
Be careful applying transforms. For example it might not be appropriate to flip images and if the images will always be horizontal it might not be appropriate to rotate them. The images should always be shuffled.
The Validation set is also resized but generally data augmentation is not applied.
image_gen_val = ImageDataGenerator(rescale=1./255) val_data_gen = image_gen_val.flow_from_directory(batch_size=batch_size, directory=val_dir, target_size=(IMG_SHAPE, IMG_SHAPE), class_mode='sparse')
Create the CNN
We are finally ready to define our convolutional neural network. The architectural decisions we will be making are the number of layers, the number of filters, and the size of the filters. They are quite important and I am quite sure what I am presenting here is suboptimal. Typically the number of layers starts small and grows as the complexity realized by the convoluted layers grows. The number of filters in a layer should be set at ratios of 32, 64, 128, 256, 512 and so on according to one source. Perhaps this is to better suit the GPU or TPU. In this case I have elected to make four convolutional layers with succeeding filter sizes of 32, 64, 128, and 256. I tried some more complex models with more layers and filters - that actually made things worse.
Remember that filters have odd values since they need to be centered on the pixel being convolved. A 3x3 filter is usual although larger ones of 5x5 up to 7x7 may work better on larger images. Max pool layers typically have a pool size of (2, 2) and are applied after each convolutional layer.
After flattening I chose to set dropout to 0.05. I played with this and set it at values up to 0.5. It seemed poorly behaved (validation loss jumped all over the place) at higher values.
The model was given a single dense layer with 512 neurons with activation set to 'relu' as usual. The number of layers and neurons in the dense layers are important but I did not tweak the in this example. The number of output elements is always set to the number of classes and activation to 'softmax'.
model = Sequential() model.add(Conv2D(32, 3, padding='same', activation='relu', input_shape=(IMG_SHAPE,IMG_SHAPE, 3))) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, 3, padding='same', activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(128, 3, padding='same', activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(256, 3, padding='same', activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dropout(0.05)) model.add(Dense(512, activation='relu')) # Set output layers to number of categories model.add(Dropout(0.1)) model.add(Dense(6, activation='softmax'))
A concise model summary is obtained with the following code.
model.summary()
Note that over 11 million parameters are being trained. I have some concern about the model being overcomplicated.
Compile the Model
The ADAM optimizer is state of the art and normally used for the optimizer. The loss function should be set to sparse cross entropy. Passing the metrics argument will report accuracy during each epoch.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Train the Model
I chose to use 120 epochs to train the model which overfits but gives a good indication of how the model performs. The fit_generator needs to be used whenever the ImageDataGenerator is specified for generating batches of training and validation data.
epochs = 120 history = model.fit_generator( train_data_gen, steps_per_epoch=int(np.ceil(train_data_gen.n / float(batch_size))), epochs=epochs, validation_data=val_data_gen, validation_steps=int(np.ceil(val_data_gen.n / float(batch_size))) )
Analyze the Output
The following code plots the training and validation accuracy / loss graphs.
acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs_range = range(epochs) plt.figure(figsize=(8, 8)) plt.subplot(1, 2, 1) plt.plot(epochs_range, acc, label='Training Accuracy') plt.plot(epochs_range, val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.title('Training and Validation Accuracy') plt.subplot(1, 2, 2) plt.plot(epochs_range, loss, label='Training Loss') plt.plot(epochs_range, val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.title('Training and Validation Loss') plt.show()
The moment of truth...
The training dataset seems pretty well behaved and steadily climbs to about 85% accuracy at 100 epochs. After that things get a bit rugged in both the training accuracy and the training loss. The validation accuracy is pretty jagged (maybe due to such a small dataset?) but tracks the training set and peaks around 80% and epoch 100. It is possible to get 100% accuracy on the training set if the number of epochs is set high enough (i.e. it can memorize the dataset).
I am actually quite pleased and a bit surprised with the accuracy of such a small dataset that was purposely made difficult.
Conclusion and Look Ahead
I wish this was easier, but it is not . It is a subject where I would like to be in a formal classroom setting with an instructor to answer questions. All the above makes more sense after going through the Udacity training linked below if you haven't already done that. Currently the model runs and gives reasonable output from what I can tell but there are many aspects and behavior I don't fully understand. l plan to post the full version on Github and provide a link when I am comfortable with it. I will also be modifying or changing to a new dataset.
Coming up I plan to look at TensorFlow Lite and moving the model to run on a Raspberry Pi 4. As always, comments and corrections are appreciated.
Useful Links
RoadTest of Raspberry Pi 4 doing Facial Recognition with OpenCV
Picasso Art Deluxe OpenCV Face Detection
Udacity Intro to TensorFlow for Deep Learning
A Beginning Journey in TensorFlow #1: Regression
A Beginning Journey in TensorFlow #2: Simple Image Recognition
A Beginning Journey in TensorFlow #3: ReLU Activation
A Beginning Journey in TensorFlow #4: Convolutional Neural Networks
A Beginning Journey in TensorFlow #5: Color Images
A Beginning Journey in TensorFlow #6: Image Augmentation and Dropout
Top Comments