This is the first of a series exploring TensorFlow. Since I have no training in machine learning it will not consist of tutorials but will have links to the material that I am using. I plan to use my Windows PC to train models and ultimately a Raspberry Pi to do object classification. The main purpose will be to document the things I learn along the way and perhaps interest you in a similar journey.
Background
I have been exploring facial recognition using OpenCV both in the Picasso Design Challenge and in a recent RoadTest of the Raspberry Pi 4 and have become interested in object classification in general. This is an advanced topic and much of the published information for beginners is little more than recipes where arcane Linux commands requiring hours of error prone entry are made without discussion of purpose. This is followed by Python scripts provided with little discussion of purpose. Some of the examples don't even show how to train the model - the user loads it up and it magically identifies coffee cups and such.
My objective is not to become an expert in the field but rather to develop the following:
- some understanding of how deep learning works
- the ability to create models from scratch
- a feel for what can go wrong and how to recognize and correct it
The starting place will be the free course Intro to TensorFlow for Deep Leaning on Udacity put together by Google.
REMINDER: I am not an expert and almost certainly will get some terminology wrong and / or make other errors. I will try however to post links to well regarded material so use that for learning rather than my blog. Corrections in the comments are always welcome.
Regression Analysis
TensorFlow can do two types of deep learning: Regression and Classification. Ultimately I want to do classification - e.g. given a set of photos or a live stream, classify the objects inside. An example might be if there are a bunch of through-hole resistors in a photo, identify the values. Regression on the other hand returns a single value given one or more inputs. An example model might estimate the value of a home given location, square area, number of bedrooms, number of bathrooms, etc.
Regardless of the whether it is regression or classification we use TensorFlow as follows:
- Gather pairs of input values and output values
- Develop a model
- Train the model so that the data is fitted
- Apply the resulting algorithm in an application
Let's take a look at how the neural networks in TensorFlow work to perform regression using a diagram from the Udacity training material. There can be one or more inputs associated with a single output for each entry into the examples used to train the model. In between there are one or more hidden layers. In a Dense and Fully Connected model each node in a layer is connected to the nodes in the previous layer.
The values of a node are calculated as shown. For example, node a1 is calculated to be a1 = x1*w11 + x2*w12 + x3*w13 + b1. And so on for all other nodes. A loss function is defined for the output, commonly the mean squared error for regression. TensorFlow then tunes the weights (w) and the bias (b) for each node to minimize the mean squared error. From what I can see it uses a hunt and peck algorithm. The "optimizer" specifies for example that gradient descent (take the derivative and see which way slope is going so that the algorithm can follow it down) and specify the amount it pecks away from the current value. The best practice method currently according to the training material is the "Adam" optimizer. I am not a math nerd so forgive me :-).
The concept isn't that difficult but the math is pretty intimidating for a large model. Fortunately we don't have to get too deep into that. Google provides an API to TensorFlow call Keras which simplifies things and we will be using that along with some Python to solve problems. Given the relative simplicity of the concept and the scant code required to implement a model I find it pretty amazing that neural networks work as well they do.
The Simplest Regression Problem
If we reduce the diagram above down to the simplest possible network we get the following diagram taken from the Udacity training material:
Their example is a model to convert Celsius to Fahrenheit. This is of course trivial and exact so it is not the type of problem where machine learning would be used but it is instructive. The training set consists of Celsius values with their corresponding Fahrenheit values. There is one input layer, one hidden layer, and one output layer. Looking at the resulting weight and bias for the node in the hidden layer we get a1 = x1*w11 + b1. After training the model this is also the algorithm for predicting outputs. It is of course the equation for a straight line - where I went to school usually given as y = mx + b where m is the slope and b is the intercept of a line. And of course the equation for conversion between Celsius and Fahrenheit is:
Fahrenheit = 1.8 * Celsius + 32
The Udacity course gives a detailed explanation of how the model is developed so I will not repeat. In addition to video there is a working Jupyter notebook called Colab that provides tensorflow, numpy, matplotlib, etc. which you can play with. It does require some knowledge of Python, another area where I am not an expert.
Interestingly, the results from Colab with 500 epochs (epochs are passes through the dataset) is the following:
Fahrenheit = 1.8229 * Celsius + 29.0338
At first glance that might not look too good but it is best to remember that the model has no idea what the values are when it starts. I expected it to be closer after 500 epochs but the loss function is flattening and I did not achieve further improvement even after increasing the number of epochs to 2000. In the training, they increase the number of layers and nodes which does provide some improvement but that raises an issue for these type networks that should be remembered. That is, for a complicated problem we have no idea what the underlying function is, there is often noise in the data, and we have no idea what the optimal number of nodes and layers is. It is a bit of an art to find that.
Life Expectancy as a function of Age
In order to test my knowledge I developed my own Python model from scratch but based on the Udacity training material. It uses data on life expectancy for the year 2007 from the US National Center for Health Statistics.
from __future__ import absolute_import, division, print_function, unicode_literals import tensorflow as tf import numpy as np import logging logger = tf.get_logger() logger.setLevel(logging.ERROR) """The training data is life expectancy for all sexes, ages, and races in the United States. Source: US National Center for Health Statistics, national Vital Statistics Report 2007""" age = np.array([ 1, 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100], dtype = float) lifeExpectancy = np.array([ 77.9, 77.5, 73.6, 68.6, 63.7, 58.8, 54.1, 49.4, 44.6, 39.9, 35.4, 30.9, 26.7, 22.5, 18.6, 15.0, 11.7, 8.8, 6.5, 4.6, 3.2, 2.3], dtype = float) """Build the layers""" layer0 = tf.keras.layers.Dense(units = 1, input_shape=[1]) """Assemble the layers into the model""" lifeExpectancyModel = tf.keras.Sequential([layer0]) """Compile the model""" lifeExpectancyModel.compile(loss='mean_squared_error', optimizer=tf.keras.optimizers.Adam(0.1)) """Train the model""" history = lifeExpectancyModel.fit(age, lifeExpectancy, epochs=2000, verbose=False) print("Finished training the model") """Display the training statistics""" import matplotlib.pyplot as plt plt.xlabel('Epoch Number') plt.ylabel("Loss Magnitude") plt.plot(history.history['loss']) """The weights are the slope and intercept of a straight line""" print("The weights are: {}".format(layer0.get_weights())) """Create a list of predicted values from the model and compare it to actual data""" model = age model = lifeExpectancyModel.predict(age) plt.xlabel("Age") plt.ylabel("Life Expectancy") plt.plot(age, lifeExpectancy, 'o', age, model) plt.show
A couple of points about this exercise:
- The data is not linear but I will use a linear fit. It is also possible to fit a polynomial or other curve to the data.
- In real life with such a simple set of data we would probably use a table and interpolate.
I ran 2000 epochs, but the mean squared error had settled by 1500.
And here is the resulting straight line regression in yellow plotted along with the blue data.
So, again this is not a particularly useful example other than for understanding how to create a regression model. One thing to be aware of is that models like this can get stuck on local minima and thus not minimizer error. I need to look into that some more.
Conclusion
Next up, a look at classification using a training set with photos. Please check out the free Udacity training if you are interested in learning from the experts and more detail. As always, comments and corrections are welcome.
Useful Links
RoadTest of Raspberry Pi 4 doing Facial Recognition with OpenCV
Picasso Art Deluxe OpenCV Face Detection
Udacity Intro to TensorFlow for Deep Learning
A Beginning Journey in TensorFlow #1: Regression
A Beginning Journey in TensorFlow #2: Simple Image Recognition
A Beginning Journey in TensorFlow #3: ReLU Activation
A Beginning Journey in TensorFlow #4: Convolutional Neural Networks
A Beginning Journey in TensorFlow #5: Color Images
A Beginning Journey in TensorFlow #6: Image Augmentation and Dropout
Top Comments