This is the third of a series exploring TensorFlow. The primary source of material used is the Udacity course "Intro to TensorFlow for Deep Learning" by TensorFlow. My objective is to document the things I learn along the way and perhaps interest you in a similar journey. I have had very little time to work on this project recently but hope to be more productive in the coming weeks. If I get a project working in time I will make an entry in the Project14 Vision Thing Competition.
Recap
In the first post it was explained that TensorFlow can do two types of deep learning: Regression and Classification. The post focused on regression, and in particular on linear regression. We learned that a neural network is composed of layers of connected nodes. The inputs to a node are multiplied by weights and a bias added. An optimizing approach called "Adam" was used to guide the gradient descent (or as I think of it iteration to try and minimize error) and select the weights.
Figure 1 Neural Network model from Udacity Intro to TensorFlow for Deep Learning
In the second post the model was extended to recognize and classify images. In regression, the model gives a single output. In classification the model assesses the probability that an input, such as an image, is a member of all the classes in the model. The model used determined the probability of an input being one of ten different articles of clothing. Several new concepts were introduced including flattening of layers and activation functions.
In this post I will back up a bit and cover the activation function. This brief discussion is being added because it was barely covered in the Udacity training and is based on this article by Jason Brownlee: A Gentle Introduction to the Rectified Linear Unit (ReLU). In general I have found all the posts and material by Jason to be good.
http://A Gentle Introduction to the Rectified Linear Unit (ReLU)/
Activation Functions and ReLU
The activation function transforms the summed and weighted input to a node as shown in Figure 1 above into the output for the node. There are a number of different activation functions available. In the past the hyperbolic tangent and sigmoid were popular. Remember for example that the hyperbolic tangent looks like this:
Figure 2 Hyperbolic Tangent
It introduces nonlinearity and has a readily calculated derivative but it has been shown to be inferior in many instances to simple Rectified Linear Activation, or ReLU. The simplicity of ReLU allows it to work faster and it is less likely to incur the vanishing gradient problem. The vanishing gradient occurs when there are many layers and the loss function approaches zero - i.e. the derivative approaches zero.
But why use an activation function at all? It was seen in the first post that the basic model finds weights and bias for linear equations. Many problems are nonlinear and among other things ReLU introduces as way for the model to address nonlinearity. In simplest form, ReLU returns 0 if the weighted sums are negative or zero, and the weighted sums if greater than zero. In graphical form it looks like this:
Figure 3 ReLU Output
Or, if you are more into pseudo code:
if input > 0: return input else: return 0
So, the derivative of the function for positive values is 1. It is assumed zero when the input is zero.
ReLU introduces nonlinearity but it might seem there is loss of information due to the simple nature. Apparently selecting the correct number of layers and nodes allow the model to overcome that. As mentioned above, ReLU is now the default activation function for neural networks. It does have limitations but is said to almost always gives improved results over other methods.
Applying ReLU to Regression
In the training material ReLU is introduced as a method for classification of images, particularly where there are many layers. As an exercise I thought it might be good to also look back at regression and the Life Expectancy as a function of Age regression done in the first blog post. Remember that machine learning is not a particularly good application for this problem but we are using it as a learning exercise. The resultant straight line regression looked like this.
Figure 4 Linear Regression Model of Life Expectancy Data
Applying ReLU can be done to the original linear regression model by modifying the layer description as follows:
The resulting plot with ReLU activation looks like this:
Figure 5 Regression Model of Life Expectancy Data using ReLU Activation
The non-linear impact is apparent, the fit is better, and the life expectancy is no longer as negative for higher ages. However, the resulting curve is not intuitive to me and it is clear that I need more experience (or experimentation) in the use of Activation Functions, especially with complicated models.
Applying ReLU to Image Categorization
In the second post ReLU was used as the Activation Function and the training dataset took 2 minutes to train and had 89% accuracy. The test dataset took 3 seconds to run and had 87% accuracy. When ReLU was removed, the training still took about 2 minutes to train but accuracy was reduced to 85%. The test dataset accuracy was reduced to 83%. ReLU clearly improved the model with no adverse impact on speed.
Conclusion
A simplified discussion of ReLU was followed by example applications. The Regression example was contrived and not representative of an application but the image Categorization example used a real example and demonstrated the improvements that can result - in this case the test accuracy with ReLU is improved by 4% from 83% to 87%. In general, ReLU is recommended for categorization using neural networks.
Please check out the free Udacity training if you are interested in learning from the experts. As always, comments and corrections are appreciated.
Useful Links
A Beginning Journey in TensorFlow #1: Regression
A Beginning Journey in TensorFlow #2: Simple Image Recognition
A Beginning Journey in TensorFlow #4: Convolutional Neural Networks
A Beginning Journey in TensorFlow #5: Color Images
A Beginning Journey in TensorFlow #6: Image Augmentation and Dropout
RoadTest of Raspberry Pi 4 doing Facial Detection
Top Comments