element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Personal Blogs
  • Community Hub
  • More
Personal Blogs
Frank Milburn's Blog A Beginning Journey in TensorFlow #5: Color Images
  • Blog
  • Documents
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: fmilburn
  • Date Created: 19 Oct 2019 1:36 AM Date Created
  • Views 4135 views
  • Likes 11 likes
  • Comments 6 comments
  • tensorflow
  • image classification
  • colab
  • vision thing
  • keras
  • image recognition
  • raspberry pi
  • machine learning
  • deep learning
  • regression
Related
Recommended

A Beginning Journey in TensorFlow #5: Color Images

fmilburn
fmilburn
19 Oct 2019

This is the 5th of a series exploring TensorFlow.  The primary source of material used is the Udacity course "Intro to TensorFlow for Deep Learning" by TensorFlow.  My objective is to document some of the things I learn along the way and perhaps interest you in a similar journey.

 

Recap

 

In previous posts regression, grey scale image categorization, and convolutional neural networks were discussed.  The images used were low resolution and of a uniform size represented with a two dimensional matrix.

image

In this post the convolutional neural network concept will be expanded to cover images of different sizes and color images.

 

Images of Different Size

 

Neural networks require images of the same size.  Previously the images were all low resolution grey scale of the same size with 28 x 28 = 784 pixels which were then flattened.  When drawing images from different sources it is not unusual for them to be of different size and resizing is necessary.  In the example below taken from the Udacity training a large image is resized down to 150 x 150 pixels.

image

In a previous post on facial recognition I explored the impact of image size on classification accuracy.  In general, larger images should be better but this must be traded off against speed.  Note that the image above has been "squished" and shortened in the horizontal dimension.  While this can be done it would be better to crop to a square image in this case and then resize.

 

Color Image Representation

 

Color RGB images have 3 channels:  Red, Green, and Blue.  They can be represented in a 3 dimensional matrix as shown in the much simplified slide below taken from the Udacity course.

image

 

Convolution

 

For a color image a 3D filter (aka kernel) can be defined in a similar manner to what was done for grey scale but with 2D filters for each channel.

image

The convolution is then applied to pixels channel by channel.  The sum of each filter is added for the pixel and a bias, typically 1, is added to make the convolved output.  The edges can be padded as shown in the example below.

image

It is common to use more than one filter on each channel.  In this case the convoluted output is multi-dimensional with the depth equal to the number of filters.  In the example below the depth is 3.

image

While training the model, the filters are updated as the training progresses to minimize the loss function. 

 

Max pooling is also used and works in a similar way to grey scale.  In the example below a 2x2 window is moved across with a stride of 2 which downsizes the 4x4x3 matrix to a 2x2x3 matrix. 

image

 

Validation

 

In the previous models a test dataset was used to determine how well the model worked after training is complete.  A validation dataset checks how well a model is doing as it completes each epoch.  Note that the model does not use the validation dataset to modify the weights and biases.

image

Validation allows us to develop a model that is accurate and general without overfitting.  Since the model architecture and variables can be modified to better fit the validation dataset it is still necessary to have a dataset since the modification can develop a bias.

 

Developing a Model

 

The Colab tutorial describes how to build data input pipelines for the images but that will not be discussed here.  The model classifies images of dogs and cats and the dataset is broken down as follows:

  • Training: 1000 cat images
  • Training: 1000 dog images
  • Validation: 500 cat images
  • Validation: 500 dog images

 

The models are getting more complicated in the training now and lengthier but conceptually are similar.  To keep my posts a reasonable length I will bypass much of the code and direct you to the Udacity course instead.  The first 5 images in the training dataset are shown below:

image

Note the squished images, partially covered images, busy backgrounds and foregrounds, kitten, and unusual stance of the dog.  For general classification a large and varied dataset is required.

 

The model is shown below and should look familiar if you have been following this series:

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(2, activation='softmax')
])

There are four convolutional blocks with a max pool layer.  Note that the convolutional blocks start small with 32 units, followed by 64, 128, and 128.  It is typical to have an increasing number of units with each layer as the detail being captured builds.  Notice also that they are factors of 2 in size which is also normal.  After flattening there is a dense layer with 512 neurons followed by the classification layer using SoftMax.

 

Compiling is done with the 'adam' optimizer as usual and the loss function is 'sparse_categorical_crossentropy'.  Below is a model summary.  Note that there are almost 3.5 million trainable parameters!

image

To demonstrate overtraining, 100 epochs are used.  The resulting accuracy and loss obtained with each epoch is shown below for both the training and validation sets.

image

The training dataset reaches an accuracy of 1.0 (complete memorization) around epoch 20.  However, the validation dataset reaches a maximum of 75% and does not improve further.  In fact, the model was overtrained well before epoch 20.

 

Conclusions and Looking Ahead

 

In this post machine learning was extended to handle color and images of different sizes.  The concept of validation datasets to assess how well the model is progressing during training was introduced and overtraining was discussed.

 

It is obvious upon reflection that it is relatively easy to train a model to memorize but less so to make it general.  The ability to memorize without generality is less an issue though if the application will be for images that are very similar to the training set (at least that is my hypothesis).  For example if the items being categorized are always lit in a similar way to the training dataset, oriented in the same direction, the same size, have a clean background, etc. then accuracy should be nearer 100%.  The project I have in mind will take advantage of that.

 

In the next post the use of image augmentation and dropout to improve generalization will be explored.  Thanks for reading (those few of you who are still with me - it is rather dry material :-) and as always comments and corrections are welcome.

 

Useful Links

RoadTest of Raspberry Pi 4 doing Facial Recognition with OpenCV

Picasso Art Deluxe OpenCV Face Detection

Udacity Intro to TensorFlow for Deep Learning

A Beginning Journey in TensorFlow #1: Regression

A Beginning Journey in TensorFlow #2: Simple Image Recognition

A Beginning Journey in TensorFlow #3: ReLU Activation

A Beginning Journey in TensorFlow #4: Convolutional Neural Networks

A Beginning Journey in TensorFlow #6: Image Augmentation and Dropout

  • Sign in to reply

Top Comments

  • 14rhb
    14rhb over 5 years ago in reply to fmilburn +4
    Just the right level for me - keep up the great work.
  • fmilburn
    fmilburn over 5 years ago in reply to 14rhb +3
    Thanks! I worry that there is too much detail to attract most viewers and not enough detail for those that want to learn the details. Adding to that, I am still learning myself. The Udacity course is quite…
  • 14rhb
    14rhb over 5 years ago +2
    Hi Frank, I'm really enjoying this series you are putting together, all the photos and diagrams are exactly what my A I requires for this AI. Well done with all the progress you have made and I can see…
  • adnanoncevarlik
    adnanoncevarlik over 3 years ago

    Thanks for sharing and detailed explanation. :) 

    TRANSLATE with x
    English
    ArabicHebrewPolish BulgarianHindiPortuguese CatalanHmong DawRomanian Chinese SimplifiedHungarianRussian Chinese TraditionalIndonesianSlovak CzechItalianSlovenian DanishJapaneseSpanish DutchKlingonSwedish EnglishKoreanThai EstonianLatvianTurkish FinnishLithuanianUkrainian FrenchMalayUrdu GermanMalteseVietnamese GreekNorwegianWelsh Haitian CreolePersian
    TRANSLATE with
    COPY THE URL BELOW
    Back
    EMBED THE SNIPPET BELOW IN YOUR SITE
    Enable collaborative features and customize widget: Bing Webmaster Portal
    Back
    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • Sean_Miller
    Sean_Miller over 5 years ago

    Another great contribution.

     

    It's neat how understanding one thing will allow for understanding of another years later.

     

    In the later 90's, I wrote a small BASIC program for the Amiga that would translate the IFF image file format to raw image data.  This allowed me to paste the data directly into C code for my game screens (Pitfall).  I learned about the bit plan approach to describe color:

     

    • If an image was just black and white, it had 1 bit plane.  It was a grid of pixels either on or off.
    • If the image had 4 colors, it stacked 2 bit planes.
    • 8 colors, 3 bit planes
    • … 2^(# of bitplanes)…

     

    A channel here reminds me of the bit planes back then.

     

    I wonder if the processing would improve if one would first apply a routine to take it to grey scale right off the whip for things where color isn't necessary for distinction?

     

    -Sean

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • genebren
    genebren over 5 years ago

    Frank,

     

    Another great blog in your series.  Keep up the good work!

     

    Gene

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • 14rhb
    14rhb over 5 years ago in reply to fmilburn

    Just the right level for me image - keep up the great work.

    • Cancel
    • Vote Up +4 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • fmilburn
    fmilburn over 5 years ago in reply to 14rhb

    Thanks!  I worry that there is too much detail to attract most viewers and not enough detail for those that want to learn the details.  Adding to that, I am still learning myself.  The Udacity course is quite good though and I am finding lots of other material on the internet where I have a question.

    • Cancel
    • Vote Up +3 Vote Down
    • Sign in to reply
    • More
    • Cancel
>
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube