element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Design for a Cause - Design Challenge
  • Challenges & Projects
  • Design Challenges
  • Design for a Cause - Design Challenge
  • More
  • Cancel
Design for a Cause - Design Challenge
Blog Audio4Vision #5 - Show and Tell: More Iterations and Finetuning
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Events
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: pranjalranjan299
  • Date Created: 12 Aug 2018 2:22 PM Date Created
  • Views 1503 views
  • Likes 5 likes
  • Comments 6 comments
Related
Recommended

Audio4Vision #5 - Show and Tell: More Iterations and Finetuning

pranjalranjan299
pranjalranjan299
12 Aug 2018

Welcome to our fifth blog post! This blog will be a little short, as we're essentially posting an update on the last blog.

 

We mentioned the Show and Tell model in that blog, and talked about its structure and capabilities of the model. We also mentioned that we will increase the number of iterations as well as finetune the model a little.

 

First, the outputs from the older model:

imageimageimageimage

 

This model was trained for 1 million iterations.

Now, we trained a new model, this time with 2 million iterations, and performed some finetuning. These are the results:

 

imageimageimageimage

 

As you can see, the probabilities have increased, and in general a clearer picture is being painted for our user.

That's all for this blog/update. The next blog will be up soon with more exciting updates so stay tuned!

 

Thank you for reading.

  • Sign in to reply

Top Comments

  • pranjalranjan299
    pranjalranjan299 over 7 years ago in reply to dixonselvan +2
    Well the improvement is there in the probabilities, which you might not observe directly in the caption of the same picture but if there is a similar picture before which would have two similar probabilities…
  • pranjalranjan299
    pranjalranjan299 over 7 years ago in reply to genebren +2
    I would say the second model has been trained longer so it has overfit the dataset it was trained with. Hence it's giving out the caption of what a similar image would exist in its original training data…
  • dixonselvan
    dixonselvan over 7 years ago +1
    Nice update pranjalranjan299 But I don't seem to find any improvement after increasing the number of iterations to 2M (except the first one where it correctly identifies car is parked near a tree). In…
  • aspork42
    aspork42 over 7 years ago

    Great update! I agree - there is a big difference between driving and being parked, but you have clearly explained what is happening and why that would be the case. Looking forward to seeing the project develop.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • dixonselvan
    dixonselvan over 7 years ago in reply to pranjalranjan299

    That was an excellent explanation, now I understand things more clearly. Good luck on the development of the hybrid model.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • pranjalranjan299
    pranjalranjan299 over 7 years ago in reply to genebren

    I would say the second model has been trained longer so it has overfit the dataset it was trained with. Hence it's giving out the caption of what a similar image would exist in its original training data. There are few solutions to differentiate between moving and stationary -

     

    1) We can create another learning model which takes more frames of the same picture, so it detects the object at different coordinates in different pictures assuming the camera was stationary when the frames were taken. If the coordinates are the same in all the frames then it can be treated as not moving, otherwise, we can say it is moving. I think we can implement this after the first model is created because right now we are facing several obstacles in just deploying this on the cloud and linking it to MKR1000. So I think this additional feature can be added once we are done making a functioning unit which is capable of giving out fairly good descriptive captions.

     

    2) The second solution could be using a sensor to detect movement in the line of sight of the camera, so the sensor can detect for a certain time period during which the image is taken by the camera. Then the caption is generated and at the end of the caption the words "moving" or "stationary" can be added depending on the values of the sensor(s) which detect a change in velocity or movement in the vicinity. This is can also be a viable solution, maybe we can use the PIR sensor for relatively near objects but I doubt it will work for objects farther away from a certain distance.

     

    Thanks for the fantastic suggestion Gene, it will certainly improve our project's functioning.

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • pranjalranjan299
    pranjalranjan299 over 7 years ago in reply to dixonselvan

    Well the improvement is there in the probabilities, which you might not observe directly in the caption of the same picture but if there is a similar picture before which would have two similar probabilities of different predictions, the newly trained model would still give the same caption as it would have more probability. So in simple words the second model is more "confident" about the correct predictions.

    As far as the second image is concerned, it's a clear example of overfitting which is very common in Deep Learning based on Images. As you increase the iterations, you are making the model learn the captions of its original dataset more strongly, and hence it will give out the captions slightly wrongly in certain cases. We are going to utilize a model which is a hybrid of these two so it can predict the caption with more probability as well as not suffer from overfitting.


    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • genebren
    genebren over 7 years ago

    Interesting update.  I wonder on image 11.jpg, which caption is more correct (before or after).  Before says 'driving', while after says 'parked'.  There is a big difference between the two statements.  Also, for motion, wouldn't a series of images (i.e. change in position or potential velocity) be a valuable input for a correct caption?  Would your modeling allow for a caption based on differences between images?

     

    Good luck,

    Gene

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
>
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube