element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Design for a Cause - Design Challenge
  • Challenges & Projects
  • Design Challenges
  • Design for a Cause - Design Challenge
  • More
  • Cancel
Design for a Cause - Design Challenge
Blog Audio4Vision #4 - Show and Tell: Image Caption Generator Neural Network
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Events
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: pranjalranjan299
  • Date Created: 4 Aug 2018 5:06 PM Date Created
  • Views 1180 views
  • Likes 4 likes
  • Comments 3 comments
Related
Recommended

Audio4Vision #4 - Show and Tell: Image Caption Generator Neural Network

pranjalranjan299
pranjalranjan299
4 Aug 2018

Welcome to our fourth blog post! Today we'll be looking at a potential model for our use-case, the Show and Tell model.

The Show and Tell model is a neural image caption generator- It is a deep neural network that learns how to describe the content of input images.

e.g.

Show and Tell Examples

 

The Show and Tell model can be broken down into two blocks: the encoder, and the decoder. The encoder is a CNN, which takes an image, performs convolutional operations on it, and outputs a vectorized representation of the input. This vector is then given to a natural language processing model, which converts it into a sentence in a language of your choice (The original paper uses English).

Show and Tell model

 

 

The encoder network is using the Inception v3 image recognition model pre-trained on the ILSVRC-2012-CLS image classification dataset. It is a deep convolutional neural network. It takes a 299x299x3 image as input, and gives 8x8x2048 output. It has 1000 output classes.

The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.

 

The Show and Tell repository can be found here. It has a deeper explanation of the model as well as instructions on how to download, train and run the model.

We trained the model over the COCO dataset for 1 million iterations before stopping. Here are the results on a few images:

imageimageimageimage

 

Now, we are going to try some finetuning as well as increasing the number of iterations in the training sequence. We are positive that the captioning can only improve from here, and will keep you posted.

 

Thank you for reading, the next one will be up in no time!

  • Sign in to reply

Top Comments

  • DAB
    DAB over 7 years ago +1
    Nice update. DAB
  • prashanth.nagendrappa
    prashanth.nagendrappa over 7 years ago

    Nice Update

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • genebren
    genebren over 7 years ago

    A nice update to your design challenge project.  This is moving along very nicely.  It will be interesting to see where your finetuning takes you.

     

    Gene

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • DAB
    DAB over 7 years ago

    Nice update.

     

    DAB

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube