Audio4Vision #4 - Show and Tell: Image Caption Generator Neural Network

4 Aug 2018

Welcome to our fourth blog post! Today we'll be looking at a potential model for our use-case, the Show and Tell model.

The Show and Tell model is a neural image caption generator- It is a deep neural network that learns how to describe the content of input images.

e.g.

The Show and Tell model can be broken down into two blocks: the encoder, and the decoder. The encoder is a CNN, which takes an image, performs convolutional operations on it, and outputs a vectorized representation of the input. This vector is then given to a natural language processing model, which converts it into a sentence in a language of your choice (The original paper uses English).

The encoder network is using the Inception v3 image recognition model pre-trained on the ILSVRC-2012-CLS image classification dataset. It is a deep convolutional neural network. It takes a 299x299x3 image as input, and gives 8x8x2048 output. It has 1000 output classes.

The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.

The Show and Tell repository can be found here. It has a deeper explanation of the model as well as instructions on how to download, train and run the model.

We trained the model over the COCO dataset for 1 million iterations before stopping. Here are the results on a few images:

Now, we are going to try some finetuning as well as increasing the number of iterations in the training sequence. We are positive that the captioning can only improve from here, and will keep you posted.

Thank you for reading, the next one will be up in no time!

Top Comments

DAB over 6 years ago +1

Nice update. DAB

prashanth.nagendrappa over 6 years ago

Nice Update
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
genebren over 6 years ago

A nice update to your design challenge project. This is moving along very nicely. It will be interesting to see where your finetuning takes you.

Gene
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
DAB over 6 years ago

Nice update.

DAB
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel