Google creates a neural network that’s capable of multitasking called Multimodal. A diagram of how Google’s new neural network works (Photo via Google)
My immediate thought… Neural Network Raspberry Pi?
Multitasking is something we do every day whether or not we realize it. While some of us are better at it than others, we all still have this capability. Neural networks don’t, however. Normally, they’re trained to do one task, whether that’s adding animation to video games or translating languages. Try to give it another task and the network can’t do its first job very well. Tech giant Google is looking to change this with their latest system, MultiModal.
Modeled after the human brain, their new system can handle eight tasks at one time and pull them off pretty well. Some of the tasks the system is now able to do are detect objects in images, recognize speech, translate between four pairs of languages along with deciphering grammar and syntax and provide captions. The system did all of these tasks at the same time, which is impressive for a neural network.
So, how does it do it? The neural network from Google Brain, the company’s deep learning team, is made up of subnetworks that specialize in certain tasks relating to audio, images or text. It also has a shared model equipped with an encoder, input/output mixer, and decoder. From this, the system learned how to perform these eight tasks at the same time. During testing, the system didn’t break any records and still showed some errors, but its performance was consistently high. It achieved an accuracy score of 86 percent meaning its image recognition abilities were only 9 percent worse than specialized algorithms. Still, it managed to match the abilities of the best algorithms in use five years ago.
While there’s still work to be done to improve the system, MultiModal is already showing its benefits. Normally, deep-learning systems need large amounts of data for training to complete its task. With Google’s new system, it learns from gathering data from a completely different task. For instance, the network’s ability to parse sentences for grammar improved when trained on a database of images, which has nothing to do with sentence parsing.
Not wanting to keep the system to themselves, Google released the MultiModal code as part of its Tensor Flow open source project. Now, other engineers can experiment with the neural network and see what they can get it to do. The company hopes sharing the source code will help facilitate quicker researcher in order to improve the neural network.
Have a story tip? Message me at: cabe(at)element14(dot)com