Can ChatGPT be run at the edge?

5 Aug 2023

A brief overview of Transformers

Models like Stable Diffusion and ChatGPT use Transformers (introduced in the 2017 paper Attention Is All You Need) to “learn” the meaning of words for question answering or image generation based on a prompt. Many people consider these models a type of general intelligence because of their ability to understand and contextualize training data.

The previous standard for natural language processing (NLP) models were LSTMs. These models are able to do things like text prediction by looking at N previous words and choosing the most likely next word, based on training data they’ve seen in the past. Importantly, they also learned the most optimal amount of context to use for their prediction and forgot the rest, because the larger the N previous words, the more complex the computation (quadratic complexity).

LSTMs are able to do question answering (and power autocomplete in your phone) and sometimes pass a Turing test. However they can get caught in a loop of endless repetition no matter how much data they are trained on.

Transformers are different from previous NLP models (such as LSTMs) because they use attention mechanisms to create contextual word embeddings to derive meaning from words and understand grammar.

Unlike previous models, they “understand” word meanings and context to such an extent that they no longer dependent on sequential sentence structure to generate a good answer. However, storing these contextual meanings requires tons of memory, which is part of why these models are usually enormous.

What are Transformers capabilities?

Because of their ability to understand contextual meaning, these models have transformed (no pun intended) the field of natural language processing. ChatGPT is able to pass the Turing test, get high scores on the LSAT, and help doctors make decisions (definitely watch this talk by Peter Lee from Microsoft Research on The Emergence of General AI for Medicine). Many people consider the emergence of Transformers to be the beginning of Artificial General Intelligence (AGI), although the goal posts for this achievement always seem move past what is currently possible.

Despite the hype, Transformers still make mistakes. One downside to Transformers is their propensity for hallucination. This means that they take creative liberties and generate factually inaccurate responses to questions.

Side note: Interestingly, this is somewhat reminiscent of how human brains work. In human brains, the thalamus is responsible for filtering our thoughts and ideas. Less filtering means more creative genius and out of the box thinking, but it’s also correlated with schizophrenia.

Hallucinations are especially dangerous when they are applied to high stakes applications like medical diagnosis.

What else can't ChatGPT do?

ChatGPT is also unable to reliably solve basic math equations, although it can be prompted to write code to solve the same equations. And while they can emulate human emotion, Transformers are totally unable to write funny jokes. ChatGPT works best in western languages like English, but performs poorly in many other languages, especially those without a large online presence.

What are use some cases of running Transformers on edge devices?

You might want to run a Transformer at the edge to reduce communication with cloud servers. This could be important for sensitive applications such as medical when cloud connectivity might introduce security risks. It could also be useful in robotics or automotive applications when latency could be an issue.

Here are a few other use cases where offline transformers could make sense:

Real-time Language Translation: On-device Transformers can be used for real-time language translation, enabling users to translate text or speech directly on their smartphones or other edge devices without requiring an internet connection.
Voice Assistants: Edge-based Transformers can power voice assistants, allowing users to interact with their devices and perform tasks without relying on cloud servers for processing speech recognition and natural language understanding.
Text Summarization: Edge-based Transformers can be used for on-device text summarization, enabling users to quickly generate summaries of documents, articles, or emails without sending data to the cloud.
Personalization: Edge-based Transformers can be used to provide personalized recommendations, content filtering, or other personalized experiences without transmitting user data to the cloud.
Autonomous Systems: Edge-based Transformers can be used in autonomous systems, such as robots or drones, to process sensory data and make real-time decisions without relying on external servers.
Energy Efficient applications: Processing data locally on edge devices can reduce the energy consumption associated with data transmission to the cloud.

Can Transformers be run at the edge?

This depends on the Transformer.

Stable Diffusion limits itself to less than 10GB and can be easily downloaded and manipulated. It can also be quantized to run on much smaller devices. At the 2023 TinyML Summit, Qualcomm and the Korean startup SqueezeBits demonstrated quantized versions of StableDiffusion running on microprocessors.

Meanwhile, I asked ChatGPT about its size and computational requirements. According to itself, ChatGPT has approximately 175 billion parameters, is around a terabyte large, and requires several hundred gigs of RAM to be run.

Other models, such as Hugging Face's GPT-J 6B have fewer parameters, but are still memory capacity and memory bandwidth bound, making them difficult to run on modern AI accelerators that are built for doing parallel computations with limited memory.

There is currently research being done to increase Transformer’s inference efficiency, including modifying the attention mechanism and reduce word prediction complexity from quadratic to linear. The paper Efficient Transformers: A Survey (2022) explains the recent work in detail.

Despite the difficulties, I expect that we will see Transformers like ChatGPT run as inference on microprocessors or even microcontrollers within a few years. In the meantime, ChatGPT can be run via the cloud from edge devices using its API.

Parents

albertabeef over 2 years ago

Great summary, very well done
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel

Comment

albertabeef over 2 years ago

Great summary, very well done
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel

Children

zebular13 over 1 year ago in reply to albertabeef

thank you!
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel