DeepSeek is an open-source AI tool that’s changing the AI industry. (Image Credit: Solen Feyissa/Unsplash)
DeepSeek is a sophisticated AI tool and startup based in Hangzhou, Zhejiang, China. Engineer Liang Wenfeng, who also has an AI and quantitative finance background, founded this firm in May 2023, and it has received funds from the High-Flyer Hedge Fund. However, founding DeepSeek was a quick, surprising process rather than a slow one. It took one year for it to shift from a startup to a revolutionizing force.
This large language model (LLM) performs various tasks based on user input as it uses machine learning and deep learning—just like the AI models created by Google or OpenAI. It can assist with summarization, translation, and writing by producing human-like text. Developers may use it to help complete and debug coding projects.
Machine learning algorithms detect data patterns, which the system can learn from and improve after use. On the other hand, deep learning models, including neural networks, read and understand user input, including images, before processing those data structures. The AI tool runs on plenty of algorithms, such as recurrent neural networks (RNN), convolutional neural networks (CNN), and reinforcement learning. All of these work together, providing the machine with natural language processing, predictive analytics, and image recognition.
Since DeepSeek produces open-source, free usage models, developers can customize them based on their requirements. By doing so, western-based AI companies may face some challenges in keeping up as the most dominant AI tool available.
The company’s AI models represent a big leap forward in advancing AI technology. These focus on natural language processing and programming tasks, offering advanced solutions designed for various applications. Some examples of models the company released include:
DeepSeek has released various AI models since its introduction. (Image Credit: DeepSeek)
- DeepSeek Coder:
DeepSeek’s Coder became available in November 2023. The AI model came with a 16,000 token context window, has 33 billion parameters, and was trained on 2 trillion tokens (made of 87% source code and 13% natural language in English and Chinese). It also supports 86 programming languages.
- DeepSeek LLM:
Introduced in December 2023, DeepSeek LLM, with 67 billion parameters and trained on 2 trillion tokens, became the company’s first multi-functional LLM. This performs similarly to GPT-4 (natural language understanding), setting itself as a powerful competitor in the AI tool market. It’s also more cost-efficient while offering customization.
- DeepSeek-V2:
In May 2024, DeepSeek released the V2 model, which features a Mixture of Experts (MoE) architecture and multi-head latent attention. It has 236 billion parameters, with 21 billion in use (per token). DeepSeek trained this model on a multi-source corpus with 8.1 trillion tokens. This demonstrated significant inference efficiency and training improvements.
- DeepSeek-Coder-V2:
With a heavy focus on developers, Coder-V2 set its foot in the AI game in June 2024. The model has 236 billion parameters (21 billion active per token), supports 338 programming languages, and a 128,000-token context window (to handle long and complex inputs. Impressively, DeepSeek trained Coder-V2 on 10.2 trillion tokens, which includes 4.2 trillion tokens from the dataset. Meanwhile, 6 trillion tokens are for Coder-V2 (60% source code, 10% math corpus, and 30% natural language corpus).
- DeepSeek-V3:
DeepSeek unveiled its V3 AI model, trained on 14.8 trillion high-quality tokens, in December 2024. It uses advanced MoE and deploys FP8 mixed precision training. V3, which has 671 billion parameters (37 billion in use per token), established new performance standards associated with cost efficiency and language understanding. This enables it to match some of the best AI models, including GPT-4 and Claude 3.5 Sonnet.
The R1 version, released on January 20th, 2025, competes with the big names, including OpenAI and Meta, and is notable for its cost efficiency. OpenAI’s models run on costly AI chips, such as NVIDIA H100 GPU, costing approximately $30,000/unit. Meta spent around $10.5 billion on its AI hardware infrastructure. Meanwhile, the R1 model performs similarly without relying on heavy resources---achieved through sophisticated engineering and optimization technologies.
For instance, the R1 model heavily uses reinforcement learning for human-like reasoning. It works by simplifying tasks into natural, smaller step-by-step processes, which achieves more reliable outputs while relying on less computational demands. Proximal Policy Optimization (PPO) enables precise reasoning and logical consistency. As a result, the AI model is less prone to error propagation and guesswork during inference.
The model isn’t as power-intensive as others since it activates essential experts within the Mixture of Experts (MoE) architecture based on task complexity. This technique adjusts resource use, improving scalability and efficiency across different hardware configurations. Additionally, its native sparse attention (NSA) mechanism reduces computational costs by allocating computational resources to certain parts of the input sequence. NSA lowers memory and processing demands by compressing and prioritizing tokens, leading to high performance during training and inference.
DeepSeek deploys quantization techniques that use 8-bit numbers rather than 32-bit and mixed precision training (FP16 and FP32 calculations). These ensure the AI tool doesn’t use a lot of memory while speeding up computation and ensuring precision.
Other techniques like Multi-Head Latent Attention (MLA) reduce memory usage by compressing key-value cache sizes. This is achieved by distributing attention heads into a latent space, leading to larger batches while boosting token sizes (67k to 650k), which improves inference speed. The model even has multi-token prediction (MTP), which makes the AI tool capable of producing multiple tokens per step. It’s practical for complex reasoning or long text-generation tasks.
The AI tool runs on NVIDIA H800 chips. (Image Credit: Mariia Shalabaieva/Unsplash)
Approaches like expert parallelism focus computational tasks on GPUs, and data parallelism allocates data across GPUs, achieving attention processing. These methods ensure balanced load distribution while reducing redundancy for efficient hardware cluster scaling. Although DeepSeek’s R1 model is optimized to use NVIDIA H800, a less powerful GPU, it integrates innovative solutions for high performance. The AI company didn’t use the more advanced NVIDIA AI A100/H100 chips as the U.S. applied trade restrictions on them to prevent rapid AI advancements from global competitors.
DeepSeek is revolutionizing AI by combining cost efficiency and accessibility with advanced innovation. This means it doesn’t require cutting-edge hardware or financial investments, boosting competition for researchers and smaller companies focusing on AI. Plus, its open-source nature presents more collaborative opportunities, giving it the chance to increase global innovations.
However, DeepSeek still faces security concerns and controversies despite its advancements. Countries have prohibited the use of this AI tool due to possible exploits by the Chinese government, data privacy, and national security issues. Italy became the first country to ban DeepSeek after investigations, stating that it’s “contrary to what was found by the Authority.” South Korea’s Personal Information Protection Commission (PIPC) announced it temporarily removed DeepSeek downloads from the Google Play and Apple App Store due to concerns with South Korea’s Personal Information Protection Act. Australia’s Department of Home Affairs has completely banned it on government devices and systems due to national security risks. NASA and the U.S. Navy also banned DeepSeek as it comes with ethical and security concerns.
Wiz, a cybersecurity firm, found an unauthorized, open database on DeepSeek. This database stored over a million lines of sensitive data, such as backend details, system logs, API keys, and user chat logs. With this exploit, attackers could gain database access, control it, and elevate permissions in DeepSeek’s infrastructure. Since then, this exploit has been fixed, but it’s unknown if 3rd parties accessed the data.
Jan Czarnocki prompted DeepSeek to answer questions about Xingjiang.(Image Credit: Jan Czarnocki/LinkedIn)
In addition, DeepSeek has drawn up some scrutiny concerning its open-source availability, opening the doors to potential cyber-attacks and misinformation campaigns. In this case, researchers discovered that the chatbot shows bias toward the Chinese government, especially when asked about certain topics, including Xinjiang, COVID-19, the Tiananmen Square massacre of 1989, and Taiwan. While the AI tool avoids discussing the Tiananmen Square massacre, it fails to acknowledge China’s actions against the Uyghurs in Xinjiang as genocide.
Researchers also identified a flaw in DeepSeek. Users can use it to create malicious content, including malware step-by-step creation guides, keylogging instructions, or purchasing stolen credentials. Tenable researchers demonstrated that DeepSeek could produce a Python script for keyloggers and keystroke logging instructions. It even keeps the log file hidden. Despite the buggy code, researchers manually edited it to make it functional.
The KELA Cyber Team discovered that DeepSeek can be tricked into producing keyloggers, malware, and ransomware. (Image Credit: KELA Cyber Team)
DeepSeek was instructed to generate a C++ ransomware code that encrypts files and displays a ransom dialog. These also needed to be manually edited but still featured persistence mechanisms and file encryption routines. KELA’s Cyber Team showed DeepSeek’s ability to create infostealer malware scripts that pull sensitive data, including credit card numbers, usernames, and passwords. The AI tool produced malware distribution instructions and recommended buying stolen credentials from underground marketplaces.
The AI tool has restrictions to prevent that type of content creation. However, researchers found that inputting prompts, like claiming it’s for educational purposes, can generate harmful outputs
Have a story tip? Message me at: http://twitter.com/Cabe_Atwell