DeepSeek Rumor Refutation: Five Misconceptions and Truth

Crypto Labs
11 min readFeb 8, 2025

--

Interpretations
Since the Spring Festival, DeepSeek has been continuously gaining high popularity, and along with it come many misunderstandings and controversies. Some people claim that it is the "pride of domestic products that beats OpenAI", while others say it is "just a clever trick of copying foreign large - model homework". These misunderstandings and controversies mainly focus on the following five aspects:
Over - mythologizing and mindless belittling. Is DeepSeek truly an underlying innovation? Is there any basis for the so - called distillation of ChatGPT?
Is the cost of DeepSeek really only $5.5 million?
If DeepSeek can really be so efficient, have the huge AI capital expenditures of global giants all gone to waste?
Does DeepSeek use PTX programming and can it really bypass the dependence on Nvidia CUDA?
DeepSeek is becoming popular globally, but will it be gradually banned abroad due to compliance, geopolitical and other issues?
I. Over - mythologizing and Mindless Belittling: Is DeepSeek Truly an Underlying Innovation?
Caoz, an Internet practitioner, believes that its promoting value to the industry development is worthy of recognition, but it is still too early to talk about subversion. From some professional evaluations, DeepSeek has not surpassed ChatGPT in solving some key problems. For example, someone tested the code for simulating the typical bouncing of a ball in a closed space. Compared with ChatGPT o3 - mini, the program written by DeepSeek still has a gap in terms of compliance with physics. We should neither over - mythologize it nor mindlessly belittle it.
Currently, there are two extreme views on DeepSeek’s technological achievements: one calls its technological breakthrough a "disruptive revolution"; the other believes it is just an imitation of foreign models, and some even speculate that it has made progress by distilling OpenAI models. Microsoft claims that DeepSeek distilled the results of ChatGPT, and some people have thus belittled DeepSeek to nothing.
In fact, both views are too one - sided. More accurately, DeepSeek’s breakthrough is an engineering paradigm upgrade oriented towards industrial pain points, opening up a new "less is more" path for AI reasoning. It has mainly made innovations at three levels:
First, by slimming down the training architecture. For example, the GRPO algorithm omits the necessary Critic model in traditional reinforcement learning (i.e., the "dual - engine" design), simplifying complex algorithms into implementable engineering solutions.
Second, it adopts a simple evaluation standard. Typically, in the code - generation scenario, the compilation results and unit - test pass rates are directly used to replace manual scoring. This rule - based system based on certainty effectively solves the subjective bias problem in AI training.
Finally, it finds a delicate balance in data strategy. By combining the Zero mode of pure - algorithm self - evolution and the R1 mode that only requires thousands of manually annotated data, it not only retains the model’s self - evolution ability but also ensures human interpretability.
However, these improvements have not broken through the theoretical boundaries of deep learning, nor completely subverted the technical paradigms of leading models such as OpenAI o1/o3. Instead, they solve industrial pain points through system - level optimization. DeepSeek is completely open - source and has detailed these innovation points. The whole world can use these progressions to improve their own AI model training, and these innovation points can be seen from the open - source files.
Tanishq Mathew Abraham, the former research director of Stability AI, also emphasized three innovation points of DeepSeek in a recent blog post:

Multi - head Attention Mechanism: Large language models are usually based on the Transformer architecture and use the so - called multi - head attention (MHA) mechanism. The DeepSeek team developed a variant of the MHA mechanism, which can more efficiently utilize memory and achieve better performance.
Verifiable - Reward GRPO: DeepSeek has proven that a very simple reinforcement learning (RL) process can actually achieve similar results to GPT - 4. More importantly, they developed a variant of the PPO reinforcement learning algorithm called GRPO, which is more efficient and has better performance.
DualPipe: When training AI models in a multi - GPU environment, many efficiency - related factors need to be considered. The DeepSeek team designed a new method called DualPipe, and its efficiency and speed have been significantly improved.
In the traditional sense, "distillation" refers to the training of token probabilities (logits), and ChatGPT has not opened up such data, so it is basically impossible to "distill" ChatGPT. Therefore, from a technical perspective, DeepSeek's achievements should not be questioned. Since the relevant thought - chain reasoning process of OpenAI o1 has never been made public, it is difficult to achieve this result solely by "distilling" ChatGPT.
Caoz believes that in the training of DeepSeek, it may have partially utilized some distilled corpus information or done a little distillation verification, but this should have a very low impact on the quality and value of its entire model. In addition, optimizing one's own model based on the distillation verification of leading models is a common operation for many large - model teams. However, since it requires an online API, the information that can be obtained is very limited and is unlikely to be a decisive factor. Compared with the massive amount of Internet data information, the corpus obtained by calling leading large models through the API is a drop in the bucket. A reasonable guess is that it is more used for strategy verification and analysis rather than directly for large - scale training. All large models need to obtain corpus from the Internet for training, and leading large models are constantly contributing corpus to the Internet. From this perspective, every leading large model cannot escape the fate of being collected and distilled, but in fact, there is no need to regard this as the key to success or failure. Eventually, everyone is intertwined and progresses iteratively.
From the perspective of market applications, DeepSeek has begun to show its prowess in multiple fields. In the intelligent customer service field, it can quickly and accurately understand user problems and provide high - quality answers, greatly improving customer service efficiency. In the content creation assistance aspect, it provides rich creative inspiration and material suggestions for creators, helping to make the creation process more efficient. These practical application results further prove the practicality and value of DeepSeek's technological innovation.
II. Is the Cost of DeepSeek Only $5.5 Million?
The conclusion that the cost is $5.5 million is both correct and wrong because it does not clarify what kind of cost it is. Tanishq Mathew Abraham objectively estimated the cost of DeepSeek: This figure first appeared in the DeepSeek - V3 paper, which was published one month earlier than the DeepSeek - R1 paper. DeepSeek - V3 is the base model of DeepSeek - R1, which means that DeepSeek - R1 is actually an additional reinforcement learning training based on DeepSeek - V3. Therefore, in a sense, this cost data is not accurate enough because it does not include the additional cost of reinforcement learning training, but this part of the additional cost may only be several hundred thousand dollars.

So, is the $5.5 million cost claimed in the DeepSeek - V3 paper accurate? Multiple analyses based on GPU costs, dataset sizes, and model scales have all obtained similar estimated results. It is worth noting that although DeepSeek V3/R1 is a model with 671 billion parameters, it adopts a mixture - of - experts architecture, which means that only about 37 billion parameters are used in any function call or forward propagation, and this value is the basis for calculating the training cost.
It should be noted that DeepSeek reports the cost estimated based on the current market price. We don't know how much their 2048 - unit H800 GPU cluster actually cost. Usually, buying a GPU cluster in bulk is cheaper than buying them individually, so the actual cost may be lower. But the key point is that this is only the cost of the final training run. Before reaching the final training, there are many small - scale experiments and ablation studies, which will all generate considerable costs, and this part of the cost is not reflected in this report. In addition, there are many other costs, such as the salaries of researchers. According to SemiAnalysis, the salaries of DeepSeek's researchers are rumored to be as high as $1 million, which is equivalent to the high - end salary levels of AGI - frontier laboratories such as OpenAI or Anthropic.
Some people deny the low cost and operational efficiency of DeepSeek because of the existence of these additional costs, which is extremely unfair. Because other AI companies also spend a lot on personnel salaries, which are usually not calculated into the cost of the model. SemiAnalysis, an independent research and analysis company focusing on semiconductors and artificial intelligence, also provided an analysis of DeepSeek's AI TCO (total cost in the field of artificial intelligence). Calculated over a four - year cycle, the total cost of 60,000 GPUs is 1.629 billion) and operation costs ($0.944 billion).
Of course, no one in the outside world accurately knows how many GPUs DeepSeek actually has and the proportion of each model, and everything is just an estimate. In summary, if all the costs of equipment, servers, operation, etc. are included, the cost is definitely far more than 5.5 million is already very efficient. Compared with other similar large models, DeepSeek has indeed shown unique advantages in cost control, which also gives it a certain price advantage in the market competition, enabling more enterprises and research institutions to have the opportunity to use its technology and services.
III. Is the Huge Capital Expenditure on Computing Power Just a Huge Waste?
This is a widely spread but rather one - sided view. Indeed, DeepSeek has shown advantages in training efficiency and also exposed possible inefficiencies in the use of computing resources by some leading AI companies. Even the short - term slump of Nvidia may be related to the wide spread of this misreading.
But this does not mean that having more computing resources is a bad thing. From the perspective of Scaling Laws, more computing power always means better performance. Since the advent of the Transformer architecture in 2017, this trend has continued, and DeepSeek's model is also based on the Transformer architecture. Although the focus of AI development is constantly evolving - from the initial model scale, to the dataset size, and now to inference computing and synthetic data - the core law of "more computing equals better performance" has not changed. Although DeepSeek has found a more efficient path and the scale law still holds, more computing resources can still achieve better results.

For example, in the field of image recognition, more computing resources can support the model to process higher - resolution and more complex image data, thereby improving the accuracy and speed of image recognition. In the machine translation task of natural language processing, powerful computing resources can enable the model to learn more abundant language knowledge and achieve more accurate and natural translation effects. For enterprises and research projects that pursue extreme performance and large - scale applications, sufficient computing resources are still essential. The emergence of DeepSeek provides a new way of thinking for the industry, that is, how to achieve more efficient model training and application under limited resource conditions through innovative technologies and algorithms.
IV. Does DeepSeek Use PTX to Bypass the Dependence on NVIDIA CUDA?
DeepSeek's paper mentions the use of PTX (Parallel Thread Execution) programming. Through such a customized PTX optimization, DeepSeek's system and model can better release the performance of the underlying hardware. The original text of the paper is: "we employ customized PTX (Parallel Thread Execution) instructions and auto - tune the communication chunk size, which significantly reduces the use of the L2 cache and the interference to other SMs." That is, "We adopt customized PTX (parallel thread execution) instructions and automatically adjust the communication block size, which greatly reduces the use of the L2 cache and the interference to other SMs."
There are two interpretations of this content on the Internet. One voice believes that this is to "bypass the CUDA monopoly"; the other voice is that because DeepSeek cannot obtain the most high - end chips, in order to solve the problem of limited interconnection bandwidth of H800 GPUs, it has to sink to a lower level to improve the cross - chip communication ability.
Dai Guohao, an associate professor at Shanghai Jiao Tong University, believes that both statements are not very accurate. First, the PTX (parallel thread execution) instruction is actually a component inside the CUDA driver layer and still depends on the CUDA ecosystem. So, the statement of using PTX to bypass the CUDA monopoly is wrong. CUDA is a relatively more upper - layer interface that provides a series of programming interfaces for users. And PTX is generally hidden in the CUDA driver, so almost all deep - learning or large - model algorithm engineers will not come into contact with this layer. The reason why this layer is very important is that from this level, PTX directly interacts with the underlying hardware and can achieve better programming and calling of the underlying hardware. In plain language, DeepSeek's optimization plan is not a last - resort measure under the actual conditions of chip limitations, but an active optimization. Whether the chip used is H800 or H100, this method can improve the communication interconnection efficiency.
From the perspective of technological development trends, with the continuous evolution of AI hardware and software ecosystems, more technologies and solutions similar to PTX that optimize the underlying hardware may emerge in the future. DeepSeek's exploration in this regard has accumulated valuable experience for the industry and also provided references for other enterprises and research institutions, promoting the continuous improvement of the utilization efficiency of hardware resources in the entire AI industry.
V. Will DeepSeek be Banned Abroad?

After DeepSeek became popular, the five major cloud giants, Nvidia, Microsoft, Intel, AMD, and AWS, have all launched or integrated DeepSeek. Domestically, Huawei, Tencent, Baidu, Alibaba, and ByteDance Cloud Engine also all support the deployment of DeepSeek. However, there are some overly emotional remarks on the Internet. On the one hand, some people say that the deployment of DeepSeek by foreign cloud giants means "foreigners have been convinced". In fact, the deployment of DeepSeek by these companies is more due to commercial considerations. As cloud vendors, supporting the deployment of the most popular and powerful models as much as possible can provide better services for customers. At the same time, they can also ride the wave of traffic related to DeepSeek and perhaps bring in some new user conversions. It is true that they concentrated on deploying DeepSeek when it was very popular, but statements such as being infatuated with DeepSeek or "being convinced" are exaggerated. What's more, some people even fabricated the story that after DeepSeek was attacked, the Chinese technology community formed an "Avengers Alliance" to jointly support DeepSeek.
On the other hand, there are also voices saying that due to geopolitical and other real - world reasons, DeepSeek will soon be gradually banned from use abroad. In this regard, Caoz gave a relatively clear interpretation: The DeepSeek we mentioned actually includes two products. One is the DeepSeek app that is popular around the world, and the other is the open - source code library on github. The former can be regarded as a demo of the latter, a complete display of capabilities. And the latter may grow into a thriving open - source ecosystem. What may be restricted from use is the DeepSeek app, while what the giants access and provide is the deployment of DeepSeek open - source software. These are completely two different things.
From the perspective of the development of the open - source ecosystem, DeepSeek's open - source code library has attracted the attention and participation of many developers around the world. They conduct secondary development, optimization, and innovation based on DeepSeek, constantly enriching its functions and application scenarios. The vitality and creativity of this open - source community will further promote the development and improvement of DeepSeek technology, enabling it to occupy a more stable position in the global AI market. Even if the DeepSeek app is restricted in some regions, the influence of its open - source technology will not be easily weakened.
DeepSeek has entered the global AI arena as a "Chinese large model" and adopted the most generous open - source protocol - MIT License, even allowing commercial use. Currently, the discussion about it has far exceeded the scope of technological innovation, but technological progress has never been a black - and - white dispute of right and wrong. Instead of being overly boastful or completely negative, it is better to let time and the market test its true value. After all, in this AI marathon, the real competition has just begun. In the future, DeepSeek will face more opportunities and challenges. Whether its technology can continue to innovate and stand out in the global market competition remains to be seen.

--

--

Crypto Labs
Crypto Labs

Written by Crypto Labs

We exist to bridge the gap between West and East, Tech and Business in Blockchain, Leading Incubation Labs & Project Research & Asset Management in Web3!

No responses yet