Can DeepSeek continue to be popular?

Author: Yu Yan, reporter of The Paper

A headhunter responsible for recruiting high-end technology talents in the field of large models told The Paper that DeepSeek's hiring logic is not much different from that of other companies in the field of large models. The core label for talents is "young and high potential", that is, born around 1998, with work experience of no more than five years, "smart, science and engineering, young, and little experience."

According to industry insiders, DeepSeek is lucky compared to other large model startups in China. It has no financing pressure, does not need to prove itself to investors, and does not need to balance model technology iteration and product application optimization. However, as a commercial company, after huge investment, it will sooner or later face the pressure and challenges currently faced by other model companies.

Which company will be the most popular in China's big model circle in 2024? Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd. (hereinafter referred to as DeepSeek) must be a strong competitor. If DeepSeek, as the initiator of the big model price war in the middle of last year, first entered the public eye, and at the end of the year and the beginning of the year, after the open source model DeepSeek-V3 and the inference model DeepSeek-R1 were released, DeepSeek completely detonated the public opinion field in the big model circle. On the one hand, people were surprised at its cost-effective training cost (it is said that DeepSeek-V3 only cost $5.576 million in training costs), and on the other hand, they applauded its open source model and public technical reports. The release of DeepSeek-R1 has excited many scientists, developers and users, and even believes that DeepSeek is a strong competitor to OpenAI's o1 and other inference models.

How can this low-key company produce large models with good performance at extremely low training costs? What did it do right to become so popular today? What challenges will it face in the future if it wants to continue to forge ahead in the "modeling circle"?

Algorithm innovation has significantly reduced computing costs

"DeepSeek invested early, has accumulated a lot of experience, and has its own unique algorithms." When referring to DeepSeek, an executive of a star large-scale model startup in China said that he believed that the core advantage of DeepSeek's popularity was due to its algorithmic innovation. "Chinese companies lack computing power, so they pay more attention to saving computing power costs than OpenAI."

According to the DeepSeek-R1 information released by DeepSeek, it uses reinforcement learning technology on a large scale in the post-training stage, which greatly improves the model's reasoning ability with only a small amount of labeled data. In tasks such as mathematics, code, and natural language reasoning, its performance is comparable to the official version of OpenAI o1.

 DeepSeek-R1 API Price

DeepSeek founder Liang Wenfeng has repeatedly emphasized that DeepSeek is committed to developing differentiated technology routes rather than copying OpenAI's model, and DeepSeek must come up with more effective ways to train its models.

"They used a series of engineering techniques to optimize the model architecture, such as the innovative use of model mixing methods. The essential purpose is to reduce costs through engineering to make it profitable." A senior person who has worked in the technology industry for many years told The Paper.

According to the information disclosed by DeepSeek, it can be found that it has made significant progress in the MLA (Multi-head Latent Attention) multi-head potential attention mechanism and the self-developed DeepSeekMOE (Mixture-of-Experts) structure. These two technical designs make the DeepSeek model more cost-effective and improve training efficiency by reducing training computing resources. According to data from research institution Epoch AI, DeepSeek's latest model is very efficient.

In terms of data, unlike OpenAI's "massive data feeding" method, DeepSeek uses algorithms to summarize and classify data, and after selective processing, it is fed to the large model, which improves training efficiency and reduces DeepSeek's cost. The emergence of DeepSeek-V3 achieves a balance between high performance and low cost, providing new possibilities for the development of large models.

"In the future, we may not need super-large GPU clusters." After the release of DeepSeek's cost-effective model, Andrej Karpathy, a founding member of OpenAI, said.

Liu Zhiyuan, a tenured associate professor at the Department of Computer Science at Tsinghua University, told The Paper that DeepSeek's success is proof of our competitive advantage, which is to achieve victory with less through the extremely efficient use of limited resources. The release of R1 shows that the gap between our AI capabilities and those of the United States has narrowed significantly. The Economist also said in its latest report: "DeepSeek is changing the technology industry with its low-cost training and innovation in model design."

Demis Hassabis, the current CEO and co-founder of Google DeepMind, said that while it is not entirely clear how much DeepSeek relies on Western systems for training data and open source models, it must be acknowledged that the team's achievements are indeed impressive. On the one hand, he recognizes that China has very strong engineering and scalability capabilities, but on the other hand, he also pointed out that the West is still ahead and needs to consider how to maintain its leading position in cutting-edge models.

Years of focus and accumulation

DeepSeek’s innovations were not achieved overnight, but were the result of years of “incubation” and long-term planning. Liang Wenfeng is also the founder of Huanfang Quantitative, a leading quantitative private equity firm. Deepseek is believed to have made full use of the funds, data and cards accumulated by Huanfang Quantitative.

Liang Wenfeng graduated from Zhejiang University with a bachelor's and master's degree in information and electronic engineering. Since 2008, he has led his team to explore fully automatic quantitative trading using technologies such as machine learning. In 2015, Huanfang Quantitative was established. The following year, it launched its first AI model. The first trading position generated by deep learning was put online for execution. In 2018, it established AI as its main development direction. In 2020, Huanfang's AI supercomputer "Firefly No. 1", which has a cumulative investment of more than 100 million yuan and covers an area equivalent to a basketball court, was officially put into operation. It is said to be able to rival the super computing power of 40,000 personal computers. In 2021, Huanfang invested 1 billion to build "Firefly No. 2", "equipped with 10,000 A100 GPU chips". At that time, there were no more than 5 companies in China with more than 10,000 GPUs, and except for Huanfang Quantitative, the other 4 companies were all Internet giants.

In July 2023, DeepSeek was officially established and entered the field of general artificial intelligence. It has never received external financing to date.

"With relatively abundant funds and no pressure to raise funds, we focused on making models instead of products in the past few years. This made DeepSeek appear simpler and more focused than other large domestic model companies, and enabled it to make breakthroughs in engineering technology and algorithms," said an executive of the aforementioned large domestic model company.

In addition, as the big model industry became increasingly closed and OpenAI was nicknamed CloseAI, DeepSeek's open source models and public technical reports also won numerous praises from developers, allowing its technology brand to quickly stand out in the big model market at home and abroad.

A researcher told The Paper that the openness of DeepSeek is remarkable, and the open source of models V3 and R1 has raised the benchmark level of open source models on the market.

Success proves the power of young people

"The success of DeekSeek has also allowed everyone to see the power of young people. In essence, the development of this generation of artificial intelligence requires younger minds." A person from a model company told The Paper.

Previously, Jack Clark, former policy director of OpenAI and co-founder of Anthropic, believed that DeepSeek had hired "a group of unfathomable geniuses." In response to this, Liang Wenfeng said in an interview with self-media that there were no unfathomable geniuses. They were all graduates from top domestic universities, interns in their fourth and fifth doctoral programs, and some young people who had graduated only a few years ago.

From the existing media reports, it can be seen that the biggest feature of the DeepSeek team is that they are from prestigious universities and are young. Even the team leaders are mostly under 35 years old. The team has less than 140 people, and almost all engineers and R&D personnel are from top domestic universities such as Tsinghua University, Peking University, Sun Yat-sen University, and Beijing University of Posts and Telecommunications, and they have not worked for long.

However, the aforementioned headhunter also stated that a big model startup is essentially still a startup. It is not that they do not want to recruit top overseas AI talents, but the reality is that not many top overseas AI talents are willing to come back.

An anonymous DeepSeek employee told The Paper that the company has a flat management structure and a good atmosphere for free communication. Liang Wenfeng's whereabouts are uncertain, and most of the time people communicate with him online.

The employee had previously worked in a large domestic company doing large model technology research and development, but felt that he was more like a screw in the company and could not create value, so he finally chose to join DeepSeek. In his opinion, DeepSeek is currently more focused on the underlying model technology.

The working atmosphere at DeepSeek is completely bottom-up with a natural division of labor. There is no upper limit on how everyone can mobilize cards and people. "Everyone has their own ideas and does not need to be pushed. When they encounter problems during the exploration process, they will bring people together to discuss them," Liang Wenfeng said in an interview.

“It’s too early to assume that China has surpassed the United States in AI”

US business media Business Insider analyzed that the newly released R1 shows that China can compete with some of the industry's top artificial intelligence models and keep pace with cutting-edge developments in Silicon Valley; secondly, open sourcing such advanced artificial intelligence may also pose a challenge to companies that try to make huge profits by selling technology.

However, it may be too early to shout that "China's AI has surpassed the United States." Liu Zhiyuan publicly stated that we need to be vigilant against the public opinion turning from extreme pessimism to extreme optimism, thinking that we have surpassed and are far ahead in all aspects, but "not at all." Liu Zhiyuan believes that the current new AGI technology is still evolving at an accelerated pace, and the future development path is still unclear. China is still in the stage of catching up. Although it is no longer out of reach, it can only be said that it is still far behind. "It is relatively easy to follow the path that others have already explored. How to open up new paths in the fog is a greater challenge."

"It's too crowded now, and everyone is too anxious to realize that DeepSeek has finally come out on top." People close to DeepSeek told The Paper that the industry is changing too fast, and it is impossible to predict what will happen next. They can only wait and see what changes will happen in the next Q3.

On the one hand, Demis Hassabis recognizes that China has very strong engineering and scalability capabilities. On the other hand, he also points out that the West is still ahead and needs to consider how to maintain the leading position of Western cutting-edge models.

Although Liang Wenfeng previously stated that DeepSeek only makes models and not products, as a commercial company, it is almost impossible to only make models and not products. On January 15, the DeepSeek official app was officially released. People close to DeepSeek told The Paper that commercialization has been put on the agenda by DeepSeek.

In the view of industry insiders, compared with other large model startups in China, DeepSeek is lucky. It has no financing pressure, does not need to prove itself to investors, and does not need to balance the technical iteration of the model and the optimization of product applications. However, as a commercial company, after huge investments, it will sooner or later face the pressure and challenges currently faced by other model companies. "This outing has made a successful marketing for DeepSeek on the eve of commercialization, but after real commercialization in the future, it needs to be tested by the market, and it is difficult to determine whether it can continue to break through the waves." said the above-mentioned model company person.

What is certain is that DeepSeek will face more pressure and challenges in the future. The competition to build a universal model has just begun, and who will win depends on the continued investment of funds and the iteration of technology. However, industry insiders also believe that "for the domestic model industry, it is a good thing to have a company like DeepSeek with real technical strength join in."