Author: BadBot, IOBC Capital
Just last night, DeepSeek released an update to its V3 version on Hugging Face - DeepSeek-V3-0324, with 685 billion model parameters and significant improvements in coding capabilities, UI design, and reasoning capabilities.
At the just-concluded 2025 GTC conference, Huang Renxun spoke highly of DeepSeek, and emphasized that the market's previous understanding that DeepSeek's efficient model would reduce the demand for Nvidia chips was wrong, and that future computing demand will only be more, not less.
As a star product with algorithmic breakthroughs, what is the relationship between DeepSeek and NVIDIA's computing power supply? I would like to first discuss the significance of computing power and algorithms to the development of the industry.
Symbiotic evolution of computing power and algorithms
In the field of AI, the increase in computing power provides the operating basis for more complex algorithms, enabling models to process larger amounts of data and learn more complex patterns; while algorithm optimization can more efficiently utilize computing power and improve the efficiency of computing resource utilization.
The symbiotic relationship between computing power and algorithms is reshaping the AI industry landscape:
Differentiation of technical routes: Companies such as OpenAI pursue the construction of super-large computing clusters, while DeepSeek and others focus on optimizing algorithm efficiency, forming different technical schools.
Industrial chain reconstruction: NVIDIA has become the leader in AI computing power through the CUDA ecosystem, while cloud service providers have lowered the deployment threshold through elastic computing power services.
Resource allocation adjustment: The company's R&D focus is on seeking a balance between hardware infrastructure investment and efficient algorithm development.
The rise of the open source community: Open source models such as DeepSeek and LLaMA enable the sharing of algorithm innovation and computing power optimization results, accelerating technology iteration and diffusion.
DeepSeek’s Technological Innovation
The popularity of DeepSeek is definitely inseparable from its technological innovation. I will use simple language to explain it so that most people can understand it.
Model architecture optimization
DeepSeek uses a combination of Transformer+MOE (Mixture of Experts) architecture and introduces the Multi-Head Latent Attension (MLA) mechanism. This architecture is like a super team, in which Transformer is responsible for handling routine tasks, and MOE is like a group of experts in the team. Each expert has his or her own area of expertise. When encountering a specific problem, the expert who is best at it will handle it, which can greatly improve the efficiency and accuracy of the model. The MLA mechanism allows the model to focus on different important details more flexibly when processing information, further improving the performance of the model.
Innovation in training methods
DeepSeek proposed the FP8 mixed precision training framework. This framework is like an intelligent resource allocator, which can dynamically select the appropriate calculation precision according to the needs of different stages of the training process. When high-precision calculation is required, it uses higher precision to ensure the accuracy of the model; when lower precision is acceptable, it reduces the precision, thereby saving computing resources, increasing training speed, and reducing memory usage.
Improved inference efficiency
In the inference phase, DeepSeek introduced Multi-token Prediction (MTP) technology. The traditional inference method is to proceed step by step, and only predict one token at each step. MTP technology can predict multiple tokens at a time, which greatly speeds up the inference speed and reduces the cost of inference.
Reinforcement learning algorithm breakthrough
DeepSeek's new reinforcement learning algorithm GRPO (Generalized Reward-Penalized Optimization) optimizes the model training process. Reinforcement learning is like equipping the model with a coach, who guides the model to learn better behaviors through rewards and penalties. Traditional reinforcement learning algorithms may consume a lot of computing resources in this process, while DeepSeek's new algorithm is more efficient. It can reduce unnecessary calculations while ensuring the improvement of model performance, thereby achieving a balance between performance and cost.
These innovations are not isolated technical points, but form a complete technical system, reducing the computing power requirements from training to reasoning. Ordinary consumer-grade graphics cards can now run powerful AI models, greatly lowering the threshold for AI applications and enabling more developers and companies to participate in AI innovation.
Impact on Nvidia
Many people think that DeepSeek bypasses the Cuda layer and thus gets rid of its dependence on NVIDIA. In fact, DeepSeek directly optimizes algorithms through NVIDIA's PTX (Parallel Thread Execution) layer. PTX is an intermediate representation language between high-level CUDA code and actual GPU instructions. By operating this layer, DeepSeek can achieve more refined performance tuning.
This has a two-sided impact on NVIDIA. On the one hand, DeepSeek is actually more deeply tied to NVIDIA's hardware and Cuda ecosystem, and the lowering of the threshold for AI applications may expand the overall market size. On the other hand, DeepSeek's algorithm optimization may change the market demand structure for high-end chips. Some AI models that originally required GPUs such as H100 to run may now run efficiently on A100 or even consumer-grade graphics cards.
What it means for China’s AI industry
DeepSeek's algorithm optimization provides a technological breakthrough path for China's AI industry. In the context of high-end chip restrictions, the idea of "software supplements hardware" reduces dependence on top imported chips.
In the upstream, efficient algorithms reduce the pressure of computing power demand, allowing computing power service providers to extend the hardware life cycle and improve the return on investment through software optimization. In the downstream, the optimized open source model lowers the threshold for AI application development. Many small and medium-sized enterprises can develop competitive applications based on the DeepSeek model without a large amount of computing power resources, which will give rise to more AI solutions in vertical fields.
Profound impact on Web3+AI
Decentralized AI Infrastructure
DeepSeek's algorithm optimization provides new impetus for Web3 AI infrastructure. Its innovative architecture, efficient algorithms and low computing power requirements make decentralized AI reasoning possible. The MoE architecture is naturally suitable for distributed deployment. Different nodes can hold different expert networks, and there is no need for a single node to store the complete model. This significantly reduces the storage and computing requirements of a single node, thereby improving the flexibility and efficiency of the model.
The FP8 training framework further reduces the demand for high-end computing resources, allowing more computing resources to be added to the node network. This not only lowers the threshold for participating in decentralized AI computing, but also improves the computing power and efficiency of the entire network.
Multi-Agent System
Intelligent trading strategy optimization: Through the coordinated operation of real-time market data analysis agent, short-term price fluctuation prediction agent, on-chain transaction execution agent, transaction result supervision agent, etc., it helps users obtain higher returns.
Automatic execution of smart contracts: Smart contract monitoring agent, smart contract execution agent, execution result supervision agent, etc. work together to achieve more complex business logic automation.
Personalized portfolio management: AI helps users find the best staking or liquidity opportunities in real time based on their risk preferences, investment goals, and financial status.
"We can only see a short time into the future, but it's enough to find that there is a lot of work to be done there." DeepSeek is finding breakthroughs through algorithm innovation under the constraints of computing power, opening up a differentiated development path for China's AI industry. Lowering the application threshold, promoting the integration of Web3 and AI, reducing dependence on high-end chips, and enabling financial innovation are reshaping the digital economy. In the future, AI development will no longer be just a computing power competition, but a competition for the coordinated optimization of computing power and algorithms. On this new track, innovators such as DeepSeek are redefining the rules of the game with Chinese wisdom.