Integration of DePIN and embodied intelligence: technical challenges and future prospects

Introduction: On February 27, Messari hosted a podcast on "Building Decentralized Physical AI" and invited Michael Cho, co-founder of FrodoBot Lab. They talked about the challenges and opportunities of decentralized physical infrastructure networks (DePIN) in the field of robotics. Although this field is still in its infancy, it has great potential and may completely change the way AI robots operate in the real world. However, unlike traditional AI that relies on large amounts of Internet data, DePIN robotic AI technology faces more complex problems, such as data collection, hardware limitations, evaluation bottlenecks, and the sustainability of economic models.

In today’s article, we will break down the key points of this discussion and see what problems DePIN robotics has encountered, what are the main obstacles to scaling decentralized robots, and why DePIN has advantages over centralized methods. Finally, we will also explore the future of DePIN robotics and see if we are about to usher in the “ChatGPT moment” of DePIN robotics.

Where is the bottleneck of DePIN intelligent robots?

When Michael Cho first started working on FrodoBot, the biggest headache was the cost of robotics technology. The prices of commercial robots on the market were ridiculously high, making it difficult to promote AI applications in the real world. His initial solution was to build a low-cost autonomous robot that would cost only $500, intending to win by being cheaper than most existing projects.

But as he and his team delved deeper into R&D, Michael realized that cost was not the real bottleneck. The challenges of decentralized physical infrastructure networks (DePIN) in robotics are far more complex than "expensive or not". As FrodoBotLab continues to advance, multiple bottlenecks in DePIN robotics have gradually surfaced. To achieve large-scale deployment, the following bottlenecks must be overcome.

Bottleneck 1: Data

Unlike those large 'online' AI models that are trained with large amounts of Internet data, embodied AI needs to interact with the real world to develop intelligence. The problem is that the world does not currently have such a large-scale foundation, and there is no consensus on how to collect this data. Data collection for embodied AI can be divided into the following three categories:

▎The first category is human operation data, which is the data generated when humans manually control robots. This type of data is of high quality and can capture video streams and action labels - that is, what humans see and how they react accordingly. This is the most effective way to train AI to imitate human behavior, but the disadvantage is that it is costly and labor-intensive.

▎The second type is synthetic data (simulated data), which is useful for training robots to move in complex terrain, such as training robots to walk on rugged ground, and is very useful for some specialized fields. But for some tasks with a variety of changes, such as cooking, the simulated environment is not very good. We can imagine the situation of training a robot to fry eggs: the type of pan, oil temperature, and slight changes in room conditions will affect the results, and it is difficult for a virtual environment to cover all scenarios.

▎The third category is video learning, which is to let the AI model learn by observing videos of the real world. Although this method has potential, it lacks the real physical direct interactive feedback required for intelligence.

Bottleneck 2: Level of autonomy

Michael mentioned that when he first tested FrodoBot in the real world, he mainly used the robot for last-mile delivery. From the data point of view, the results were actually quite good - the robot successfully completed 90% of the delivery tasks. But in real life, a 10% failure rate is unacceptable. A robot that fails once in every ten deliveries cannot be commercialized at all. Just like automated driving technology, unmanned driving can have a record of 10,000 successful driving, but one failure is enough to defeat the confidence of commercial consumers.

Therefore, for robotics to be truly useful, the success rate must be close to 99.99% or even higher. But the problem is that every 0.001% increase in accuracy requires exponential time and effort. Many people underestimate the difficulty of this last step.

Michael recalled that when he sat in Google's self-driving car prototype in 2015, he felt that fully autonomous driving was just around the corner. Ten years later, we are still discussing when we can achieve Level 5 full autonomy. The progress of robotics is not linear, but exponential - with each step forward, the difficulty increases significantly. This last 1% accuracy may take years or even decades to achieve.

Bottleneck 3: Hardware: AI alone cannot solve the robot problem

Taking a step back, even if AI models are powerful, existing robot hardware is not ready to achieve true autonomy. For example, the most easily overlooked problem in hardware is the lack of tactile sensors - the best current technology, such as Meta AI's research, is far from the sensitivity of human fingertips. Humans interact with the world through vision and touch, while robots know almost nothing about texture, grip, and pressure feedback.

There’s also the problem of occlusion — when an object is partially blocked, it’s hard for a robot to recognize and interact with it, whereas humans can intuitively understand an object even if they can’t see its full appearance.

In addition to perception issues, robotic actuators themselves have flaws. Most humanoid robots place actuators directly on the joints, making them bulky and potentially dangerous. In contrast, the human tendon structure allows for smoother, safer movements. This is why existing humanoid robots appear stiff and inflexible. Companies like Apptronik are developing more bio-inspired actuator designs, but these innovations will take time to mature.

Bottleneck 4: Why is hardware expansion so difficult?

Unlike traditional AI models that rely solely on computing power, the realization of intelligent robotics technology requires the deployment of physical devices in the real world. This poses a huge capital challenge. Robots are expensive to build, and only the richest companies can afford large-scale experiments. Even the most efficient humanoid robots now cost tens of thousands of dollars, making mass adoption simply unrealistic.

Bottleneck 5: Evaluation effectiveness

This is an "invisible" bottleneck. Think about it, large online AI models like ChatGPT can test their functions almost instantly - after a new language model is released, researchers or ordinary users around the world can basically draw conclusions about its performance within a few hours. But evaluating physical AI requires real-world deployment, which takes time.

Tesla’s Full Self-Driving (FSD) software is a good example. If a Tesla logs 1 million miles without an accident, does that mean it has truly achieved Level 5 autonomy? What about 10 million miles? The problem with robotic intelligence is that the only way to validate it is to see where it ultimately fails, which means large-scale, long-term live deployment.

Bottleneck 6: Human Resources

Another underestimated challenge is that human labor remains indispensable in the development of robotic AI. AI alone is not enough. Robots need human operators to provide training data; maintenance teams to keep the robots running; and essential researchers/developers to continuously optimize AI models. Unlike AI models that can be trained in the cloud, robots require continuous human intervention—a major challenge that DePIN must address.

The future: When will ChatGPT moment come for robotics?

Some believe that the ChatGPT moment for robotics is coming. Michael is somewhat skeptical. Given the challenges of hardware, data, and evaluation, he believes that general robotics AI is still a long way from mass adoption. However, the progress of DePIN robotics does give people hope. Robotics development should be decentralized, not controlled by a few large companies. The scale and coordination of a decentralized network can spread the capital burden. Instead of relying on a large company to pay for the construction of thousands of robots, it is better to put individuals who can contribute into a shared network.

For example - first, DePIN accelerates data collection and evaluation. Instead of waiting for a company to deploy a limited number of robots to collect data, decentralized networks can run in parallel and collect data at a much larger scale. For example, in a recent AI vs. human robot competition in Abu Dhabi, researchers from institutions such as DeepMind and UT Austin tested their AI models against human players. Although humans still had the upper hand, the researchers were excited about the unique dataset collected from real-world robot interactions. This indirectly proves the need for subnetworks that connect the various components of robotics. The enthusiasm of the research community also shows that even if full autonomy remains a long-term goal, DePIN robotics has demonstrated tangible value from data collection and training to real-world deployment and verification.

On the other hand, AI-driven hardware design improvements, such as optimizing chips and materials engineering with AI, may significantly shorten the timeline. A specific example is that FrodoBot Lab has worked with other institutions to ensure two boxes of NVIDIA H100 GPUs, each containing eight H100 chips. This provides researchers with the necessary computing power to process and optimize AI models for real-world data collected from robot deployments. Without such computing resources, even the most valuable data sets cannot be fully utilized. It can be seen that through access to DePIN's decentralized computing infrastructure, the Robotics Network can enable researchers around the world to train and evaluate models without being restricted by capital-intensive GPU ownership. If DePIN can successfully crowdsource data and hardware advances, the future of robotics may come sooner than expected.

Additionally, AI agents like Sam, a traveling KOL robot with meme tokens, demonstrate new profit models for decentralized robotics networks. Sam operates autonomously, livestreaming 24/7 in multiple cities while its meme tokens appreciate in value. This model demonstrates how smart robots powered by DEPIN can sustain themselves financially through decentralized ownership and token incentives. In the future, these AI agents could even use tokens to pay human operators for assistance, rent additional robotic assets, or bid on real-world tasks, creating an economic cycle that benefits both AI development and DePIN participants.

Final summary

The development of robot AI depends not only on algorithms, but also on hardware upgrades, data accumulation, financial support, and human participation. In the past, the development of the robotics industry was limited by high costs and the dominance of large companies, which hindered the pace of innovation. The establishment of the DePIN Robot Network means that with the power of decentralized networks, robot data collection, computing resources, and capital investment can be coordinated on a global scale, which not only accelerates AI training and hardware optimization, but also lowers the development threshold and allows more researchers, entrepreneurs, and individual users to participate. We also hope that the robotics industry will no longer rely on a few technology giants, but will be jointly driven by the global community to move towards a truly open and sustainable technology ecosystem.