After waking up from a sleep, many friends asked me to read #manus, which is said to be a truly universal AI Agent in the world, capable of independent thinking, planning and executing complex tasks, and delivering complete results. It sounds very cool, but apart from the voices of many friends who are anxious about losing their jobs, what will it bring to the explosion of web3 DeFai scenarios? Here are my thoughts:
1) About a month ago, OpenAI launched a similar product, Operator. AI can independently complete tasks including restaurant reservations, shopping, ticket booking, and takeout ordering in the browser. Users can supervise visually and take over control at any time.
Not many people discuss the emergence of this agent because it is a single model-driven framework called by the tool. Once users think that key decisions still need intervention, they lose the idea of relying on it to perform tasks.
2) Manus seems to be not much different on the surface, except that it has many more application scenarios, including screening resumes, researching stocks, purchasing real estate, etc., but in fact the difference lies in the framework and execution system behind it. Manus is driven by a multimodal large model and innovatively adopts a multi-signature system.
In short, AI needs to imitate the PDCA cycle of human execution (plan-do-check-act), which will be completed by multiple large models working together. Each model focuses on a specific link, which can reduce the decision-making risk of a single model in executing tasks and improve execution efficiency. The so-called "multi-signature system" is actually a decision verification mechanism for multi-model collaboration, which ensures the reliability of decision-making and execution by requiring joint confirmation from multiple professional models.
3) In this comparison, the advantages of manus are clearly highlighted, and the series of operation experiences shown in the video demo really give people an extraordinary experience. But objectively speaking, Manus's iterative innovation of Operator is just the beginning and has not yet reached the level of subversive revolution.
The key point lies in the complexity of the task execution and the definition of the fault tolerance and success rate of the large model after the non-standard user input prompt is entered. Otherwise, following this set of innovations, can the DeFai scenario of web3 be maturely applied immediately? Obviously, it cannot be done yet:
For example, in the DeFai scenario, for an agent to execute a transaction decision, an Oracle-layer agent is required to collect and verify on-chain data, perform data integration and analysis, and monitor on-chain prices in real time to capture transaction opportunities. This process poses great challenges to real-time analysis. It is possible that a transaction opportunity that was useful a second ago will no longer exist after the Oracle large model is transmitted to the transaction execution agent (arbitrage window).
This actually exposes the biggest weakness of this type of multimodal large model in making execution decisions: how to connect to the Internet, touch the chain, retrieve and analyze real-time data, analyze transaction opportunities from it, and then capture transactions. The Internet environment is actually not bad. The order prices of many e-commerce websites do not change in real time, which is not easy to cause huge dynamic balance problems for the entire multimodal collaboration. If it is on the chain, such challenges exist almost all the time.
4) Therefore, the emergence of manus will indeed cause a wave of anxiety in the web2 field. After all, many clerical and information processing jobs with high repetitiveness may face the risk of being replaced by AI. But let them be anxious.
We need to objectively understand the role of web3 in promoting DeFai application scenarios:
We have to admit that it is of great significance. After all, the LLM OS and Less Structure more intelligence concepts it proposed, especially the multi-signature system, will provide great inspiration for web3 to expand the combination of DeFi and AI.
This actually corrects a major misunderstanding of most DeFai projects. Don't rely on a large model to achieve complex goals such as autonomous thinking and decision-making of AI Agents. This is simply not practical in financial scenarios.
The realization of the true DeFai vision requires solving complex problems such as the upper limit of the capabilities of single AI models, atomicity assurance of multimodal interactive collaboration, unified resource scheduling and control of multimodal systems, system fault tolerance and fault handling mechanisms, etc.
For example: Oracle layer Agent is responsible for collecting and analyzing on-chain data and monitoring prices to form an effective data source;
The decision-making agent analyzes and assesses risks based on the data fed by Oracle, and formulates a set of decisions and action plans;
The execution layer agent executes according to the various solutions given by the decision-making layer and takes into account the actual situation, including gas fee optimization, cross-chain status, transaction sorting conflicts, etc.
Only when this series of agents are powerful at the same time and a huge system framework is established, a true DeFai revolution will be launched.