In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

Efficient communication tools are needed in blockchain to ensure smooth interaction between nodes. libp2p is an indispensable framework for developers in peer-to-peer communication, providing powerful modular functions, making message transmission in decentralized networks more flexible and secure. In Substrate, the integration of libp2p helps developers easily implement various custom protocols, simplifies the communication process between blockchain nodes, and makes complex network interactions more intuitive and efficient.

This article was sponsored by PaperMoon The six technical articles of the Substrate advanced course written by teacher Kaichao will give you an in-depth understanding of the advantages of peer-to-peer communication, the architectural design of libp2p, and how Substrate applies these technologies to the construction of decentralized blockchain networks, helping you to easily cope with complex network environments when developing custom blockchain applications.

The rise and significance of the peer-to-peer communication model

In the Web2.0 era, most Internet applications use a TCP/IP -based "client-server" communication architecture, where the client collects data and sends it to the server, which stores and processes the data. The client then obtains and uses the processed data. This model has supported the rapid development of the Internet for nearly three decades. While providing convenience to ordinary users, it has also caused various problems, such as:

  • Disclosure of user privacy;

  • Selling user data;

  • Service providers publish unscrupulous advertisements, causing irreparable losses to users who are unaware of the truth;

  • Arbitrarily delete content posted by users without consultation;

  • Monopolistic markets and price fixing;

  • Over-exploiting user psychology and occupying user time without restraint;

The client-server communication model is as follows:

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

When looking for solutions to the above problems, peer-to-peer communication mechanisms gradually came into the vision of technology pioneers. In the early days of the Internet, peer-to-peer communication was mainly used for file sharing, such as the music sharing service Napster and the streaming media download service BitTorrent. The wider application of peer-to-peer services also requires certain governance mechanisms to deal with resource copyright issues and real-world supervision. These are not the focus of this article and will not be introduced in detail.

The point-to-point communication model is as follows:

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

In a peer-to-peer network, all nodes are equal, that is, any node can store and process data (as a server); it can also send data to be processed to other nodes in the network and obtain data processed by the network (as a client). Through such a communication mechanism, it can be guaranteed that

  • The network is open and nodes can join and exit freely;

  • Without relying on a single service node , the network service is more reliable and efficient;

  • The program code run by the nodes is publicly visible , and the rules are more transparent.

Depending on the data transmitted and the services provided in the network, peer-to-peer applications have different application scenarios, including file storage and reading, data computing, content sharing, data trading and other services. In the process of developing these applications, the technical points that may be involved are:

  • Node identity , which uniquely identifies nodes and address formats in the network;

  • Discovery mechanism: how to discover new nodes without a centralized coordination service;

  • Routing : The local node cannot store the information of all nodes in the network, and the required nodes are found through the routing algorithm;

  • Various communication protocols such as TCP, UDP, WebSocket, QUIC, etc.

  • Encryption and authentication to ensure the reliability and security of messages;

  • NAT penetration solves the problem that the internal IP behind NAT is inaccessible;

  • Multiplexing to save resources;

  • Message subscription , efficient acquisition of updates without burdening the network;

  • Relay : When the two nodes that need to establish communication cannot be directly accessed, such as both are in a NAT network, information needs to be transmitted through a relay node;

The technical points/requirements listed above do not appear in every peer-to-peer application. Most of them only use some of the functions. However, there is still a serious phenomenon of reinventing the wheel. Some applications choose to fork the functional code of existing open source applications to avoid repeated development. This method introduces technical debt of the original application and is difficult to customize and expand.

The complex and ever-changing network topology and the expanding application status have made the development, promotion and popularization of peer-to-peer applications extremely difficult. It is not surprising that a highly modular peer-to-peer communication development framework has emerged, which is libp2p, which we will introduce next.

Modular point-to-point communication, the birth of the framework - libp2p

Libp2p is a framework for developing peer-to-peer applications. It originated from the decentralized file sharing service IPFS. It extracted and redesigned the content related to network communication to form the current libp2p. The more mature language versions currently include js-libp2p, go-libp2p, and rust-libp2p, and a set of reference specifications are defined. As long as the implementation versions of different languages conform to this specification, they can achieve intercommunication.

The core functions provided by Libp2p include:

  • Establish secure and reusable network connections between nodes;

  • Verifiable node identities and connectable addresses.

Safe and reusable connections

The underlying (transport layer) protocols supported by Libp2p include TCP/IP, UDP, WebSocket, QUIC, etc. The implementation completion of different language versions varies. The security of the connection is guaranteed by encrypting the transmitted content, and the identity of the node will also be verified accordingly.

In order to improve the utilization of connections and cope with complex network scenarios such as various forms of firewalls and NAT, it is necessary to multiplex the established underlying connections. Stream is a form of upper-layer connection that can achieve multiplexing. It can be bidirectional or unidirectional.

The QUIC protocol has built-in security and reuse components. For protocols without such features, libp2p can be used to upgrade the original connection and add the required security and reusable suites. The security suites include secio and Noise , and the reusable suites include yamux and mplex .

The process of the Upgrade protocol is as follows:

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

In the stream, you can transmit a variety of libp2p built-in or user-defined application layer protocols, which define the way and content of information exchange between nodes, such as:

  • Ping , used to periodically check whether the node is online;

  • identity , used to exchange information between nodes, such as the node's public key and address in the network;

  • kad-dht , a distributed hash table based on the Kademlia algorithm, used for inter-node routing;

Taking the identity protocol as an example, its protocol id (a string with path format) is /ipfs/id/1.0.0 , and the message representation and serialization uses protocol buffer .

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

Node identity

When a node is started, a private key (which can also be randomly generated) is required, mainly used for

  • The public keys of both nodes are exchanged through Diffie-Helman key to encrypt and decrypt the message;

  • Hash the node's public key to generate PeerId, the node identity.

The public key encryption algorithms supported by Libp2p include RSA, Ed25519, Secp256k1, etc. PeerId is generated in the form of multihashes , which supports multiple hash algorithms. The format after base 58 encoding is such as QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N .

Combining PeerId with multiaddr can be used to locate nodes and verify identities in the network. For example, the multiaddr (multi-level address) of a node with IP address 7.7.7.7, listening on port 4242 and having the above PeerId is:

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

The above only lists some of the functions provided by libp2p. For more information such as message subscription, relay, NAT penetration, etc., please refer to the relevant documents. Using libp2p to develop peer-to-peer applications can solve most of the above-mentioned problems and technical points, save a lot of development time, and increase the maintainability and scalability of the system. Next, let's see how to use rust-libp2p to implement a simple custom application protocol.

Simple Application

Here we write a simple peer-to-peer application based on rust-libp2p, which can complete the echo function, that is, one node sends a string, and the other node receives the string and replies with the same characters. Here we need to customize an application layer protocol EchoProtocol and implement the UpgradeInfo interface provided by libp2p.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

The protocol_info method here returns the name and format of the protocol. Then implement InboundUpgrade and OutboundUpgrade , both of which inherit from UpgradeInfo。

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

NegotiatedSubstream indicates the I/O stream that a certain negotiated protocol will use. When the remote node supports the current protocol, call upgrade_inbound and upgrade_outbound to start the handshake signal on the listener and dialer ends respectively.

After that, define the handler for processing the connection request, which is the structure EchoHandler here, which saves the status information used in the processing process.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

A custom enumeration event enumeration type is also required.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

Then you can implement the ProtocolsHandler interface provided by libp2p::swarm.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

When the node is a dialer, the handler needs to return ProtocolsHandlerEvent::OutboundSubstreamRequest containing EchoProtocol instance when polling ( ProtocolsHandler::poll() ) , which is used to initiate and negotiate the protocol used for the connection. If the negotiation is successful, ProtocolsHandler::inject_fully_negotiated_outbound is called, where we update the outbount state saved by the handler from None to Some(send_echo(stream).boxed()) , where send_echo receives the negotiated IO stream and returns the stream if no error occurs.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

Let's look at the implementation Pr otocolsHandler::poll . When outbound is Some and the result of the future poll returned by send_echo is Poll::Pending , outbound is updated to self.outbound = Some(send_echo_future) to ensure that it is still valid for the next poll. When the result is Poll::Ready the corresponding event information is returned.

When the node is a listener and a new request stream appears in the connection, ProtocolsHandler::listen_protocol automatically called to return an instance of InboundUpgrade to negotiate the protocol used by the stream. After the negotiation is successful, inject_fully_negotiated_inbound is called, one of the parameters of which is the negotiated stream. In this method, the inbound attribute state of the handler is updated to Some(recv_echo(stream).boxed()) , and the implementation of the recv_echo method is.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication Here the generic S needs to satisfy AsyncRead and AsyncWrite constraints provided by futures_io .

A peer-to-peer network is like a swarm , and the overall behavior of the swarm is composed of the behavior of a single individual. The behavior of a single individual is formulated by a series of rules. Such rules can be used in combination. In rust-libp2p, the prescribed definition needs to implement NetworkBehaviour interface. Here we first define a structure to save the state of the rules.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

This structure contains the message events for communicating with Swarm and the initial configuration required for behavior definition. Next, you can implement NetworkBehaviour interface.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

When a connection is established or an attempt is made to call a node, new_handler will be called, and the handler we defined previously, namely EchoHandler , will be returned as the background processing thread for the connection. Behaviour and handler communicate through the message passing mechanism. inject_event can pass the handler's message to behaviour. When polling, behaviour returns SendEvent to pass the message to handler. So far, we have completed a simple echo point-to-point communication protocol. Now let's see how to use it in the main function.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

A brief description of the code is as follows:

  • Generate the key for inter-node communication encryption through Keypair::generate_ed25519 , the public key of which can be used to derive PeerId of the node.

  • libp2p::build_development_transport builds a commonly used transport layer for development, supports TCP/IP, WebSocket, uses the noise protocol as the encryption layer, and the yamux and mplex multiplexing protocols.

  • Parse the incoming parameters. If they contain the node information of the call, it is the dialer (client). Set the initial parameter init_echo of the constructed behavior to true.

  • Using the transport layer, behavior, and node id constructed above, call Swarm::new(transport, behaviour, peer_id) to construct a swarm of the simulated network.

  • When the node is a dialer, it calls the incoming remote node Swarm::dial_addr(&mut swarm, remote)? to add the node to the swarm node pool.

  • Poll the swarm with swarm.poll_next_unpin(cx) and process the corresponding message if there is a message triggered by the behavior.

In summary, libp2p has highly abstracted peer-to-peer communication. When you first come into contact with these concepts, it is easy to be confused and you need to constantly familiarize yourself with the divided layers and common protocols. The implementation of rust-libp2p encapsulates different interfaces for the layers and protocols defined by libp2p . When developing custom protocols, you need to have a deep understanding of these abstract interfaces and the communication methods between interfaces. In general, the difficulty of developing peer-to-peer communication is much higher than that of traditional client-server communication. The design of libp2p is to bridge some of these pain points, but there is still a long way to go. Application developers need to know more about the underlying mechanisms to better develop application protocols. At present, applications using libp2p include IPFS, Substrate/Polkadot, Libra, Ethereum 2.0, etc. Next, let's learn how Substrate uses libp2p.

Substrate network layer architecture, multi-protocol support and node discovery mechanism implementation

The blockchain network is composed of decentralized (or peer-to-peer) nodes, and messages are transmitted between nodes through network connections. Substrate is a general blockchain development framework. Its network layer uses rust-libp2p, which can easily use and expand a series of communication protocols, such as:

  • The transport layer supports TCP/IP (address format is /ip4/1.2.3.4/tcp/5 ), WebSocket ( example.com/tcp/5/ws format is /ip4/1.2.3.4/tcp/5/ws ), DNS (address format /dns/ or /dns/ ), and the corresponding IPv6 format example.com/tcp/5

  • The encryption protocol Noise is applied on top of the transport layer;

  • Supports multiplexing protocols Yamux and Mplex , of which mplex will be gradually abandoned;

  • Use the libp2p standard ping protocol ( /ipfs/ping/1.0.0 ) to periodically check whether the network connection between nodes is still alive. If the check fails, the connection will be disconnected;

  • Use libp2p standard id protocol ( /ipfs/id/1.0.0 ), through which nodes periodically exchange their own information;

  • libp2p standard Kademlia protocol ( /<protocol_id>/kad ), perform Kademlia random walk query, where protocol_id can be used to distinguish different chains and is defined in Substrate chain spec;

  • The custom sync protocol ( /<protocol-id>/sync/2 ) is used to synchronize block information . The data format of the request and return results is defined in the api.v1.proto file;

  • A custom light protocol ( /<protocol-id>/light/2 ), which is used by light clients to synchronize state information on the chain. The data format is defined in the light.v1.proto file;

  • A custom transactions protocol ( /<protocol-id>/transactions/1 ) is used to broadcast the transaction information received by the node. Its format is the SCALE encoding result of the transaction set;

  • Customized block broadcast protocol ( /<protocol-id>/block-announces/1 ). When a node produces or receives a block, it broadcasts the block to other nodes.

  • A custom gossip protocol ( /paritytech/grandpa/1 ), which GRANDPA uses to notify other nodes of relevant voting information;

  • The custom Substrate legacy protocol ( /substrate/<protocol-id>/<version> ) is a protocol that will be deprecated soon. It can also synchronize and broadcast block information, handle light client requests, Gossiping (used by GRANDPA), etc.

Combining the above underlying and application layer communication protocols, Substrate nodes can establish connections through three discovery mechanisms:

  • Bootstrap nodes, whose addresses and PeerIds are fixed, are suitable for cold start of the network and when a node just joins the network, it enters the network through the bootstrap node;

  • mDNS , by broadcasting UDP packets in the local network, if there is a node responding, a connection can be established;

  • Kademlia random walk,When the connection is established, the current node can request the remote node through FIND_NODE to obtain the remote node's perspective on the composition of nodes in the current network.

The above protocols together constitute the general network layer of Substrate, and the use of this network layer is implemented through NetworkWorker and NetworkService structures. The usage example in the node template node program is as follows:

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

The decentralized communication model has opened up a new paradigm for Internet applications, but also brought considerable challenges. The emergence of the libp2p specification has gradually alleviated the pain points encountered by developers in developing peer-to-peer applications. Substrate , with the excellent characteristics of libp2p, can easily allow ordinary developers to complete complex custom blockchain applications in the decentralized application field of blockchain without paying too much attention to the underlying communication mechanism .

The countdown begins! The Polkadot Hackathon Bangkok is about to enter its final sprint

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

To encourage developers to explore the Polkadot ecosystem and Web3 in depth, the Polkadot Hackathon held by the OneBlock+ community since July 11, 2024 has now entered its final stage. The Demo Day in Bangkok will be presented on November 16, and there is only a short time left before the code submission deadline (October 23, 12:00 noon UTC+8) ! As the second event of this competition, Bangkok has attracted many innovative teams to compete for a total prize pool of up to $630,000 . Seize the last chance to join the Polkadot ecosystem and let your project shine on the stage!

🏄♂️ Register now: https://forms.gle/4pNpmp92pnX2wWSZ8

🧺 Guidelines:

Bangkok: https://dorahacks.io/zh/hackathon/polkadot-2024-bangkok/detail

🛠️ Github repository: https://github.com/OneBlockPlus/polkadot-hackathon-2024

🗳️ Technical Resource Repository: https://github.com/OneBlockPlus/Technical-docs/blob/main/Substrate-technical-docs.md

The eighth Substrate development advanced course, in-depth analysis and project practice, to help you lead the blockchain

Want to quickly master the core of blockchain technology and build your own applications? OneBlock+ and Polkadot jointly launched the eighth "Substrate Development Advanced and Project Practice" course, inviting senior experts in the industry - Wang Dachui, Zhou Jun and Sun Kaichao to provide professional guidance to students. The course will comprehensively analyze the key technologies of Substrate, help you master cutting-edge development skills, and improve your hands-on ability through project practice. Whether you want to make a name for yourself in the blockchain industry or look forward to a career breakthrough, this course will open the road to success for you.

In-depth interpretation of the Substrate network layer: libp2p helps decentralized peer-to-peer efficient communication

🪅Join now: https://wj.qq.com/s2/14825200/0zv4/

Add assistant Emma ( 🆔 oneblockEmma) to get more information!