As known to all, EOSIO has BFT-DPoS consensus algorithm and each block is produced at 500-millisecond interval. Such features are undoubtedly the core technical breakthrough for the blockchain system to support millions of users. This article will analyze the BFT-DPoS consensus algorithm in details and explain why it can achieve 500-millisecond block-production interval that is far beyond the performance of other blockchain systems.
I. BFT Protocol
BFT protocol refers to Byzantine fault tolerant protocol, which comes from “Byzantine Generals' Problem”, a problem put forward by Leslie Lamport in his paper “The Byzantine Generals Problem” published in 1982. Byzantium is the capital of ancient Eastern Roman Empire. Because of its vast territory, several generals guarding the border (just like multiple nodes in the system) need to send messages through messengers to reach certain unanimous decisions. However, possible traitors in the generals (just like errors in the system) will endeavor to send different messages to different generals in an attempt to interfere the achievement of consensus. Therefore, the Byzantine problem actually is: under this condition, how to make loyal generals act in concert.
This problem has made a direct impact on the development of blockchain. Essentially, blockchain is how a group of nodes that do not have mutual trust (some nodes will discard information, and some nodes will tamper information) reach effective consensus. Lamport's study on the problem states that "For Byzantine problem, if the total number of nodes is N and the number of traitors is F, the problem will only be solved when N is not less than 3F+1." That is to say, in a group of nodes when the number of malicious nodes is less than one-third of the total, such group of nodes could reach a consensus on a certain state through a certain protocol, and this is the BFT protocol.
Simply to say, BFT protocol is: first, select a node as main one from a group of nodes by rotation or random algorithm, and the selected node has the right to produce blocks. After packaging the transactions of a period into a block, the main node should sign the block with its private key and broadcasts the block to all nodes. Other nodes will verify and confirm the block. If a node confirms the block, it will sign and send the block to the main node. When the main node receives the block signed by at least two-thirds of all nodes, the verification for the block performed by all nodes should be regarded as completed and the block will become an irreversible one concatenated into the blockchain.
II. DPoS Algorithm
As to bitcoin consensus algorithm, PoW resource is consumed largely and the computing power gets more and more concentrated, so in 2014, the chief developer of Bitshares -- Daniel Larimer came up with the quick, safe, and less energy consumed Delegated Proof-of-Stake (DPoS) consensus mechanism. While minimizing the network cost, DPoS gives each shareholder certain right to vote for "Block Producers", and the Block Producers with the most votes will produce blocks equally in rotation.
There are two reasons that Daniel Larimer chose 101 Block Producers for Bitshare but chose 21 Block Producers for EOS. The first reason is that it’s difficult for users to fully know a large amount of Block Producers, so too many Block Producers will reduce users’ activeness to vote. The second reason is that the number of Block Producers being set as 21 could achieve more efficient consensus on the Byzantine issue with lower resource cost.
The main responsibilities of Block Producers include: to provide relevant computing resource and network resource to ensure normal operation of nodes; when having the right to produce blocks, the Block Producer should collect all transactions in the period, package the transactions into a block after verification, and broadcast the block to other Block Producers. Other Block Producers should verify and add the block into their own databases. In EOS technical white paper, it is described that the time interval of block production is defined as 3 seconds. In current network environment, when a Block Producer packages and broadcasts a block, it will take no more than 3 seconds for most other Block Producers to receive the block. Only after receiving a block broadcasted by last Block Producer, will the current Block Producer produce a new block, thus no block will be neglected. It requires the confirmation from more than two-thirds of the Block Producers to make a block irreversible. In DPoS, only when a Block Producer generates a new block does it mean the Block Producer confirms the last blockchain received. After a block is generated, only when it is followed by14 blocks, does it indicate that the block is irreversible, and the transaction in the block is irreversible as well. The entire confirmation process will take 45 seconds.
The DPoS consensus algorithm also has strong anti-forking capability, because the rate to add a block into a blockchain fork relies upon the proportion of the Block Producers that have the same consensus, in other words, the fork with more Block Producers will grow faster than the one with fewer Block Producers. At any time when seeing a longer effective chain, an honest Block Producer will switch from the current fork to the chain, and because the number of Block Producers is odd, there must be a longer chain at any time. When a Block Producer tries to produce blocks on two forks at the same time, EOS holders will vote the Producer out, and EOS community will punish the malicious Block Producers. Therefore, in general, it is difficult for EOS with DPoS to have any fork.
III. BFT-DPoS Consensus Mechanism
In the earliest EOS technical white paper, EOS mainly used the above DPoS mechanism to generate a block every 3 seconds, but now EOS adopts the BFT-DPoS consensus mechanism and achieves 500-millisecond interval for block production.
The process of the mechanism is: EOS holders vote to elect 21 Block Producers. Then the 21 Block Producers will negotiate to determine a sequence to have block producing right based on each one’s network resource When a Block Producer has block producing right, it could produce twelve new blocks continuously at a time interval of 500 milliseconds (500 milliseconds is the time interval of block production in the current network state that EOS team obtain through a large number of experimental tests.), then the next Block Producer will produce the next twelve blocks successively.
This method ensures that a Block Producer can generate blocks continuously at a time interval of 500 milliseconds, because the process that a Block Producer generates new blocks will not be affected by network condition. However, network latency will make other Block Producers difficult to confirm that the produced block is irreversible.
Therefore, EOS introduces the BFT protocol. After Block Producer A generates the first new block, A will sign the block and broadcast it to other Block Producers. Other Block Producers will verify, sign and return the block to Block Producer A. When Block Producer A receives the block signed by 14 different Block Producers, the block will become irreversible and be concatenated to the blockchain (the process of generating a new block in 500 milliseconds and the process of reaching BFT protocol consensus on a block are performed at the same time by Block Producers, which means the confirmation process does not affect the Block Producer generating a new block). Through a large number of experimental tests, EOS team found out that under the current network condition, the process that a Block Producer broadcasts and confirms a new block can be completed in 1 second. Therefore, it takes up to 1.5 seconds for each new block to become irreversible, which greatly reduces the latency of cross-blockchain communication. When a blockchain introduces the transaction state of another blockchain, the blockchain to be introduced must be irreversible. Two EOS blockchains can communicate back and forth within 3 seconds, while it takes Ethereum 9 minutes to perform similar communication and it takes Bitcoin more than 3 hours to do so.
Even though the process above could ensure that a Block Producer produces new blocks at the time interval of 500ms, the last several new blocks generated by the last Block Producer may be ignored by the current Block Producer due to network latency in the process of switching from one Block Producer to another. In order to solve this problem, EOS chooses to make Block Producers to produce blocks in certain sequence, e.g. New York (east coast of the USA), Chicago (middle area of the USA), Los Angeles (west coast of the USA), Tokyo (Japan), Shanghai (China). In this order, the last block generated by the current Block Producer will be broadcast to the next Block Producer with the minimum latency, so that the next Block Producer will not ignore the block generated by the current Block Producer. As to a Block Producer with randomly defined right of block production, under current network condition, the interval of block production should be only controlled to be not more than 3 seconds to ensure that the next Block Producer won’t ignore the blocks generated by the current Block Producer.
By the BFT-DPoS protocol above, the interval of EOS block production could be reduced from 3 seconds to 500 milliseconds, the latency of cross-chain communication could be greatly shortened, and the number of transactions that can be confirmed in unit time could be greatly increased. Such mechanism is undoubtedly a significantly factor for blockchain to support millions of users.