Protocol Overview
A quick primer for product teams: SKALE uses leaderless, asynchronous BFT consensus with BLS threshold signatures. Blocks finalize instantly once 2/3 of a chain’s validator committee signs, so there are no reorgs or multi-confirmation waits. Randomized rotation and data-availability proofs keep proposals honest while maintaining high throughput and zero-gas UX. [Graphic placeholder: Proposal → DA proof → BFT vote → BLS signature → instant finality] A SKALE chain is a Proof-of-Stake blockchain fully compatible with ETH mainnet. It can run ETH wallets, tools and dapps. As any Ethereum-compatible chain, SKALE chain includes a blockchain, which is a chain of transactions ordered into committed blocks, and a computing machine denoted as EVM. The set of variables stored in EVM is denoted as EVM state. Transaction Processing Flow: Transactions are processed sequentially through the EVM. Processed transactions (Tx1, Tx2, Tx3) flow through EVM processing, with each transaction being processed one at a time to update the EVM state. EVM processes committed blocks one transaction at a time. For each transaction it runs instructions (bytecode) specified by the transaction and changes EVM state.
Architecture overview
The purpose of SKALE chain is to order transactions into blocks and then process them by EVM. SKALE chain is composed of a fixed set of N network nodes that process user transactions in the following phases:- Accept and validate user transactions (submission phase)
- Broadcast transactions to peer nodes (broadcast phase)
- Store transactions into pending queues (pending queue phase)
- Create block proposal for each block number and broadcast it to peers, collecting 2/3 N data availability signatures and creating DA proofs (block proposal phase)
- Broadcast DA proofs to peers (DA broadcast phase)
- Run block consensus for each block proposal to select a winning proposal (block consensus phase)
- Sign the statement on which proposal won (block signature share) and broadcast it to other nodes. Wait until receipt of 2/3 of block signature shares and merge them into block signature (block signature phase)
- Commit the winning proposal if node has it, otherwise download it from other nodes and commit it. The winning proposal becomes a committed block (block finalization phase)
- Process the committed block through Ethereum Virtual Machine to transition to the new EVM state (EVM processing phase)
- Store committed blocks and EVM state (storage phase)
In addition to normal block processing, a node can receive blocks through block catchup mechanism.
SKALE node overview
Each node runs skaled, SKALE software blockchain agent. skaled is composed of:- Network API module accepts transactions and user requests
- Transaction validation module validates transactions on receipt
- Pending queue module holds transactions
- Transaction broadcast module broadcasts valid transactions to other nodes in the chain
- Proposal module creates block proposals for consensus
- Proposal broadcast module broadcasts block proposals to peers and collects DA proofs
- DA proof broadcast module broadcasts DA proofs to peers
- Consensus module selects the winning block proposal and turns it into a committed block, and then creates block signature by assembling signature shares
- Finalization module downloads winning proposals from other nodes, if a node does not have a copy of winning proposal by completion of block consensus
- EVM module processes the committed block
- Block storage module stores committed blocks, deleting old blocks if skaled runs out of block storage space (block rotation)
- State storage module stores EVM state. State information is never deleted automatically. Cleaning up the state is the responsibility of dapps
Security assumptions overview
SKALE is provably secure. This means one can prove two qualities of the blockchain:- Consistency - for any block number, committed blocks and EVM state are identical on each node. Note that due to network delays, some nodes may at a given moment have less committed blocks than others. Therefore, the consistency is eventual.
- Liveliness - the blockchain will always keep producing new committed blocks.
Node security assumptions
We assume that out of N nodes, t nodes at maximum are Byzantine (malicious), where Simply speaking, not more than 1/3 of nodes can be malicious. For instance, if N = 16, the maximum number of malicious nodes is 5. The identity of malicious nodes is not known. A malicious node will typically pretend being an honest node. A malicious node will attempt to break the consistency and liveliness of the network by sending malicious messages, or not sending any messages when it is supposed to send a message by a protocol. It is assumed that malicious nodes do not control network routers and links. This means, in particular, that malicious nodes cannot affect messages sent between honest nodes, such as corrupting or reordering them.Network security assumptions
The algorithms used by SKALE make assumptions about the properties of the underlying network. SKALE assumes that the network is asynchronous and reliable with eventual delivery guarantee. This means that:- Nodes are assumed to be connected by reliable communications links
- Links can be arbitrarily slow, but will eventually deliver messages
Protocol phases overview
Submission phase
During submission phase a user client (browser or mobile app) signs a transaction using user private wallet key and submits it either directly to one of core nodes or to a network proxy. A network proxy is a node that load balances incoming transactions to core nodes attempting to load them evenly, and avoiding transaction submissions to non-responsive nodes.Broadcast phase
During the broadcast phase, a node that received a transaction from user client will broadcast it to other core nodes.Pending queue phase
During the pending queue phase, a transaction received from user client or from transaction broadcast is validated and placed into the pending queue. During the validation, transaction signature and format are verified.The pending queue has fixed memory capacity. If the pending queue is full, adding a new transaction to the queue will cause some transactions to be dropped from the pending queue. Ethereum-compatible blockchains, including SKALE, drop transactions with the smallest gas price.
Block proposal phase
During the block proposal phase each SKALE node will form a block proposal. A block proposal is an ordered list of transactions. If all transactions in pending queue can be placed into proposal without reaching block gas limit, then all transactions will be placed into block proposal. Otherwise, transactions with higher gas price will be selected from the queue to create a block proposal that fits the block gas limit. Once a node created a proposal, it will broadcast compressed proposal to all its nodes. The compressed proposal includes only the transaction hash (fingerprint) of each transaction. The receiving node decompresses transactions by matching transaction hashes to transactions stored in its pending queue. In the event receiving node does not have a matching transaction in its pending queue, it will ask the sending node for the entire transaction. Once the receiving node receives the block proposal, it will sign a Data Availability Signature and pass it to the sending node. Once the sending node collects DA signatures from 2/3 of nodes, it will merge the signatures into a DA proof. The DA proof proves that the proposal has been widely distributed over the network.DA broadcast phase
Once a node obtains a DA proof for its block proposal, it will broadcast DA proof to other nodes.Block consensus phase
Once a node receives DA proofs from 2/3 of nodes, the node will start the block consensus phase. During block consensus phase, the node will vote1 if it received DA proof for a particular proposal, and vote 0 otherwise.
The nodes will then execute asynchronous binary consensus algorithm, also known as Byzantine Generals problem. See Byzantine fault tolerance for more information.
The particular binary consensus algorithm implemented in SKALE is specified in this paper.
Once the binary consensus completed, it guarantees that all honest nodes will reach consensus of 1 or 0. If honest nodes reach 1 it is guaranteed that 1 was initially voted by at least one honest node. That, in turn, guarantees that the block proposal is DA safe, or that it is widely distributed over the network.
If a block consensus phase outputs 1 for several proposals, the proposal with highest priority is selected. The priority changes from one block to another so that on average each node has similar probability to win.
Block signature phase
After block consensus decides on the winning block, each node will sign the statement specifying the winning proposal (block signature share) and broadcast it to other nodes. The node will then wait until receipt of 2/3 of block signature shares and merge the shares into block signature.Block finalization phase
On completion of block signature phase, all honest nodes will have the block signature but some of them may not have the block itself. This can happen due to a malicious proposer, that intentionally does not send its proposals to some of the all nodes in order to break the liveliness property of the blockchain. It can also happen due to proposer crashing, or due to slow network. Fortunately, DA proof requirement solves the problem. It is guaranteed, that block proposal that wins block consensus phase has DA proof, and is, therefore, widely distributed across the network. Therefore, during block finalization phase if a node does not happen to have the winning proposal, it will simply connect to other nodes to download it from them.2/3 of the nodes are guaranteed to have a copy of the proposal after DA proof phase.
EVM processing phase
After block finalization the block is present on the node. It will be then processed through Ethereum Virtual Machine to update EVM state.Storage phase
Committed block will now be stored in persistent storage, and EVM state will be updated in persistent storage. The node will move into block proposal phase for the next block.Achieving eventual delivery by retransmissions
Since real Internet sometimes drops messages on the way without delivering them, the eventual delivery guarantee is achieved in practice by retransmissions. The sending node will make multiple attempts to transfer a message to the receiving node, until the transfer is successful and is confirmed by the receiving node. Each sending node maintains a separate outgoing message queue for each receiving node. To schedule a message for delivery to a particular node, message is placed into the corresponding outgoing message queue. Each outgoing message queue is serviced by a separate program thread. The thread reads messages from the queue and attempts to transfer them to the destination node. If the destination node temporarily does not accept messages, the thread will keep initiating transfer attempts until the message is delivered. The destination node can, therefore, temporarily go offline without causing messages to be lost. Since there is a dedicated message sending thread for each destination node, messages are sent independently. Failure of a particular destination node to accept messages will not affect receipt of messages by other nodes. In the remainder of this document, anywhere where it is specified that a message is sent from node A to B, we mean reliable independent delivery as described above.Consensus state
Each node stores consensus state. For each round of consensus, consensus state includes the set of proposed blocks, as well as the state variables of the protocols used by the consensus round. The state is stored in non-volatile memory and preserved across reboots.Reboots and crashes
During a reboot, a node will temporarily become unavailable. After a reboot, messages destined to the node will be delivered to the node. Therefore, a reboot does not disrupt operation of asynchronous consensus. Since consensus protocol state is not lost during a reboot, a node reboot will be interpreted by its peers as a temporarily slowdown of network links connected to the node. A hard crash is an event where a node loses all or parts of the consensus state. For instance, a node can lose received block proposals or values of protocol variables. A hard crash can happen in case of a software bug or a hardware failure. It also can happen if a node stays offline for a very long time. In this case, the outgoing message queues of nodes sending messages to this node will overflow, and the nodes will start dropping older messages. This will lead to a loss of a protocol state.Default queue lifetime
This specification specifies one hour as a default lifetime of a message which has been placed into an outgoing queue. Messages older than one hour may be dropped from the message queues. A reboot, which took less than an hour is, therefore, guaranteed to be a normal reboot.Limited hard crashes
Hard crashes are permitted by the consensus protocol, as long as not too many nodes crash at the same time. Since a crashed node does not conform to the consensus protocol, it counts as a Byzantine node for the consensus round, in which the state was lost. Therefore, only a limited number of concurrent hard crashes can exist at a given moment in time. The sum of crashed nodes and Byzantine nodes cannot be more than t in the equation above. Then the crash is qualified as a limited hard crash. During a limited hard crash, other nodes continue block generation and consensus. The blockchain continues to grow. When a crashed node is back online, it will sync its blockchain with other nodes using a catchup procedure described in this document, and start participating in consensus.Widespread crashes
A widespread crash is a crash where the sum of crashed nodes and Byzantine nodes is more than t. During a widespread crash a large proportion of nodes or all nodes may lose the state for a particular round and consensus progress may stall. The blockchain, therefore, may lose its liveliness. Security of the blockchain will be preserved, since adding a new block to blockchain requires a supermajority threshold signature of nodes, as described later in this document. The simplest example of a widespread crash is when more than 1/3 of nodes are powered off. In this case, consensus will stall. When the nodes are back online, consensus will start working again. In real life, a widespread crash can happen due to a software bug affecting a large proportion of nodes. As an example, after a software update all nodes in an SKALE Chain may experience the same bug.Failure resolution protocol
In a case of a catastrophic failure a separate failure resolution protocol is used to restart consensus. First, nodes will detect a catastrophic failure by detecting absence of new block commits for a long time. Second, nodes will execute a failure recovery protocol that utilizes Ethereum main chain for coordination. Each node will stop consensus operation. The nodes will then sync their blockchains replicas, and agree on time to restart consensus. Finally, after a period of mandatory silence, nodes will start consensus at an agreed time point in the future.Blockchain architecture
Each node stores a sequence of blocks. Blocks are constructed from transactions submitted by users. The following properties are guaranteed:- Block sequence - each node stores a block sequence B_i that have positive block IDs ranging from 0 to HEAD
- Genesis block - every node has the same genesis block that has zero block id
- Liveliness - the blockchain on each node will continuously grow by appending newly committed blocks. If users do not submit transactions to the blockchain, empty blocks will be periodically committed. Periodic generation of empty blocks serves as a beacon to monitor liveliness of the blockchain
- Fork-free consistency - due to network propagation delays, blockchain lengths on two nodes A and B may be different. For a given block id, if both node A and node B possess a copy of a block, the two copies are guaranteed to be identical
Honest and Byzantine Nodes
An honest node is a node that behaves according to the rules described in this document. A Byzantine node can behave in arbitrary way, including doing nothing at all. The goal of a Byzantine node is to either violate the liveliness property of the protocol by preventing the blockchain from committing new blocks or violate the consistency property of the protocol by making two different nodes commit two different blocks having the same block ID. It is assumed that out of N total nodes, t nodes are Byzantine, where the following condition is satisfied: or The above condition is well known in the consensus theory. There is a proof that shows that secure asynchronous consensus is impossible for larger values of t. It is easy to show that if a security proof works for a certain number of Byzantine nodes, it will work for fewer Byzantine nodes. Indeed, an honest node can always be viewed as a Byzantine node that decided to behave honestly. Therefore, in proofs, we always assume that the system has the maximum allowed number of Byzantine nodes: In this case the number of honest nodes is: Note, that it is beneficial to select N in such a way that (N-1)/3 is divisible by 3. Otherwise an increase in N does not lead to an increase in the maximum allowed number of Byzantine nodes. As an example, for N = 17 we get t = 5, so an increase in N does not improve Byzantine tolerance. In this specification, we assume that N is always selected in such a way that (N-1) is divisible by 3. In this case, expressions simplify as follows:Mathematical properties of node voting
Consensus uses voting rounds. It is, therefore, important to prove some basic mathematical properties of voting. Typically, a node will vote by signing a value and transmitting it to other nodes. To count votes, a receiving node will count received signatures for a particular value v. The number of Byzantine nodes is less than a simple majority of honest nodes. This directly follows from the fact that: and, therefore, a simple majority of honest nodes is: We define supermajority as a vote of at least: nodes. A vote of all honest nodes is a supermajority. Proof: this comes from the fact that: If a particular message was signed by a supermajority vote, at least a simple majority of honest nodes signed this message. Even if all Byzantine nodes participate in a supermajority vote, the number of honest votes it needs to receive is: which is exactly the simple majority of honest nodes s. If honest nodes are required to never sign conflicting messages, two conflicting messages cannot be signed by a supermajority vote. Proof: let A and B be two conflicting messages. Since a particular honest node will sign either A or B, both A and B cannot get simple majority of honest nodes. Since a supermajority vote requires participation of a simple majority of honest nodes, both A and B cannot reach a supermajority, even if Byzantine nodes vote for both. A supermajority vote, is, therefore, an important conflict avoidance mechanism. If a message is signed by a supermajority vote, it is guaranteed that no conflicting messages exist. As an example, if a block is signed by a supermajority vote, it is guaranteed that no other block with the same block ID exists.Threshold signatures
Our protocol uses threshold signatures for supermajority voting. Each node is supposed to be in possession of BLS private key share PKS_I. Initial generation of key shares is performed using joint-Feldman Distributed Key Generation (DKG) algorithm that is described in this document. DKG algorithm is executed when an SKALE Chain is created. Nodes are able to collectively issue supermajority threshold signatures on messages, where the threshold value is equal to the supermajority vote: For instance for N = 16, the threshold value is 11. BLS threshold signatures are implemented as described in the paper by Boldyreva. BLS threshold signatures require a choice of elliptic curve and group pairing. We use elliptic curve (altBN256) and group pairing (optimal-Ate) implemented in Ethereum Constantinople release. To verify the signature, one uses BLS public key PK. This key is computed during the initial DKG algorithm execution. The key is stored in SKALE manager contract on Ethereum mainnet and is available to anyone.Transactions
Each user transaction T is assumed to be an Ethereum-compatible transaction, represented as a sequence of bytes.Block format: header and body
Each block is a byte string, which includes a header followed by a body.Block format: header
Block header is a JSON object that includes the following:- BLOCK_ID - integer id of the current block, starting from 0 and incremented by 1
- BLOCK PROPOSER - integer id of the node that proposed the block
- PREVIOUS BLOCK HASH - SHA-3 hash of the previous block
- CURRENT BLOCK HASH - the hash of the current block
- TRANSACTION COUNT - count of transactions in the current block
- TRANSACTION SIZES - an array of transaction sizes in the current block
- CURRENT BLOCK PROPOSER SIG - ECDSA signature of the proposer of the current block
- CURRENT BLOCK TSIG - BLS supermajority threshold signature of the current block
All integers in this spec are unsigned 64-bit integers unless specified otherwise.
Block format: body
BLOCK BODY is a concatenated transactions array of all transactions in the block.Block format: hash
Block hash is calculated by taking 256-bit Keccak hash of block header concatenated with block body, while omitting CURRENT BLOCK HASH, CURRENT BLOCK SIG, and CURRENT BLOCK TSIG from the header. The reason why these fields are omitted is because they are not known at the time block is hashed and signed.Throughout this spec we use SHA-3 as a secure hash algorithm.
Block verification
A node or a third party can verify the block by verifying a threshold signature on it and also verifying the previous block hash stored in the block. Since the threshold signature is a supermajority threshold signature and since any honest node will only sign a single block at a particular block ID, no two blocks with the same block ID can get a threshold signature. This provides security against forks.Block proposal format
A block starts as a block proposal. A block proposal has the same structure as a block, but has the threshold signature element unset. Nodes concurrently make proposals for a given block ID. A node can only make one block proposal for a given block ID. Once a block proposal is selected to become a block by consensus, it is signed by a supermajority of nodes. A signed proposal is then committed to the end of the chain on each node.Pending transactions queue
Each node will keep a pending transactions queue. The first node that receives a transaction will attempt to propagate it to all other nodes in the queue. A user client software may also directly submit the transaction to all nodes. When a node commits a block to its blockchain, it will remove the matching transactions from the transaction queue.Gas fees
Each transaction requires payment of a gas fee, compatible with ETH gas fee. The gas fee can be paid in native currency of the SKALE chain (sFUEL) or in Proof of Work. The gas price is adjusted after each committed block. It is decreased if the block has been underloaded, meaning that the number of transactions in the block is less than 70 percent of the maximum number of transactions per block, and is increased if the block has been overloaded.Compressed block proposal communication
Typically pending queues of all nodes will have similar sets of messages, with small differences due to network propagation times. When node A needs to send to node B a block proposal P, A does not need to send the actual transactions that compose P. A only needs to send transaction hashes, and then B will reconstruct the proposal from hashes by matching hashes to messages in its pending queue. In particular, for each transaction hash in the block proposal, the receiving node will match the hash to a transaction in its pending queue. Then, for transactions not found in the pending queue, the receiving node will send a request to the sending node. The sending node will then send the bodies of these transactions to the receiving node. After that the receiving node will then reconstruct the block proposal.Consensus data structures and operation
Blockchain
For a particular node, the blockchain consists of a range of committed blocks B_i starting from B_0 and ending with B_TIPID, where TIP_ID is the ID of the largest known committed block. Block ids are sequential positive integers. Blocks are stored in non-volatile storage.Consensus rounds
New blocks are created by running consensus rounds. Each round corresponds to a particular BLOCK_ID. At the beginning of a consensus round, each node makes a block proposal. When a consensus round completes for a particular block, one of block proposals wins and is signed using a supermajority signature, becoming a committed block. Due to a randomized nature of consensus, there is a small probability that consensus will agree on an empty block instead of agreeing on any of the proposed blocks. In this case, an empty block is pre-committed to a blockchain.Catchup agent
There are two ways, in which blockchain on a particular node grows and TIP_ID is incremented: Normal consensus operation: during normal consensus, a node constantly participates in consensus rounds, making block proposals and then committing the block after the consensus round commits. Catchup: a separate catchup agent is continuously running on a node. The catchup engine is continuously making random sync connections to other nodes. During a sync both nodes sync their blockchains and block proposal databases. If during catchup, node A discovers that node B has a larger value of TIP_ID, A will download the missing blocks range from B, and commit it to its chain after verifying supermajority threshold signatures on the received blocks.Both normal and catchup operation append blocks to the blockchain. The catchup procedure is intended to catchup after hard crashes.
Normal consensus operation
Block proposal creation trigger
A node is required to create a block proposal directly after its TIP_ID moves to a new value. TIP_ID will be incremented by 1 once a previous consensus round completes. TIP_ID will also move, if the catchup agent appends blocks to the blockchain.Block proposal creation algorithm
To create a block a node will:- Examine its pending queue
- If the total size of transactions in the pending queue TOTAL_SIZE is less or equal than MAX_BLOCK_SIZE, fill in a block proposal by taking all transactions from the queue
- Otherwise, fill in a block proposal of MAX_BLOCK_SIZE by taking transactions from oldest received to newest received
- Assemble transactions into a block proposal, ordering transactions by sha-3 hash from smallest value to largest value
- In case the pending queue is empty, the node will wait for BEACON_TIME and then, if the queue is still empty, make an empty block proposal containing no transactions
The node does not remove transactions from the pending queue at the time of proposal. The reason for this is that at the proposal time there is no guarantee that the proposal will be accepted.
Block proposal reliable communication algorithm
Once a node creates a block proposal it will communicate it to other nodes using the data availability protocol described below. The data availability protocol guarantees that if the protocol completes successfully, the message is transferred to the supermajority of nodes. The five-step protocol is described below:- Step 1: the sending node A sends the proposal P to all of its peers
- Step 2: each peer on receipt of P adds the proposal to its proposal storage database PD
- Step 3: the peer then sends a receipt back to A that contains a threshold signature share for P
- Step 4: A will wait until it collects signature shares from a supermajority of nodes (including itself). A will then create a supermajority signature S. This signature serves as a receipt that a supermajority of nodes are in possession of P
- Step 5: A will send the supermajority signature to each of the nodes
Pluggable Binary Byzantine Agreement
The consensus described above uses an Asynchronous Binary Byzantine Agreement (ABBA) protocol. We currently use ABBA from Mostéfaoui et al. Any other ABBA protocol P can be used, as long as it has the following properties:- Network model: P assumes asynchronous network messaging model described above
- Byzantine nodes: P assumes less than one third of Byzantine nodes, as described by the equation above
- Initial vote: P assumes that each node makes an initial vote
yes(1)orno(0) - Consensus vote: P terminates with a consensus vote of either
yesorno, where if the consensus vote isyes, it is guaranteed that at least one honest node voted yes
An ABBA protocol typically outputs a random number COMMON_COIN as a byproduct of its operation. We use this COMMON_COIN as a random number source.
Consensus round
A consensus round R is executed for each BLOCK_ID and has the following properties:- For each R nodes will execute N instances of ABBA
- Each ABBA_i corresponds to a vote on block proposal from the node i
- Each ABBA_i completes with a consensus vote of
yesorno - Once all ABBA_i complete, there is a vote vector v_i, which includes
yesornofor each proposal - If there is only one
yesvote, the corresponding block proposal P is committed to the blockchain - If there are multiple
yesvotes, P is pseudo-randomly picked from theyes-voted proposals using pseudo-random number R. The winning proposal index is the remainder of division of R by n_win, where n_win is the total number ofyesproposals - The random number R is the sum of all ABBA COMMON_COIN
- In the rare case when all votes are
no, an empty block is committed to the blockchain. The probability of an all-no vote is very small and decreases when N increases
yes vote unless at least one honest node initially votes yes, and from the fact that an honest node will not vote yes unless it has a data availability proof (threshold signature S).
Consensus round vote trigger
Each node A will vote for ABBAs in a consensus round R immediately after proposal phase completes, meaning that two processes complete:- A receives a supermajority of block proposals for this round, including data availability signatures
- A transmits its block proposal to a supermajority of nodes
yes for each block proposal that it received, and no for each block proposal that it did not receive.
Vote of each honest node will include:
yes votes and:
no votes.
This simply follows from the fact that node A votes immediately after receiving a supermajority of block proposals, and from the fact that A votes yes for each block proposal that it received.
Finalizing Winning Block Proposal
Once consensus completes on a particular node A and the winning block proposal, the node will execute the following algorithm to finalize the proposal and commit it to the chain:- A will check if it has received the winning proposal P
- If A has not received the proposal, it will download it from its peer nodes using the algorithm described later in this document. It is possible to do it because of the data availability guarantee
- A will then sign a signature share for P and send it to all other nodes
- A will then wait to receive signature shares from a supermajority of nodes, including itself
- Once A has received a supermajority of signature shares, it will combine them into a threshold signature
- A will then commit the P to the blockchain together with the threshold signature of P
- A sends a message to each peer i, requesting for chunk i
- A waits until it receives a supermajority - 1 of responses
- A then enumerates missing chunks
- A then randomly assigns each missing chunk to servers, and empty chunks to each server that did not get a missing chunk assigned, and sends the corresponding requests to each server
- A waits until receives supermajority - 1 of responses
- If A received all chunks, the algorithm is complete. Otherwise it goes back to step 3
FUTURE: we may implement more advanced algorithms based on erasure codes.
