The Sirius Consensus Protocol
The Sirius Consensus Protocol
L. Golab, W. Golab, C.Gorenflo, S. Gorbunov, S. Keshav, S. Rizvi, and B. Wong
University of Waterloo
1. Introduction
Consensus is widely viewed as the critical bottleneck in building highly-scalable blockchains. Typical consensus protocols cannot support transaction rates of more than about 3500 transacations/s, which is far less than that achieved by centralized database systems. This limits the scope of application of blockchains to some specialized niches.
The goal of the Sirius consensus protocol is to allow Byzantine Fault Tolerance and high transaction rates (our target is 1 million transactions per second) with sub-second latency in a globally-distributed permissioned setting. The protocol can be used as a part of a permissioned blockchain, such as Hyperledger Fabric or Parity Substrate. Due to its plugin design, existing blockchain distributed applications will be able to exploit the scale provided by Sirius with no modification to application code.
In this document, we describe the key ideas in our work and open research areas that we plan to address in the near future.
2. Key ideas
Sirius is an extension of the Canopus consensus protocol [1]. Canopus is a highly-scalable and globally-distributed consensus protocol, originally designed for key-value stores that achieves (with 21 servers) nearly 5 million serializable transactions per second. Since Canopus’s transaction rate increases with increasing numbers of servers, we believe that this rate can be improved by adding more servers.
The key intuition behind Canopus is to be topology-aware and force most communication to be local. Specifically, servers are grouped into super-leafs where all members of a super-leaf are preferably in the same datacenter rack and connected to the same switch. This permits low-latency communication between peers in the same super-leaf.
Consensus in Canopus is achieved over multiple rounds of communication. In each round, representatives in every super-leaf fetch progressively higher levels of a hierarchical computation from nodes in other super-leafs. They then share this with their peers. After log n rounds, where n is the total number of nodes (servers), all nodes learn the entire global state, in parallel.
Canopus achieves high scalability by restricting the number of long-distance messages, and pipelining communication over high-latency links. Moreover, all communication is in parallel and the highly-regular communication structure allows high performance despite node and communication failure.
3. Problems with Canopus
Despite its excellent performance, Canopus is not resilient to certain faults. Specifically, Canopus stalls (loses liveness) if an entire super-leaf fails or the network partitions. Moreover, if some nodes have Byzantine failures, Canopus may compute incorrect results. Thus, in these situations, Canopus does not preserve safety or liveness, which is undesirable.
4. Sirius
Sirius addresses the safety and liveness issues with Canopus. Like Canopus, it is topology aware. The set of network nodes is divided into a number of Byzantine groups where all the members in the same group are `close’ to each other (within, say 10ms latency).
Let n be the number of nodes in the smallest Byzantine group and f be the largest number of nodes in any group that can have Byzantine faults. Then, Sirius provides the following guarantees:
1. If in every Byzantine group n > 3f holds and the network eventually recovers from partitions then every non-failed node terminates with the correct global consensus.
2. If n > 3f but the network is permanently partitioned, then every non-failed node in the super-majority partition (i.e., the partition with more than 2/3 of the super-leafs) terminates with the correct global consensus; non-failed nodes in minority partitions stall,
preserving safety at the expense of liveness.
3. If n > 3f does not hold, then neither safety nor liveness can be guaranteed.
To provide these guarantees, Sirius provides mechanisms to recover from (a) super-leaf failures, (b) network partition, and (c) Byzantine failures. These solutions are each complex, and their combination is certainly challenging. Nevertheless, we believe that we found reasonable and pragmatic solutions to all these problems.
We are also designing a highly-scalable data ingestion and analytics platform, compatible with multiple permissioned blockchain implementations, that can be used to store and analyze blockchain transactions at rates of up to 1 million transactions per second. This analytic capability will allow managers and users of distributed applications to mine usage data.
6. Next steps
Our immediate next step is to finalize our design for Sirius and prove its correctness. We will implement Sirius in the context of a permissioned blockchain. We will then integrate this prototype into some sample distributed applications of interest to industry partners, testing our solutions first in pilots, and subsequently at scale. We expect that this will require us to update our design to take into account real-world constraints. Hence, in the future, our plan is to revise Sirius as necessary to make it fully usable in a real-world setting.
References
[1] Rizvi, Sajjad, Bernard Wong, and Srinivasan Keshav. "Canopus: A Scalable and Massively Parallel Consensus Protocol." Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies. ACM, 2017. A provisional US patent has been filed on the innovative aspects of this work.