PSWG 2022: Activities and Goals
Authors: Sam Yuan (sam yuan) and Haris Javaid (Javaid, Haris)
This article summaries offline discussions between PSWG members in order to document the current activities and long term goals of PSWG. We welcome any kind of contributions to this article and any projects mentioned here. The performance sandbox project below is going to be the spotlight of this year's activities. In future, PSWG plans to discuss technological contributions in greater detail with other groups in Hyperledger community, and to offer guidance on standard procedures and emerging best practices for evaluating blockchain performance.
Please join us in the Performance and Scale Working Group to get involved and for subsequent updates on our progress.
Introduction
Blockchain applications typically involve multiple layers of infrastructure as depicted below. At the lowest layer is the hardware infrastructure which provides the basic compute, memory and network resources to deploy a distributed system. Typical example includes multi-core servers with acceleration cards (GPUs, FPGAs, etc.), large memories and hard disks. The next layer involves automated tools that enable easy creation, deployment and management of various nodes inside a distributed system. A popular example here is Kubernetes (k8s) which allows easy creation, deployment, scaling and management of containers (e.g. dockers) in a datacenter environment. The blockchain layer involves setting up the underlying blockchain system or platform such as Hyperledger Fabric, and configuring it according to the needs of the blockchain application. The top level involves creating and deploying the application on the blockchain platform, e.g. smart contracts, application data model, GUI, etc.
This article covers how we can measure and continuously observe performance of a blockchain application and/or system when deployed in k8s environment, and introduces some performance improvement techniques.
Performance Monitoring and Observability
As figure below in performance sandbox meet up, today, from traditional monitoring as metrics only, we added more features/concepts at observability level. As distributed tracing and logging, to know what's happened among distributed system.
Monitoring, been given with more human insight, and means chart ops/ai opts there, basing on data collected by observability to interacted with system.
Performance with k8s
Learn from CNCF paper, we can give a steps for blockchain adopt with cloud native and with observability's help to improve performance or handling anything related with performance.
Steps to adopt the observability, as an example below:
- Making blockchain able to deploy on k8s.
many blockchain systems supports docker based deployment and for this is the 1st step here to move your blockchain from local container runtime to k8s. - Making blockchain able to deploy on k8s with observability related tools.
there two things need to be done in this step. 1st is to implemented observability api basing on your blockchain network. For example metric and distributed tracing. 2nd is to update your blockchain deployment yaml to support observability injection for example, sidecar based/operator based observability tools as prometheus operator. - Making performance testing tools able to deploy on k8s in a distributed way.
In short, both blockchain system and test harness should be deployed on k8s and integrated with observability. - Display metrics on monitoring system for example grafana.
Once finished with dashboard setting, then start with correlating-observability-signals
Best practices for adopt with observability
Taking Hyperledger Fabric with Transaction throughput as an example:
- Hyperledger Fabric on k8s(a short description for fabric operators)
There lots of project deploy Hyperledger Fabric on k8s, for example, https://github.com/hyperledger-labs/fabric-operator, even so far performance sandbox still working with shell script based k8s-test-network, but as a long term plan, we will adopt performance sand box with fabric-operator. - Hyperledger Fabric with observability
Here we take Tape as sample. Tape is a Simple Traffic Generator for Hyperledger Fabric, it takes 3 steps from zero to integrated deployment on k8s with observability.
i. Distributed
To make test harness support distributed, you need to split traffic generator and monitoring client. Making them able to run in split mode.
ii. Metric
For test harness, we are able to expose metrics for performance research usage. For example, when we want to calculate latency, there is one way to make a latency report at test harness instead of tracing metrics for each node.
iii. Tracing
You are able to use labels and grpc options to support distributed tracing among test harness.
- Dashboard
Transaction throughput for Hyperledger Fabric, is the rate at which valid transactions are committed by the peers in a defined time period. Note that this is not the rate at a single node, but across the entire SUT, i.e. committed at all nodes of the network. This rate is expressed as transactions per second (TPS) at a network size. So which means, at least, peers from each organization participate in specific channel as the block broadcasting.
ref to https://www.hyperledger.org/learn/publications/blockchain-performance-metrics
Transaction Throughput = Total committed transactions / total time in seconds @ #committed nodes
In the test data with about 10k transaction, in 1k block.(as 10 transactions in one block following configuration.)
We can find the ledger blockchain height rate as block height/time(as rate) and among the duration in time zone. Some how it shows the throughput for transactions base on block height. Or we are able to use transactions in specific channel. As Business transaction per channel, chaincode shows. How the transactions increase during the time.
Performance Improvements with Custom Hardware
There are various ways to improve the hardware infrastructure for better performance of a blockchain system and application. Here, we provide a brief description of one such work called Blockchain Machine, done by a PSWG member (Javaid, Haris) and his team at Xilinx/AMD. The Blockchain Machine explores the use of network-attached hardware acceleration for Hyperledger Fabric to improve its performance beyond what is achievable by software-only implementation on a multi-core server. It leverages FPGA accelerator cards (such as AMD Alveo), which are being increasingly adopted for accelerating cloud workloads and are also available from major public cloud providers such as AWS and Microsoft Azure.
The scalability and peak performance of Fabric is primarily limited by the bottlenecks present in its block validation/commit phase. The validation phase is run by either an endorser peer (which also endorses transactions) or validator peer. The Blockchain Machine is a hardware accelerator which is coupled with a hardware-friendly communication protocol to act as a validator peer in Hyperledger Fabric network. Hence, it is targeted for a server with a network-attached FPGA card in contrast to existing validator peers which run Hyperledger Fabric software on just a multi-core server. The Blockchain Machine peer receives blocks from the orderer through a hardware-friendly protocol, and the block data is retrieved in FPGA without any involvement of the host CPU. The extracted block and its transactions are then passed through an efficient block-level and transaction-level pipeline in FPGA, which implements the bottleneck operations of the validation phase. Finally, Hyperledger Fabric software running on the host CPU accesses the block validation results from hardware, and then commits the block to disk-based ledger just like the software-only validator peer. Overall, a Blockchain Machine peer is a hardware/software co-designed peer, leveraging both CPUs and FPGA-based accelerator cards to deliver significantly better performance than just using CPUs in a multi-core server.
For more technical details about Blockchain Machine, check out the following paper and the open-source repo:
- H. Javaid, J. Yang, N. Santoso, M. Upadhyay, S. Mohan, C. Hu, G. Brebner. 2022. Blockchain Machine: A Network-Attached Hardware Accelerator for Hyperledger Fabric. International Conference on Distributed Computing Systems (ICDCS).
- AMD. Fabric Machine Repo. Available at https://github.com/hyperledger-labs/fabric-machine
For a case study on how AMD used this technology to scale its supply chain, listen to the following talk:
- M. Kumaraswamy, H. Javaid. 2022. Accelerated Hyperledger Fabric for Supply Chain Applications in Semiconductor Industry. Hyperledger Global Forum.
How to Scale Performance Monitoring and Improvements
Just answer one single question, how to testing a system with thousands of nodes?
It seems that we got one answer learn from CNCF on a recording for kubecon.
As we test scaling with mock system, for detail:
When we deep dive blockchain node, we are able to split features or modules inside a node.
internal features
In this area, when we scale up size of blockchain system, it just linear increasing.
for example, an engine to running a specific chain code. For most blockchain system, the node should be able to running chaincode for execution and business logic.
However, for scale up, it just linear increasing the resources.
Unfortunate our research environment is resource limited, hence to do the research or just simulate a large size of network, we need to mock this part of features in our mock system for example making it always return success for any response.
external communication features
In this area, when we scale up size of blockchain system, it is the key features been impacted by size.
for example, most of blockchain system relay on p2p network or gossip network.
With scale up the sizing of blockchain network, the performance of p2p network will impact the performance of blockchain system.