...
Page Properties | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||
|
Background
...
Informed decisions require metrics
It is not hard to walk towards a goal on the horizon. But if you are blindfolded, that task becomes impossible. You will drift off course without something to guide you. The same goes for the iroha2 core team. We have a good idea of where we would like to go. But we will not get there unless we have good guide, aka, good metrics. Specifically, real world measurements of a network's throughput, it's transactions per secondWith good data and metrics it is easy to make good engineering decisions. We want to improve iroha2's performance. Therefore, we want performance data, we want TPS numbers. But it can't be just any data, it must be accurate. An optimization might benefit performance on a developers own machine but be a regression in the real world use case. If the data is inaccurate our decisions will be too. Good metrics are therefore paramount.
Death by a thousand cuts
Currently the iroha2 codebase suffers from general slowness. Slowness that is very hard to pin down to any particular place in the codebase. In april, Aleksandr Petrosyan spent 3 weeks trying to debug what he thought was a deadlock, turns out it was just iroha2 being so slow it was hard to tell the difference. There is no obvious bottle neck we can address. Instead we are faced with needing lots of small improvements in many places. The measurable performance difference due to any specific change is negligable. But combined they are what will get us to 50x. Even Everything is slow, so any optimization increases TPS only slightly, because the rest of the codebase is still slow post optimization. We can make things better but it will require iterated small percentage improvements, most of which will not have a visible impact on TPS. Even though we cannot use metrics to decide what change is good. They are still useful in making sure we have not made things worse. If you know that you haven't introduced a regression in performance you can refactor and simplify with confidence. This will allow us to do necessary optimization/simplification faster.
...
I will be regularly performing TPS benchmarks on a set of four machinesbenchmark stand, created by DevOps. This will allow the iroha2 core team consistent insight into how code changes are affecting performance. I will establish a baseline TPS for the LTS release. That way we can make sure all our codebase simplications are improvements and not regressions. Some of this work can be handled by devops once the routine has been established.
...
There are many more things such as these yet to be discovered. I am confident we can reach 50xachieve shockingly good results compared to today. The question is simply how quickly we will get there. Either way performance testing and profiling is essential.
Decisions
Sam H Smithwill do performance testing on a 4 node benchmarking stand.
Alternatives
We could try set up automated performance testing. - currently quite difficult
Have devops do some of this work.
Simply put, we need to start profiling. Otherwise we won't be able to improve performance.
Anton is working on scripts for us to deploy iroha2 on kubernetes. This is tracked by issue, https://app.zenhub.com/workspaces/iroha-v2-60ddb820813b9100181fc060/issues/hyperledger/iroha/2450.
Once this is done I can gather performance data regularly as mentioned above.
Alternatives
Concerns
There is a concern that time spent optimizing will not be fruitful. Or that the feature requirements will change so that the optimization work is made redundant.
...
We have assumed iroha2 is at least 50-100x slower than it needs to beonly using 1-2 % of the machine's performance.
Risks
Additional Information
...