Below are a few highlights of the Hive-related programming issues worked on by the BlockTrades team since my last report.
Originally, I thought we would be able to release updates for a bunch of different tools by now, but everything took longer than expected, although in some cases the delays weren't always for bad reasons: we discovered several ways to dramatically improve performance of HAF, so we decided to delay the release to incorporate those changes and continue to add features to hived and HAF apps in the meantime. As for the "bad reasons", we've basically been overhauling the entire approach to deployment of HAF and HAF apps to make it super simple, and we ran into lots of issues along the way as we determined the best practices for what I'm now calling HAF 2.0.
What is HAF 2.0?
HAF is a library and service for developing specialized Hive-based APIs and applications. It acts as a backend server for web-based applications that want to utilize the Hive blockchain network.
We started working on HAF more than 2 years ago, and released the first production version about a year ago. Since then we've steadily been making incremental improvements, but in the past year we've been steadily increasing the number of developers working on HAF and HAF apps as it is the foundation for all our future apps including our layer 2 smart contract processing engine.
HAF 2.0: Completely overhauling deployment and maintenance
With HAF 2.0, one key focus has been on easing deployment and maintenance of HAF servers and HAF apps. The recommended deployment method has completely changed with 2.0, as our goal is to create an ecosystem where any API node operator can quickly and easily install or uninstall any selection of HAF apps they want to support on their server(s).
Perhaps the best analogy (well, for programmers) is we're building something like a packaging system for HAF and HAF apps. Another way to look at is we're making HAF apps available as appliances that can easily interact with each other.
Another deployment improvement has been to standardize on a ZFS dataset layout as a method of delivering "operation ready" snapshots of HAF servers that don't require a replay of the entire blockchain at setup.
This is also extremely beneficial during development of HAF apps: you can take snapshots of your server at any point in time and then later rollback to that same state again, making it easy to: recover from database corruptions that occur while developing your app, reproduce bugs, etc. I've found the ability to easily reproduce bugs and performance problems to be especially useful in my work (in one case I rolled back to the same snapshot about 20 times while analyzing a performance problem that occurred in a specific block range).
HAF 2.0 uses docker compose scripts to deploy the entire stack for an API node
HAF 2.0 also includes a full deployment stack with all the apps needed to run an API node. Previously an API node operator had to not only setup a HAF server, they also had to deploy various other apps such as nginx, jussi, varnish, haproxy, caddy, etc to setup an efficient server that caches API responses, allow re-routing of traffic to different services in the stack, and manage rate-limiting to prevent DDOS attacks. Varnish is a "newcomer" to our stack and is used to provide caching for the new REST-based APIs offered by our new HAF apps such as balance_tracker and the block_explorer. In practice, we've found these offered better performance than the older json-rpc based APIs (these are still cached by jussi).
With HAF 2.0, you can deploy and manage all these services with docker compose scripts all configured by a single .env file. More about this setup can be found at https://gitlab.syncad.com/hive/haf_api_node
HAF 2.0: improving performance and decreasing hardware requirements
In the past few weeks, we improved massive sync replay time by 33% and live sync update performance by 45%. And looking at benchmarks over the past three months, a full replay of HAF used to take around 30 hours to process the entire 80 million blocks of the Hive blockchain and it is now down to 17.5 hours (14 hours for replay, 3.5 hours for creating HAF indexes). At the same time, we recently cut CPU usage by 10x (this doesn't show up as a 10x speedup because the code is heavily multi-threaded, but it does mean HAF can be run on a cheaper computers with fewer cores and consumes less energy or alternatively those extra cores can be used for running other processes like nginx, jussi, haproxy, varnish, more hived nodes, etc).
Disk space storage has also dramatically been reduced. First by storing blockchain operations as binary data, and second by using lz4 compression via ZFS. These methods allowed us to cut database storage requirements by more than 2x.
HAF servers also require much less memory to operate now. Databases almost always like more memory, and HAF was originally targeted towards servers with 64GB memory, but by keeping hived's shared_memory.bin statefile on an NVME drive, we've found that a HAF server can quite comfortably operate as a production server with 32GB of RAM.
It is worth mentioning we're not yet done with performance improvements for HAF: during our recent work we came up with a few more ideas for speedups, but we just ran out of time to fit those improvements into the upcoming release.
Other projects
We're also working on a host of other tools (some of our devs have already made posts about Clive, for example) and it has gotten to the point that I think it makes more sense if I post about the projects I spend the most time on and leave posts about the other projects to the devs involved in them (otherwise my posts would start to get really long nowadays given the number of devs involved on different projects).
So I'll leave it to them (hopefully) to make posts after the holidays about some of the other projects such as the block explorer, Denser, Helpy, WAX, Clive, beekeeper, etc.
Final testing started for HAF 2.0
We started full replays of HAF servers with the latest code a couple of days ago. Although HAF only takes 17 hours to replay, hivemind takes considerably longer (something on the order of 80 hours I think) as we haven't had any time to further optimize its performance yet.
We should have several servers fully replayed by Tuesday, at which time we'll start directing production traffic from api.hive.blog to some of these servers as a final test of performance under real world conditions.
Barring any problems, we'll tag passing versions as final releases and setup downloadable snapshots for api node operators who don't want to wait on replays.
[UPDATE] Sharing one more benchmark from our new fastest system:
"speed king" s16 (AMD 7950X with 64GB DDR5 6400Mhz and 2x4T CT4000T700 nvmes) full replay with ramdisk reached livesync in 11.1h, built indexes in 3.2h, and ready to serve up data after 11.1+3.2=14.3 hours!