Below are a few highlights of the Hive-related programming issues worked on by the BlockTrades team since my last report.
For the past two months, we’ve been testing and improving the various apps that make up the stack for Hive API nodes, as well as the infrastructure that supports it.
The big announcement is that today we’ve released 1.27.5rc8 (release candidate 8), which based on testing so far, will probably be re-tagged this week as the official release of the stack.
HAF API node
This repo contains docker compose scripts to easily install and manage a full Hive API node. It’s so simple to use that anyone who’s familiar with basic Linux system administration should be able to setup and operate their own Hive API node now. One of the best features of the new stack is that it easily allows API node operators to incrementally add support for new Hive APIs as new HAF apps get released.
Recent changes to the scripts include:
- Supports using either Drone or Jussi for reverse proxying and caching of JSON-based API calls. Drone is now the recommended choice.
- ZFS Snapshot script has an option for “public snapshots” to avoid including logs in the snapshot (e.g. when you want to supply a snapshot to someone else).
- Fixed log rotation of postgres logs.
- Caddy now overrides Jussi/drone handling of CORS
- Fixed various issues with the assisted_startup.sh script.
- Fixed various configuration problems found while testing on our production node.
- The .env file now supports downloading pre-built docker images from gitlab, dockerhub, and registry.hive.blog.
- Added a bind-mount point for caddy logging.
- Fixes to healthcheck checks for btracker, hafbe, and hafah.
- Configured haproxy to log healthcheck failures.
- Fixes to btracker/hafbe uninstall processed.
- Simplify exposing internal ports between services when spreading node services across several servers.
We also updated the CI processes for hive, HAF, and the various HAF apps so that whenever we manually tag a gitlab repo with a new release or release candidate, the associated docker images built by CI will also be tagged with the same tag, and then automatically pushed to dockerhub and registry.hive.blog.
Last night I used the jussi traffic analyzer tool to analyze API call performance on api.hive.blog (despite the name it also works for analyzing traffic when your node is configured to use Drone). We’re running api.hive.blog itself off a single server now (it used to be spread across 4 servers that we’d been using since Hive was first launched) and the average API call response time is about 2x what it was back in November of last year, despite handling about 50% more traffic than our old configuration. And our CPU load is also significantly lower, so we have substantial headroom for more load.
Note: the new server hardware is also somewhat faster than our old servers (at least compared to one individually) and has the same amount of disk storage as all 4 of the old servers had (4x2TB = 8TB). There’s less available memory (the new server has 128GB, the 4 old servers had 64GB each for a total of 256GB), but we’ve also significantly lowered memory requirements over time, so this isn’t a problem. In fact, currently an API node should be able to handle quite a lot of traffic with only 64GB of RAM.
HAF (Hive Application Framework)
We fixed several issues that could cause a HAF app’s state to get corrupted during shutdowns of HAF or HAF apps because of a missed block. The balance tracker app was particularly useful in helping us identify those problems because a single block of missed data would typically cause it to fail before too long with a divide-by-zero error. As part of these changes, we also simplified the usage of the HAF API for creating HAF-based apps.
We improved the stability of a number of CI tests and added more tests (e.g. tests to test filtering of operations during HAF replays). And we found and fixed an interaction bug between the fc scheduling library used by hived and the “faketime” tool we use in CI testing that could result in busy looping. Fixing this bug significantly reduced the loading on our test runners and, more importantly, eliminated several intermittent test fails that were occurring because tests were sometimes consuming 100% of the CPU cores on our test runners.
We shrunk the size of the HAF docker image used by HAF API nodes from 3120 MB down to 611 MB to speed up downloads during installation (the 3120MB version contains all tools needed for development of HAF apps).
HAF Block Explorer and Balance Tracker APIs
We updated these apps to accommodate the changes to the HAF API.
We also added some new API calls and fixed bugs found in existing API calls while testing the Block Explorer UI.
And we added caching headers that provide varnish with hints about how long to cache API responses based on the API call type and the API parameters specified.
Drone (replacement for Jussi)
A while back, @deathwing created and deployed a reverse-proxying program called Drone that could potentially replace Jussi. For the past couple of months we’ve been working on completing various changes necessary to allow us to replace Jussi in the standard API node stack. We made a bunch of small changes, but the major changes we made were improvements to Drone’s caching system, and at this point its caching performance is as flexible and more performant than Jussi’s.
Hivemind API (social media API)
We dramatically improved the performance of several slow queries in hivemind that we discovered while testing the stack in a production environment. These improvements also reduce database loading on API nodes. There’s still a couple queries that need improvement, but we can easily deploy more improvements later as none of the changes will require a replay of hivemind.
Hivemind was updated to accommodate the changes to the HAF API. We also improved the install and update code, including removing some obsolete code, and speeded up restarting hivemind after a temporary shutdown.
API node update timeline
We expect API nodes to begin updating to the new rc8 stack over the next few weeks. We’re running rc8 now on api.hive.blog.
Near the end of this upcoming week, we will upload a ZFS snapshot with already replayed versions of HAF and the HAF apps such as hivemind. This will particularly be helpful for nodes with slower processors, but even on fast servers it is likely to be faster to download such a snapshot than do a local replay. However, nodes that have already replayed rc7 should be able to upgrade to rc8 relatively painlessly, and that is the recommended procedure for such nodes (only hafbe, if previously installed, will need a replay).
What’s next?
Since I think rc8 will be the official release of the stack, we’re now starting to turn our attention to future development plans. Our plans still aren’t fully formed, but here’s a sample of things we’re already looking at:
- Begin next phase development of the HAF-based smart contract environment.
- Create a HAF app to support creation and maintenance of “lite” accounts at the 2nd layer.
- Create a “lite node” version of hived that doesn’t require local storage of a full block_log. This will also reduce storage needs of HAF apps and Hive API nodes.
- Further improvements to the “sql_serializer” plugin that takes blockchain data from hived and adds it to a HAF database.
- A long due overhaul of Hivemind code to bring its design architecture more in alignment with HAF-based coding practices. This should improve both replay time and API server performance and make the code easier to maintain.
- Continue refining the new WAX library for developing light weight Hive apps
- Finish up the new command-line wallet (Clive)
- Finish up the GUI for the HAF-based block explorer
- Finish up Denser (replacement for Condenser social media app)
- Finish extracting reputation API from hivemind into a separate HAF app
- Update and reorganize documentation of just about everything