Our current setup features several RPC nodes behind a load balancer. While this is fine for most cases, our nodes are stateful, meaning during peak times when we want to increase our serving capacity by adding a new node, it will take several hours to catch up with the latest state of the blockchain.
This quirk cripples our ability to maneuver with the freedom we would like to have. Scaling up takes so long that by the time the node becomes available, the need to have that new node might be gone. Scaling down is more of a problem than it has to be because we don't want to give up that extra serving resources, but at the same time, we might be over-serving, making the whole system more expensive to keep running.
With that in mind, we have two main problems to solve. First, we need to make our nodes stateless to scale up and down based on demand, optimizing costs. Second, we need to utilize our resources more efficiently by allocating more than one node in the same server (i.e. EC2 instance).
If we want to make a node stateless, we have to shift from in-disk storage to remote storage so more than one node can connect to the same DB, thereby removing the need to catch up each time the node restarts. There is a PR open to adding support for TiKV for storage; we could use it for our intended purpose.
An open question remains open if two or more nodes are connected to the same DB; would that cause race conditions? If true, we could deploy it in two stages. First, to have a node that will catch up with the blockchain updates and store them in the remote DB. Second, have the light-RPC nodes that will be a light version of a full node, whose sole purpose will be to connect to the DB and serve reading calls and forward write calls to the full RPC nodes that will write that into the blockchain.
This way we can scale the light RPCs up and down with ease and keeping their resource consumption somewhat limited so we can allocate more RPC nodes in the same server, more on that bellow.
This architecture allows us to further distribute resources based on hot domains or hot endpoints; in the example below, we create another set of light RPC nodes that will only serve getLogs. Please bear in mind this is a standard light RPC node that only serves getLogs requests, each group scales independently from the other, and they are all served using the same load balancer (more on how we could do that bellow).
This same principle could be applied to more domains or endpoints, where we could have the general RPC group, the blockchain RPC group with all the endpoints related to blockchain, etc.
Now that we talked about the possibility of grouping domains to further optimize the utilization of our resources and, more specifically, CPU and memory is time to articulate how we can re-use resources more efficiently. For the most part, our nodes won't be running full speed, and therefore keeping one server (EC2) per node (RPC) is a waste of resources. We can build a cluster that can allocate more than one node on the same server, and if we run out of space, the cluster can add instances to allocate more nodes if the need arises and release them otherwise.
Note in the image below how we can move nodes interchangeably between servers (EC2); this is the powerful part. We can treat each node as a computational unit that is entirely disposable (nothing personal); we can restart it, remove it, and relocate it to a different server if needed. Our RPC API currently supports graceful shutdown, which means that when the cluster decides it is time to shutdown one RPC node, the given node will stop accepting new requests but still finish up what is already processing.