Our recent outages will used an opportunity to strengthen the network and improve our processes. We have a number of lessons learned and action items which will receive our attention in the coming days, weeks, and months. Below is a growing and evolving list which will capture our progress as improvements are made.
Development
Short Term
- [ ] signing rate dropping
- [ ] view change algorithm fix
- [ ] 30s block insertion investigation
- [ ] further tuning on p2p
- [ ] p2p peers blacklist and whitelist feature
- [ ] fast syncing switch
- [ ] staking dashboard failures due to gas adjustment
- [ ] continual adjustments to optimize RPC performance
Long Term
- [ ] libp2p upgrade to latest version
- [ ] p2p stress/load test (network RPC load test)
- [ ] replace gossipsub with our own p2p broadcasting
- [ ] global broadcasting vs in-shard communication design
- [ ] review areas of improvement in change process
Operation
Monitoring
- [ ] additional p2p lower level metrics on sender, messages