S2 - https://harmonyone.pagerduty.com/incidents/Q1WXHGZB39IFPC 11 min down S1 - https://harmonyone.pagerduty.com/incidents/Q039HJKM2PHRMH 47 min down
Both happened at the same time and at epoch change. Both shard would eventually recovers without doing anything special and just waiting for validator to catchup their beacon shard DB
During S1 troubleshooting internal validator was at beacon chain block was at 24739505 and next epoch block was at 24739840. Network voting power was at 59% explaining why we lost consensus. Once epoch block hit the epoch block and we realized consensus was back, voting power was at 98%. On top of the internal node, there was definitely external node also having the same issue.
Why did S1 and S2 lose consensus
Why shard voting power went below 66.66% ?
Why validator node was lagging behind ?
beacon out of sync
are more important when a validator node is impacted. Change the watchdog alert severity of the explorer node from high to medium so validator beacon out of sync are better seen. Watchdog issue : ‣