We're good again. Additional configuration changes have been made to hopefully reduce the likelihood of this happening again.
We believe this issue to be unrelated to the earlier issues, even though the end-user effect is the same. Essentially, GCP decided to migrate 50% of the cluster at the same time, causing a large dip in throughput and causing the cluster to degrade and require manual intervention.
Posted 10 months ago. Sep 30, 2018 - 10:57 PDT
We're having a slight struggle again with DMs and Friends Lists.