The root cause was on us. We were performing maintenance on our API cluster and unintentionally reduced the API cluster capacity too far, leading to latency. Unfortunately, this also caused our Envoy front proxy to become unhappy and get into an unrecoverable state.
We resolved the incident by doing a rolling restart of the Envoy proxy after we fixed the API cluster capacity to handle user traffic.
Posted Apr 03, 2020 - 01:36 PDT
Latency and message send rate should be back to normal. We are still monitoring and will resolve this incident once we're confident everything is 100%.
Posted Apr 03, 2020 - 00:51 PDT
We are still working to resolve the latency that this situation has caused. Engineers are online and working the problem and we hope to have service restored in a few minutes.
Posted Apr 03, 2020 - 00:43 PDT
This was related to a maintenance we were performing, we unintentionally overloaded our API cluster. We are working on recovering now.