The root cause here was a deployment of our API system, which contained a bug in some new code that was creating too many database queries against one of our Postgres backends. This query was in the path of sending messages on Discord, which resulted in highly elevated latency on our API cluster and in particular on sending messages.
We rolled back the code and reverted it, and tomorrow engineering will fix the issue and add some more safeguards around the particular part of the database library that caused the issue. (It is a system designed to basically coalesce queries so we can batch them to the database, but unfortunately in this case the batching was broken and we were generating many, many more queries than expected.)
Posted Jun 17, 2020 - 22:18 PDT
Monitoring
The rollback has been completed, and service should be mostly back to normal.
Posted Jun 17, 2020 - 21:55 PDT
Identified
We have identified an issue with a bad deployment causing too much load on a database. We are rolling this back now.