We still have one database machine that is offline intentionally at this point. We are waiting until we're off of peak traffic before turning it back online. We have also disabled some internal functionality that is not user impacting in order to shed load from the databases.
We're going to mark this incident as resolved as the user impact is complete and, pending another incident, we do not expect it to recur.
Posted Jan 12, 2019 - 13:08 PST
We are continuing to see impact to one of our MongoDB machines which is causing the recovery to be slow and is what kicked off this problem. Google has confirmed this is a problem with their storage layer and they are escalating internally to try to achieve resolution.
Separately, we have identified some slowness being caused by a recent change we made to migrate data out of MongoDB and into Postgres. We have overloaded the Postgres setup and it is responding slowly. We are expanding that datababase now to provide more breathing room.
Posted Jan 12, 2019 - 12:03 PST
The service should be nearly recovered. We are still monitoring some additional latency causing message sends to be a little slower than usual, but this should clear up as the database caches warm up.
We will be investigating the root cause as well as working to understand why recovery took so long. We'll post another update when the incident is resolved, or if we have further information.
Posted Jan 12, 2019 - 11:53 PST
We have resolved the issue and are letting people back in slowly.
Posted Jan 12, 2019 - 11:15 PST
We are aware of the issue and trying to resolve it.
Posted Jan 12, 2019 - 10:56 PST
We are looking into why users cannot connect to Discord. Connected users should be fine.