Connectivity Issues to Discord
Incident Report for Discord
Postmortem

At Discord we take service uptime very seriously, and are constantly working with our hosts, bandwidth providers and vendors to ensure that things don't go down.

In this incident, a router misconfiguration caused Anycast traffic to be pulled to CloudFlare's SOF PoP causing connectivity issues across Europe and North America. Approximately 20% of users that were connected at the time of the incident were unable to connect for up to for seven minutes. Following that period, 5% of the users that were previously disconnected experienced DNS resolution issues for up to 27 minutes after service was restored.

CloudFlare's System Reliability team identified the issue almost immediately, and was able to resolve it quickly. CloudFlare’s network team is taking steps to improve tooling to disable a carrier globally when manual intervention is required. Engineering work continues on methods to automatically route around major disruptive incidents.

Check out CloudFlare's blog on the matter: https://blog.cloudflare.com/a-post-mortem-on-this-mornings-incident/

Posted Jun 16, 2016 - 14:25 PDT

Resolved
We have isolated the issue to a suspected route leak outside of our network that caused tonight's outage - and have confirmed that it is 100% resolved now. CloudFlare is contacting the affected upstream providers to gather more information. We will update this incident with a post-mortem once we have all the details.
Posted Jun 11, 2016 - 05:05 PDT
Monitoring
According to our metrics, as of 4:23 AM PDT it looks like just about everyone is able to connect again. We are still investigating the root cause of the five minute disconnects and lingering connection/DNS resolution errors for the remaining users.

If you are still unable to connect, tweet us, https://twitter.com/discordapp or e-mail support@discordapp.com so we can look into it further.
Posted Jun 11, 2016 - 04:38 PDT
Update
We have isolated the issue to a DNS resolution failure when connecting to our gateway. We are working with CloudFlare to resolve it. It looks like more users are gradually recovering.
Posted Jun 11, 2016 - 04:27 PDT
Identified
We're aware of connectivity issues affecting users on several ISPs affecting less than 20% of our users for a brief period of five minutes, from 3:51 to 3:57 AM PDT. We are investigating. It looks like most people have been able to reconnect so far - however there are still some users experiencing connectivity issues.
Posted Jun 11, 2016 - 04:14 PDT