Service looks stable and everyone is back online and chatting. We are still working on recovering the search APIs.
Posted Dec 07, 2019 - 14:35 PST
Traffic levels have been restored to their usual levels, and most users are back in. Our search API is currently down and our engineers are working on recovering that, but otherwise service looks to be stable.
We are following up with google for a postmortem of the underlying issue, and are monitoring.
Posted Dec 07, 2019 - 14:01 PST
We've started opening the flood-gates and are letting users back in. Recovery is proceeding as planned - and we will continue to ramp up traffic levels and monitor.
Posted Dec 07, 2019 - 13:36 PST
Google has updated their status page:
> Current data indicates that approximately 0.01% of PD-SSD volumes in us-east1-b are affected by this issue, from a peak of approximately 8.45%. > At this time, the issue is contained, and we can confirm that this is only impacting us-east1-b.
We are currently beginning to work on service recovery on our end. Please standby!
Posted Dec 07, 2019 - 13:06 PST
Google's Engineering team continues to work on mitigating the issue, and have posted the following update:
> Mitigation work is currently underway by our engineering team. > We do not have an ETA for mitigation at this point.
We have all hands on deck awaiting the resolution of this issue to begin restoring service.
Posted Dec 07, 2019 - 11:58 PST
We continue to observe elevated IO latency. Google has finally gotten to updating their status page. We continue to await resolution.
Google is currently investigating an issue with SSD Persistent Disk in our region (what our database clusters store their data on). We are awaiting their resolution.
Given the IO starvation, we are expecting continued API latency - as most if not all of our datastores are currently degraded. We will post updates as we get them.
Posted Dec 07, 2019 - 10:54 PST
Multiple engineers are online, investigating the issue. We are noticing anomalously high iowait across most of our database clusters all other instances leading us to believe that this is a google cloud issue. We are working on a mitigation, however latency is expected to be high as we are currently starved for IO resources.
Posted Dec 07, 2019 - 10:37 PST
We're experiencing an elevated level of API errors and are currently looking into the issue.