Google blames a Google User ID Service error for last week’s outage

Google LLC said today that a simple “zero” error is responsible for shutting down the global authentication system offline and preventing users from accessing Gmail, YouTube and their cloud services for more than an hour last week.

The company said one day after the break on December 14 that its preliminary analysis determined that the cause of the incident was a problem with its automated storage quota management system. This, Google said, has caused a reduction in the capacity of its central identity management system, thus blocking people’s access to services that require them to sign up.

The interruption lasted only about an hour, but was noticed by millions of people around the world. This has also affected thousands of companies that rely on the Google Cloud Platform for computing resources. This is of course bad for business, because the reliability and availability of cloud services are among the most important aspects of any business.

Google’s full incident report submitted Tuesday shows that the problem was caused by what he calls a “zero” error generated by the old storage quota system it uses to automatically secure storage for its authentication system.

“As part of the ongoing migration of the User ID service to the new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left that incorrectly reported the use of the User ID service as 0,” in the report. “As a result, the quota for the account database was reduced, which prevented the Paxos leader from writing. Shortly thereafter, most read operations became obsolete, resulting in errors in authentication searches. “

Google User ID has a unique identifier for each Google Account. Processes authentication credentials for OAuth tokens and cookies used to log people into the service without entering their password each time. This data is stored in a distributed cloud database that uses the Paxos protocol to coordinate updates after deciding which data values ​​to process.

“For security reasons, this service will reject requests when it detects outdated data,” Google said. “The existing delay in enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to reduce the quota allowed for the User ID service and triggering this incident. There are existing security checks to prevent many unintentional quota changes, but at the time they did not cover the zero reported load scenario for a single service. “

The Google report also covers the impact of interruptions on Google Cloud Storage, Google Cloud Network, Google Kubernetes Engine, Google Workspace (formerly G Suite) and Google’s cloud support services. It was said that “all authenticated Google Workspace applications were down at the time of the incident.” In addition, about 4% of GKE control plane API requests failed, and almost all workloads managed by customers and Google failed to report metrics to Cloud Monitoring.

The Google report concluded that most of its authenticated services on Google Cloud and Google Workspace have seen an “increased error rate,” and that all of its services that require users to sign in with a Google account are “affected by different influences.”

Image: Google

Ever since you’ve been here …

Show your support for our mission by subscribing to our YouTube channel with one click (below). The more subscribers we have, the more YouTube will suggest relevant emerging corporate and technology content. Thanks!

Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.

… We would also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc. business model it is based on the intrinsic value of the content, not the advertising. Unlike many online publications, we do not have a pay wall or run banner advertising because we want our journalism to remain open, without influence or the need to prosecute traffic.Journalism, reporting and commentary on SiliconANGLE, along with a video from our Silicon Valley studio and globalization video teams in theCUBE – It takes a lot of effort, time and money. Maintaining high quality requires the support of sponsors who are in line with our vision of ad-free journalistic content.

If you like reporting, video interviews, and other ad-free content here, take a moment to look at a sample of video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.