On Thursday, June 12, 2025, beginning at around 10:51 PDT (1:51 PM EDT), Google Cloud experienced a major service disruption that lasted approximately 7.5 hours globally, with full recovery confirmed by 18:18 PDT (9:18 PM EDT).
1. Root Cause: Bad Code + Missing Protection
- The incident began when a new feature introduced on May 29—meant to support enhanced quota-policy checks—activated unexpectedly due to a policy change.
- This unflagged code path encountered a null-pointer exception when processing a policy record with blank fields, causing the core Service Control binary to crash globally.
- Attempts to restart the service across regions triggered a “herd effect,” overwhelming underlying infrastructure—notably in us-central1, prolonging recovery there.
- Google confirmed the failure stemmed from an invalid automated quota update in the API management control plane.
2. Timeline & Impact
| Time (PDT) | Event |
|---|---|
| 10:51 AM | Outage begins—Service Control crashes due to null-pointer. |
| ~10 minutes later | Root cause identified; engineers start mitigation. |
| ~40 minutes later | Initial recovery across most regions; us-central1 still lagging. |
| ~2 hours later | API bypass mitigations in place; most global regions restored. |
| 6:18 PM | Full restoration across all Google Cloud services confirmed. |
3. Ripple Effects: Billions of Impacted Services
The outage rippled across the web:
- Google services like Search, Gmail, Drive, Meet, Calendar, Cloud Memorystore, BigQuery, Vertex AI, Nest, and more faced elevated error rates and downtime.
- Third-party platforms—Spotify, Discord, Snapchat, Twitch, GitHub, Replit, Elastic, LangChain, Character.ai, UPS, DoorDash, Etsy, Shopify, IKEA and others—saw partial or complete service outages.
- Cloudflare’s Workers KV service, reliant on Google Cloud’s storage backend, also failed—triggering cascading failures in distribution of assets, authentication, and service requests.
- Market reaction: Google’s stock dropped ~1%, Cloudflare lost nearly 5% during the outage.
Downdetector recorded tens of thousands of outage reports—for example, over 46,000 for Spotify and ~11,000 for Discord in the U.S. at peak.
4. Google’s Response & Lessons Learned
- In its incident report, Google acknowledged the lack of feature‑flag protection and robust error handling, stating that if protections had been in place, the issue would have been caught during staging.
- Google pledged operational improvements including:
- Stronger feature‑flagging and error‑handling in API management.
- Enhanced recovery resilience during herd‑effect scenarios.
- Improved incident communication, with faster alerts to customers—even during degraded monitoring conditions.
- Cloudflare is moving its KV storage off Google Cloud to its own R2 infrastructure to reduce future dependency.
5. Takeaways for Users & Businesses
- Cloud resilience matters: Even top-tier providers can suffer from cascading failure effects.
- Multi‑cloud or redundancy strategies can mitigate correlated outages.
- Monitor provider status pages actively—automated alerts may lag in major incidents.
- Prepare post‑mortems and operational improvements: every service outage is also a chance to learn and evolve.
✅ In Summary
The June 12 Google Cloud outage was triggered by a faulty quota‑policy change that caused software crashes and global service disruption. It lasted nearly eight hours, affecting Google’s own products and a host of third‑party apps and platforms. Recovery required mitigation of the offending code and infrastructure stabilization. Google and its partners are taking steps to enhance code safety, infrastructure resilience, and transparency to prevent future incidents.
