Global Microsoft Outage Cripples Critical Services: What You Need To Know?

July 19, 2024

We just witnessed one of the most significant global software-induced outages ever seen. Airports, hospitals, pharmacies, flight operators, train services, TV broadcasters, and supermarkets are all facing disruptions. Here's what we know so far:

Today, July 19th, 2024, a massive number of users worldwide encountered or continue to face the dreaded Windows Blue Screen of Death, impacting work, public services, and overall productivity. This outage has specifically affected Windows machines using Crowdstrike for endpoint protection, which includes antivirus, firewall, intrusion detection, encryption, and application control.

Crowdstrike holds a fair  share in Windows endpoint protection, pushed a routine software update to Windows machines globally. Unfortunately, this update caused Windows systems to crash, rendering them inoperable.

In typical scenarios, buggy code can be reverted by pushing a previous, stable version. However, in this case, the affected machines are non-functional, complicating the recovery process. Crowdstrike has advised a manual fix, which involves booting each machine into safe mode, deleting a specific file, and then rebooting. This process is time-consuming and must be done individually for every impacted machine.

What's baffling is how Crowdstrike released this update globally without a staged rollout—a standard practice in change and release management to prevent widespread issues. No responsible cybersecurity company or SaaS vendor would bypass such precautions, especially with software critical to important infrastructures.

This incident can be linked to a "YOLO deployment," an approach where changes are pushed live without extensive testing, under the assumption that issues can be easily reverted. However, when dealing with software that protects essential systems, this approach is highly irresponsible and dangerous.

The repercussions of this outage are likely to be significant, potentially impacting the global GDP and severely damaging Crowdstrike's reputation. The key lesson here is the importance of staged rollouts and canary deployments, especially for software used in critical infrastructure. Skipping these steps can lead to catastrophic failures, as we've seen in this instance.

Learn about the latest in cybersecurity

Check out the BitsProof blog or sign up for our newsletter.

Let's talk enterprise security

Hire A Security Expert