Crowdstruck: The CrowdStrike Incident of July 19, 2024.

On July 19, 2024, the cybersecurity world was rocked by a widespread technical issue caused by a faulty update from CrowdStrike, one of the most trusted names in endpoint protection. The incident led to massive outages across organizations worldwide, from small businesses to major enterprises, leaving IT departments scrambling to restore services.

This blog post explores the root cause of the CrowdStrike incident, the impact it had on various sectors, and the steps organizations can take to protect themselves from similar issues in the future.

What Happened on July 19, 2024?

CrowdStrike is renowned for its Falcon platform, a cloud-native cybersecurity solution widely used by enterprises to protect against cyber threats. However, on the morning of July 19, 2024, a faulty configuration update was pushed to Windows-based systems that used Falcon Sensor software. This update was meant to enhance the software’s performance, but instead, it caused a severe malfunction.

The faulty configuration file led to “out-of-bounds memory reads” in Windows systems. As a result, many machines either failed to boot properly, entered into boot loops, or went into recovery mode​.

These issues particularly affected systems running Windows 10 and Windows 11, while Linux and macOS users were unaffected since the problematic file was exclusive to Windows configurations​

Root Cause of the Incident

The root cause was traced to a single faulty update distributed by CrowdStrike’s Falcon platform. This update, intended for Windows hosts, inadvertently modified the configuration responsible for screening named pipes—a vital aspect of system communication. Due to this modification, the update caused an invalid page fault, essentially crippling the system​

The malfunction wasn’t a cyberattack or a malicious breach; it was a technical glitch triggered by an unexpected bug in the update. Despite CrowdStrike’s swift response—rolling back the faulty update within hours—the damage had already been done, and millions of devices were affected​

Who Was Affected?

The scope of the incident was vast, affecting companies across multiple industries and geographies. Major sectors that rely heavily on Windows-based systems, such as government institutions, airlines, and healthcare, were the hardest hit. Some of the most significant impacts include:

  • Airlines: Major airlines like Delta, United, and American Airlines were forced to ground flights as their systems went down. As a result, passengers experienced delays and cancellations globally​.
  • Government Services: Local government systems in cities such as San Antonio experienced outages, affecting operations and causing significant delays​.
  • Cloud Services: Virtual machines running on popular cloud platforms like Microsoft Azure and Google Compute Engine also faced reboots and failures​.
See also  Incident Response and Recovery: Creating a Resilient Business Strategy

At the height of the disruption, it was estimated that around 8.5 million devices were affected by the faulty update, although CrowdStrike emphasized that most personal Windows PCs were not impacted, as Falcon Sensor is typically used by large organizations​

The Financial and Operational Impact

The financial repercussions were substantial. Some estimates suggest that the incident cost businesses billions of dollars in lost productivity and operational downtime. A study on the broader impacts of the event revealed that the top 500 U.S. companies, excluding Microsoft, suffered close to $5.4 billion in losses, with only a fraction of those losses covered by insurance​

Beyond the immediate financial toll, the operational impact was significant. Organizations had to scramble to restore systems manually in many cases, as the faulty update could not be undone without rebooting or taking more drastic measures such as deleting specific system files​.

For organizations with enhanced security protocols like BitLocker, the recovery process was even more complicated due to encryption keys being stored on the very servers that had crashed.

Steps to Recover from the Incident

CrowdStrike quickly acknowledged the problem and rolled back the faulty update. However, the rollback alone didn’t instantly fix all affected systems. Organizations had to undertake several steps to restore operations:

  1. Reboot the Systems: Machines needed to be rebooted while connected to a network, often requiring several restarts to download the reverted configuration file​.
  2. Manual Intervention: Some systems, particularly those running enhanced security features, required manual intervention. This included booting into safe mode and manually deleting specific driver files linked to the faulty update​.
  3. Technical Support: IT departments worked tirelessly to resolve the issue on a machine-by-machine basis, which for large organizations, took days to fully restore​.
See also  DoDI 8140.02: Identification, Tracking, and Reporting of Cyberspace Workforce Requirements

What Organizations Can Do to Prevent Similar Incidents

While the CrowdStrike incident was an unexpected technical failure rather than a targeted attack, it served as a critical reminder of the vulnerabilities that even trusted cybersecurity solutions can pose. Here are a few essential steps organizations can take to mitigate risks in the future:

1. Implement Staggered Rollouts for Updates

One of the biggest takeaways from this incident is the risk of global, simultaneous updates. Organizations should consider implementing staggered rollouts when deploying critical updates. By applying updates to a subset of systems first, IT teams can monitor for potential issues before rolling them out across the entire network.

2. Backup and Disaster Recovery Plans

Organizations must maintain robust backup and disaster recovery solutions. Regularly backing up important data and ensuring that systems are recoverable in case of an emergency is critical to minimizing downtime. Cloud-based solutions can help reduce reliance on vulnerable local systems​

3. Diversify System Platforms

While it’s impractical for most organizations to shift away from Windows entirely, diversifying the systems used across an organization can help reduce the overall impact of a single system failure. Incorporating Linux or macOS environments into a network may help spread out the risk in case of platform-specific incidents like this one​

4. Educate and Train IT Teams

Training IT teams on how to respond to both security incidents and technical failures is crucial. IT staff should be well-versed in troubleshooting and recovery techniques, including how to revert faulty updates and handle encryption protocols like BitLocker. This can dramatically reduce downtime during future crises.

5. Vendor Management and Negotiations

Organizations should have clear terms and expectations laid out with their cybersecurity vendors. CrowdStrike’s incident highlighted the importance of understanding vendor liability in case of service interruptions. Businesses should negotiate favorable terms that go beyond simple refunds, ensuring more robust service-level agreements (SLAs) and compensation in case of incidents​

See also  Cybersecurity Basics: Protecting Your Digital World

6. Stay Updated on Patch Management

Patch management should be a priority for any organization. Having a proper procedure in place for evaluating, testing, and deploying patches ensures that any potential issues are discovered before they wreak havoc across a system. This involves using sandbox environments to test updates before applying them to mission-critical infrastructure​

7. Leverage Automation for Monitoring

Utilizing automated tools to monitor system health and performance can help IT teams catch issues before they escalate. Automated monitoring can detect unusual behaviors triggered by updates, allowing swift corrective action before a malfunction becomes a company-wide issue.

Conclusion

The CrowdStrike incident of July 19, 2024, was a stark reminder of the complexities and risks involved in managing cybersecurity solutions. While the malfunction was quickly identified and resolved, the widespread disruption underscored the need for careful planning, robust system backups, and resilient disaster recovery strategies.

Organizations can take a few key lessons from this event to better prepare for the future. By adopting staggered rollouts, diversifying system platforms, and ensuring strong vendor relationships, businesses can minimize their risk of experiencing similar disruptions. As cybersecurity continues to evolve, it’s more important than ever for organizations to stay vigilant, adaptable, and proactive in safeguarding their critical systems.

Sources:

  1. CrowdStrike Incident Report, July 2024
  2. “Glitch from Texas-based company CrowdStrike,” Texas Public Radio, 2024:
  3. CrowdStrike Fal.Con 2024 Conference Information:

Leave a Reply

Your email address will not be published. Required fields are marked *