Unless you were living in a cave, or having a spiritual tech detox session, you cannot have failed to notice the global IT outages that happened last Friday.

The incident which occurred has been attributed to a faulty piece of code in an update to a security product from Crowdstrike – called Falcon Sensor which caused a serious instability in Microsoft Windows, causing it to shutdown with a Critical System Error (commonly called a BSOD – Blue Screen of Death).

This error could not be auto-repaired by Windows, resulting in affected devices becoming stuck in a boot-loop. A boot-loop is where the system shuts-down and attempts to re-boot into a self-repair mode, but then cannot conduct the repair and so shuts-down once more. Rinse, and repeat.

8.5 million devices affected

Such is the customer-base for Crowdstrike, that Microsoft have reported in excess of 8.5 million devices across the globe were affected by the incident. This had the effect that many thousands of systems immediately failed and as such caused the global outage many experienced.

The effects of the software crash began almost as soon as the code update was pushed into production at 04:09 with systems updating their code and becoming unstable. The glitch caused all manner of industries to be affected – flights were grounded at airports around the world, Sky News was unable to broadcast live, the NHS was partially paralysed with many hospitals, and trusts unable to access patient notes, rail networks were hampered, sea ports were temporarily unable to process freight.

Some retailers were unable to process payments due to their EPOS systems being affected, and stock exchanges across the world were also temporarily affected. The effects were global with services on every continent being taken offline.

Graphical depiction of services around the world failing due the Crowdstrike glitch

News quickly spread

Social Media became a focus for many reporting their IT issues on most platforms. The r/crowdstrike subreddit started a mega-thread with many people reporting their problems.

The fix is not so quick

The fix to this massive outage is not one which can easily be implemented automatically, although Microsoft have released a semi-automated solution which might work for some affected customers.

The reason for this is down to the fact that because the affected machines are stuck in the boot loop, manual intervention is required to force the machines into a safe mode, where only the minimum files required to run the machine are loaded.

Once the safe mode has been accessed, the problematic file can be located and manually renamed, or deleted. This will then allow the device to boot as normal. You can imagine how long that will take when there are over 8.5 million devices to fix.

When the problem file was identified, and the information for the fix was released some users reported that they couldn’t enter the safe mode option because they needed to enter their Windows BitLocker key and the keys are located on file servers also affected by the bootloop. Unless the BitLocker key is stored on a non-affected device, those machines will have to be either restored from a backup, or re-built from scratch.

Anyone lucky enough to have the BitLocker key, then Microsoft have issued instructions on how to automate the repair via a USB stick – making the process slightly quicker than doing it manually, although this will still take some time to repair all affected devices.

The technical bit

Crowdstrike have been quick to issue a report into what exactly went wrong which can be read here, but in brief, the issue stemmed from a faulty file which Crowdstrike call a channel file.

This channel file controls how the Falcon software evaluates the execution of named pipes to ensure that the execution is not malicious. A named pipe is a communication stream between a client and a server service and is the fundamental way for processes to communicate and share data. if these named pipes are abused by malware, then all manner of bad things can happen, criminals will use named pipes to facilitate the exchange of data between a compromised device and a C2 (Command & Control) server

Falcon Sensor is supposed to check these named pipes and block unauthorised communications, unfortunately, in this incident, the channel file caused a memory instability resulting in the system failure.

Criminals quick to exploit the opportunity

As is the case with many similar events, it doesn’t take long for criminals to try to leverage the issue to their advantage.

Reports are surfacing of some threat actors pushing out their own, fake recovery tools which if used will download malware onto unsuspecting devices.

A malware campaign targeting BBVA Mexico bank customers offers a fake CrowdStrike Hotfix update that installs the Remote Access Tool – Remcos RAT.

The fake hotfix is being promoted through a phishing site, portalintranetgrupobbva.com, which pretens to be a BBVA Intranet portal. The link actually redirects to a dropbox account where victims are given instructions to install the update to avoid errors when connecting to the company’s internal network.

The zip archive contains the installer files for the malware along with an instruction readme file in Spanish – instrucciones.txt – The file says that it is a “Mandatory update to avoid connection and synchronization errors to the company’s internal network”

The UK’s National Cyber Security Centre (NCSC) has reported that they have seen a rise in phishing messages directly related to the Crowdstrike incident, and that users should stay vigilant to ensure they don’t fall foul to any attacks.