In this blog article, Justin Macklin, one of the consultants who work with us as part of NCF Consult plots out what providers need to be aware of following the CrowdStrike outage.
Friday 19th July saw what Elon Musk described on X (Twitter) as the “biggest IT fail ever”. Microsoft estimates a faulty CrowdStrike update affected 8.5 million PCs word-wide1.
The impact of the incident was large, with both individual PCs and many online services affected. There is so much information published about the incident, it is difficult to identify the main issues for the care and support sector.
Questions include:
- Who are CrowdStrike and what happened?
- Why was I, or wasn’t I, affected?
- Will it happen again?
- What can I do to help protect my organisation?
Who are CrowdStrike and what happened?
CrowdStrike are a company which sells security software that is designed to keep systems safe from external attacks such as viruses and other malware. One of their products, Falcon Sensor, was sent a software update that was faulty and crashed the Windows machines it was installed on.
This raises legitimate questions including did CrowdStrike test the update, and why did Windows let the faulty update install?
The answer to the first question is easy – they simply didn’t test the update thoroughly enough. The answer to why Windows allowed this update to be installed is more complex. Windows (and other operating systems such as Linux or MacOS) has a central protected core of code called the kernel. The kernel is the ‘inner sanctum’ that orchestrates and controls the system and allows it to communicate with the processor, memory, video system and external components such as printers. If the kernel crashes, then the famous ‘Blue Screen of Death’ appears (or black ‘kernel panic’ on Linux or pink ‘kernel panic’ on MacOS). CrowdStrike Falcon runs in the kernel, and downloads update files that are handled by the kernel – something that some people are saying is bad practice. In summary, a Falcon faulty update was downloaded, run by the kernel and caused the Blue Screen of Death.
For a more detailed, more technical explanation have a look at https://www.youtube.com/watch?v=wAzEJxOo1ts – it’s presented by a well-regarded, retired Windows Developer.
Why was I, or wasn’t I affected?
Working in care and support, it is unlikely you directly were affected by Friday’s outage. However, as Digital Care Hub states, you may well have been indirectly impacted because some NHS systems were taken off-line, including EMIS and GP Connect2.
It’s possible some other of your software suppliers were also affected, but it should be noted that the issue was typically limited to bigger organisations because CrowdStrike Falcon is a product aimed at large corporates. It is estimated that 60% of Fortune 500 companies were affected3.
What made it particularly bad is that the fix requires a technician to be physically at the computer – remote access doesn’t work because the computer won’t even start (boot). This is incurring companies an enormous expense, which is why many organisations are removing the software altogether (including all Elon Musk’s businesses4).
Will it happen again?
It is unlikely that this issue will happen again in its current form. CrowdStrike have been affected badly and can be expected to do everything in their power to ensure it doesn’t happen again. It will also have sent shockwaves across other software vendors who write kernel level code.
Some are saying that EU have some responsibility for this, accusing the EU of forcing Microsoft to give third parties access to the kernel in a 2009 agreement5. It is worth noting that Microsoft is not being blamed for this incident5, and have acted responsibly by providing technical resources to help repair affected systems.
However, this risk remains as long as the kernel can access and run uncertified and untested code. Therefore, all organisations should consider the following to help protect their business.
What can I do to protect my organisation?
At some point in the future it is possible that a non-malicious update or program will crash a system. Protecting an organisation from this risk involves the same steps that all businesses should be doing on top of running anti-virus software:
- Having a secure inventory of all computers and related administrator and security keys and passwords (e.g. BitLocker drive encryption keys). Many organisations don’t manage the recovery keys properly, meaning they can’t recover their systems and data easily, or at all.
- Keeping off-line backups of critical data (again with good password and key management).
- Having a clear, up to date, disaster recovery plan.
- Ensuring agreements with third party suppliers meet both data protection and data recovery requirements.
- Regular testing of all the above (especially relating to the third parties!)
Finally, it is widely acknowledged that the incident will lead to an increase in phishing attacks, as bad actors try to use fear, uncertainty and doubt (FUD) as a tactic to infiltrate systems. All staff should be reminded to be vigilant.
References
- ‘Helping our customers through the CrowdStrike outage’, David Weston, Vice President, Enterprise and OS Security, Microsoft. https://blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/ retrieved 22 July 2024.
- ‘Microsoft global IT outage – update from Digitising Social Care’, Digital Care Hub, https://www.digitalcarehub.co.uk/microsoft-global-it-outage-update-from-digitising-social-care/ retrieved 25 July 2024
- ‘CrowdStrike IT Outage Highlights Need For Tighter Operational Updates’, Will Townshend and Moor Insights and Strategy, Forbes, https://www.forbes.com/sites/moorinsights/2024/07/23/crowdstrike-it-outage-highlights-need-for-tighter-operational-updates/ retrieved 24 July 2024.
- Elon Musk, X. https://x.com/elonmusk/status/1814336158505050523 retrieved 22 July 2024.
- ‘EU gave CrowdStrike the keys to the Windows kernel, claims Microsoft’, Richard Speed, The Register. https://www.theregister.com/2024/07/22/windows_crowdstrike_kernel_eu/ retrieved 24 July 2024.