CrowdStrike and the Hard Truth about Software Reliability
Bill Gates once gave Microsoft a "one" on a scale of 10
Welcome to the Cloud Database Report. I’m John Foley, a long-time tech journalist, including 18 years at InformationWeek, who worked in strategic comms at Oracle, IBM, and MongoDB. I’m now a VP with Method Communications.
CrowdStrike’s undetected error, which led to the Windows OS crash felt around the world on July 19, is being called the largest IT failure ever. Estimated cost: $5.4 billion.
CrowdStrike has published a prelim report on what went wrong. The basic explanation is that a CrowdStrike-issued content configuration update resulted in a Windows system crash, or BSOD (blue screen of death) in industry jargon. CrowdStrike blamed it on a “bug” in its Content Validator.
For more, see CrowdStrike’s Post Incident Report.
Gates rated Microsoft one on a scale of 10
No one knows more about software bugs than Bill Gates. In January 2002, as Microsoft grew in the enterprise—making software quality & security an urgent priority—Gates issued a memo on Trustworthy Computing, which served as a call to action across the company.
“As software has become ever more complex, interdependent and interconnected, our reputation as a company has in turn become more vulnerable,” Gates wrote. “Flaws in a single Microsoft product, service or policy not only affect the quality of our platform and services overall, but also our customers’ view of us as a company.”
It has taken longer, I suspect, than Gates imagined to get it right—the CrowdStrike glitch being just the latest and greatest example. Now, Microsoft is reportedly considering restricting kernel-level access to its OS.
In 2002, as Editor of InformationWeek magazine, I talked to Gates about his strategy to rethink software development at Microsoft. In that interview, Gates offered a blunt assessment of Microsoft’s software quality that still resonates to this day.
I asked the following question: “On a scale of one to 10, where one is unsatisfactory and 10 is high satisfaction, how would you rate the overall software quality of your company's products today?”
Gates answer: “I mean, is it as good as people want? One.”
Boom! It was a surprisingly candid response. And one that still rings true for Microsoft, CrowdStrike, and other tech vendors whose software exposes us to breaches, malware, system crashes, and a constant barrage of other threats.
Here’s a more complete excerpt of Gates’ reply:
“It's a very subjective number. I mean, is it as good as people want? One. Is it good compared to other people's software or what we were doing three or four years ago? I'll give us a nine. But it’s the most subjective question in the world, and, hey, the customer is always right. Are they going to rank us any worse on that number now than they did four years ago? No. Are we doing a dramatically better job on this stuff now than four years ago? Yes.” - Bill Gates
Here’s the full InformationWeek interview from 2002: “Q&A: Bill Gates on Trustworthy Computing”
Road to security hell paved with good intentions
In pursuit of its Trustworthy Computing initiative, Microsoft took steps to make things better: Patch Tuesday, security-oriented software development, bug bounties, etc.
There was progress, but the challenges remained. In 2005, a full three years into its Trustworthy era, Microsoft in one week released a dozen software patches, many deemed critical, to fix 17 vulnerabilities in Windows, IE, SharePoint, and Office. Here’s the creepy-crawly cover story I wrote about it. This article was included among InformationWeek’s most important magazine cover stories over 28 years.
The big lesson here is that, despite best intentions, software quality and best practices in software security and resilience haven’t gotten any easier. If anything, they keep getting harder for reasons that are all too obvious: more devices and platforms (wider software surface area and more points of entry), more sophisticated threats, and the complexity of so many components and layers of software working together. In the scheme of things, the CrowdStrike bruhaha was caused by an itsy-bitsy software configuration issue. And like that, the business world came to a stop.
No one should be surprised when these things happen. The tech industry has been grappling with software bugs since at least 1946 when Grace Hopper and team found a moth in a Harvard University computer.
And let’s be clear: This isn’t just a CrowdStrike or a Microsoft issue. The top five vendors/platforms with the most “distinct” vulnerabilities so far this year are Linux, Microsoft, Google, Adobe, and Apple, according to Security Scorecard. So this is everyone’s digital hill to climb.
Been there, done that
When it comes to software quality and IT resilience, history repeats itself. My Cloud Database Report post from last year recounts some of the boondoggles of the past. The article includes my ideas on how to lower the risks — through tech practices like architecture, automation, and multi-cloud.
With the rise of AI copilots and AI-driven observability, orchestration, and automation, we can hope for better days ahead. However, the IT tech stack seems to be getting more complex with new algorithms, vectors, and LLMs. So it remains to be seen if AI this eases the situation — or only makes it worse.
Meanwhile, the finger pointing and fallout from the CrowdStrike debacle goes on. As we watch how this plays out, we should all remember what Bill Gates said nearly 20 years ago: The customer is always right.
And if you frame the question about software & system robustness as, “Is it as good as people want?” The answer is probably still just a one.