Gamla Stan - Stockholm Sweden - July 2022

The Cloud is Darker and More Full of Terrors - Sec-T 2024

In September 2024, I returned to Stockholm to give a talk at Sec-T. The Slides are here, and the YouTube Video is here.

In the last year or so talking to organizations of all sizes, shapes, and security budgets, it’s become clear there is a deeper problem than just “developers don’t know how to not make a bucket public”. How we as an industry use the public cloud is fundamentally unsafe. We wouldn’t give any random 16-year-old kid with a driver’s license a 787 to fly. Yet, with the public cloud, anyone with a credit card can sign up for one of the most technically complex creations the IT Industry has ever created. Engineers fresh out of school are given access to enterprise cloud tenants and told to deploy their applications. At no point do the cloud providers take reasonable measures to ensure you are qualified to operate the cloud safely, nor are their default auto-pilot settings all that safe.

What is it about companies from Seattle that keep generating unsafe products?

Much of this is based on the research and data collection from breaches.cloud. With breaches.cloud, I aim to articulate the risk and “tell a story” to my developers when I ask them to prioritize security over feature development. This post will cover several well-known incidents and explain where the customer went wrong and why they probably did what they did. No one sets out to be breached, indicted, or run out of business.

Major Cloud Security Incidents

Code Spaces

In the first major cloud security incident, an attacker gained access to credentials to their “Amazon EC2 control panel” and, after attempting to extort the company, deleted all “EBS snapshots, S3 buckets, all AMI’s, some EBS instances and several machine instances.”

Code Spaces never recovered from the attack and promptly ceased operations. I heard they never contacted AWS support during this incident.

After this point, the “Multi-Account Strategy” started to be a thing, even if it took AWS another three years to release AWS Organizations.

We nearly saw this exact same issue play out with the Australian pension fund UniSuper in GCP. Except in this case, the threat actor was GCP itself (via poorly documented pre-release code with a timed expiration). UniSuper was smart enough to have an off-cloud backup of the critical data.

Cisco WebEx

In September 2018, a former Cisco Engineer deleted 456 Virtual Machines for Cisco’s WebEx five months after the engineer left Cisco. He was criminally charged because the FBI identified the IP Address the attack came from as related to the suspect’s Google Cloud account. The engineer pled guilty at trial, but no motivation was discussed. Without a documented motivation, we cannot rule out this was a complete accident, and this engineer was prosecuted to cover up Cisco’s mistakes.

What does this tell us?

  • In the year 2018, Cisco was still using Long Term Access Keys
  • Cisco was using shared access keys because the attack was attributed to the IP Address rather than the name under which the credentials were issued.
  • Did Cisco provide an adequate environment for the engineer to conduct his job, or was he forced to resort to a personal Google environment to get his job done?

LastPass

In 2022, LastPass suffered two security incidents that eventually led to the exfiltration of customer vaults. This very targeted attack began with malware, which led to a GitHub compromise, which was leveraged in a second breach that started with a Sr. DevOps Engineer’s exploitable Plex Server.

LastPass kept all the vaults in their data center but stored backups in the cloud. The interesting thing about this report is the references to the attacker getting “decryption keys needed to access the AWS S3 LastPass production backups”. Typically, AWS Encryption is done via KMS, and with KMS, the actual keys never leave AWS data centers. They wouldn’t be found on a Sr. DevOps Engineer’s laptop on his home network. LastPass actually went beyond most AWS customers, and it self-encrypted their backup data before uploading it to AWS and encrypting it with KMS.

Capital One

This is likely the most prominent breach directly related to being an AWS Customer. Paige Thompson discovered an SSRF or Proxy bypass vulnerability in an exposed Capital One web server, which she used to exfiltrate credentials and subsequently download credit application information on over 100 million people.

Capital One was well known in the Cloud Security community for several open-source tools, Cloud Custodian being the primary one. We were shocked when they were impacted, but all the elements related to this breach were well-known.

In June 2012, the Instance Metadata Service was released.

The idea of an SSRF to pivot the Instance Metadata Service (IMDS) has been known to the community since 2014. Prezi wrote a responsible disclosure on this in October 2014.

By 2018, more IMDS/SSRF issues were surfacing, and researchers were asking AWS to mitigate this problem.

https://x.com/0xdabbad00/status/1034543560022876160

In December of 2018, AWS opened the eu-north-1 (Stockholm) region. It should be noted that this was enabled in all customer accounts, and the customers did not have the option to disable it.

My attempt to enable GuardDuty when the region was announced

January 2019, I reminded AWS that they’re missing critical capabilities we depend on:

March 2019: The attacker finds the “misconfigured WAF” and gains access to credentials. She then leverages the eu-north-1 endpoints where Capital One’s detection capability is missing.

In May 2019, GuardDuty finally became available

In July 2019, indictments were issued, and Capital One disclosed the breach.

August 2019, Sen Ron Wyden (D-OR) addressed a letter to Jeff Bezos, explicitly asking:

  1. Was an SSRF used?
  2. How many other incidents at AWS customers used SSRF to compromise the metadata service
  3. How was AWS warning its customers about this potential threat
  4. What was AWS’s response to a request from a Netflix engineer to add a header to IMDS to protect against SSRFs?

AWS’s response blamed the breach 100% on Capital One and denied any other incidents involving IMDS credential exfiltration. As for the advice AWS gives its customers, AWS CISO Steve Schmidt explicitly called out GuardDuty as one of the capabilities available to protect against this (along with WAFs, least-privilege, and their Well-Architected Review service available to enterprise customers). Of course, GuardDuty wasn’t universally available, and the creation of the eu-north-1 region exposed all AWS customers without providing them any ability to mitigate this threat.

In a follow-up letter to the FTC seeking enforcement, Senator Wyden cites AWS

Amazon knew, or should have known, that AWS was vulnerable to SSRF attacks. Although Amazon’s competitors addressed the threat of SSRF attacks several years ago, Amazon continues to sell defective cloud computing services to businesses, government agencies, and to the general public. As such, Amazon shares some responsibility for the theft of data on 100 million Capital One customers.

On November 19th, 2019, IMDSv2 was released, allowing customers to require an auth token to be present in all calls to the metadata service. This feature was released 113 days after the Capital One disclosure.

UNC2903

In June 2021, Mandiant identified UNC2903 attempting to harvest and abuse credentials using Amazon Instance Metadata Service (IMDS). This uncategorized threat actor began by scanning externally facing AWS infrastructure hosting Adminer, an open-source database management tool written in PHP. Adminer versions 4.0-4.7.9 are vulnerable to CVE-2021-21311, a server-side request forgery. UNC2903 hosted a pre-configured web server on a relay box with a 301 redirect script back to the http://169.254.169.254/latest/meta-data/iam/security-credentials/ URL. The victim’s server returned an error, including the redirect output containing AWS API credentials. The threat actor then pivoted into the account to engage in S3 exfiltration. Mandiant didn’t disclose the victim.

Microsoft - Storm-0558

In July of 2023, Microsoft disclosed a compromise of Exchange Online that targeted “25 organizations … including government agencies as well as related consumer accounts of individuals likely associated with these organizations.” The vector of compromise was several validation flaws in the Microsoft-hosted Exchange Online and AzureAD services.

The threat actor obtained a expired Microsoft Account (MSA) consumer signing key in an as-yet-undiscovered method. Leveraging that very sensitive key, the threat actor pivoted into enterprise Exchange environments due to a validation error on Microsoft’s part. Per Microsoft: “The method by which the actor acquired the key is a matter of ongoing investigation.” Why was the key expired? According to the CSRB, Microsoft caused an outage in 2021 attributed to the key rotation, so they just stopped doing key rotation.

Storm-0558 had access to some of these cloud-based mailboxes for at least six weeks, and during this time, the threat actor downloaded approximately 60,000 emails from the State Department alone. (CSRB Report: p1)

So Microsoft didn’t rotate an old key. The Chinese Government got hold of it. They used a key intended for Minecraft against a major government agency.

Meme of unknown origin

Microsoft - Midnight Blizzard

While Microsoft was cleaning up the mess made by China and responding to requests from the National Cyber Safety Review Board, a new threat actor managed to “compromise a legacy non-production test tenant account” and, from there, pivoted into the Microsoft corporate tenant. The threat actor, identified as APT29, Midnight Blizzard, or Cozy Bear, is part of the Russian Foreign Intelligence Service (SVR). Their initial goal, as stated by Microsoft, was Microsoft’s internal corporate emails about Midnight Blizzard.

This incident is really an example of a Cloud Provider acting like a negligent customer.

The initial access was the compromise of a “legacy non-production test tenant account.” While we’d expect MFA to be in place, this was a test user, and Mandiant already flagged APT29 for leveraging the self-enrollment process for MFA in Azure Active Directory to apply their own MFA to users before the legitimate user.

Once inside the test tenant, “Midnight Blizzard leveraged their initial access to identify and compromise a legacy test OAuth application that had elevated access to the Microsoft corporate environment.”

So, a sandbox environment, which would always be a mess of test accounts and users that never get cleaned up, could access the main production tenant.

Snowflake

What do our healthcare data, mortgage applications, call records, and how much we’re all willing to pay for Taylor Swift tickets have in common? They were all stored in a high-performance data warehouse application from Snowflake. Why is all this data for sale on the DarkWeb? Because the companies responsible for collecting and securing this information were victims of Infostealers. And how is it that highly sensitive personal data was exfiltrated by just using a compromised password? Well, it’s because these companies didn’t have MFA enforced on all their accounts.

Snowflake engaged Mandiant to ensure no compromise of Snowflake itself.

Specific to the customer side of Shared-Responsibility, Mandiant indicates that the victimized customers

…did not require multi-factor authentication, and in many cases, the credentials had not been rotated for as long as four years. Network allow lists were also not used to limit access to trusted locations.

Yes, in the summer of 2024, we still have massive data breaches due to single-factor authentication.

By a strict interpretation of Shared Responsibility, this is entirely on the customers of Snowflake.

But the bigger question is: How is it that Snowflake wasn’t enforcing MFA on its enterprise customer accounts? Shortly after these breaches became public, Snowflake finally gave its customers the option to enforce MFA on their users.

CrowdStrike

The largest cyber-disruption in the last few years was inflicted on the world by us, the infosec community. In July, CrowdStrike pushed a bad content file to 8 million Windows servers, crashing their kernel driver and forcing thousands of companies to manually recover each system.

This was a fundamental failure of the vendor, its customers, Microsoft, and regulators.

  1. CrowdStrike should never have run code that used pushed configurations inside the kernel
  2. CrowdStrike should never have pushed this to all customer devices at once. Instead, it should have used a staged deployment to a small percentage of systems before a full rollout.
  3. The customers should have done better due diligence and demanded a more resilient update and testing path.
  4. Microsoft should have done more to protect its kernel from stupid shit like what CrowdStrike was doing, but of course, it was also dependent on security revenue, which is why -
  5. Regulators forced MS to allow access to the kernel for antitrust reasons without considering the security implications of that action.

Themes

In looking at these incidents, they fall into two categories:

  1. Blatant Customer Failures
    1. Cisco for not disabling the engineer’s access when he departed. And for potentially not providing a proper environment for actual deployments
    2. Microsoft/Midnight Blizzard for giving test tenants access to their corporate tenant and for not cleaning up test users before they could be compromised via Self-Enrollment abuse.
    3. Whoever was the victim of UNC2903 for not enabling IMDSv2
    4. All the companies out there who expose their development environments to the internet
    5. CodeSpaces - for not having offsite backups
    6. LastPass - for letting engineers use personal devices for sensitive operations.
  2. Blatant Cloud Provider Failures
    1. Capital One and the lack of a header for the metadata service from AWS
    2. Microsoft/Storm-0558 and the failure to rotate a signing key or confirm that the signing was being used against the correct environment.
    3. Snowflake, for not enforcing MFA on what were clearly production-level enterprise users
    4. CrowdStrike, for not staging the rollout of changes that had way more access to kernel space than needed

Closing thoughts

Just as we must ensure our own organizations follow cloud security best practices, we must also demand better from cloud providers and vendors. It’s unacceptable to lack MFA enforcement, hide behind licensing agreements, terms-of-service, and the Shared Responsibility Model, or charge extra for logging, monitoring, or Single-Sign-On and Identity Federation.

At AWS, Security is Job Zero, but whose security are we talking about? Is that of Amazon or the Customer?

It’s clearly not job zero to have core security controls as a regional launch prerequisite. That’s for day 95.
It’s not job zero to have consistent API and IAM Actions.
It’s not job zero to ensure data and metadata actions are easily delineated.
It’s not job zero to ensure that all authenticated actions are logged and that those logs aren’t behind a very high paywall.
It’s not job zero to ensure that their software is compatible with the new security features that are released. (Cloudwatch logs didn’t support IMDSv2 for several years).
It’s not job zero to provide customers with free and useful cloud security posture monitoring.
It’s not job zero to set reasonable secure defaults for customers. Not until enough customers have high-profile security incidents, then it becomes Job Zero to protect AWS’s image.

So yes, Security is Job Zero, but not the technical security of AWS’s platform and customers. Job Zero is protecting the security of AWS’s primary position among the top hyperscale cloud providers.

One of AWS’s tenets is to work backward from what the customer needs. However, AWS creates these products in a vacuum. They don’t conduct user testing or consider how a customer will actually use their product. The warnings may be buried in hundreds of pages of documentation. Still, there are no safety mechanisms to prevent users from making mistakes, and the default settings aren’t designed to promote security.

The one cloud provider, who was only briefly mentioned in this post, has a different take on Shared Responsibility:

A shared responsibility model, promoted and used by legacy cloud providers to guide security and risk discussions, draws a line in the sand. Everything on one side of the line is the cloud provider’s responsibility and everything on the other side—the configuration, the applications, the data—that’s your responsibility.

This can create an unhealthy dynamic that can lead to finger-pointing, blame, and abdication of responsibilities. It often feels adversarial. (src)

Google promotes a “Shared Fate” model. They recognize that their ability to innovate and do business rests on how well their customers can secure their workloads.