It’s been quite a week with the disclosure of Capital One breach that impacted over a 100MM Americans and 6MM Canadians.
Up until Friday, the major source of information was the document from justice department. The document while highlighting key steps, obviously, left out the technical details that were required for everyone to draw lessons. Thanks to Brian Krebs, we now have some additional significant information that augments our understanding of the incident, and draw some key lessons. This article delves briefly on what we know till now and, more importantly key lessons that can be learned.
What we know:
A ModSecurity web application firewall deployment was misconfigured. ModSecurity WAF can be either embedded in an Apache Server or can also be hosted on its own as a standalone service. In either cases, on AWS it is run on a EC2. The attacker tricked the misconfigured WAF to pass the requested resource (e.g. /latest/meta-data/iam/security-credentials/ISRM-WAF-Role) to AWS Instance Metadata Service hosted at http://169.254.169.254. What was the misconfiguration and how exactly the attacker was able to dictate the internal destination to ModSecurity is not known at this point in time.
Acting as a proxy, the WAF made a request on behalf of the attacker to url http://169.254.169.254/latest/meta-data/iam/security-credentials/ISRM-WAF-Role. When instance meta service received the request, it identified the calling EC2, determined that such a data attribute existed in the respective meta data store and returned its value. This is the expected behavior for interactions with instance metadata service.
Once ModSecurity received response from the MetaService, it duly returned it to the attacker. So, attacker received a neatly packaged json object with the access key and secret key.
Now, in possession of the credentials, attacker obtained a list of buckets using ‘aws s3api list-buckets’ command. To copy out data, attacker used ‘aws s3api syn <source> <destination> command. The exfiltration modalities used are not entirely clear other than the use of VPN and TOR. A key point to note is that S3 is a global service and each bucket in S3 has a unique URL that can be requested directly from the Internet (https://<bucketname>.s3.us-east-1.amazonaws.com/<prefix>/<filename>). Notice the amazonaws.com domain in the URL. So, the attacker could have accessed and copied the data directly from S3 buckets used by Capital One to another S3 bucket/ local disk controlled by the attacker.
Let us take a look at some key lessons to be learned from this.
When access controls are discussed in the context of an on-prem application, almost always it begins and ends with user authentication and authorization through roles and privileges. When data is stored in flat-files, the directory permissions do come in to play. However, given that tech. and ops teams are used to working in an on-prem setting, where access to production servers are tightly controlled and accessed through break-glass procedures/subjected to peer-reviews, the directory-level permissions are hardly ever paid attention.
In the context of AWS however, access controls at directory/bucket/resource level play a pivotal role. Take a look Amazon’s access evaluation logic:
The Resource based policies are evaluated first for an explicit ‘Allow’. If there was no explicit allow, the implicit Deny would kick in and access refused.
In the case of current Capital One incident, buckets storing client data should have explicitly allowed ONLY the roles that need read access to files in the bucket. Not having these bucket level resource policies or leaving the bucket open was another key factor in this data breach.
In the context of Cloud & specifically AWS, any access control must be defined at Data, Infrastructure and Principal levels. The changes in IAM (principals) and Infrastructure take place often and end up being relatively more prone to misconfiguration. The bucket policies on the other hand don’t change as often, unless triggered by application changes.
The key takeaway: maki it a standard operating procedure to define Resource Policies — not just for S3 buckets but all services for which Resource Level Policies are permitted. See here for the full list of services with support for resource policies.
Despite ISRM-WAF-Role having such excessive privileges (700+ buckets listed), only credit card application data was compromised. One can safely infer that is does not represent all the sensitive data that is stored in Cap One’s AWS Cloud and S3. Most likely this data was received and processed by a small set of applications. This implies that there was AWS account level isolation that prevented the attacker getting access to data from all other application(s). The resource policies may have played a role but from what we have seen so far, it appears that the account isolation helped contain the breach.
The key takeaway is this: as organizations adopt cloud, one must be fully cognizant of the fact that misconfigurations will happen even with robust processes & controls. Placing all the applications and resources in a small set of accounts will gravely aggregate risk. Account level isolation is critical to limit blast radius or lateral movement.
S3 Encryption and Encryption Context:
In it’s incident disclosure statement, Capital One explicitly states:
“Due to the particular circumstances of this incident, the unauthorized access also enabled the decrypting of data.”
From this statement, it is clear that encryption was turned on at S3 level and either the default setting (SSE-S3) or customer managed master key stored in KMS (SSE-KMS) setting was used. The attacker however was able to retrieve data in-clear as S3’s encryption and decryption take place seamlessly. This is the case even when SSE-KMS option is used as long as the requesting role has access to the key arn in KMS. If a ISRM-WAF-role had access to a key used by workloads, an alarm should have gone out long before the breach. This also indicates a systemic gap in leveraging resource policies at Capital One.
In AWS, every resource needs to be granted explicit permissions via roles. This provides a robust framework to define authorization controls. The onus is on AWS clients to define these authorization controls appropriately.
Consider this scenario — several workloads are deployed on an EC2 instance, be it a standalone server, cluster node or similar. To enable these workloads to talk to multiple services, the EC2 is provisioned access to all those services. This will result in a situation where each workload will have access it needs as well as the entitlements that other workloads require. Whether the EC2 is compromised, or a single workload is compromised, all the data/services could be tainted. This scenario of provisioning entitlements is fairly common place. Defining granular authorization controls at data, infra and principal levels is essential for applications on cloud.
Here are a few other things to consider & act on:
At a tactical level, in light of the Capital One incident, organizations would be wise to:
- Review all their Resource Policies for S3 buckets at the minimum to ensure lockdown.
- Review the Resource Policies attached to every key in KMS to ensure only the workloads necessary to read the key have access to it. Key Admin too would not require read-key access but only access to APIs that allow key management such as rotation.
- Review the roles and entitlements used by EC2s, in context of the workloads running on them and not just in isolation at infrastructure level.
- Review security automation, ops automation, CI/CD pipelines to determine how access to systems is obtained for code deployments, ops tasks or configuration changes. For key workflows, consider inserting a step that requires explicit user input (for e.g. a one-time password is required as input to execute a deployment script. Post deployment, the password is automatically reset.)
- Review the instance metadata service to see if storing credentials can be avoided or limit the ability to call the metadata service url using tools like ‘ip-lockdown’.
- Look at your organization’s AWS account strategy and determine potential data exposure if the incident were to take place at your organization.
At a strategic/near-short-term level, consider approaches that can enhance and mature organization’s capabilities in cloud such as:
- Update Entitlement Reviews: For on-premise workloads, the entitlement review processes for applications, infrastructure happen in silos. When it comes to cloud, they are very closely intertwined. This makes it essential to rethink how entitlements are evaluated for Cloud deployments. At the minimum, expand the entitlement reviews to include a workload centric view. At workload level, inventory all resources used, their purpose and relevant roles. Fortunately, cloud makes this easy to automate. There are probably some Cloud Security Posture Management tools that already do a good job here.
- Application Security Programs: Do your existing risk assessment programs take into consideration the fact when it comes to cloud, the concept of application is not limited to business logic and the data. An application in cloud is Data + Business logic + Infrastructure-As-Code. Do your checklists, processes and evaluation methods take into account all these aspects as part of application as well (not just Infrastructure teams?). If not, these gray areas will open up gaps like the one we saw with Capital One when it comes to bucket level resource policies.
- Rethink Concept of Application: With Cloud and the concept of infrastructure-as-code, applications/workloads have become more closely married to the infrastructure on which they run. The concept of applications being solely the business-logic and the data they process serves well for On-prem. However, unless one starts to include certain aspects of infrastructure into their organization’s concept of an application, more likely than not, architectures, tools and processes that are suitable for on-prem will be replicated to Cloud. This will create a technology debt that will remain unpaid till the next big tech. revolution.
The article does not reflect on detective and preventive controls beyond what’s shared above, simply for the reason that there are too many pieces in the puzzle missing to develop actionable learnings. I have also not treaded on insider threat as it is too early to say if ‘inside’ knowledge was essential for the hack. It surely made it easy and swift, but not necessarily an enabling factor.
I am hopeful we’ll eventually learn more details, and have an opportunity to further learn. While it’s a rough whether for Capital One, I hope they do right by their customers and bounce back into their mode of pioneering innovation in banking sector.