In depth look at State of Secrets Sprawl report
In just one month, the cybersecurity industry has seen two major incidents make headlines. Mid-september, Uber, the ride-hailing company, confirmed reports of an attacker who accessed several internal and critical systems. Pivotal in the hack was a PowerShell script that contained hardcoded admin credentials to login into the firm’s privileged access management platform. From there, it was possible for the attacker to retrieve a number of some of the company’s most valuable IT systems’ credentials or secrets.
A few weeks later, it was Toyota, the giant carmaker, who revealed they had accidentally exposed a credential allowing access to customer data in a public GitHub repository for nearly five years. Although they assured the key had been invalidated, any security expert would agree that exposure this long could mean multiple malicious actors had already acquired access.
These two high-impact incidents had a particular resonance at GitGuardian: since our inception, we have been warning about the dangerous state of secrets sprawl, both inside corporate perimeters and on public-facing services such as GitHub. Awareness is building around this issue, but secrets protection remain often underprioritized compared to other classes of code vulnerabilities.
Earlier this year in our annual research the State of Secrets Sprawl 2022, we reported that more than 6 million secrets were pushed to GitHub in 2021 alone. This figure has doubled compared to 2020. On average, 3 commits out of 1,000 contained a credential, which is fifty percent higher than the last year. While the majority of open source projects hosted on GitHub are personal repositories, it is important to understand that it is very easy for a professional developer to inadvertently publish code giving access to corporate resources.
It happens regularly, just as in the Toyota case. And we know that this space is closely monitored and actively exploited: our research indicates that a secret being publicly exposed can be exploited within days or hours.
Secrets sprawl is the term used to describe the phenomena of secrets not only becoming publicly visible but more generally appearing in cleartext in various sources: source code, build scripts, infrastructure-as-code, logs, etc. Unlike traditional credentials, secrets are meant to be distributed to developers, machines, applications, and infrastructure systems. Adding more of these factors inexorably increases the number of secrets used in the development cycle, as well as the probability of a leak.
To make it worse, secrets are even more easily found inside not-publicly exposed company assets, like in the Uber case. Cleartext secrets-in-code are really akin to a security debt, accumulating over time when nothing is done to prevent their leak within internal repositories or other digital infrastructure. When scanning for secrets, organisations are soon to realise that there are far more hardcoded secrets than their application security teams can possibly handle.
In the same report, we highlighted the fact that, on average, in 2021, a typical company with 400 developers would discover 1,050 unique hardcoded secrets when scanning its entire codebase, each having 13 different occurrences. That’s 212 unique secrets per security engineer on average! To investigate, triage, and revoke these represents a massive amount of work, let alone digging through months or years of older incidents. Assuming that the typical remediation process for one secret averages two hours, we can approximate that for a 400 developers' shop at least 2,100 man-hours would be required to reach “zero secrets-in-code”. And this is only if the leakage of secrets stops completely.
The problem is acute. Cybersecurity teams are taking hard-coded secrets in source code, and the risks they bring, very seriously. More generally, it’s the paradigm around source code security that is shifting. Companies around the globe—even those whose core business is not digital—are realizing source code is one of their most valuable assets. Protecting it is not only a matter of ensuring business continuity, it is now also a reputation and intellectual property matter, not to mention possible legal proceedings costs in case of a breach.
One thing is for sure, this trend is not going to go away any time soon. The everything-as-code and the commoditisation of digital infrastructure and services mean that software, and therefore code and secrets, will continue to be ever more present in businesses’ day-to-day activities. On the other end, hackers’ campaigns now focus on software practitioners’ credentials retrieval since software development systems are less protected and monitored than production ones. Targeting any link in the software supply chain is the best way today to gain privileged access to the overall IT infrastructure.
To enable application security to adapt to these new challenges, it is essential to provide visibility into vulnerabilities at the individual, team, and organisational levels. Securing source code means going beyond traditional practices and exploring new ways to involve developers, security engineers, and operations closer in the detection, remediation, and prevention actions.
We are developing tools not only to empower this collaboration but also to innovate around secrets—using them as an intrusion detection system—and also support the blurring of boundaries between Application Security and Cloud Security with Infrastructure-as-Code (IaC).
Learning to live with secrets sprawl and putting the right tools and resources in front of this problem now is a strategy that will pay off in the future. The question is not “could one of our company secrets be leaked and exploited?”, it’s “do we have everything in place to detect and remediate this situation within a reasonable time?”.
It’s time to take action!