StepSecurity Detects CI/CD Supply Chain Attack in Microsoft’s Open-Source Project Azure Karpenter Provider in Real-Time

This case study discusses how StepSecurity Harden-Runner detected a CI/CD supply chain attack in real-time in Microsoft’s open-source project Azure Karpenter Provider.

Security

Industry: Technology
Runners: GitHub-Hosted

Introduction

Summary of the incident

An independent security researcher, on August 31st, 2024, demonstrated a successful supply chain attack on Azure Karpenter Provider, an open-source project maintained by Microsoft. A vulnerable GitHub Actions workflow led to this attack. The researcher successfully exploited the vulnerability and gained access to the workflow's GITHUB_TOKEN, which had "id-token: write" permission to the repository. This means that the token could be used to access the cloud resources that the workflow had access to.

As Azure Karpenter Provider had been using StepSecurity Harden-Runner, this attack was detected in real time. StepSecurity Harden-Runner provides network egress control and CI/CD infrastructure security for GitHub-hosted and self-hosted environments

StepSecurity reported the detection within an hour of it being exploited to the Microsoft Security Response Center (MSRC). We are proud to share that StepSecurity has been acknowledged in the MSRC acknowledgment portal for detecting and reporting this issue. The portal recognizes individuals and companies who have contributed to making Microsoft’s online services safer by privately disclosing and assisting in remediating security vulnerabilities.

For executive summary, check out the video below.

What was the vulnerability?

Summary of the vulnerability

A GitHub Actions workflow in the Microsoft Azure Karpenter Provider project was misconfigured to download and run untrusted code in CI/CD with elevated permissions. Any GitHub user could have exploited this vulnerability to steal the workflow’s CI/CD credentials.

The vulnerability was in the e2e.yml reusable workflow, which checks out code using an explicit ref provided via an input parameter, and runs in a privileged context.

the vulnerable e2e.yml checks out code using an explicit ref provided via an input parameter

This ref comes from an artifact created in a previous workflow. The resolve-args.yml reusable workflow is responsible for extracting the ref from the artifact.

the ref comes from an artifact created in a previous workflow

The ApprovalComment workflow stores the commit ID into the artifact. This commit is later used to check out the code, allowing the attacker’s code to be run in the e2e.yml privileged workflow.

approval comment workflow stores the commit id into the artifact

The privileged e2e.yml workflow is automatically triggered when the ApprovalComment workflow completes successfully as defined in the E2EMatrixTrigger workflow.

privileged e2e.yml workflow is automatically triggered

How was the vulnerability exploited?

The exploit begins when the researcher creates a pull request from their fork and modifies the ApprovalComment workflow to store a commit ID of their choice into the artifact.

researcher creates a pull request from their fork

In the malicious commit, which is part of the pull request, the researcher added a command that runs a malicious script hosted on a gist.

researcher adds a command to execute their malicious script

The modified ApprovalComment workflow saves the malicious commit information in the artifact. You can see this in this build log.

the modified approvalcomment workflow saves the malicious commit information in the artifact

Finally, the vulnerable workflow checks out and runs code from the researcher’s forked repository. The malicious script, inserted earlier, is executed, demonstrating arbitrary code execution in a privileged context, as seen in this build log.

the vulnerable workflow checks out and runs code from the researcher's forked repository

The researcher’s exploit code (the poc.sh file from the gist) is designed to exfiltrate credentials from the privileged workflow. The YOUR_EXFIL variable is set to an external domain, where the extracted data will be sent.

researcher's exploit code is designed to exfiltrate credentials from the privileged workflow

What could have happened in a real malicious attack?

This could have caused a Codecov-style software supply chain attack. An adversary could have exfiltrated CI/CD secrets to access the cloud environments that this workflow had access to.

How did StepSecurity detect the attack?

All workflows in this repository had been configured to use Harden Runner in audit mode since Jan 2024.

all workflows were using StepSecurity Harden-Runner

This is the egress traffic from a non-malicious run of the e2e.yml workflow. As expected, the only outbound calls are made to api.github.com in the commit-status/start step, which are marked as allowed.

https://app.stepsecurity.io/github/Azure/karpenter-provider-azure/actions/runs/10532706648?jobid=29187384291&tab=network-events

egress traffic from a non-malicious run

When the exploit occurred, Harden-Runner detected suspicious outbound traffic. As you can see, there are now anomalous calls to gist.githubusercontent.com and the exfiltration domain n6uw1ivl7vxwuf43sas4hjs17sdj19py.oastify.com.

These calls were flagged as anomalous because they had never been observed in previous runs and were not part of the baseline. The call to gist was part of the researcher’s modified action, and the exfiltration domain was used to steal secrets from the workflow.

This is the StepSecurity Insights page for the malicious run:

https://app.stepsecurity.io/github/Azure/karpenter-provider-azure/actions/runs/10651161331?jobid=29523417280&tab=network-events

anomaly detection during the malicious run

Remediation

To remediate the vulnerability, the developers first removed the approval comment trigger from the workflow configuration. This was the trigger that allowed the exploit to be executed once the approval comment workflow completed.

developer removed the approval comment trigger

Additionally, the developers configured Harden-Runner to operate in block mode instead of audit mode. By switching to block mode, any unauthorized or anomalous network traffic is immediately blocked, preventing exfiltration attempts like the one seen in the exploit.

harden-runner deployed in block mode

Conclusion

A security researcher demonstrated a supply chain attack on Azure Karpenter Provider, detected by StepSecurity Harden-Runner, highlighting the need to secure CI/CD pipelines from emerging threats. We commend Azure Karpenter Provider maintainers for their vigilance and use of StepSecurity to protect workflows. Kudos to the researcher for ethically testing vulnerabilities, raising CI/CD security awareness, and enhancing the open-source ecosystem.

Open-Source

StepSecurity Detects CI/CD Supply Chain Attack in Google’s Open-Source Project Flank in Real-Time

This case study discusses how StepSecurity Harden-Runner detected a CI/CD supply chain attack in real-time in Google’s open-source project Flank.

Enterprise

Hashgraph Achieves Comprehensive CI/CD Security Without Compromising Development Speed

Discover how Hashgraph, leveraging StepSecurity's enterprise solution, revolutionized GitHub Actions security across its diverse CI/CD environments.