PyTorch Supply Chain Compromise

The vulnerability in PyTorch’s CI/CD pipeline highlights the critical need for securing self-hosted runners

Ashish Kurmi

December 9, 2024

In the summer of 2023, security researchers Adnan Khan and John Stawinski IV uncovered a critical vulnerability in PyTorch's continuous integration and deployment (CI/CD) pipeline. Their independent research demonstrated how this flaw could enable malicious actors to execute a supply chain attack, potentially compromising the integrity of the PyTorch framework. Original source

Understanding the Vulnerability

The core issue stemmed from the use of self-hosted runners in GitHub Actions workflows within a public repository. Unlike GitHub's own runners, self-hosted runners are managed by the repository owners and can introduce security risks if not properly configured. In PyTorch's case, certain workflows in the public repository were configured to run automatically on pull requests from external contributors, without adequate security checks.

Self-hosted runner named worker-rocm-amd-30 uncovered by Gato — *Gato* *analyzed workflow files and logs, uncovering a self-hosted runner named worker-rocm-amd-30*

*Draft pull request demonstrating how workflows triggered by pull_request executed successfully, highlighting potential exploitation through unverified contributions*

Breaking Down the Attack on PyTorch

Step 1: Fixing a Typo

To execute the attack, the researchers needed to become contributors to the PyTorch repository. Instead of investing time in contributing significant code changes, they identified a minor typo in a markdown file and submitted a pull request (PR) to fix it. This clever yet simple approach granted them the necessary access to trigger workflows within the repository.

Step 2: Crafting the Payload

With contributor access secured, the researchers created a malicious payload designed to gain persistence on the self-hosted runners. Traditional Command and Control (C2) methods like reverse shells were deemed unsuitable due to advanced security defenses such as EDR, firewalls, and packet inspection.

To bypass these challenges, they developed the "Runner on Runner" (RoR) technique:

They installed another self-hosted GitHub runner on the target server, attaching it to their private GitHub organization.
This method leveraged the existing runner's infrastructure, effectively camouflaging the attack as legitimate traffic.
A script automated the registration of the malicious runner and was stored in a public Github gist

‍

*Malicious workflow configuration designed to execute payloads on multiple self-hosted runners across different operating systems*

The payload was embedded in a draft PR workflow, ensuring it executed without notifying maintainers.

Step 3: Gaining C2 and Privilege Escalation

The researchers executed the RoR payload on three of PyTorch's self-hosted runners. Using their C2 repository, they confirmed remote code execution (RCE) by running diagnostic commands like pwd and sudo -l. The latter confirmed they had root access on the runners, solidifying their control over the target systems.

*Command execution on jenkins-worker-rocm-amd-34 confirming stable C2 and root access during the exploitation process*

Step 4: Stealing Secrets

In CI/CD environments, secrets like the GITHUB_TOKEN are crucial. The researchers exploited workflows using actions/checkout to access these tokens, which were stored in the .git/config file during active workflows:

*Detailed GITHUB_TOKEN permissions on the compromised runner 'jenkins-worker-rocm-amd-30,' highlighting extensive write access across various repository resources*

They used the RoR setup to extract GITHUB_TOKENs with write permissions from an ongoing workflow.

*Extracting GITHUB_TOKEN authorization header from the runner configuration, showcasing how secrets were exposed during the exploitation process*

‍

These tokens were utilized to delete logs of their malicious PR workflows, ensuring stealth and preventing detection.

Additionally, they targeted PyTorch’s sensitive repository secrets:

By triggering specific workflows, they extracted GitHub Personal Access Tokens (PATs) and encrypted them to evade detection.

These tokens granted administrative access to over 93 repositories within the PyTorch organization.

Step 5: Modifying Repository Releases

Using the stolen GITHUB_TOKEN, the researchers demonstrated the potential to tamper with PyTorch releases. They:

Modified release metadata to showcase their exploit.

Highlighted how malicious actors could replace legitimate binaries with backdoored versions, compromising end-users who downloaded these artifacts.

*Modified PyTorch 2.0.1 release note demonstrating an unauthorized change, showcasing the potential impact of compromised GITHUB_TOKEN permissions on public repositories*

Step 6: Gaining AWS Access

Expanding their attack scope, the researchers targeted AWS credentials used in PyTorch's CI/CD workflows. They:

Compromised aws-pytorch-uploader-secret-access-key and aws-access-key-id belonging to the pytorchbot user.

Compromised AWS credentials revealing the identity of the PyTorch bot user — *Compromised AWS credentials reveal the identity of the PyTorch bot user, highlighting the attacker’s access to critical cloud resources*

‍

Gained write access to PyTorch’s S3 buckets, which contained sensitive artifacts like production releases.

Contents of the compromised PyTorch S3 bucket — *Listing the contents of the compromised PyTorch S3 bucket, revealing access to sensitive artifacts and production assets*

‍

Verified that backdooring these artifacts could compromise users installing PyTorch directly from the PyTorch website.

Implications and Risk

Arbitrary Code Execution: Malicious actors can exploit CI/CD workflows to execute arbitrary code on self-hosted runners, enabling the injection of backdoors or tampering with software artifacts.

Secrets and Token Theft: Exposure of sensitive credentials, such as GitHub PATs and AWS keys, grants attackers unauthorized access to repositories, cloud environments, and sensitive systems, posing significant security and data breach risks.

Compromised Software Releases: Attackers can modify or replace release artifacts, leading to the distribution of backdoored binaries that compromise end-users and downstream projects relying on PyTorch.

Unauthorized Workflow Execution: Exploiting lenient workflow approval settings allows attackers to bypass repository safeguards and trigger workflows with malicious payloads, increasing the risk of persistent threats.

Escalation to Organizational Repositories: Stolen tokens with administrative privileges can enable attackers to access and compromise additional repositories within an organization, further propagating the attack across the supply chain.

Preventing Future Attacks

StepSecurity provides essential protection for GitHub Actions workflows with three key capabilities:

Credential Exfiltration Prevention: StepSecurity’s Harden-Runner enhances GitHub Actions runners by implementing network egress control and runtime security. This ensures that sensitive data, like GITHUB_TOKENs and secrets, cannot be exfiltrated by malicious code. Harden-Runner would have detected the outbound traffic used to steal credentials in the PyTorch attack, securing tokens and other secrets. It has already proven effective in safeguarding projects like Google’s open-source Flank - Read the full StepSecurity-Flank Case Study.

Vulnerability Scanning and Forensic Capabilities: StepSecurity provides advanced forensic capabilities, enabling visibility into all process executions and outbound network calls across your CI/CD pipelines. This ensures organizations can detect anomalies, investigate malicious activity, and mitigate potential exploitation risks, enhancing overall pipeline security and resilience.

Restrict Permissions on GitHub Tokens: StepSecurity enables you to audit and manage GitHub tokens in your organization. It identifies tokens with excessive permissions, like those with read/write access, that pose potential security risks. By applying the principle of least privilege, StepSecurity reduces the attack surface significantly.

Summary

This incident underscores the risks posed by misconfigured self-hosted runners in GitHub Actions workflows. While these workflows are essential to modern CI/CD pipelines, they can introduce vulnerabilities if not properly secured. Strengthening runner security, enforcing strict workflow permissions, and using tools like StepSecurity’s Harden-Runner are crucial steps in protecting against supply chain attacks and safeguarding critical development pipelines.

References

https://johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch/

https://www.stepsecurity.io/blog/analysis-of-backdoored-xz-utils-build-process-with-harden-runner