In the summer of 2023, security researchers Adnan Khan and John Stawinski IV uncovered a critical vulnerability in PyTorch's continuous integration and deployment (CI/CD) pipeline. Their independent research demonstrated how this flaw could enable malicious actors to execute a supply chain attack, potentially compromising the integrity of the PyTorch framework. Original source
Understanding the Vulnerability
The core issue stemmed from the use of self-hosted runners in GitHub Actions workflows within a public repository. Unlike GitHub's own runners, self-hosted runners are managed by the repository owners and can introduce security risks if not properly configured. In PyTorch's case, certain workflows in the public repository were configured to run automatically on pull requests from external contributors, without adequate security checks.
Breaking Down the Attack on PyTorch
Step 1: Fixing a Typo
To execute the attack, the researchers needed to become contributors to the PyTorch repository. Instead of investing time in contributing significant code changes, they identified a minor typo in a markdown file and submitted a pull request (PR) to fix it. This clever yet simple approach granted them the necessary access to trigger workflows within the repository.
Step 2: Crafting the Payload
With contributor access secured, the researchers created a malicious payload designed to gain persistence on the self-hosted runners. Traditional Command and Control (C2) methods like reverse shells were deemed unsuitable due to advanced security defenses such as EDR, firewalls, and packet inspection.
To bypass these challenges, they developed the "Runner on Runner" (RoR) technique:
- They installed another self-hosted GitHub runner on the target server, attaching it to their private GitHub organization.
- This method leveraged the existing runner's infrastructure, effectively camouflaging the attack as legitimate traffic.
- A script automated the registration of the malicious runner and was stored in a public Github gist
The payload was embedded in a draft PR workflow, ensuring it executed without notifying maintainers.
Step 3: Gaining C2 and Privilege Escalation
The researchers executed the RoR payload on three of PyTorch's self-hosted runners. Using their C2 repository, they confirmed remote code execution (RCE) by running diagnostic commands like pwd and sudo -l. The latter confirmed they had root access on the runners, solidifying their control over the target systems.
Step 4: Stealing Secrets
In CI/CD environments, secrets like the GITHUB_TOKEN are crucial. The researchers exploited workflows using actions/checkout to access these tokens, which were stored in the .git/config file during active workflows:
- They used the RoR setup to extract GITHUB_TOKENs with write permissions from an ongoing workflow.
- These tokens were utilized to delete logs of their malicious PR workflows, ensuring stealth and preventing detection.
Additionally, they targeted PyTorch’s sensitive repository secrets:
- By triggering specific workflows, they extracted GitHub Personal Access Tokens (PATs) and encrypted them to evade detection.
- These tokens granted administrative access to over 93 repositories within the PyTorch organization.
Step 5: Modifying Repository Releases
Using the stolen GITHUB_TOKEN, the researchers demonstrated the potential to tamper with PyTorch releases. They:
- Modified release metadata to showcase their exploit.
- Highlighted how malicious actors could replace legitimate binaries with backdoored versions, compromising end-users who downloaded these artifacts.
Step 6: Gaining AWS Access
Expanding their attack scope, the researchers targeted AWS credentials used in PyTorch's CI/CD workflows. They:
- Compromised aws-pytorch-uploader-secret-access-key and aws-access-key-id belonging to the pytorchbot user.
- Gained write access to PyTorch’s S3 buckets, which contained sensitive artifacts like production releases.
- Verified that backdooring these artifacts could compromise users installing PyTorch directly from the PyTorch website.
Implications and Risk
- Arbitrary Code Execution: Malicious actors can exploit CI/CD workflows to execute arbitrary code on self-hosted runners, enabling the injection of backdoors or tampering with software artifacts.
- Secrets and Token Theft: Exposure of sensitive credentials, such as GitHub PATs and AWS keys, grants attackers unauthorized access to repositories, cloud environments, and sensitive systems, posing significant security and data breach risks.
- Compromised Software Releases: Attackers can modify or replace release artifacts, leading to the distribution of backdoored binaries that compromise end-users and downstream projects relying on PyTorch.
- Unauthorized Workflow Execution: Exploiting lenient workflow approval settings allows attackers to bypass repository safeguards and trigger workflows with malicious payloads, increasing the risk of persistent threats.
- Escalation to Organizational Repositories: Stolen tokens with administrative privileges can enable attackers to access and compromise additional repositories within an organization, further propagating the attack across the supply chain.
Preventing Future Attacks
StepSecurity provides essential protection for GitHub Actions workflows with three key capabilities:
- Credential Exfiltration Prevention: StepSecurity’s Harden-Runner enhances GitHub Actions runners by implementing network egress control and runtime security. This ensures that sensitive data, like GITHUB_TOKENs and secrets, cannot be exfiltrated by malicious code. Harden-Runner would have detected the outbound traffic used to steal credentials in the PyTorch attack, securing tokens and other secrets. It has already proven effective in safeguarding projects like Google’s open-source Flank - Read the full StepSecurity-Flank Case Study.
- Vulnerability Scanning and Forensic Capabilities: StepSecurity provides advanced forensic capabilities, enabling visibility into all process executions and outbound network calls across your CI/CD pipelines. This ensures organizations can detect anomalies, investigate malicious activity, and mitigate potential exploitation risks, enhancing overall pipeline security and resilience.
- Restrict Permissions on GitHub Tokens: StepSecurity enables you to audit and manage GitHub tokens in your organization. It identifies tokens with excessive permissions, like those with read/write access, that pose potential security risks. By applying the principle of least privilege, StepSecurity reduces the attack surface significantly.
Summary
This incident underscores the risks posed by misconfigured self-hosted runners in GitHub Actions workflows. While these workflows are essential to modern CI/CD pipelines, they can introduce vulnerabilities if not properly secured. Strengthening runner security, enforcing strict workflow permissions, and using tools like StepSecurity’s Harden-Runner are crucial steps in protecting against supply chain attacks and safeguarding critical development pipelines.