Skip to content

CI Worker Threat Model

This document describes the security boundary for docstore's CI execution environment and the mitigations in place to limit the blast radius of malicious user code.

Trust boundary

The ci-worker binary is trusted. Everything that runs after it hands execution to BuildKit — user-defined build steps executing inside the Kata CLH microVM — is treated as adversarial.

ci-worker binary (trusted)
  ├── claims job from ci-scheduler (K8s SA proof)
  ├── fetches ci.yaml via request_token
  ├── obtains presigned archive URL via request_token
  └── hands off to BuildKit ← trust boundary
        └── user build steps (untrusted)
              ├── reads request_token from /run/secrets/
              └── has host network namespace (--oci-worker-net=host)

Credentials available inside the VM

Credential How obtained Notes
request_token BuildKit secret mount at /run/secrets/docstore_oidc_request_token Readable by any build step
OIDC token URL BuildKit secret mount at /run/secrets/docstore_oidc_request_url Needed to exchange request_token for JWT
GCP metadata server Plain HTTP to 169.254.169.254 See mitigations below
Docker daemon tcp://localhost:2375, unauthenticated; DOCKER_HOST is set Gives full container control within the VM
Cluster-internal network --oci-worker-net=host gives build containers the VM's network namespace Can reach cluster services

What the request_token can do

The request_token is a short-lived opaque token bound to a single CI job. It is accepted by endpoints on the docstore server and the ci-scheduler. All docstore endpoints enforce that job.Repo matches the URL path repo:

Server Endpoint Purpose
docstore POST /repos/{repo}/-/archive/presign Get presigned source archive URL
docstore POST /repos/{repo}/-/check/{name}/logs Upload check run log content
docstore GET /repos/{repo}/-/ci/config Fetch .docstore/ci.yaml for the job's branch/sequence
docstore POST /repos/{repo}/-/check Report check run status
ci-scheduler POST /jobs/{id}/heartbeat Keep job alive (cluster-internal only)
ci-scheduler POST /jobs/{id}/complete Report job completion (cluster-internal only)

The ci-scheduler endpoints are only reachable from within the cluster (ci-scheduler.docstore-ci.svc.cluster.local). Both validate the request_token and enforce that the token's job ID matches the URL {id}.

The request_token can also be exchanged at the ci-oidc endpoint for a short-lived OIDC JWT. The audience determines what the JWT can access:

  • aud=ci-registry — authenticate to the BuildKit layer cache registry
  • aud=docstore — authenticate to the docstore API (see below)

OIDC JWT (aud=docstore) permissions

The OIDC JWT is validated by the docstore server. After validation, the request is checked against an allowlist before reaching the inner API mux:

  1. The URL path repo must match jobID.Repo — no cross-repo access.
  2. The endpoint must be permitted by the job's declared permissions.

Default permissions (no permissions: block in ci.yaml): checks: write only, which allows POST /repos/{own-repo}/-/check.

Elevated permissions can be declared in .docstore/ci.yaml:

permissions:
  contents: write    # commit, branch, merge, rebase, purge
  proposals: write   # open proposals, post reviews/comments
  issues: write      # create/close/comment on issues
  releases: write    # create/delete releases
  ci: write          # trigger CI runs on own repo

Permissions are evaluated at job dispatch time, not at request time. For proposal (PR) jobs, permissions are read from the target branch (base branch) ci.yaml, not the source branch. A PR cannot grant itself elevated permissions — they only take effect after the permission change is reviewed and merged. See ci.md for details.

The on: trigger filter (which controls whether CI runs at all for a given proposal) is evaluated from the source branch ci.yaml. A PR author can therefore suppress or expand which base branches trigger CI on their branch, but this does not affect permissions, which remain base-branch-gated.

GCP metadata server

The entrypoint-worker.sh adds an iptables rule to block outbound traffic to 169.254.169.254 before buildkitd and dockerd start:

iptables -I OUTPUT -d 169.254.169.254 -j DROP

This is not a hard security boundary. Because build steps run with host networking and the Docker daemon is unauthenticated, a sufficiently motivated build step can remove this rule and reach the metadata server:

docker run --net=host --cap-add=NET_ADMIN alpine \
  sh -c "iptables -F OUTPUT && curl -H 'Metadata-Flavor: Google' \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"

The real security boundary is the ci-runner GCP service account's IAM grants, which are intentionally minimal (see below). The iptables rule is defense-in-depth.

ci-runner GCP service account permissions

The ci-runner@dlorenc-chainguard.iam.gserviceaccount.com SA is bound to the ci-worker Kubernetes service account via Workload Identity. Its grants are:

Scope Role Rationale
Project roles/artifactregistry.reader Pull the ci-worker container image

No other project-level roles. No bucket-level grants.

Notably absent and intentionally so: - No roles/cloudsql.client — ci-worker talks to ci-scheduler over HTTP; it never connects to the database directly. - No GCS access — log writes go through the docstore server's request_token-gated endpoint; ci-worker has no direct GCS dependency.

ci-registry cache access

The BuildKit layer cache registry uses a separate SA (ci-registry@dlorenc-chainguard.iam.gserviceaccount.com) with roles/storage.objectAdmin on the cache bucket. Access is scoped at two levels:

  1. Org-level: the OIDC JWT audience ci-registry is required.
  2. Repo-level: auth.go enforces exact repo equality — a token for acme/repo-a can only push/pull acme/repo-a:* refs, not acme/repo-b:*.

K8s service account token

The K8s SA token for the ci-worker pod is used to claim jobs from ci-scheduler (k8sproof validation). The scheduler enforces one-claim-per-pod: once a pod has claimed a job, its SA token cannot be used to claim another. A malicious build step that steals the SA token and calls /claim will receive a rejection.

What is NOT reachable

  • Other tenants' request_tokens or source archives — separate Kata VMs, no state sharing between jobs
  • The OIDC JWT signing key — lives in GCP KMS, never touches the VM
  • Cross-repo API operations — enforced at the OIDC JWT allowlist gate
  • Other tenants' presigned archive URLs — job.Repo == URL repo enforced in the presign handler
  • Cross-org ci-registry operations — enforced in auth.go
  • Cloud SQL — ci-runner SA has no cloudsql.client grant
  • Other tenants' build logs — ci-runner SA has no GCS grants; log access goes through the docstore server which enforces repo-level authorization

Residual risks and future work

  • iptables bypass: a privileged build step with Docker daemon access can remove the metadata server block. Mitigated by minimal SA permissions. Long-term fix: run buildkitd/dockerd as a separate less-privileged process, or use a network policy at the Kata VM level.
  • Cluster-internal network: host networking gives build steps access to cluster services. The ci-scheduler and docstore server do not accept requests from arbitrary cluster workloads, but this is worth hardening with NetworkPolicy.
  • Cache poisoning within same org: repo-level scoping in ci-registry prevents cross-repo cache poisoning. Cache integrity relies on BuildKit's content-addressable layer verification.

Raw markdown — machine-readable source for this page.