CI System
DocStore has a built-in CI system. When a commit is pushed to any branch, the server emits a com.docstore.commit.created webhook event. The CI system picks up the event, fetches the repo source, and runs the checks defined in .docstore/ci.yaml on main.
Architecture
docstore server
│ POST /subscriptions webhook
│ X-DocStore-Signature: sha256=<hmac>
▼
ci-scheduler (GKE, 1 replica)
│ INSERT ci_jobs (status=queued)
│ reap stale claimed jobs every 30s
│ POST /claim ← K8s SA token auth + pod provenance checks
│ POST /jobs/{id}/heartbeat ← request_token auth
│ POST /jobs/{id}/complete ← request_token auth
▼
ci-worker pods (GKE, KEDA ScaledJob, Kata CLH microVMs)
│ POST /claim with K8s SA token
│ receive job + request_token + oidc_token_url
│ exchange request_token → OIDC JWT
│ POST /archive/presign with request_token → presigned URL
│ fetch .docstore/ci.yaml (JWT auth)
│ run checks via BuildKit LLB (presigned source URL, no credentials)
│ upload logs to GCS
│ POST /repos/{repo}/-/check (JWT auth)
│ POST /jobs/{id}/complete (request_token)
└─ exit (KEDA creates a fresh pod for the next job)
Components
ci-scheduler (cmd/ci-scheduler)
A lightweight HTTP service that:
- Receives webhook deliveries from the docstore outbox at
POST /webhook. - Verifies the HMAC-SHA256 signature (
X-DocStore-Signature: sha256=<hex>). - Parses
com.docstore.commit.createdevents and inserts a row intoci_jobs. - Serves job status at
GET /run/{id}. - Proxies live logs at
GET /run/{id}/logs/{check}— either reverse-proxying to the worker pod (while the job is claimed) or redirecting to the GCS log URL (after completion). - Manual CI triggers are handled by the main docstore server at
POST /repos/:name/-/ci/run(IAP-protected, writer+ RBAC). - Runs a stale-job reaper every 30 seconds to reclaim jobs that have missed their heartbeat.
- Serves
POST /claim— validates K8s projected SA tokens with pod provenance checks, claims the next queued job, and returns arequest_token. - Serves
POST /jobs/{id}/heartbeat— request_token authenticated; updates the job heartbeat. - Serves
POST /jobs/{id}/complete— request_token authenticated; marks the job passed or failed. - Serves
GET /queue-depth— returns the count of queued jobs for the KEDA autoscaler.
Flags / environment variables:
| Flag | Env var | Description |
|---|---|---|
-port |
— | HTTP listen port (default 8080) |
-docstore-url |
DOCSTORE_URL |
Base URL of the docstore server |
-scheduler-url |
RUNNER_URL |
Public URL of this scheduler (used to auto-register the webhook subscription) |
-webhook-secret |
WEBHOOK_SECRET |
HMAC secret for webhook signature verification |
-oidc-token-url |
OIDC_TOKEN_URL |
URL of the CI OIDC token endpoint returned to workers on /claim |
| — | DATABASE_URL |
PostgreSQL DSN (required) |
| — | LOG_LEVEL |
Log level |
| — | LOG_FORMAT |
json (default) or text |
On startup, if DOCSTORE_URL, RUNNER_URL, and WEBHOOK_SECRET are all set, the scheduler self-registers a webhook subscription with the docstore server.
ci-worker (cmd/ci-worker)
A process that:
- Waits for buildkitd (
tcp://localhost:1234) and dockerd (tcp://localhost:2375) to be ready (up to 5 minutes each). - Calls
POST /claimon ci-scheduler with its Kubernetes projected service account token to claim a queued job. Returns 204 (no job) or a JSON body with the job,request_token, andoidc_token_url. - Exchanges the
request_tokenfor a short-lived OIDC JWT by callingoidc_token_url. - Sends a heartbeat to ci-scheduler via
POST /jobs/{id}/heartbeat(authenticated withrequest_token) every 30 seconds while the job runs. - Executes the job (see below).
- Reports final status to ci-scheduler via
POST /jobs/{id}/complete(authenticated withrequest_token). - Exits. KEDA creates a fresh pod for the next job.
Required environment variables:
| Variable | Description |
|---|---|
DOCSTORE_URL |
Base URL of the docstore server |
CI_SCHEDULER_URL |
URL of the ci-scheduler service |
LOG_STORE |
Log backend: gcs or local |
LOG_BUCKET |
GCS bucket for logs (required when LOG_STORE=gcs) |
BUILDKIT_ADDR |
buildkitd address (default tcp://localhost:1234) |
The Kubernetes projected service account token is auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/token and is read automatically by the worker. No explicit env var is required.
The worker serves live logs on port 8081 at GET /logs/{check}. The scheduler's log proxy endpoint reverse-proxies to this port while the job is running.
Entrypoint (entrypoint-worker.sh)
The Dockerfile.ci-worker sets this script as the container entrypoint. It:
- Fetches a GCP workload identity token from the instance metadata server and writes
~/.docker/config.jsonto authenticate buildkitd against GCR and Artifact Registry. - Creates loop device nodes (
/dev/loop0through/dev/loop7) manually — udev does not run inside Kata VMs. - Creates a 20 GiB sparse file, formats it as ext4, and mounts it at
/var/lib/buildkit. The Kata CLH guest rootfs is virtiofs, which does not support overlayfs upper directories; ext4 does. - Starts buildkitd at
tcp://localhost:1234with--oci-worker-net=hostand--oci-worker-snapshotter=overlayfs. - Starts dockerd at
tcp://127.0.0.1:2375. execsci-worker.
Kata Container isolation
ci-worker pods use runtimeClassName: kata-clh, which runs each pod inside a Kata Cloud-Hypervisor microVM. This provides a real Linux kernel per pod, so Docker and BuildKit run natively without privileged containers at the host level. The securityContext.privileged: true flag in the pod spec applies inside the VM only.
Authentication and Job Identity
Workers no longer authenticate with a shared database credential. The authentication chain is:
1. K8s token validation (POST /claim)
Worker pods present their Kubernetes projected service account token in Authorization: Bearer <token> to ci-scheduler's POST /claim endpoint. ci-scheduler validates it via the Kubernetes TokenReview API and verifies pod provenance:
- Token is valid and issued for
system:serviceaccount:docstore-ci:ci-worker - Pod was created within the last 4 hours (freshness check)
- Pod's owner chain: Pod →
batch/v1Job →keda.sh/v1alpha1ScaledJob namedci-worker - Each pod handles at most one job (enforced by job claiming semantics)
2. request_token
On successful claim, ci-scheduler generates a cryptographically random request_token (32 bytes, base64url-encoded). The SHA-256 hash is stored in the database; the plaintext is returned once to the worker. The request_token is used to:
- Authenticate
POST /jobs/{id}/heartbeat - Authenticate
POST /jobs/{id}/complete - Authenticate
POST /repos/{repo}/-/archive/presign - Exchange for an OIDC JWT
3. OIDC JWT exchange
The worker calls POST <oidc_token_url> (the ci-oidc Cloud Run internal service) with the request_token. The response is a signed JWT with these claims:
| Claim | Value |
|---|---|
iss |
https://oidc.docstore.dev |
sub |
repo:{repo}:branch:{branch}:check:{check_name} |
aud |
docstore |
jti |
unique UUID (written to ci_oidc_tokens audit table) |
job_id |
CI job UUID |
repo |
full repo name (e.g. acme/myrepo) |
org |
first path segment of repo |
branch |
branch name |
check_name |
check name (may be empty) |
ref_type |
post-submit, pre-submit, schedule, or manual |
sequence |
head sequence number |
exp |
iat + 1 hour |
JWTs are signed with RS256 using a Cloud KMS asymmetric key. The private key never leaves KMS.
4. API authentication
Workers include the JWT as Authorization: Bearer <token> on all docstore API calls (fetch config, post check results).
5. Job identity on check runs
When a worker posts check results via POST /repos/:repo/-/check, the docstore server validates the JWT via JobTokenMiddleware and records the identity as ci-job:{job_id} — the reporter on the check run.
Presigned Archive URLs
BuildKit cannot hold bearer tokens. To fetch the source archive securely without embedding credentials in the build context:
- Worker calls
POST /repos/{repo}/-/archive/presignwithAuthorization: Bearer {request_token}. - Server verifies the
request_tokenmatches the job and the repo in the token matches the request path. - Server generates an HMAC-SHA256 signed URL valid for 1 hour:
https://docstore.dev/repos/{repo}/-/archive?branch={branch}&at={seq}&expires={unix}&sig={hex} - The HMAC covers
repo,branch,sequence, andexpires(newline-separated) with a server-side secret. - Worker passes this URL as the BuildKit source — no credentials are needed to fetch it.
Job execution flow
For each job, the ci-worker:
-
Fetches CI config. Downloads
.docstore/ci.yamlfrom the branch under test at the job's head sequence viaGET /repos/{repo}/-/file/.docstore/ci.yaml, authenticated with the OIDC JWT. -
Gets a presigned archive URL. Calls
POST /repos/{repo}/-/archive/presignwith therequest_tokento obtain a time-limited, HMAC-signed URL for the source archive. No credentials are embedded in the URL itself. -
Marks checks pending. For each check in
ci.yaml, postsPOST /repos/{repo}/-/checkwithstatus=pending, authenticated with the OIDC JWT. -
Runs all checks in parallel via the executor (
internal/executor). The executor fetches the source archive via the presigned URL using BuildKit'sllb.HTTPsource. -
Uploads logs. Writes each check's log to a temp file (served live at
:8081) and uploads to GCS. -
Posts results. For each check result, posts
POST /repos/{repo}/-/checkwithstatus=passedorstatus=failedand the GCS log URL, authenticated with the OIDC JWT.
OIDC Service
The OIDC identity provider runs as two separate Cloud Run services using the same ci-oidc image:
ci-oidc (internal ingress — GKE only)
Deployed with --ingress=internal, reachable from GKE worker pods via VPC. Serves the full endpoint set:
POST /ci/token— acceptsrequest_token, issues a signed JWT for the jobGET /.well-known/openid-configuration— OIDC discovery documentGET /.well-known/jwks.json— RSA public key for JWT verificationGET /healthz
Required env vars: DATABASE_URL, KMS_KEY_VERSION
ci-oidc-public (public — oidc.docstore.dev)
Deployed with --ingress=all --allow-unauthenticated and PUBLIC_ONLY=true. Serves only the discovery endpoints (no database access, no token issuance):
GET /.well-known/openid-configurationGET /.well-known/jwks.jsonGET /healthz
Required env vars: KMS_KEY_VERSION only (no database access)
The CNAME oidc.docstore.dev → ghs.googlehosted.com is configured via Cloud Run domain mapping so that external systems (GCP WIF, AWS STS, HashiCorp Vault) can discover the JWKS and verify job JWTs.
Executor (internal/executor)
The executor translates a Config into a BuildKit LLB DAG and dispatches it to buildkitd.
For each check (run in parallel via goroutines):
- Resolves the image config from the registry to extract environment variables (e.g.
PATHingolang:1.25). - Injects all image ENV variables and
DOCKER_HOST=tcp://localhost:2375so check steps can use Docker. - Fetches the source archive from the presigned URL via BuildKit
llb.HTTP. - Chains the steps sequentially: each step runs
sh -c <step>in the image with/srcmounted. The/srcmount is passed forward between steps so file mutations persist across steps within the same check. - Marshals the LLB DAG and calls
client.Build. Log output is collected from theSolveStatuschannel. - Returns
CheckResult{Name, Status, Logs}.
.docstore/ci.yaml DSL
The CI configuration file lives at .docstore/ci.yaml on main. Example:
checks:
- name: ci/build
image: golang:1.25
steps:
- go build ./...
- name: ci/test
image: golang:1.25
steps:
- go test ./...
- name: ci/vet
image: golang:1.25
steps:
- go vet ./...
Fields
checks— List of checks to run in parallel.checks[].name— Check name reported to docstore (e.g.ci/build). Must be unique within the file.checks[].image— OCI image to use as the build environment. The image is pulled by buildkitd.checks[].steps— Shell commands run in sequence inside the image with the source mounted at/src. Each step is executed assh -c <step>. Steps within a check run sequentially; file mutations in/srcpersist between steps.
Notes
- The config is always read from the branch under test at its head sequence, not from
main. However,on:trigger filtering is evaluated againstmainat the time of enqueueing. - All checks run in parallel.
- A check fails if any step exits non-zero.
DOCKER_HOST=tcp://localhost:2375is injected automatically, so check steps can rundocker build,docker push, etc. The dockerd is running in the same Kata VM.DOCSTORE_OIDC_REQUEST_TOKENandDOCSTORE_OIDC_REQUEST_URLare injected into each check's environment so steps can themselves obtain OIDC tokens if needed.
ci_jobs table
The scheduler and workers coordinate through a PostgreSQL table (ci_jobs). Key columns:
| Column | Type | Description |
|---|---|---|
id |
UUID | Job ID |
repo |
text | Repo full name |
branch |
text | Branch name |
sequence |
int8 | Head sequence at time of event |
status |
text | queued, claimed, passed, failed |
worker_pod_name |
text | Pod name of the claiming worker (from K8s token claims) |
worker_pod_ip |
text | IP of the claiming worker (for log proxy) |
request_token_hash |
text | SHA-256 hash of the issued request_token |
last_heartbeat_at |
timestamptz | Updated every 30s by the worker |
log_url |
text | GCS URL for the first check's logs |
error_message |
text | Set if the job itself failed (not a check) |
A job is considered stale if its heartbeat is more than 90 seconds old and its status is claimed. The scheduler reaps stale jobs back to queued every 30 seconds.
Webhook signature verification
The docstore outbox signs webhook deliveries with HMAC-SHA256:
X-DocStore-Signature: sha256=<hex(hmac-sha256(body, secret))>
The scheduler verifies this signature before processing any event. Mismatched signatures are rejected with HTTP 400.
Raw markdown — machine-readable source for this page.