docs: add CD pipeline design spec (BuildKit + Flux GitOps)
This commit is contained in:
218
docs/superpowers/specs/2026-04-20-cd-pipeline-design.md
Normal file
218
docs/superpowers/specs/2026-04-20-cd-pipeline-design.md
Normal file
@@ -0,0 +1,218 @@
|
|||||||
|
# CD Pipeline Design
|
||||||
|
|
||||||
|
**Date:** 2026-04-20
|
||||||
|
**Status:** Approved for implementation
|
||||||
|
|
||||||
|
## Problem statement
|
||||||
|
|
||||||
|
The supervisor (and future services on the koala k3s cluster) have no automated deployment path after CI passes. Images are not built, the cluster is updated manually, and there is no audit trail for what is running where.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
After a push to `main` passes CI, automatically build a container image, push it to the Gitea registry, and update the cluster via GitOps — with a design that scales to many repos and services without per-repo kubeconfig or secret sprawl.
|
||||||
|
|
||||||
|
## Success criteria
|
||||||
|
|
||||||
|
- [ ] Successful `main` push triggers image build and push to `gitea.d-ma.be/<org>/<repo>:<git-sha>`
|
||||||
|
- [ ] Infra repo receives a commit updating the image tag for the deployed service
|
||||||
|
- [ ] Flux reconciles within 60s of the infra repo commit; pod runs the new image
|
||||||
|
- [ ] Rollback = one commit to infra repo reverting the tag
|
||||||
|
- [ ] Secrets (app secrets, registry pull) are SOPS-encrypted in infra repo; no manual `kubectl create secret`
|
||||||
|
- [ ] Adding a new service requires only: adding `apps/<service>/` to infra repo + `cd.yml` to the app repo
|
||||||
|
- [ ] Zero changes to the k3s cluster networking or runner configuration
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- Gitea Actions self-hosted runner runs as a **systemd host process** on koala — not a k8s pod; cannot use cluster DNS
|
||||||
|
- k3s uses containerd; no Docker daemon, no nerdctl on koala
|
||||||
|
- Flux is already running (core controllers only); image-reflector/image-automation are NOT installed and will NOT be added
|
||||||
|
- SOPS + age is the secret management standard; no plaintext Secrets in git
|
||||||
|
- All org-level Gitea secrets are shared across repos — minimize the set
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- Multi-cluster promotion (koala only for now; infra repo structure supports adding clusters later)
|
||||||
|
- Automated rollback on health check failure (manual rollback via infra repo commit)
|
||||||
|
- Build caching beyond BuildKit's local disk cache
|
||||||
|
- PR preview environments
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
App repo (supervisor, n8n, etc.)
|
||||||
|
↓ push to main
|
||||||
|
Gitea Actions — ci.yml (lint + test)
|
||||||
|
↓ passes
|
||||||
|
Gitea Actions — cd.yml
|
||||||
|
├─ 1. buildctl → BuildKit (unix socket on koala host)
|
||||||
|
│ → pushes gitea.d-ma.be/<org>/<repo>:<git-sha>
|
||||||
|
├─ 2. Clone infra repo (SSH deploy key)
|
||||||
|
│ → patch apps/<service>/deployment.yaml IMAGE_TAG → <git-sha>
|
||||||
|
│ → git commit + push
|
||||||
|
└─ done
|
||||||
|
|
||||||
|
gitea.d-ma.be/mathias/infra (Flux source)
|
||||||
|
↓ Flux source-controller detects new commit (30s interval)
|
||||||
|
kustomize-controller
|
||||||
|
└─ applies apps/<service>/kustomization.yaml → k3s namespace
|
||||||
|
↓
|
||||||
|
pod runs new image (pulls from gitea.d-ma.be with imagePullSecret)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### 1. BuildKit — systemd service on koala
|
||||||
|
|
||||||
|
BuildKit runs as a rootless systemd service on the koala host, identical to the Gitea runner pattern already in use.
|
||||||
|
|
||||||
|
- Socket: `unix:///run/user/<uid>/buildkit/buildkitd.sock` (rootless) or `/run/buildkit/buildkitd.sock` (root)
|
||||||
|
- Cache: local disk at default BuildKit cache path — persists across builds
|
||||||
|
- Access: `buildctl --addr unix:///run/buildkit/buildkitd.sock` from the runner process (same host, same user)
|
||||||
|
- No k3s involvement for builds
|
||||||
|
|
||||||
|
### 2. Gitea Actions — `cd.yml`
|
||||||
|
|
||||||
|
Separate workflow file; triggers on `main` push after `ci.yml` succeeds.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: cd
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
deploy:
|
||||||
|
needs: [ci] # or workflow_run trigger — see implementation plan
|
||||||
|
runs-on: [self-hosted, koala]
|
||||||
|
env:
|
||||||
|
IMAGE: gitea.d-ma.be/${{ github.repository }}:${{ github.sha }}
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
- name: Build and push
|
||||||
|
run: |
|
||||||
|
buildctl --addr unix:///run/buildkit/buildkitd.sock \
|
||||||
|
build \
|
||||||
|
--frontend dockerfile.v0 \
|
||||||
|
--local context=. \
|
||||||
|
--local dockerfile=. \
|
||||||
|
--output type=image,name=$IMAGE,push=true
|
||||||
|
env:
|
||||||
|
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
|
||||||
|
- name: Update infra repo
|
||||||
|
run: |
|
||||||
|
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra
|
||||||
|
cd /tmp/infra
|
||||||
|
sed -i "s|IMAGE_TAG|${{ github.sha }}|g" apps/${{ env.SERVICE_NAME }}/deployment.yaml
|
||||||
|
git config user.email "cd-bot@d-ma.be"
|
||||||
|
git config user.name "CD Bot"
|
||||||
|
git add apps/${{ env.SERVICE_NAME }}/deployment.yaml
|
||||||
|
git commit -m "chore(deploy): ${{ env.SERVICE_NAME }} → ${{ github.sha }}"
|
||||||
|
git push
|
||||||
|
env:
|
||||||
|
GIT_SSH_COMMAND: ssh -i /tmp/infra-deploy-key -o StrictHostKeyChecking=no
|
||||||
|
```
|
||||||
|
|
||||||
|
`SERVICE_NAME` is set per-repo (either hardcoded in `cd.yml` or derived from the repo name).
|
||||||
|
|
||||||
|
### 3. Org-level Gitea secrets
|
||||||
|
|
||||||
|
Three secrets, set once, inherited by all repos:
|
||||||
|
|
||||||
|
| Secret | Purpose |
|
||||||
|
|--------|---------|
|
||||||
|
| `BUILDKIT_REGISTRY_AUTH` | credentials for pushing to `gitea.d-ma.be` (buildctl `--opt` or `~/.docker/config.json`) |
|
||||||
|
| `INFRA_DEPLOY_KEY` | SSH private key with write access to `gitea.d-ma.be/mathias/infra` |
|
||||||
|
| `KUBECONFIG_KOALA` | (optional) kubeconfig for manual `kubectl` steps if ever needed; scoped ServiceAccount |
|
||||||
|
|
||||||
|
### 4. Infra repo structure
|
||||||
|
|
||||||
|
```
|
||||||
|
gitea.d-ma.be/mathias/infra
|
||||||
|
├── clusters/
|
||||||
|
│ └── koala/
|
||||||
|
│ └── kustomization.yaml # points at ../../apps/*/
|
||||||
|
├── apps/
|
||||||
|
│ ├── supervisor/
|
||||||
|
│ │ ├── namespace.yaml
|
||||||
|
│ │ ├── deployment.yaml # image: gitea.d-ma.be/mathias/supervisor:IMAGE_TAG
|
||||||
|
│ │ ├── service.yaml
|
||||||
|
│ │ ├── secrets.enc.yaml # SOPS-encrypted app secrets (ANTHROPIC_API_KEY, etc.)
|
||||||
|
│ │ └── kustomization.yaml
|
||||||
|
│ ├── n8n/
|
||||||
|
│ │ └── ...
|
||||||
|
│ └── imagepullsecret/
|
||||||
|
│ └── secret.enc.yaml # SOPS-encrypted imagePullSecret for gitea.d-ma.be
|
||||||
|
└── flux-system/ # existing Flux bootstrap manifests
|
||||||
|
```
|
||||||
|
|
||||||
|
Adding a new service = add `apps/<service>/` directory. The `clusters/koala/kustomization.yaml` uses a glob or explicit list.
|
||||||
|
|
||||||
|
### 5. SOPS + age for Flux
|
||||||
|
|
||||||
|
Flux decrypts SOPS-encrypted files at apply time using an age key stored as a k8s Secret in the `flux-system` namespace. Setup:
|
||||||
|
|
||||||
|
1. Generate age keypair: `age-keygen`
|
||||||
|
2. Store private key: `kubectl create secret generic sops-age --from-file=age.agekey -n flux-system`
|
||||||
|
3. Configure Flux Kustomization with `decryption.provider: sops`
|
||||||
|
4. Encrypt secrets before committing: `sops --encrypt --age <pubkey> secret.yaml > secret.enc.yaml`
|
||||||
|
|
||||||
|
App secrets (e.g., `ANTHROPIC_API_KEY`) and the registry pull secret live as encrypted files in `apps/<service>/` and `apps/imagepullsecret/` respectively.
|
||||||
|
|
||||||
|
### 6. Image pull secret
|
||||||
|
|
||||||
|
Each app namespace needs a `kubernetes.io/dockerconfigjson` Secret to pull from `gitea.d-ma.be`. This Secret is SOPS-encrypted in `apps/imagepullsecret/` and applied to each app namespace via Kustomize `namespace` field or a shared Kustomize component.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data flow: supervisor deploy
|
||||||
|
|
||||||
|
1. Push to `supervisor` main → CI passes (lint/test/vet)
|
||||||
|
2. CD job builds image: `gitea.d-ma.be/mathias/supervisor:abc1234`
|
||||||
|
3. CD job clones infra repo, patches `apps/supervisor/deployment.yaml`, commits
|
||||||
|
4. Flux source-controller detects infra commit within 30s
|
||||||
|
5. kustomize-controller applies `apps/supervisor/kustomization.yaml`
|
||||||
|
6. Flux decrypts `secrets.enc.yaml` → k8s Secret in `supervisor` namespace
|
||||||
|
7. k3s pulls `gitea.d-ma.be/mathias/supervisor:abc1234` using imagePullSecret
|
||||||
|
8. Pod starts with new image; previous pod terminates
|
||||||
|
|
||||||
|
Rollback: `git revert <tag-commit>` in infra repo → Flux reconciles → old image deployed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error handling
|
||||||
|
|
||||||
|
| Scenario | Behaviour |
|
||||||
|
|----------|-----------|
|
||||||
|
| CI fails | `cd.yml` does not run (`needs: ci` gate) |
|
||||||
|
| BuildKit unreachable | `buildctl` exits non-zero → workflow fails; infra repo untouched |
|
||||||
|
| Image push fails | Workflow fails; infra repo untouched; cluster unchanged |
|
||||||
|
| Infra repo push conflict | Retry once with rebase; fail and alert if still conflicting |
|
||||||
|
| Flux reconcile error | Notification-controller fires alert; pods stay on previous image |
|
||||||
|
| Pod image pull fails | `ImagePullBackOff`; Flux reports degraded Kustomization |
|
||||||
|
| SOPS decrypt fails | Kustomization fails; Flux reports error; no partial apply |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing approach
|
||||||
|
|
||||||
|
1. **BuildKit smoke test** — `buildctl build` with a trivial one-line Dockerfile; verify image appears in Gitea registry
|
||||||
|
2. **cd.yml dry run** — trigger manually on a test branch; verify infra repo commit contains correct sha
|
||||||
|
3. **Flux reconcile test** — push infra commit; verify `flux get kustomizations` shows `Ready` and pod runs new image sha
|
||||||
|
4. **Pull secret test** — delete pod, verify it restarts and pulls from Gitea registry without `ImagePullBackOff`
|
||||||
|
5. **SOPS round-trip test** — encrypt a dummy secret, push to infra repo, verify Flux decrypts and `kubectl get secret` shows correct data
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| BuildKit socket path varies by user/rootless mode | Confirm path during setup; hardcode in `cd.yml` |
|
||||||
|
| Infra repo concurrent pushes (multiple repos deploying simultaneously) | Git rebase retry handles this; unlikely at current scale |
|
||||||
|
| age private key lost | Back up to SOPS-accessible location; document recovery procedure |
|
||||||
|
| Registry storage fills up | Set Gitea registry tag retention policy (keep last 20 per repo) |
|
||||||
|
| Gitea deploy key compromised | Rotate via Gitea UI; single key for infra repo only |
|
||||||
Reference in New Issue
Block a user