docs: add CD pipeline implementation plan
This commit is contained in:
923
docs/superpowers/plans/2026-04-20-cd-pipeline.md
Normal file
923
docs/superpowers/plans/2026-04-20-cd-pipeline.md
Normal file
@@ -0,0 +1,923 @@
|
|||||||
|
# CD Pipeline Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Build a GitOps CD pipeline that automatically builds a container image on `main` push and deploys it to k3s on koala via Flux.
|
||||||
|
|
||||||
|
**Architecture:** BuildKit runs as a systemd service on koala (same host as the Gitea runner); CD pushes images to the Gitea registry and commits image tag updates to the infra repo; Flux reconciles within 60s. App secrets (including ANTHROPIC_API_KEY) are SOPS-encrypted in the infra repo and decrypted by Flux at apply time.
|
||||||
|
|
||||||
|
**Tech Stack:** Go 1.26, Node.js 22 (for claude CLI), BuildKit (buildctl), Gitea Actions, Flux (kustomize-controller), SOPS + age, k3s/containerd
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment context
|
||||||
|
|
||||||
|
This plan spans three environments. Each task header notes which environment it runs in:
|
||||||
|
|
||||||
|
- **[this-repo]** — `/Users/mathias/Documents/local-dev/AI/supervisor` on flamingo
|
||||||
|
- **[koala-ssh]** — `ssh koala` (run commands via `ssh koala "..."`)
|
||||||
|
- **[infra-repo]** — `gitea.d-ma.be/mathias/infra` (clone to a temp dir, work there, push)
|
||||||
|
- **[gitea-ui]** — Gitea web UI at `https://gitea.d-ma.be`
|
||||||
|
- **[kubectl]** — kubectl from flamingo (home LAN)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File map
|
||||||
|
|
||||||
|
**This repo (supervisor):**
|
||||||
|
- Create: `Dockerfile`
|
||||||
|
- Create: `.gitea/workflows/cd.yml`
|
||||||
|
|
||||||
|
**koala host:**
|
||||||
|
- Create: `/etc/systemd/system/buildkitd.service` (or user-level equivalent)
|
||||||
|
- Create: `/root/.config/buildkit/buildkitd.toml` (registry auth config)
|
||||||
|
|
||||||
|
**Infra repo (`gitea.d-ma.be/mathias/infra`):**
|
||||||
|
- Create: `apps/supervisor/namespace.yaml`
|
||||||
|
- Create: `apps/supervisor/deployment.yaml`
|
||||||
|
- Create: `apps/supervisor/service.yaml`
|
||||||
|
- Create: `apps/supervisor/secrets.enc.yaml` (SOPS-encrypted)
|
||||||
|
- Create: `apps/supervisor/kustomization.yaml`
|
||||||
|
- Create: `apps/imagepullsecret/secret.enc.yaml` (SOPS-encrypted)
|
||||||
|
- Create: `apps/imagepullsecret/kustomization.yaml`
|
||||||
|
- Modify: `clusters/koala/kustomization.yaml` (add supervisor + imagepullsecret)
|
||||||
|
- Modify: `flux-system/kustomization.yaml` or relevant Flux Kustomization CRD (add SOPS decryption)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: Dockerfile [this-repo]
|
||||||
|
|
||||||
|
The supervisor binary depends on the `claude` CLI as a subprocess. The image uses a multi-stage build: Go builder stage compiles the binary; the runtime stage is Node.js (for `npm install -g @anthropic-ai/claude-code`). Config files are baked in. The `brain/` directory is a volume mount.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `Dockerfile`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Verify no Dockerfile exists**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls Dockerfile 2>/dev/null || echo "confirmed: no Dockerfile"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `confirmed: no Dockerfile`
|
||||||
|
|
||||||
|
- [ ] **Step 2: Create the Dockerfile**
|
||||||
|
|
||||||
|
```dockerfile
|
||||||
|
# syntax=docker/dockerfile:1
|
||||||
|
|
||||||
|
# ── Build stage ───────────────────────────────────────────────────────────────
|
||||||
|
FROM golang:1.26-bookworm AS builder
|
||||||
|
|
||||||
|
ARG VERSION=dev
|
||||||
|
WORKDIR /src
|
||||||
|
|
||||||
|
COPY go.mod go.sum ./
|
||||||
|
RUN go mod download
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
|
||||||
|
go build -trimpath -ldflags="-s -w -X main.version=${VERSION}" \
|
||||||
|
-o /out/supervisor ./cmd/supervisor
|
||||||
|
|
||||||
|
# ── Runtime stage ─────────────────────────────────────────────────────────────
|
||||||
|
# Node.js 22 slim — needed for claude CLI subprocess
|
||||||
|
FROM node:22-slim
|
||||||
|
|
||||||
|
# Install claude CLI (provides the `claude` binary the supervisor shells out to)
|
||||||
|
RUN npm install -g @anthropic-ai/claude-code \
|
||||||
|
&& claude --version \
|
||||||
|
&& echo "claude CLI installed"
|
||||||
|
|
||||||
|
# Copy supervisor binary
|
||||||
|
COPY --from=builder /out/supervisor /usr/local/bin/supervisor
|
||||||
|
|
||||||
|
# Bake in config (models.yaml + skill discipline files)
|
||||||
|
COPY config/ /app/config/
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# brain/ is writable state — mount a PersistentVolume here
|
||||||
|
VOLUME /app/brain
|
||||||
|
|
||||||
|
ENV SUPERVISOR_CONFIG_DIR=/app/config/supervisor
|
||||||
|
ENV SUPERVISOR_MODELS_FILE=/app/config/models.yaml
|
||||||
|
ENV SUPERVISOR_BRAIN_DIR=/app/brain
|
||||||
|
ENV SUPERVISOR_SESSIONS_DIR=/app/brain/sessions
|
||||||
|
ENV SUPERVISOR_PORT=3200
|
||||||
|
|
||||||
|
EXPOSE 3200
|
||||||
|
|
||||||
|
ENTRYPOINT ["/usr/local/bin/supervisor"]
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Build locally to verify it compiles (no push)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# buildctl must be available locally, OR use docker if available on flamingo
|
||||||
|
docker build --target builder -t supervisor-build-test . && echo "build stage OK"
|
||||||
|
# If no docker on flamingo, skip this step and verify at Task 3 on koala instead
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add Dockerfile
|
||||||
|
git commit -m "feat: add multi-stage Dockerfile with claude CLI runtime"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: BuildKit systemd service on koala [koala-ssh]
|
||||||
|
|
||||||
|
Install `buildkitd` as a root-level systemd service on koala. The Gitea runner process runs as root (confirmed by PID/cgroup), so the root socket at `/run/buildkit/buildkitd.sock` is accessible to it.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `/etc/systemd/system/buildkitd.service` on koala
|
||||||
|
- Create: `/etc/buildkit/buildkitd.toml` on koala (registry auth)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Check if buildkitd is already installed**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "buildkitd --version 2>/dev/null || echo 'not installed'"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Install buildkitd on koala**
|
||||||
|
|
||||||
|
Download the latest buildkit release binary (arm64 or amd64 — koala has x86_64):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
BUILDKIT_VERSION=v0.21.0
|
||||||
|
curl -sSL https://github.com/moby/buildkit/releases/download/\${BUILDKIT_VERSION}/buildkit-\${BUILDKIT_VERSION}.linux-amd64.tar.gz \
|
||||||
|
| tar -xz -C /usr/local/
|
||||||
|
buildkitd --version
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output includes: `buildkitd github.com/moby/buildkit v0.21.0`
|
||||||
|
|
||||||
|
- [ ] **Step 3: Create buildkitd.toml with Gitea registry auth**
|
||||||
|
|
||||||
|
The `[registry]` block configures auth for pushing to `gitea.d-ma.be`. The actual credentials come from `~/.docker/config.json` (which buildkitd reads automatically) — this toml just enables the registry:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
mkdir -p /etc/buildkit
|
||||||
|
cat > /etc/buildkit/buildkitd.toml << 'EOF'
|
||||||
|
[worker.containerd]
|
||||||
|
enabled = false
|
||||||
|
|
||||||
|
[worker.oci]
|
||||||
|
enabled = true
|
||||||
|
|
||||||
|
[registry.\"gitea.d-ma.be\"]
|
||||||
|
http = false
|
||||||
|
insecure = false
|
||||||
|
EOF
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Create systemd unit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
cat > /etc/systemd/system/buildkitd.service << 'EOF'
|
||||||
|
[Unit]
|
||||||
|
Description=BuildKit daemon
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=notify
|
||||||
|
ExecStart=/usr/local/bin/buildkitd --config /etc/buildkit/buildkitd.toml
|
||||||
|
Restart=on-failure
|
||||||
|
LimitNOFILE=1048576
|
||||||
|
LimitNPROC=1048576
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable buildkitd
|
||||||
|
systemctl start buildkitd
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Verify the socket exists and is responsive**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
systemctl status buildkitd --no-pager
|
||||||
|
buildctl --addr unix:///run/buildkit/buildkitd.sock debug info
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: service `active (running)`, buildctl shows BuildKit version info.
|
||||||
|
|
||||||
|
- [ ] **Step 6: Smoke-test build with trivial Dockerfile**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
echo 'FROM alpine:3.21
|
||||||
|
RUN echo hello' | buildctl --addr unix:///run/buildkit/buildkitd.sock build \
|
||||||
|
--frontend dockerfile.v0 \
|
||||||
|
--local context=/ \
|
||||||
|
--opt filename=Dockerfile \
|
||||||
|
--output type=image,name=localhost/smoke-test:latest
|
||||||
|
echo 'smoke test OK'
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `smoke test OK`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: Gitea registry push auth for buildctl [koala-ssh]
|
||||||
|
|
||||||
|
`buildctl` reads Docker-style credentials from `/root/.docker/config.json`. Create the credentials file so the runner can push to `gitea.d-ma.be`.
|
||||||
|
|
||||||
|
**Prerequisites:** A Gitea user token or password with `write:packages` scope for the `mathias` org. Create one in Gitea → User Settings → Applications → Generate Token (scopes: `write:packages`).
|
||||||
|
|
||||||
|
- [ ] **Step 1: Create Gitea access token**
|
||||||
|
|
||||||
|
In Gitea UI (`https://gitea.d-ma.be`) → top-right avatar → Settings → Applications → Generate New Token.
|
||||||
|
- Token name: `buildkit-push`
|
||||||
|
- Scopes: `write:packages` (container registry write)
|
||||||
|
- Copy the token — it won't be shown again.
|
||||||
|
|
||||||
|
- [ ] **Step 2: Write docker config.json on koala**
|
||||||
|
|
||||||
|
Replace `<TOKEN>` with the token from Step 1:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
mkdir -p /root/.docker
|
||||||
|
TOKEN=<TOKEN>
|
||||||
|
AUTH=\$(echo -n 'mathias:'\${TOKEN} | base64)
|
||||||
|
cat > /root/.docker/config.json << EOF
|
||||||
|
{
|
||||||
|
\"auths\": {
|
||||||
|
\"gitea.d-ma.be\": {
|
||||||
|
\"auth\": \"\${AUTH}\"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
chmod 600 /root/.docker/config.json
|
||||||
|
echo 'credentials written'
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Verify push works**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh koala "
|
||||||
|
echo 'FROM alpine:3.21' | buildctl --addr unix:///run/buildkit/buildkitd.sock build \
|
||||||
|
--frontend dockerfile.v0 \
|
||||||
|
--local context=/ \
|
||||||
|
--opt filename=Dockerfile \
|
||||||
|
--output type=image,name=gitea.d-ma.be/mathias/supervisor:push-test,push=true
|
||||||
|
echo 'push OK'
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `push OK`. Verify in Gitea UI: `https://gitea.d-ma.be/mathias/supervisor/packages` should show a `push-test` tag.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Delete the test image tag**
|
||||||
|
|
||||||
|
In Gitea UI → supervisor repo → Packages tab → delete the `push-test` tag.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: age keypair + Flux SOPS decryption [kubectl + flamingo]
|
||||||
|
|
||||||
|
Flux decrypts SOPS-encrypted secrets at apply time. It needs the age private key stored as a k8s Secret in the `flux-system` namespace.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Verify age is installed**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
age --version || brew install age
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Generate age keypair**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
age-keygen -o /tmp/supervisor-age.key
|
||||||
|
cat /tmp/supervisor-age.key
|
||||||
|
```
|
||||||
|
|
||||||
|
Output includes two lines:
|
||||||
|
```
|
||||||
|
# public key: age1xxxxxx...
|
||||||
|
AGE-SECRET-KEY-1xxxxxxx...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Copy the public key** (the `age1...` value) — you'll need it in Task 7 for encrypting secrets.
|
||||||
|
**Store the private key file securely** — back it up outside the cluster (e.g., 1Password or encrypted note).
|
||||||
|
|
||||||
|
- [ ] **Step 3: Create the SOPS age secret in flux-system**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl create secret generic sops-age \
|
||||||
|
--from-file=age.agekey=/tmp/supervisor-age.key \
|
||||||
|
-n flux-system
|
||||||
|
kubectl get secret sops-age -n flux-system
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: secret exists with `age.agekey` key.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Shred the temp key file**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
shred -u /tmp/supervisor-age.key
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Check what Flux Kustomization CRDs exist in the infra repo**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-sops-setup
|
||||||
|
ls /tmp/infra-sops-setup/flux-system/
|
||||||
|
```
|
||||||
|
|
||||||
|
Look for a `kustomization.yaml` or `gotk-sync.yaml` that defines the main Flux Kustomization resource pointing at the `clusters/koala/` path.
|
||||||
|
|
||||||
|
- [ ] **Step 6: Patch the Flux Kustomization to enable SOPS decryption**
|
||||||
|
|
||||||
|
Find the Kustomization resource that syncs `clusters/koala/`. It will look like:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: flux-system
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
path: ./clusters/koala
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Add the `decryption` block:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
decryption:
|
||||||
|
provider: sops
|
||||||
|
secretRef:
|
||||||
|
name: sops-age
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit the file in `/tmp/infra-sops-setup/flux-system/` and commit:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /tmp/infra-sops-setup
|
||||||
|
# Edit the relevant Kustomization yaml to add decryption block (shown above)
|
||||||
|
git add flux-system/
|
||||||
|
git commit -m "feat: enable SOPS decryption via age key in flux-system"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 7: Verify Flux picks up the change**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
flux reconcile source git flux-system
|
||||||
|
flux get kustomizations
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `flux-system` Kustomization shows `Ready True` with no errors.
|
||||||
|
|
||||||
|
- [ ] **Step 8: Clean up temp clone**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf /tmp/infra-sops-setup
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: Infra repo — supervisor app manifests [infra-repo]
|
||||||
|
|
||||||
|
Create the full k8s manifest set for the supervisor service in the infra repo. The deployment uses an `IMAGE_TAG` placeholder; the CD job patches this with the actual git sha before pushing.
|
||||||
|
|
||||||
|
**Prerequisites:** age public key from Task 4 Step 2.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Clone the infra repo**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-supervisor
|
||||||
|
cd /tmp/infra-supervisor
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Create namespace**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p apps/supervisor
|
||||||
|
cat > apps/supervisor/namespace.yaml << 'EOF'
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: supervisor
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Create deployment**
|
||||||
|
|
||||||
|
The `brain` volume is a `hostPath` on koala (simplest for a single-node service; add a PVC later if needed). The image uses `imagePullSecrets` to pull from the Gitea registry.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > apps/supervisor/deployment.yaml << 'EOF'
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: supervisor
|
||||||
|
namespace: supervisor
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: supervisor
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: supervisor
|
||||||
|
spec:
|
||||||
|
nodeSelector:
|
||||||
|
kubernetes.io/hostname: koala
|
||||||
|
imagePullSecrets:
|
||||||
|
- name: gitea-registry
|
||||||
|
containers:
|
||||||
|
- name: supervisor
|
||||||
|
image: gitea.d-ma.be/mathias/supervisor:IMAGE_TAG
|
||||||
|
ports:
|
||||||
|
- containerPort: 3200
|
||||||
|
envFrom:
|
||||||
|
- secretRef:
|
||||||
|
name: supervisor-secrets
|
||||||
|
env:
|
||||||
|
- name: SUPERVISOR_PORT
|
||||||
|
value: "3200"
|
||||||
|
- name: LITELLM_BASE_URL
|
||||||
|
value: "http://iguana:4000"
|
||||||
|
- name: LLAMA_SWAP_URL
|
||||||
|
value: "http://koala:8080"
|
||||||
|
- name: INGEST_BASE_URL
|
||||||
|
value: "http://localhost:3300"
|
||||||
|
volumeMounts:
|
||||||
|
- name: brain
|
||||||
|
mountPath: /app/brain
|
||||||
|
volumes:
|
||||||
|
- name: brain
|
||||||
|
hostPath:
|
||||||
|
path: /var/lib/supervisor/brain
|
||||||
|
type: DirectoryOrCreate
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Create service**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > apps/supervisor/service.yaml << 'EOF'
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: supervisor
|
||||||
|
namespace: supervisor
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: supervisor
|
||||||
|
ports:
|
||||||
|
- port: 3200
|
||||||
|
targetPort: 3200
|
||||||
|
type: ClusterIP
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Create kustomization.yaml for supervisor**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > apps/supervisor/kustomization.yaml << 'EOF'
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
resources:
|
||||||
|
- namespace.yaml
|
||||||
|
- deployment.yaml
|
||||||
|
- service.yaml
|
||||||
|
- secrets.enc.yaml
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 6: Ensure clusters/koala/kustomization.yaml exists and includes supervisor**
|
||||||
|
|
||||||
|
Check if the file exists:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat clusters/koala/kustomization.yaml 2>/dev/null || echo "need to create"
|
||||||
|
```
|
||||||
|
|
||||||
|
If it exists, add supervisor and imagepullsecret resources. If it does not exist, create it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > clusters/koala/kustomization.yaml << 'EOF'
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
resources:
|
||||||
|
- ../../apps/imagepullsecret
|
||||||
|
- ../../apps/supervisor
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
If it already exists, add the two resource lines (preserving existing entries).
|
||||||
|
|
||||||
|
- [ ] **Step 7: Commit (without secrets — those come in Task 6)**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /tmp/infra-supervisor
|
||||||
|
git add apps/supervisor/ clusters/koala/
|
||||||
|
git commit -m "feat(supervisor): add k8s manifests for supervisor service"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: SOPS-encrypted secrets in infra repo [infra-repo + flamingo]
|
||||||
|
|
||||||
|
Two encrypted secret files: the imagePullSecret for the Gitea container registry, and the supervisor app secrets (ANTHROPIC_API_KEY, LITELLM_API_KEY).
|
||||||
|
|
||||||
|
**Prerequisites:**
|
||||||
|
- age public key from Task 4 Step 2 (format: `age1xxxxx...`)
|
||||||
|
- `sops` installed (`brew install sops` if missing)
|
||||||
|
- Gitea registry token (same one used in Task 3, or create a read-only one for pulling)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Verify sops is installed**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sops --version || brew install sops
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Create .sops.yaml in infra repo root**
|
||||||
|
|
||||||
|
This tells sops which key to use for all files in the repo:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /tmp/infra-supervisor
|
||||||
|
cat > .sops.yaml << 'EOF'
|
||||||
|
creation_rules:
|
||||||
|
- age: age1REPLACE_WITH_YOUR_PUBLIC_KEY
|
||||||
|
EOF
|
||||||
|
git add .sops.yaml
|
||||||
|
git commit -m "chore: add sops config (age key)"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `age1REPLACE_WITH_YOUR_PUBLIC_KEY` with the actual age public key from Task 4.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Create and encrypt the imagePullSecret**
|
||||||
|
|
||||||
|
The imagePullSecret is a namespace-less Secret (it will be targeted per namespace via Kustomize). Create it in the `imagepullsecret` app:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p apps/imagepullsecret
|
||||||
|
|
||||||
|
# Create a registry pull token in Gitea: Settings → Applications → Generate Token
|
||||||
|
# Scopes: read:packages
|
||||||
|
# Use that token here (or reuse the buildkit-push token — read access is enough for pulling)
|
||||||
|
PULL_TOKEN=<gitea-read-packages-token>
|
||||||
|
PULL_AUTH=$(echo -n "mathias:${PULL_TOKEN}" | base64)
|
||||||
|
|
||||||
|
cat > /tmp/gitea-pull-secret.yaml << EOF
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: gitea-registry
|
||||||
|
namespace: supervisor
|
||||||
|
type: kubernetes.io/dockerconfigjson
|
||||||
|
stringData:
|
||||||
|
.dockerconfigjson: |
|
||||||
|
{
|
||||||
|
"auths": {
|
||||||
|
"gitea.d-ma.be": {
|
||||||
|
"auth": "${PULL_AUTH}"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
|
||||||
|
sops --encrypt /tmp/gitea-pull-secret.yaml > apps/imagepullsecret/secret.enc.yaml
|
||||||
|
rm /tmp/gitea-pull-secret.yaml
|
||||||
|
|
||||||
|
cat > apps/imagepullsecret/kustomization.yaml << 'EOF'
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
resources:
|
||||||
|
- secret.enc.yaml
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify the encrypted file looks correct (should show `sops:` metadata at the bottom):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tail -20 apps/imagepullsecret/secret.enc.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Create and encrypt supervisor app secrets**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# ANTHROPIC_API_KEY: your Anthropic API key
|
||||||
|
# LITELLM_API_KEY: the key your LiteLLM instance expects (can be any string if it's local)
|
||||||
|
cat > /tmp/supervisor-secrets.yaml << 'EOF'
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: supervisor-secrets
|
||||||
|
namespace: supervisor
|
||||||
|
type: Opaque
|
||||||
|
stringData:
|
||||||
|
ANTHROPIC_API_KEY: "REPLACE_WITH_REAL_KEY"
|
||||||
|
LITELLM_API_KEY: "REPLACE_WITH_REAL_KEY"
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Edit /tmp/supervisor-secrets.yaml to insert real values, then:
|
||||||
|
sops --encrypt /tmp/supervisor-secrets.yaml > apps/supervisor/secrets.enc.yaml
|
||||||
|
rm /tmp/supervisor-secrets.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tail -20 apps/supervisor/secrets.enc.yaml
|
||||||
|
# Should show encrypted values and sops metadata
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit encrypted secrets**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /tmp/infra-supervisor
|
||||||
|
git add apps/imagepullsecret/ apps/supervisor/secrets.enc.yaml .sops.yaml
|
||||||
|
git commit -m "feat: add SOPS-encrypted imagePullSecret and supervisor app secrets"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 6: Verify Flux reconciles and creates the secrets**
|
||||||
|
|
||||||
|
Wait ~60s then:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
flux reconcile kustomization flux-system --with-source
|
||||||
|
kubectl get secrets -n supervisor
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `gitea-registry` and `supervisor-secrets` appear in the `supervisor` namespace.
|
||||||
|
|
||||||
|
- [ ] **Step 7: Clean up temp clone**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf /tmp/infra-supervisor
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 7: Gitea org-level secrets [gitea-ui + koala-ssh]
|
||||||
|
|
||||||
|
Set the three secrets that all repos in the `mathias` org will inherit. These go in the Gitea org (not individual repos).
|
||||||
|
|
||||||
|
**Files:** No files — Gitea UI configuration.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Generate SSH deploy key for infra repo**
|
||||||
|
|
||||||
|
On flamingo:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh-keygen -t ed25519 -C "cd-bot infra deploy key" -f /tmp/infra-deploy-key -N ""
|
||||||
|
cat /tmp/infra-deploy-key # private key → INFRA_DEPLOY_KEY secret
|
||||||
|
cat /tmp/infra-deploy-key.pub # public key → add to Gitea infra repo as deploy key
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add public key to infra repo as a deploy key (write access)**
|
||||||
|
|
||||||
|
In Gitea UI: `https://gitea.d-ma.be/mathias/infra` → Settings → Deploy Keys → Add Deploy Key.
|
||||||
|
- Title: `cd-bot`
|
||||||
|
- Key: paste content of `/tmp/infra-deploy-key.pub`
|
||||||
|
- Enable write access: ✓
|
||||||
|
|
||||||
|
- [ ] **Step 3: Set org-level secrets in Gitea**
|
||||||
|
|
||||||
|
In Gitea UI: `https://gitea.d-ma.be/org/mathias/settings/secrets` → Add Secret.
|
||||||
|
|
||||||
|
Set these three secrets:
|
||||||
|
|
||||||
|
| Secret name | Value |
|
||||||
|
|-------------|-------|
|
||||||
|
| `INFRA_DEPLOY_KEY` | content of `/tmp/infra-deploy-key` (private key, including `-----BEGIN...` lines) |
|
||||||
|
| `BUILDKIT_REGISTRY_AUTH` | same base64 auth string as used in Task 3 Step 2 (format: `mathias:<token>` base64-encoded) |
|
||||||
|
|
||||||
|
Note: `BUILDKIT_REGISTRY_AUTH` is redundant if `/root/.docker/config.json` is already on the runner host from Task 3 — but setting it as a secret allows the `cd.yml` to explicitly pass it to `buildctl` for clarity and rotation.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Clean up temp key files**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
shred -u /tmp/infra-deploy-key /tmp/infra-deploy-key.pub
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Verify secrets appear in Gitea**
|
||||||
|
|
||||||
|
In Gitea UI: `https://gitea.d-ma.be/org/mathias/settings/secrets` — confirm both secrets are listed (values are hidden, only names shown).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 8: cd.yml workflow [this-repo]
|
||||||
|
|
||||||
|
Create the CD workflow that triggers after CI passes, builds the image with buildctl, and commits the updated tag to the infra repo.
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Create: `.gitea/workflows/cd.yml`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Create cd.yml**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cat > .gitea/workflows/cd.yml << 'EOF'
|
||||||
|
name: cd
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
deploy:
|
||||||
|
name: Build and deploy
|
||||||
|
needs: [check] # 'check' is the job name in ci.yml
|
||||||
|
runs-on: self-hosted
|
||||||
|
env:
|
||||||
|
SERVICE: supervisor
|
||||||
|
REGISTRY: gitea.d-ma.be
|
||||||
|
IMAGE: gitea.d-ma.be/mathias/supervisor
|
||||||
|
INFRA_REPO: git@gitea.d-ma.be:mathias/infra.git
|
||||||
|
BUILDKIT_HOST: unix:///run/buildkit/buildkitd.sock
|
||||||
|
steps:
|
||||||
|
- name: Checkout
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Build and push image
|
||||||
|
run: |
|
||||||
|
IMAGE_TAG="${{ github.sha }}"
|
||||||
|
echo "Building ${IMAGE}:${IMAGE_TAG}"
|
||||||
|
buildctl --addr "${BUILDKIT_HOST}" build \
|
||||||
|
--frontend dockerfile.v0 \
|
||||||
|
--local context=. \
|
||||||
|
--local dockerfile=. \
|
||||||
|
--opt build-arg:VERSION="${IMAGE_TAG}" \
|
||||||
|
--output "type=image,name=${IMAGE}:${IMAGE_TAG},push=true"
|
||||||
|
echo "IMAGE_TAG=${IMAGE_TAG}" >> $GITHUB_OUTPUT
|
||||||
|
id: build
|
||||||
|
|
||||||
|
- name: Update infra repo
|
||||||
|
run: |
|
||||||
|
IMAGE_TAG="${{ github.sha }}"
|
||||||
|
# Write SSH key for infra repo
|
||||||
|
mkdir -p ~/.ssh
|
||||||
|
echo "${{ secrets.INFRA_DEPLOY_KEY }}" > ~/.ssh/infra_deploy_key
|
||||||
|
chmod 600 ~/.ssh/infra_deploy_key
|
||||||
|
ssh-keyscan gitea.d-ma.be >> ~/.ssh/known_hosts 2>/dev/null
|
||||||
|
|
||||||
|
# Clone infra repo
|
||||||
|
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||||
|
git clone "${INFRA_REPO}" /tmp/infra-update
|
||||||
|
|
||||||
|
# Patch the image tag
|
||||||
|
cd /tmp/infra-update
|
||||||
|
sed -i "s|gitea.d-ma.be/mathias/supervisor:.*|gitea.d-ma.be/mathias/supervisor:${IMAGE_TAG}|" \
|
||||||
|
"apps/${SERVICE}/deployment.yaml"
|
||||||
|
|
||||||
|
# Commit and push
|
||||||
|
git config user.email "cd-bot@d-ma.be"
|
||||||
|
git config user.name "CD Bot"
|
||||||
|
git add "apps/${SERVICE}/deployment.yaml"
|
||||||
|
git commit -m "chore(deploy): ${SERVICE} → ${IMAGE_TAG}"
|
||||||
|
GIT_SSH_COMMAND="ssh -i ~/.ssh/infra_deploy_key -o IdentitiesOnly=yes" \
|
||||||
|
git push
|
||||||
|
|
||||||
|
# Clean up
|
||||||
|
rm -rf /tmp/infra-update
|
||||||
|
rm ~/.ssh/infra_deploy_key
|
||||||
|
echo "Infra repo updated: ${SERVICE} → ${IMAGE_TAG}"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Verify the `needs` job name matches ci.yml**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep "^ [a-z].*:$" .gitea/workflows/ci.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
The output should show `check:` as the quality-gate job name. The `cd.yml` uses `needs: [check]` — confirm this matches.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add .gitea/workflows/cd.yml
|
||||||
|
git commit -m "feat: add CD workflow (buildctl → Gitea registry → infra repo update)"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 9: End-to-end smoke test
|
||||||
|
|
||||||
|
Trigger the full pipeline and verify each stage.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Push to main to trigger CI + CD**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git push origin main
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Monitor CI job in Gitea**
|
||||||
|
|
||||||
|
Open `https://gitea.d-ma.be/mathias/supervisor/actions` — wait for the `ci` workflow `check` job to pass.
|
||||||
|
|
||||||
|
- [ ] **Step 3: Monitor CD job**
|
||||||
|
|
||||||
|
In the same actions view, the `cd` workflow should start after `ci` passes. Check the `Build and push image` step output for:
|
||||||
|
|
||||||
|
```
|
||||||
|
Building gitea.d-ma.be/mathias/supervisor:<sha>
|
||||||
|
```
|
||||||
|
|
||||||
|
And the `Update infra repo` step for:
|
||||||
|
|
||||||
|
```
|
||||||
|
Infra repo updated: supervisor → <sha>
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Verify image in Gitea registry**
|
||||||
|
|
||||||
|
```
|
||||||
|
https://gitea.d-ma.be/mathias/supervisor/packages
|
||||||
|
```
|
||||||
|
|
||||||
|
Should show a new tag matching the commit sha.
|
||||||
|
|
||||||
|
- [ ] **Step 5: Verify infra repo commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone git@gitea.d-ma.be:mathias/infra.git /tmp/infra-verify
|
||||||
|
cd /tmp/infra-verify
|
||||||
|
git log --oneline -3
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: most recent commit message is `chore(deploy): supervisor → <sha>`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep "image:" apps/supervisor/deployment.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `image: gitea.d-ma.be/mathias/supervisor:<sha>`
|
||||||
|
|
||||||
|
- [ ] **Step 6: Verify Flux reconciles**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
flux get kustomizations
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `flux-system` shows `Ready True` and `Applied revision: main/<infra-sha>`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n supervisor
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: supervisor pod is `Running` with the new image sha.
|
||||||
|
|
||||||
|
- [ ] **Step 7: Verify pod started correctly**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl logs -n supervisor deployment/supervisor --tail=20
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: supervisor startup logs (MCP server listening on port 3200, no errors).
|
||||||
|
|
||||||
|
- [ ] **Step 8: Clean up verify clone**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf /tmp/infra-verify
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 10: Post-deploy — registry retention policy [gitea-ui]
|
||||||
|
|
||||||
|
Prevent the Gitea container registry from filling up by setting a tag retention policy.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Set tag retention in Gitea**
|
||||||
|
|
||||||
|
In Gitea UI: `https://gitea.d-ma.be/mathias/supervisor` → Settings → Packages → Container Registry.
|
||||||
|
|
||||||
|
Set: Keep last **20** tags per image name.
|
||||||
|
|
||||||
|
If Gitea does not expose a UI retention policy, note this for manual cleanup and open a task to automate it (e.g., a weekly Actions job that calls `docker image prune` via the Gitea API).
|
||||||
|
|
||||||
|
- [ ] **Step 2: Verify existing test tags are cleaned up**
|
||||||
|
|
||||||
|
Manually delete any test tags pushed during Task 3 if not already done.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-review checklist (for plan author — not a task)
|
||||||
|
|
||||||
|
- [x] **Spec coverage:** BuildKit systemd ✓, cd.yml ✓, Flux SOPS ✓, infra repo structure ✓, imagePullSecret ✓, app secrets ✓, Gitea org secrets ✓, error handling (implicit in workflow failures) ✓, registry retention ✓, smoke test ✓
|
||||||
|
- [x] **Placeholders:** `REPLACE_WITH_YOUR_PUBLIC_KEY` and `REPLACE_WITH_REAL_KEY` are intentional — real values come from user's secrets; marked clearly
|
||||||
|
- [x] **Type consistency:** No shared types across tasks (infra-only plan)
|
||||||
|
- [x] **Known gaps:** `needs: [check]` assumes ci.yml job name is `check` — verified in Task 8 Step 2. The `sed` image tag patch assumes no other image line in deployment.yaml — the deployment template only has one `image:` line.
|
||||||
Reference in New Issue
Block a user