OCI-First GitOps Promotion with Flux, Kargo, and Renovate
OCI-First GitOps Promotion with Flux, Kargo, and Renovate
This article describes a promotion architecture where the deployment definition lives in Git but is never deployed from Git. Instead, every merge builds a signed OCI artifact containing the deployment manifests, and that artifact — immutable, versioned, digest-pinned — is what promotes through dev → staging → prod and what Flux actually deploys.
Three tools, one responsibility each:
- Renovate updates the deployment definition in Git — new application image versions, third-party chart bumps, dependency updates. Its merged PRs are how change enters the pipeline.
- Kargo promotes the resulting OCI artifact across environments, running automated tests at each gate.
- Flux deploys the artifact into each cluster, verifying its signature before reconciling.
No human touches the routine path. Humans appear only at the optional prod approval gate and in emergencies.
Why deploy from OCI instead of Git?
The conventional GitOps setup points Flux directly at the Git repository holding the manifests. It works, but it has structural weaknesses that show up at promotion time:
Git branches and directories are mutable. “What exactly is running in staging” is answered by a commit SHA plus a directory path plus whatever Kustomize resolves at that commit — reconstructible, but not a single immutable unit. An OCI artifact is one digest. What’s running is sha256:abc..., full stop.
Promotion as Git diffs accumulates noise. Promoting by editing image tags inside YAML produces thousands of bot commits interleaved with human changes. Promoting by bumping one ref.tag pointer per environment keeps the audit trail readable: each promotion is exactly one one-line commit.
Signing Git content is awkward; signing OCI is solved. Cosign keyless signing, registry-side storage of signatures, and Flux’s native OCIRepository.verify give you an end-to-end verification chain with the same tooling already used for container images. Config and code ride the same security infrastructure.
Rollback becomes deterministic. “Point staging back at artifact 0.5.118” is one field change to a known-good immutable unit. No git revert archaeology across interleaved commits.
The cost: a CI step between merge and deployability, and a registry as a runtime dependency for config (it already is one for images). For most teams that trade is clearly favorable.
The architecture
flowchart TB
subgraph appRepo["app-myapp (Git)"]
src["source code<br/>Dockerfile"]
end
img["registry/myapp:1.4.3<br/>(signed image)"]
subgraph depRepo["deployment repo (Git)"]
apps["apps/myapp/<br/>base/ + overlays/{dev,staging,prod}"]
clusters["clusters/{dev,staging,prod}"]
plat["kargo/ flux-system/"]
end
renovate["Renovate PRs"]
kargoCommits["Kargo commits"]
artifact["registry/deployments/myapp:0.5.123<br/>(signed config artifact,<br/>image tag baked in)"]
subgraph kargo["Kargo Warehouse → Freight"]
stages["Stages: dev → staging → prod<br/>promotion = bump ref.tag in clusters/<env>/ (Git commit)<br/>verification = AnalysisTemplates"]
end
flux["Flux per cluster: pull OCI artifact,<br/>verify cosign signature, reconcile"]
admission["Admission controller re-verifies the<br/>workload image signature at pod start"]
src -->|"CI: build, sign"| img
renovate --> apps
kargoCommits --> clusters
depRepo -->|"CI on merge of apps/**: build, sign"| artifact
img -.->|"referenced inside"| artifact
artifact --> kargo
kargo --> flux
flux --> admissionThe key insight: Git holds two different things with two different lifecycles.
- The deployment definition (
apps/) — the Kustomize source. Humans and Renovate edit it; every merge produces a new artifact. This content is never consumed by Flux directly. - The environment pointers (
clusters/) — tiny manifests saying “dev runs artifact 0.5.123, prod runs 0.5.118.” Only Kargo writes here (plus humans in emergencies). Flux does consume this from Git — it’s the bootstrap layer that tells each cluster which artifact to pull.
Renovate and Kargo never write to the same path. Renovate owns apps/ (and app repos); Kargo owns clusters/.
Repository and registry layout
One deployment repository:
deployment/
├── apps/
│ └── myapp/
│ ├── base/
│ │ ├── deployment.yaml # image tag here ← Renovate bumps this
│ │ ├── service.yaml
│ │ ├── networkpolicy.yaml
│ │ └── kustomization.yaml
│ └── overlays/
│ ├── dev/ # env-specific: replicas, hostnames...
│ ├── staging/
│ └── prod/
├── clusters/
│ ├── dev/
│ │ └── myapp.yaml # OCIRepository ref.tag ← Kargo bumps this
│ ├── staging/
│ └── prod/
├── kargo/
│ ├── project.yaml
│ ├── warehouse.yaml
│ └── stages.yaml
├── flux-system/ # bootstrap per cluster
├── .gitea/workflows/
│ └── build-deployment-artifact.yml
└── renovate.json
Registry layout:
registry.internal/
├── myapp:1.4.3 # application images (from app repos)
├── deployments/myapp:0.5.123 # deployment artifacts (from this repo)
└── ...
App repositories stay as they are: source code, Dockerfile, CI that builds, signs, and pushes the application image. They need no knowledge of the deployment repo.
Access boundaries. Developers write to app repos only. Renovate writes to apps/ (via PRs). Kargo writes to clusters/ (direct commit or PR). The platform team owns kargo/, flux-system/, and clusters/ structurally. CODEOWNERS enforces this:
* @org/platform-team
/apps/myapp/ @org/myapp-team
/clusters/ @org/platform-team
/kargo/ @org/platform-team
/flux-system/ @org/platform-team
Branch protection on main: required CODEOWNERS review, signed commits, required status checks (manifest lint, kustomize build dry-run), no force-push. The Renovate and Kargo bot identities are exempted from review on exactly the paths they own — their commits still pass status checks.
Step 1: Application images (app repo CI)
Standard build-sign-push. The only conventions that matter downstream:
- Semver tags, immutable.
myapp:1.4.3, never re-pushed. Include a-rc.or-dev.pre-release suffix if you build from non-release branches and want Renovate to ignore them. - Keyless cosign signing bound to the workflow’s OIDC identity:
| |
- Registry push rights are CI-only. No human PATs can push tags. Each repo’s CI uses OIDC federation to the registry where supported; otherwise a scoped robot account per repo.
Step 2: Renovate brings the new version into the deployment definition
Renovate watches the registry and the deployment repo. When myapp:1.4.3 appears, it opens a PR bumping the tag in apps/myapp/base/deployment.yaml. This is the entry gate: a new application version exists in the world, but it enters the pipeline only when this PR merges.
renovate.json in the deployment repo:
| |
Renovate also keeps doing its usual inbound work — base images in app-repo Dockerfiles, language dependencies, CI action versions, third-party Helm charts. All of that flows through the same gate: PR → merge → new deployment artifact → promotion pipeline. A cert-manager chart bump gets exactly the same dev → staging → prod gating as your own code, because everything that reaches a cluster is an artifact that went through the stages.
The Renovate bot uses a dedicated identity with a token scoped to the repos it manages, sourced from a secret store, rotated on schedule. Never an org-admin token.
Step 3: CI builds the signed deployment artifact
On every merge to main touching apps/**, CI packages the deployment definition into an OCI artifact:
| |
Properties of the resulting artifact:
- It contains the full Kustomize tree —
base/plus all three overlays. Each environment’s Flux points at its overlay path inside the artifact. One artifact serves all environments; what differs per environment is which version of it they run. - The application image tag is baked in. The artifact at
0.5.123referencesmyapp:1.4.3immutably. There is no separate “image promotion” — promoting the artifact promotes the image. - It’s signed with the deployment-build workflow’s identity — distinct from the app-image signing identity. Both signatures are verified downstream.
- The Git commit is recorded in the artifact annotations (
--revision), so any running artifact traces back to its exact source commit.
For multiple services, run one such artifact per service (deployments/myapp, deployments/api, …) with path-filtered triggers, or one combined artifact if your services deploy as a unit.
Step 4: Flux consumes the artifact, per environment
Each cluster’s pointer manifest in clusters/<env>/:
| |
Notes:
verifyis mandatory, not optional. Flux refuses to reconcile an artifact not signed by the deployment-build workflow identity. A tampered or repushed artifact never reaches the cluster.wait: trueplus health checks make the Kustomization’sReadycondition mean “deployed and healthy,” which Kargo’s verification depends on.- Pin by
tag, notsemverrange. The whole point is that promotion is an explicit, gated act. A semver range would auto-deploy every new artifact, bypassing the stages. - Flux itself is installed minimally:
source-controller,kustomize-controller,helm-controller,notification-controller. The image-automation controllers are not installed — nothing in this architecture uses them.
The clusters/ directory itself is synced by Flux from Git (the classic bootstrap GitRepository + Kustomization per cluster). Optionally enable GitRepository.spec.verify so Flux also checks commit signatures on the pointer plane — then even Kargo’s commits must be signed by a trusted key.
Step 5: Kargo promotes the artifact through environments
Kargo runs in a dedicated management cluster — not in any workload cluster. Its Git credentials (write access to clusters/) are the highest-value secret in the stack: scope them to that path’s repo, source them from a secret store, rotate them, and audit the bot’s commit history.
Warehouse: watch the deployment artifact
| |
One subscription. Because the application image is baked into the artifact, the Freight is the single deployment artifact digest — the indivisible promotable unit. (If several services must move atomically, add their deployment artifacts as further subscriptions; Kargo then produces Freight only when all have new versions, and the bundle promotes as one.)
Stages: dev auto, staging gated, prod PR-gated
| |
Auto-promotion policy:
| |
With prod: true, the pipeline is fully hands-off: staging tests green → Kargo opens the prod PR → the Git platform auto-merges on green status checks → Flux reconciles prod. With prod: false, Freight becomes eligible when staging verifies, and a human triggers kargo promote --stage prod --freight <name> (or applies a Promotion YAML — everything in Kargo is CRDs, so the approval itself can be a reviewed Git commit). Either way the PR step keeps prod changes inside branch protection, CODEOWNERS, and the normal audit trail.
Verification: the tests at each gate
Tests live in AnalysisTemplate resources. The workhorse pattern is a Job running the suite against the just-deployed environment:
| |
All listed templates must pass before Freight is marked verified at a stage — multiple entries are AND-gated. Two other useful providers:
- Prometheus soak gates —
count: 30, interval: 1msampling an error-rate query gives you “healthy for 30 minutes” before prod eligibility, withfailureConditionto fail fast. - Existing CI suites — a thin Job that dispatches a Gitea Actions workflow and polls for its conclusion lets test suites stay in CI while Kargo remains the source of truth for verification state.
Verification Jobs get their credentials from environment-specific External Secrets scoped to the verification namespace — never the application’s runtime secrets.
What changes per environment — and what doesn’t
The artifact contains all overlays; the environment pointer selects one. So:
- Shared config (base manifests, NetworkPolicies, sidecars, probes) lives in
base/, is part of the artifact, and promotes — a NetworkPolicy edit produces a new artifact version that rides the same dev → staging → prod gates as a code change. No config shortcut to prod. - Environment-specific config (replicas, hostnames, resource limits, log levels) lives in
overlays/<env>/, is also part of the artifact, but only takes effect in its environment. Note the subtlety: editing the prod overlay still produces a new artifact that must travel through dev and staging first. Dev and staging won’t exercise the prod overlay’s content, but the artifact version carrying it gets verified anyway. This is a feature (prod overlay changes can’t skip the pipeline), at the cost of promotion latency for prod-only tweaks. - Secrets are never in the artifact. The artifact carries
ExternalSecretreferences; each environment’s operator resolves them against its own vault path. Values never touch Git or the registry.
The security chain, end to end
Four independent verification layers, each failing closed:
| # | Layer | Mechanism | Defeats |
|---|---|---|---|
| 1 | App image signature | cosign keyless at build CI | tampered/injected application images |
| 2 | Deployment artifact signature | cosign keyless at deployment CI; Flux OCIRepository.verify | tampered manifests, repushed artifacts, rogue registry writes |
| 3 | Pointer commit signature (optional) | signed Kargo/Renovate commits; Flux GitRepository.verify on clusters/ | stolen Git credentials without the signing key |
| 4 | Admission control | Sigstore policy-controller / Kyverno verifying layer-1 signatures at pod start | anything that bypassed layers 1–3, including hand-edited pointers |
An attacker must compromise the CI OIDC identities (1, 2), the signing keys (3), and the admission policy (4) to land an unauthorized workload in prod. Each layer is independently operated and independently auditable.
Registry hardening completes the picture: authenticated pulls, CI-only pushes via per-workflow identities, immutable tags enforced where the registry supports it.
Emergency procedures
The routine path needs no humans; these are for when it does.
Pause promotion — stop Kargo from moving anything further:
| |
Roll back an environment — point it at the last good artifact. Either through Kargo (preferred, keeps Freight state consistent):
| |
or by hand: edit spec.ref.tag in clusters/prod/myapp.yaml to the known-good version, commit (signed, through the PR process), and pause the stage so Kargo doesn’t immediately re-promote the bad Freight. Mark the bad Freight failed so it never becomes eligible again:
| |
Freeze everything — suspend Flux reconciliation while you think:
| |
Existing workloads keep running; nothing new is applied.
Break-glass direct fix — if you must kubectl a hotfix: suspend the Kustomization first (or Flux will revert you within minutes), apply the fix, open a tracking issue, and reconcile Git/artifact state with the cluster within the same incident. Drift between the cluster and its declared artifact is the most common source of GitOps post-mortems. Note that admission control still applies — a hotfix image must carry a valid layer-1 signature, which is exactly the safety net working as intended.
The self-upgrade caveat. Kargo cannot promote its own upgrade (circular), and a broken Flux can’t fix Flux. Platform tooling (Flux, Kargo, Renovate) is updated by the platform team via reviewed PRs with the 7-day soak from the Renovate config above — manually, one environment at a time, outside the artifact pipeline.
Need to know
- Prod-only overlay tweaks take the long road. A replica-count change for prod still builds an artifact and walks through dev and staging verification before the prod pointer moves. Deliberate, but it adds latency to trivial changes.
- The registry is now availability-critical for deploys. Flux can’t reconcile a new artifact if the registry is down (running workloads are unaffected). Treat the registry like the tier-0 service it already was for images.
- Kargo-with-Flux lacks the direct-sync integration Kargo has with Argo CD. Promotion latency includes Flux’s pull interval (mitigate with short
intervalonOCIRepository, or notification-controller webhooks). - Stateful workloads still need expand-contract discipline. The artifact promotes manifests; nobody rolls back a schema. Migrations run as Jobs ordered via
dependsOnor Helm hooks, in backwards-compatible phases, tested against realistic data in staging. - Cluster-scoped cross-cutting changes (CRD installs, Pod Security Standard bumps) don’t fit a per-service artifact and are promoted manually, one environment at a time.
None of these are dealbreakers; all of them are things to know before you commit.
Summary
Change enters through Renovate: a new app image, a chart bump, a dependency update — each becomes a reviewed (or auto-merged) PR against the deployment definition in Git. Every merge produces a signed, immutable OCI artifact containing the full deployment tree with the image version baked in. Kargo treats that artifact as Freight and walks it through dev → staging → prod, bumping one ref.tag pointer per environment and running the test suites as verification at each gate, with prod going through a PR. Flux pulls the artifact per environment, verifies its signature before reconciling, and admission control re-verifies the application image at pod start.
Git remains the auditable control plane — but what runs in your clusters is never “a directory at a commit.” It’s a digest that was signed twice, tested at every stage, and promoted on purpose.