diff options
| author | Paul Buetow <paul@buetow.org> | 2026-01-15 22:07:42 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-01-15 22:07:42 +0200 |
| commit | 4da8aedecb39664fa38697767ef91ed8a3adffad (patch) | |
| tree | 1e3f8c016625fd68d941669d4448a56863ddaf50 /f3s | |
| parent | 0c451d4dc6509b71443645d97bf91dd9cd2e2773 (diff) | |
cleanup
Diffstat (limited to 'f3s')
| -rw-r--r-- | f3s/tracing-demo/ARGO-ROLLOUTS-SUMMARY.md | 248 | ||||
| -rw-r--r-- | f3s/tracing-demo/DEMO-SCRIPTS.md | 292 | ||||
| -rw-r--r-- | f3s/tracing-demo/README-ROLLOUTS.md | 226 | ||||
| -rw-r--r-- | f3s/tracing-demo/ROLLOUTS-CHECKLIST.md | 222 | ||||
| -rw-r--r-- | f3s/tracing-demo/ROLLOUTS-DEMO.md | 444 | ||||
| -rw-r--r-- | f3s/tracing-demo/ROLLOUTS-FILE-TREE.txt | 183 | ||||
| -rw-r--r-- | f3s/tracing-demo/ROLLOUTS-SETUP.md | 373 |
7 files changed, 0 insertions, 1988 deletions
diff --git a/f3s/tracing-demo/ARGO-ROLLOUTS-SUMMARY.md b/f3s/tracing-demo/ARGO-ROLLOUTS-SUMMARY.md deleted file mode 100644 index 80adc23..0000000 --- a/f3s/tracing-demo/ARGO-ROLLOUTS-SUMMARY.md +++ /dev/null @@ -1,248 +0,0 @@ -# Argo Rollouts Implementation Summary - -## What Was Created - -### 1. Argo Rollouts Controller Installation -**Location**: `/home/paul/git/conf/f3s/argo-rollouts/` - -Files: -- `Justfile` - Installation automation -- `values.yaml` - Helm configuration -- `README.md` - Installation guide - -Deployment: -```bash -cd /home/paul/git/conf/f3s/argo-rollouts -just install -``` - -Also registered in ArgoCD: `/home/paul/git/conf/f3s/argocd-apps/cicd/argo-rollouts.yaml` - -### 2. Frontend Rollout Manifest -**Location**: `/home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml` - -**Replaces**: `frontend-deployment.yaml` (kept for reference) - -**Strategy**: Canary with 1-minute observation window -``` -Step 1: 33% traffic to new version (1 new pod, 3 old pods) -Step 2: Pause 1 minute (observation period) -Step 3: 100% traffic to new version (auto-promote) -``` - -**Why Frontend?** -- Has 2 replicas (good for canary demo) -- User-facing (can observe behavior easily) -- Generates traces (can monitor impact) -- Non-critical for cluster health - -### 3. Demo Documentation - -**`/home/paul/git/conf/f3s/tracing-demo/ROLLOUTS-DEMO.md`** -- Comprehensive walkthrough -- Real-time monitoring commands -- Troubleshooting guide -- Advanced patterns - -**`/home/paul/git/conf/f3s/ROLLOUTS-SETUP.md`** -- Quick setup instructions -- 5 demo scenarios (basic, manual, abort, prometheus, gitops) -- Expected output and timings -- Monitoring dashboard examples - -**`/home/paul/git/conf/f3s/tracing-demo/rollout-demo.sh`** -- Automated demo starter script -- Checks prerequisites -- Provides instructions - -### 4. Enhanced Justfile Commands -**Location**: `/home/paul/git/conf/f3s/tracing-demo/Justfile` - -New commands: -```bash -just rollout-watch # Watch progress in real-time -just rollout-status # Check current status -just rollout-info # Detailed information -just rollout-promote # Skip waiting, promote to 100% -just rollout-abort # Abort current rollout -just rollout-history # View past rollouts -just rollout-demo # Start demo script -``` - -### 5. Updated ArgoCD Application -**Location**: `/home/paul/git/conf/f3s/argocd-apps/services/tracing-demo.yaml` - -Added sync option: `RespectIgnoreDifferences=true` to gracefully handle migration from Deployment to Rollout. - -## Architecture - -``` -┌─────────────────────────────────────────┐ -│ Kubernetes Cluster │ -├─────────────────────────────────────────┤ -│ │ -│ ┌──────────────────┐ │ -│ │ ArgoCD (cicd) │ │ -│ └────────┬─────────┘ │ -│ │ │ -│ └──→ Git Repository │ -│ (conf.git) │ -│ │ -│ ┌──────────────────────────────────┐ │ -│ │ Argo Rollouts Controller (cicd) │ │ -│ │ - Manages Rollout resources │ │ -│ │ - Orchestrates canary │ │ -│ │ - Monitors replica sets │ │ -│ └──────────────────────────────────┘ │ -│ ▲ │ -│ │ watches │ -│ │ │ -│ ┌────────────────────────────────────┐ │ -│ │ tracing-demo-frontend Rollout │ │ -│ │ ┌──────────────┐ ┌──────────────┐│ │ -│ │ │ Stable RS │ │ Canary RS ││ │ -│ │ │ 3 replicas │ │ 1 replica ││ │ -│ │ └──────────────┘ └──────────────┘│ │ -│ │ │ │ -│ │ Endpoints: frontend-service │ │ -│ │ - Selects both RS (proportional) │ │ -│ │ - Routes traffic to 67%/33% │ │ -│ └────────────────────────────────────┘ │ -│ │ -│ ┌──────────────────┐ │ -│ │ Middleware │ ┌──────────────┐│ -│ │ Backend │ │ Deployment ││ -│ │ (unchanged) │ │ (unchanged) ││ -│ └──────────────────┘ └──────────────┘│ -│ │ -└─────────────────────────────────────────┘ - Monitoring (Prometheus/Grafana) -``` - -## Key Differences: Deployment vs Rollout - -| Aspect | Deployment | Rollout | -|--------|------------|---------| -| **Update Strategy** | RollingUpdate (all or nothing) | Canary, Blue-Green, A/B | -| **Traffic Split** | No built-in support | Native pod-level splitting | -| **Pause/Resume** | No | Yes (at canary steps) | -| **Automatic Rollback** | No (manual `rollout undo`) | Yes (if health checks fail) | -| **Visibility** | kubectl rollout status | kubectl argo rollouts get --watch | -| **Observability** | Basic pod counts | Detailed step information | - -## How It Works - -### Normal Deployment (Traditional) -``` -kubectl apply → All pods immediately scale up/down -Old pods: 2 → 0 -New pods: 0 → 2 -Users affected: ~5 seconds of traffic loss risk -``` - -### Canary Rollout (New) -``` -Git commit → ArgoCD detects → Argo Rollouts orchestrates - -Step 1 (50% traffic): - Stable: 2 pods → 1 pod (old version) - Canary: 0 pods → 1 pod (new version) - Users see: 50% old, 50% new for 0-2 minutes - -Step 2 (Pause): - Stable: 1 pod (old) - Canary: 1 pod (new) - Observe metrics, logs, error rates for 2 minutes - -Step 3 (100% traffic): - Stable: 1 → 0 pods (old version terminated) - Canary: 1 → 2 pods (new version scales up) - Users see: 100% new version - - Complete: Canary promoted to stable -``` - -## Demo Quick Start - -### 1. Install Everything -```bash -cd /home/paul/git/conf/f3s -# Sync with ArgoCD (auto or manual) -argocd app sync argo-rollouts -argocd app sync tracing-demo -``` - -### 2. Verify Setup -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-status -# Should show: Rollout is healthy -``` - -### 3. Run Demo -```bash -# Terminal 1: Watch rollout -just rollout-watch - -# Terminal 2: Trigger rollout (modify git or patch) -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"registry.lan.buetow.org:30001/tracing-demo-frontend:latest"}]' -``` - -### 4. Observe -- See canary step progress in Terminal 1 -- Optional: `just load-test` to generate traffic during rollout -- After ~4 minutes: Rollout complete, 100% traffic to new version - -## Files Summary - -| Path | Purpose | -|------|---------| -| `argo-rollouts/Justfile` | Install/upgrade/check Argo Rollouts | -| `argo-rollouts/values.yaml` | Helm configuration for controller | -| `argo-rollouts/README.md` | Installation and basic usage | -| `tracing-demo/helm-chart/templates/frontend-rollout.yaml` | Canary rollout definition | -| `tracing-demo/Justfile` | Added `just rollout-*` commands | -| `tracing-demo/ROLLOUTS-DEMO.md` | Detailed walkthrough | -| `tracing-demo/rollout-demo.sh` | Demo starter script | -| `argocd-apps/cicd/argo-rollouts.yaml` | ArgoCD Application for controller | -| `argocd-apps/services/tracing-demo.yaml` | Updated to work with Rollout | -| `ROLLOUTS-SETUP.md` | Complete setup guide with scenarios | -| `ARGO-ROLLOUTS-SUMMARY.md` | This file | - -## Next Steps - -1. **Install controller**: `cd argo-rollouts && just install` -2. **Wait for ArgoCD sync** or manually sync `argo-rollouts` and `tracing-demo` apps -3. **Verify**: `just rollout-status` shows healthy -4. **Run demo**: `just rollout-watch` + trigger in another terminal -5. **Explore**: Try abort, promote, or different canary durations - -## Important Notes - -- **No service mesh required**: Uses native Kubernetes service-based routing -- **Traffic splitting**: Proportional to pod counts (1 old, 1 new = 50/50) -- **Auto-promotion**: After 2 minutes, canary automatically promotes to 100% -- **Graceful**: ArgoCD correctly handles transition from Deployment → Rollout -- **Reversible**: Can abort and keep old version running - -## Limitations & Future Work - -**Current (Basic Canary)**: -- Simple replica-based traffic splitting -- No header-based routing -- No advanced health checks - -**To Add** (Optional): -- **Istio integration**: For precise % traffic splitting, header-based routing -- **Flagger**: Automated canary analysis with Prometheus thresholds -- **Linkerd**: For distributed tracing and observability -- **Longer observation**: Change `pause: duration: 2m` to `5m` or `10m` - -## Questions? - -See: -- `/home/paul/git/conf/f3s/ROLLOUTS-SETUP.md` - Complete setup & scenarios -- `/home/paul/git/conf/f3s/tracing-demo/ROLLOUTS-DEMO.md` - Detailed walkthrough -- `/home/paul/git/conf/f3s/argo-rollouts/README.md` - Controller-specific info diff --git a/f3s/tracing-demo/DEMO-SCRIPTS.md b/f3s/tracing-demo/DEMO-SCRIPTS.md deleted file mode 100644 index c869a2e..0000000 --- a/f3s/tracing-demo/DEMO-SCRIPTS.md +++ /dev/null @@ -1,292 +0,0 @@ -# Argo Rollouts Demo Scripts - -Automated scripts to demonstrate Argo Rollouts canary deployments. - -## Quick Start - -```bash -cd /home/paul/git/conf/f3s/tracing-demo - -# Interactive menu (easiest) -just demo-menu - -# Or run specific demos -just demo-canary # Full canary rollout (90s) -just demo-abort # Test abort/rollback -just demo-reset # Clean up between demos -``` - -## Scripts - -### demo-canary-rollout.sh - -Full automated canary deployment demo. - -**What it does:** -1. Checks prerequisites (controller, rollout, plugin) -2. Shows current state -3. Triggers rollout by adding env var -4. Monitors progress in real-time (~90 seconds) -5. Shows final state - -**Timeline:** -``` -0-15s: Canary pod launching (Step 0/3, SetWeight 33%) -15-60s: Observing canary (Step 1/3, paused) -60-90s: Auto-promoting (Step 2/3, SetWeight 100%) -~90s: Complete (Status Healthy) -``` - -**Run:** -```bash -./demo-canary-rollout.sh -# or -just demo-canary -``` - -**Expected output:** -``` -=== Checking Prerequisites === -ℹ Cluster: k3s -✓ Argo Rollouts controller running -✓ Rollout tracing-demo-frontend found -✓ kubectl argo rollouts plugin available - -=== Current Rollout State === -Healthy - -=== Triggering Canary Rollout === -✓ Rollout triggered (v=1768504739) - -=== Monitoring Rollout Progress === -[01:30s] Healthy | Step 3/3 | Weight 100% | Replicas: 3 (updated:3 ready:3) - -=== Demo Summary === -✓ Demo complete! -``` - -### demo-abort-rollout.sh - -Demonstrates aborting a rollout mid-canary. - -**What it does:** -1. Triggers a new canary rollout -2. Waits 20 seconds for canary pod to be ready -3. Aborts the rollout -4. Shows that old version continues running - -**Timeline:** -``` -0-5s: Canary pod launching -5-20s: Waiting for ready -20s: Abort issued -~20s+: Canary pods terminated, old pods continue -``` - -**Run:** -```bash -./demo-abort-rollout.sh -# or -just demo-abort -``` - -**Shows:** -- Canary starting normally -- Mid-rollout abort is safe -- Old pods never interrupted -- Zero downtime - -### demo-reset.sh - -Resets rollout to clean state between demos. - -**What it does:** -1. Aborts any in-progress rollout -2. Removes demo env vars -3. Waits for stabilization -4. Returns to clean state - -**Run:** -```bash -./demo-reset.sh -# or -just demo-reset -``` - -Use between demo runs to avoid env var accumulation. - -### demo-menu.sh - -Interactive menu for choosing demos. - -**Features:** -- Select demo scenario -- Check rollout status -- Watch live updates -- Exit cleanly - -**Run:** -```bash -./demo-menu.sh -# or -just demo-menu -``` - -**Options:** -``` -1) Run full canary rollout demo (~90s) -2) Abort rollout demo (~20s) -3) Reset rollout -4) Check status -5) Watch live (real-time) -0) Exit -``` - -## Usage Examples - -### First Time - Full Demo - -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just demo-menu - -# Select option 1: Run full canary rollout demo -# Watch it progress from canary → observe → promote -``` - -### Test Abort Behavior - -```bash -just demo-menu - -# Select option 2: Abort rollout demo -# See canary start, then abort mid-rollout -``` - -### Run Full Sequence - -```bash -# Demo 1: Canary rollout -just demo-canary - -# Clean up -just demo-reset - -# Demo 2: Abort behavior -just demo-abort - -# Clean up -just demo-reset - -# Check final state -just rollout-status -``` - -### Watch Live (No Automation) - -```bash -# Start in one terminal -just demo-menu -# Select 5: Watch live - -# In another terminal, trigger manually -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -## Requirements - -- kubectl configured for f3s cluster -- Argo Rollouts controller installed (`cd argo-rollouts && just install`) -- kubectl argo rollouts plugin installed -- jq (for parsing JSON) - -## Troubleshooting - -### "Argo Rollouts controller not found" - -Install it: -```bash -cd /home/paul/git/conf/f3s/argo-rollouts -just install -``` - -### "Rollout not found" - -Apply the rollout: -```bash -kubectl apply -f helm-chart/templates/frontend-rollout.yaml -``` - -### "Plugin not installed" - -Install it: -```bash -curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64 -sudo install -m 755 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts -``` - -### Rollout stuck / loops - -This shouldn't happen with ArgoCD ignoreDifferences configured. Check: -```bash -kubectl get application tracing-demo -n cicd -o yaml | grep -A 10 ignoreDifferences -``` - -If ArgoCD is reverting patches, disable auto-sync: -```bash -kubectl patch application tracing-demo -n cicd \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/syncPolicy/automated","value":null}]' -``` - -Then re-enable after demo: -```bash -kubectl patch application tracing-demo -n cicd \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/syncPolicy/automated","value":{"prune":true,"selfHeal":true}}]' -``` - -## Advanced - -### Manual Triggers (Without Scripts) - -If you want to trigger rollouts manually: - -```bash -# Trigger with env var (used by scripts) -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' - -# Watch progress -kubectl argo rollouts get rollout tracing-demo-frontend -n services --watch - -# Promote early (skip waiting) -kubectl argo rollouts promote tracing-demo-frontend -n services - -# Abort rollout -kubectl argo rollouts abort tracing-demo-frontend -n services -``` - -### Modify Canary Settings - -Edit `/home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml` to change: -- `duration: 1m` → longer observation time -- `setWeight: 33` → different traffic percentage -- `replicas: 3` → more/fewer pods - -Then commit and apply: -```bash -git add -A && git commit -m "chore: adjust canary" -git push r0 master -kubectl annotate application tracing-demo -n cicd argocd.argoproj.io/refresh=normal --overwrite -``` - -## See Also - -- `ROLLOUTS-DEMO.md` - Technical details -- `ROLLOUTS-SETUP.md` - Setup guide with 5 scenarios -- `README-ROLLOUTS.md` - Quick reference -- `ARGO-ROLLOUTS-SUMMARY.md` - Architecture overview diff --git a/f3s/tracing-demo/README-ROLLOUTS.md b/f3s/tracing-demo/README-ROLLOUTS.md deleted file mode 100644 index b038bf9..0000000 --- a/f3s/tracing-demo/README-ROLLOUTS.md +++ /dev/null @@ -1,226 +0,0 @@ -# Argo Rollouts - Quick Reference - -Progressive delivery (canary deployments) for the f3s cluster. - -## TL;DR - Get Started in 5 Minutes - -```bash -# 1. Install controller -cd /home/paul/git/conf/f3s/argo-rollouts -just install - -# 2. Wait for ArgoCD sync (or force) -argocd app sync argo-rollouts -argocd app sync tracing-demo - -# 3. Verify setup -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-status - -# 4. Run a demo (Terminal 1) -just rollout-watch - -# 5. Trigger in another terminal (Terminal 2) -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' - -# 6. Watch progress in Terminal 1 (~90 seconds total) -``` - -Expected flow: -- 0-15 sec: **33% traffic** to canary (1 new pod, 3 old pods) -- 15-60 sec: **Monitor** (paused, observing canary health) -- 60+ sec: **Auto-promote to 100%** (scales all 3 pods to new version) -- ~90 sec: **Complete** (all 3 pods running new version) - -## Files Created - -### Setup & Installation -- `argo-rollouts/Justfile` - Install/manage controller -- `argo-rollouts/values.yaml` - Helm config -- `argocd-apps/cicd/argo-rollouts.yaml` - ArgoCD app - -### Demo App Configuration -- `tracing-demo/helm-chart/templates/frontend-rollout.yaml` - Canary definition -- `tracing-demo/Justfile` - New `just rollout-*` commands -- `tracing-demo/rollout-demo.sh` - Demo automation script - -### Documentation -- `ARGO-ROLLOUTS-SUMMARY.md` - **START HERE** - Full overview -- `ROLLOUTS-SETUP.md` - **DETAILED GUIDE** - 5 demo scenarios -- `ROLLOUTS-CHECKLIST.md` - **DEPLOYMENT CHECKLIST** - Step-by-step -- `tracing-demo/ROLLOUTS-DEMO.md` - Technical walkthrough -- `README-ROLLOUTS.md` - This file - -## Why Canary Deployments? - -**Old way (Deployment)**: -- 2 old pods → removed -- 2 new pods → created -- ~5 seconds of potential traffic loss -- No way to validate before 100% rollout - -**New way (Rollout with Canary)**: -- 3 old pods → 3 old + 1 new (33% traffic to canary) -- Observe for 1 minute -- If healthy → automatically promote all 3 pods to new version -- If unhealthy → abort, revert to 3 old pods -- Zero downtime, validated before full rollout - -## Common Commands - -```bash -cd /home/paul/git/conf/f3s/tracing-demo - -# Watch rollout progress (real-time) -just rollout-watch - -# Check current status -just rollout-status - -# Detailed info -just rollout-info - -# Abort and rollback (prevents auto-promotion) -just rollout-abort - -# View history -just rollout-history - -# Generate load during rollout -just load-test -``` - -## What Happens During Canary - -### Step 1: 33% Traffic (0-15 seconds) -``` -Frontend Service -├── Stable ReplicaSet (old version): 3 pods → receives 67% traffic -└── Canary ReplicaSet (new version): 1 pod → receives 33% traffic -``` - -Monitor during this phase: -- Error rates -- Response latency -- Logs and traces -- Prometheus metrics - -### Step 2: Pause (15-60 seconds) -``` -Service pauses traffic shift, monitoring canary health: -- Auto-promotion after 1 minute if healthy -- Or abort: kubectl argo rollouts abort ... to stop -``` - -### Step 3: 100% Traffic (60+ seconds) -``` -Frontend Service -├── Stable ReplicaSet (new version): 3 pods → receives 100% traffic -└── Canary ReplicaSet (old version): 0 pods → terminated -``` - -## Architecture - -``` -Git Commit (new image) - ↓ -Git Server (conf.git) - ↓ -ArgoCD detects change - ↓ -Updates Rollout resource - ↓ -Argo Rollouts Controller - ↓ - ├─→ Scales Canary ReplicaSet (1 new pod) - ├─→ Frontend Service routes 33/67 traffic - ├─→ Monitors health/metrics for 1 minute - └─→ Auto-promotes if healthy - ├─→ If healthy: Scale to 3 new, remove old - └─→ If abort: Remove canary, keep 3 old -``` - -## Demo Scenarios - -See `ROLLOUTS-SETUP.md` for complete walkthrough of: - -1. **Basic Canary** - Watch 50% → 100% progression -2. **Manual Promotion** - Skip waiting with `just rollout-promote` -3. **Abort/Rollback** - Fail canary and revert -4. **Prometheus Monitoring** - Track metrics during rollout -5. **GitOps Flow** - Commit code, watch auto-rollout - -## Monitoring - -### Command-line -```bash -# Real-time watch -kubectl argo rollouts get rollout tracing-demo-frontend -n services --watch - -# Check metrics -kubectl top pods -n services -l app=tracing-demo-frontend -``` - -### Grafana -https://grafana.f3s.buetow.org - -1. Explore → Tempo -2. Query: `{ resource.service.name = "frontend" }` -3. See traces from old and new versions - -### Prometheus -```bash -# Port-forward -kubectl port-forward -n monitoring svc/prometheus 9090:9090 -# Open http://localhost:9090 - -# Query pod status -kube_pod_status_phase{namespace="services", pod=~".*frontend.*"} -``` - -## Troubleshooting - -**Controller not running?** -```bash -kubectl get pods -n cicd -l app.kubernetes.io/name=argo-rollouts -kubectl logs -n cicd -l app.kubernetes.io/name=argo-rollouts -``` - -**Rollout stuck?** -```bash -kubectl describe rollout tracing-demo-frontend -n services -kubectl get pods -n services -l app=tracing-demo-frontend -``` - -**Need plugin?** -```bash -curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64 -sudo install -m 755 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts -``` - -## Next Steps - -1. Complete setup using `ROLLOUTS-CHECKLIST.md` -2. Run demo scenarios from `ROLLOUTS-SETUP.md` -3. Share with team -4. Optional: Add Istio for advanced traffic routing -5. Optional: Deploy Flagger for automated analysis -6. Migrate other services to Rollout - -## Key Resources - -| File | Purpose | -|------|---------| -| `ARGO-ROLLOUTS-SUMMARY.md` | Architecture & what was created | -| `ROLLOUTS-SETUP.md` | Complete setup & 5 demo scenarios | -| `ROLLOUTS-CHECKLIST.md` | Step-by-step deployment | -| `tracing-demo/ROLLOUTS-DEMO.md` | Technical details & troubleshooting | -| `argo-rollouts/README.md` | Controller installation guide | - -## Support - -- Argo Rollouts Docs: https://argoproj.github.io/argo-rollouts/ -- Canary Strategy: https://argoproj.github.io/argo-rollouts/features/canary/ -- Kubectl Plugin: https://argoproj.github.io/argo-rollouts/getting-started/#using-kubectl-with-argo-rollouts diff --git a/f3s/tracing-demo/ROLLOUTS-CHECKLIST.md b/f3s/tracing-demo/ROLLOUTS-CHECKLIST.md deleted file mode 100644 index b475f2d..0000000 --- a/f3s/tracing-demo/ROLLOUTS-CHECKLIST.md +++ /dev/null @@ -1,222 +0,0 @@ -# Argo Rollouts Deployment Checklist - -Quick checklist for deploying and testing Argo Rollouts with canary demo. - -## Installation - -- [ ] Read `ARGO-ROLLOUTS-SUMMARY.md` - understand what was created -- [ ] Ensure kubectl access to f3s cluster -- [ ] Ensure ArgoCD is running -- [ ] Navigate to `/home/paul/git/conf/f3s/argo-rollouts` -- [ ] Run `just install` -- [ ] Verify controller: `kubectl get pods -n cicd -l app.kubernetes.io/name=argo-rollouts` -- [ ] Verify CRD: `kubectl get crd | grep rollout` -- [ ] (Optional) Install plugin: - ```bash - curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64 - chmod +x kubectl-argo-rollouts-linux-amd64 - sudo install -m 755 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts - kubectl argo rollouts version - ``` - -## ArgoCD Integration - -- [ ] Push changes to git-server: - ```bash - cd /home/paul/git/conf/f3s - git add -A && git commit -m "feat: add Argo Rollouts" - git push r0 master - ``` -- [ ] Verify ArgoCD app: - ```bash - kubectl get application argo-rollouts -n cicd - argocd app get argo-rollouts - ``` -- [ ] Verify tracing-demo app: - ```bash - kubectl get application tracing-demo -n cicd - argocd app get tracing-demo - ``` - -## Rollout Verification - -- [ ] Check rollout exists: `kubectl get rollout tracing-demo-frontend -n services` -- [ ] Verify status: `kubectl describe rollout tracing-demo-frontend -n services` -- [ ] Expected: `Status: Healthy` with `3/3 replicas` in stable state -- [ ] Check pods: `kubectl get pods -n services -l app=tracing-demo-frontend` -- [ ] All 3 pods should be `Running` - -## Demo: Basic Canary Rollout - -**Expected: 0-15s: canary starting, 15-60s: observing, 60-90s: promoting** - -### Terminal 1: Watch Rollout -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-watch -``` -- [ ] Command runs and connects to cluster -- [ ] Waiting for rollout to start - -### Terminal 2: Trigger Rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` -- [ ] Patch command successful -- [ ] Terminal 1 shows change immediately - -### Terminal 1: Observe Progress -- [ ] See `Step: 0/3, SetWeight: 33` -- [ ] 1 canary pod becoming ready -- [ ] 3 stable pods still running -- [ ] After ~15 sec: canary pod ready -- [ ] After ~60 sec: auto-promotion starts -- [ ] After ~90 sec: all 3 pods running new version -- [ ] Status shows `Healthy` - -## Demo: Abort/Rollback - -**Expected: Stop rollout and keep old version running** - -### Terminal 1: Watch Rollout -```bash -just rollout-watch -``` - -### Terminal 2: Trigger Rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V2","value":"'$(date +%s)'"}}]' -``` - -### Terminal 3: Abort at Canary Step (after 20 seconds) -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-abort -``` -- [ ] Abort command accepted -- [ ] Terminal 1 shows `Status: Aborted` -- [ ] Canary pods terminate -- [ ] Old 3 pods continue running -- [ ] Verify with: `just rollout-status` - -## Demo: Load Testing - -**Expected: Generate traffic while rollout happens** - -### Terminal 1: Watch Rollout -```bash -just rollout-watch -``` - -### Terminal 2: Start Load Test -```bash -just load-test & -``` -- [ ] Requests being sent - -### Terminal 3: Trigger Rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V3","value":"'$(date +%s)'"}}]' -``` -- [ ] Rollout progresses with active traffic -- [ ] Both old and new pods serve requests during canary phase - -## Monitoring - -- [ ] Check status: `kubectl argo rollouts status tracing-demo-frontend -n services` -- [ ] Detailed info: `kubectl argo rollouts describe rollout tracing-demo-frontend -n services` -- [ ] Pod details: `kubectl get pods -n services -l app=tracing-demo-frontend -o wide` -- [ ] View logs: `just logs-frontend` -- [ ] View history: `just rollout-history` - -## Grafana (Optional) - -- [ ] Open Grafana: https://grafana.f3s.buetow.org -- [ ] Navigate to Explore → Tempo datasource -- [ ] Query: `{ resource.service.name = "frontend" }` -- [ ] See traces from old and new versions during rollout - -## Integration with Git (GitOps) - -- [ ] Edit rollout config: - ```bash - nano /home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml - ``` -- [ ] Change any settings (e.g., duration, setWeight) -- [ ] Commit and push: - ```bash - git add -A && git commit -m "chore: adjust canary settings" - git push r0 master - ``` -- [ ] ArgoCD auto-syncs within 3 minutes (or force): - ```bash - kubectl annotate application tracing-demo -n cicd argocd.argoproj.io/refresh=normal --overwrite - ``` -- [ ] New settings take effect on next rollout trigger - -## Post-Demo - -- [ ] Abort any stuck rollouts: `just rollout-abort` -- [ ] Verify stable state: `just rollout-status` shows `Healthy` -- [ ] Review documentation: - - [ ] `ARGO-ROLLOUTS-SUMMARY.md` - architecture - - [ ] `ROLLOUTS-SETUP.md` - detailed scenarios - - [ ] `README-ROLLOUTS.md` - quick reference - - [ ] `tracing-demo/ROLLOUTS-DEMO.md` - technical details - -## Troubleshooting - -### Controller not running -```bash -kubectl get pods -n cicd -l app.kubernetes.io/name=argo-rollouts -kubectl logs -n cicd -l app.kubernetes.io/name=argo-rollouts -``` -- [ ] Pod running and ready - -### Rollout not deployed -```bash -kubectl get rollout tracing-demo-frontend -n services -kubectl describe rollout tracing-demo-frontend -n services -``` -- [ ] Check events section for errors - -### Canary pods in ImagePullBackoff -- [ ] Use env var patch instead (don't change image tag): - ```bash - kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' - ``` - -### Rollout stuck in Progressing -```bash -kubectl describe rollout tracing-demo-frontend -n services -kubectl get pods -n services -l app=tracing-demo-frontend -``` -- [ ] Check pod readiness probes -- [ ] Check pod resource requests/limits -- [ ] Check controller logs - -## Next Steps - -- [ ] Run through all demo scenarios multiple times -- [ ] Modify rollout settings and observe behavior -- [ ] Monitor with Prometheus/Grafana -- [ ] Extend to other services (middleware, backend) -- [ ] Optional: Install Istio for advanced traffic routing -- [ ] Optional: Deploy Flagger for automated analysis - ---- - -**Setup Complete When:** -- ✅ Controller running in `cicd` namespace -- ✅ Rollout deployed in `services` namespace -- ✅ One full demo executed (0-90 seconds) -- ✅ Can abort and retry -- ✅ Team trained on canary deployments diff --git a/f3s/tracing-demo/ROLLOUTS-DEMO.md b/f3s/tracing-demo/ROLLOUTS-DEMO.md deleted file mode 100644 index 775a581..0000000 --- a/f3s/tracing-demo/ROLLOUTS-DEMO.md +++ /dev/null @@ -1,444 +0,0 @@ -# Argo Rollouts Demo - Technical Details - -Detailed technical walkthrough of Argo Rollouts canary strategy for tracing-demo frontend. - -## Quick Demo (90 seconds) - -### Setup - -```bash -# Terminal 1: Watch the rollout -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-watch - -# Terminal 2: Trigger the rollout (after Terminal 1 is watching) -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -### Execution Timeline - -**t=0-15s: Canary Launch** -```bash -# Terminal 1 shows: -Name: tracing-demo-frontend -Status: ◌ Progressing -Strategy: Canary - Step: 0/3 - SetWeight: 33 - ActualWeight: 0 -Images: (new) tracing-demo-frontend (canary) - (old) tracing-demo-frontend (stable) -Replicas: - Desired: 3 - Current: 4 # 3 stable + 1 canary being created - Updated: 1 - Ready: 3 # 3 stable pods ready, canary still starting -``` - -**t=15-60s: Canary Observation** -```bash -# After canary pod becomes ready (~15 seconds) -Status: ◌ Progressing - Step: 1/3 # Now in pause step - SetWeight: 33 - ActualWeight: 33 # Actual weight achieved -Replicas: - Desired: 3 - Current: 4 - Updated: 1 - Ready: 4 # All 4 pods (3 stable + 1 canary) ready - Available: 4 -``` - -Service routes traffic: -- **Old version**: 3 pods → ~67% traffic -- **New version**: 1 pod → ~33% traffic - -**t=60s: Auto-Promotion** -```bash -# After 1 minute pause duration -Status: ◌ Progressing - Step: 2/3 # Now promoting - SetWeight: 100 -Replicas: - Desired: 3 - Current: 4 - Updated: 3 # All 3 new pods created - Ready: 3 # 3 new pods ready - Available: 3 -``` - -Old pods terminate, new pods scale up. - -**t=90s: Complete** -```bash -Status: ✔ Healthy - Step: 3/3 # Complete - SetWeight: 100 - ActualWeight: 100 -Replicas: - Desired: 3 - Current: 3 - Updated: 3 - Ready: 3 - Available: 3 -Images: tracing-demo-frontend (stable, new version) -``` - -## Configuration Details - -Location: `/home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml` - -```yaml -apiVersion: argoproj.io/v1alpha1 -kind: Rollout -metadata: - name: tracing-demo-frontend - namespace: services -spec: - replicas: 3 # Total desired pods - strategy: - canary: - steps: - # Step 1: Send 33% traffic to new version - - setWeight: 33 - # Step 2: Wait 1 minute, then auto-promote - - pause: - duration: 1m - # Step 3: Promote to 100% traffic - - setWeight: 100 - - selector: - matchLabels: - app: tracing-demo-frontend - template: - # Same pod spec as Deployment - metadata: - labels: - app: tracing-demo-frontend - spec: - containers: - - name: frontend - image: registry.lan.buetow.org:30001/tracing-demo-frontend:latest - # ... rest of container spec -``` - -## ReplicaSet Behavior - -### During Canary (Step 1) - -**Stable ReplicaSet (revision 1)** -- Desired: 3 -- Current: 3 -- Ready: 3 -- Label: `app=tracing-demo-frontend` - -**Canary ReplicaSet (revision 2)** -- Desired: 1 -- Current: 1 -- Ready: 1 (after 10-15 seconds) -- Label: `app=tracing-demo-frontend` - -**Service Routing** -```yaml -selector: - app: tracing-demo-frontend # Selects ALL replicas (both RS) -``` - -Traffic split happens at pod replica level: -- 3 stable pods serve ~67% of requests -- 1 canary pod serves ~33% of requests - -### After Promotion (Step 3) - -**Old ReplicaSet (revision 1)** -- Desired: 0 -- Current: 0 -- Terminating... - -**New ReplicaSet (revision 2)** -- Desired: 3 -- Current: 3 -- Ready: 3 -- Label: `app=tracing-demo-frontend` - -Service now routes 100% to new version (3 pods). - -## Monitoring Canary - -### kubectl Commands - -Real-time progress: -```bash -kubectl argo rollouts get rollout tracing-demo-frontend -n services --watch -``` - -Detailed status: -```bash -kubectl argo rollouts describe rollout tracing-demo-frontend -n services -``` - -History: -```bash -kubectl argo rollouts history tracing-demo-frontend -n services -``` - -### Pod Status - -Watch pods during rollout: -```bash -kubectl get pods -n services -l app=tracing-demo-frontend -w -``` - -See which revision: -```bash -kubectl get pods -n services -l app=tracing-demo-frontend -o wide \ - -o custom-columns=NAME:.metadata.name,READY:.status.ready,REVISION:.metadata.labels.controller-revision-hash -``` - -### Logs - -All pods (old and new): -```bash -kubectl logs -n services -l app=tracing-demo-frontend -f -``` - -Just canary pod (during step 1): -```bash -# Find the newest pod -CANARY=$(kubectl get pods -n services -l app=tracing-demo-frontend -o jsonpath='{.items[-1].metadata.name}') -kubectl logs -n services $CANARY -f -``` - -Just stable pods (old version): -```bash -kubectl logs -n services -l app=tracing-demo-frontend,controller-revision-hash=<old-hash> -f -``` - -### Events - -Check rollout events: -```bash -kubectl describe rollout tracing-demo-frontend -n services | grep -A 20 Events: -``` - -Check pod events: -```bash -kubectl describe pod -n services -l app=tracing-demo-frontend -``` - -## Health Checks During Canary - -Container has liveness and readiness probes: - -```yaml -livenessProbe: - httpGet: - path: /health - port: 5000 - initialDelaySeconds: 10 - periodSeconds: 10 - -readinessProbe: - httpGet: - path: /health - port: 5000 - initialDelaySeconds: 5 - periodSeconds: 5 -``` - -Argo Rollouts waits for readinessProbe to succeed before considering a pod "Ready". Only when canary pod is Ready does setWeight 33 take effect. - -If readiness fails, canary pod stays in `0/1 Ready` state indefinitely (until timeout or manual abort). - -## Traffic Flow - -### Without Service Mesh (Current) - -Kubernetes Service-based load balancing (round-robin): - -1. Client sends request -2. `kubectl get endpoints frontend-service` returns: - ``` - NAME ENDPOINTS - frontend-service 10.42.1.100,10.42.1.101,10.42.1.102,10.42.2.1 - ``` -3. Service load-balancer picks a pod randomly - - ~67% hit old pods (3 out of 4) - - ~33% hit new pod (1 out of 4) - -### With Service Mesh (Istio/Linkerd) - -Would enable advanced routing: -- Precise percentage-based splits (50.5%, 49.5%) -- Header-based routing (route by user ID, etc.) -- Gradual step weights (5% → 10% → 25% → 50% → 100%) -- Automatic rollback on error rate thresholds - -## Abort/Rollback - -### Abort Current Rollout - -```bash -kubectl argo rollouts abort tracing-demo-frontend -n services -``` - -Effect: -- Canary ReplicaSet scales to 0 -- Old Stable ReplicaSet remains at 3 pods -- Status: `Degraded` with message "RolloutAborted" -- Next rollout will use the next revision - -### Manual Rollback to Previous Revision - -```bash -kubectl argo rollouts undo tracing-demo-frontend -n services -``` - -Or rollback to specific revision: -```bash -kubectl argo rollouts undo tracing-demo-frontend -n services --to-revision=3 -``` - -## Modifying Rollout Configuration - -### Change Pause Duration - -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/strategy/canary/steps/1/pause/duration","value":"5m"}]' -``` - -### Change Weight - -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/strategy/canary/steps/0/setWeight","value":50}]' -``` - -### Change Replicas - -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - -p='{"spec":{"replicas":5}}' -``` - -All changes take effect on next rollout trigger. - -## Troubleshooting - -### Canary Pod Won't Start - -Check pod events: -```bash -kubectl describe pod -n services -l app=tracing-demo-frontend | tail -20 -``` - -Common issues: -- **ImagePullBackoff**: Image doesn't exist (use env var patch instead) -- **Pending**: No resources available (check node capacity) -- **CrashLoopBackOff**: Application error (check logs) - -### Readiness Probe Failing - -Canary pod stays in `0/1 Ready`: -```bash -kubectl get pods -n services -l app=tracing-demo-frontend -# Shows: ... 0/1 Running ... (waiting for readiness) -``` - -Check probe: -```bash -curl http://CANARY_POD_IP:5000/health -``` - -Should return 200 OK. - -### Rollout Stuck in Progressing - -Check status message: -```bash -kubectl argo rollouts status tracing-demo-frontend -n services -# Output: "Progressing - more replicas need to be updated" -``` - -Issue: Canary pod not becoming ready within timeout. Abort and retry: -```bash -kubectl argo rollouts abort tracing-demo-frontend -n services -``` - -### Auto-Promotion Not Happening - -Check if pause duration expired: -```bash -kubectl argo rollouts get rollout tracing-demo-frontend -n services - -# Look for: Step: 1/3 with pause duration elapsed -``` - -If stuck at step 1, manually promote: -```bash -kubectl argo rollouts promote tracing-demo-frontend -n services -``` - -Or abort and retry: -```bash -kubectl argo rollouts abort tracing-demo-frontend -n services -``` - -## Advanced Topics - -### Pre/Post-Promotion Hooks - -Trigger scripts before/after promotion. Example: -```yaml -strategy: - canary: - steps: - - setWeight: 33 - - pause: - duration: 1m - termination: RolloutAbortOnFailure # Built-in hook support - - setWeight: 100 -``` - -### Analysis and Rollback - -Integrate with external metrics (Prometheus, Datadog) to auto-rollback if thresholds violated. Requires Flagger or custom AnalysisTemplate. - -### GitOps Workflow - -Changes to rollout config in git auto-sync via ArgoCD: - -1. Edit `/home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml` -2. Commit: `git commit -am "chore: adjust canary duration"` -3. Push: `git push r0 master` -4. ArgoCD syncs within 3 minutes -5. Next rollout uses new config - -### Multiple Canary Steps - -Progressive rollout with multiple weight changes: -```yaml -steps: -- setWeight: 10 -- pause: {duration: 2m} -- setWeight: 25 -- pause: {duration: 2m} -- setWeight: 50 -- pause: {duration: 2m} -- setWeight: 100 -``` - -Total time: ~6 minutes with gradual traffic increase. - -## References - -- [Argo Rollouts Canary Feature](https://argoproj.github.io/argo-rollouts/features/canary/) -- [Rollout Specification](https://argoproj.github.io/argo-rollouts/features/spec/) -- [kubectl-argo-rollouts Plug-in](https://argoproj.github.io/argo-rollouts/getting-started/#using-kubectl-with-argo-rollouts) -- [Progressive Delivery Patterns](https://www.weave.works/blog/what-is-progressive-delivery/) diff --git a/f3s/tracing-demo/ROLLOUTS-FILE-TREE.txt b/f3s/tracing-demo/ROLLOUTS-FILE-TREE.txt deleted file mode 100644 index 6c85754..0000000 --- a/f3s/tracing-demo/ROLLOUTS-FILE-TREE.txt +++ /dev/null @@ -1,183 +0,0 @@ -/home/paul/git/conf/f3s/ -├── README-ROLLOUTS.md ← ENTRY POINT (quick reference) -├── ARGO-ROLLOUTS-SUMMARY.md ← Full architecture & overview -├── ROLLOUTS-SETUP.md ← Detailed setup + 5 scenarios -├── ROLLOUTS-CHECKLIST.md ← Step-by-step deployment -├── ROLLOUTS-FILE-TREE.txt ← This file -│ -├── argo-rollouts/ ← NEW: Argo Rollouts Controller -│ ├── Justfile ← Install/upgrade/uninstall -│ ├── values.yaml ← Helm configuration -│ └── README.md ← Controller-specific guide -│ -├── argocd-apps/ -│ ├── cicd/ -│ │ ├── git-server.yaml -│ │ └── argo-rollouts.yaml ← NEW: Controller app -│ │ -│ └── services/ -│ ├── tracing-demo.yaml ← UPDATED: Deployment → Rollout -│ └── ... (other apps) -│ -├── tracing-demo/ -│ ├── README.md -│ ├── Justfile ← UPDATED: Added rollout commands -│ ├── ROLLOUTS-DEMO.md ← NEW: Technical walkthrough -│ ├── rollout-demo.sh ← NEW: Demo automation -│ │ -│ └── helm-chart/ -│ ├── Chart.yaml -│ └── templates/ -│ ├── frontend-rollout.yaml ← NEW: Canary rollout definition -│ ├── frontend-deployment.yaml ← KEPT: For reference -│ ├── middleware-deployment.yaml ← (unchanged) -│ ├── backend-deployment.yaml ← (unchanged) -│ ├── frontend-service.yaml -│ ├── middleware-service.yaml -│ ├── backend-service.yaml -│ └── ingress.yaml -│ -└── ... (other apps unchanged) - - -═══════════════════════════════════════════════════════════════════════════ - -INSTALLATION SUMMARY -═══════════════════════════════════════════════════════════════════════════ - -Step 1: Install Controller - cd /home/paul/git/conf/f3s/argo-rollouts - just install - -Step 2: Verify ArgoCD - argocd app sync argo-rollouts - argocd app sync tracing-demo - -Step 3: Watch Demo - cd /home/paul/git/conf/f3s/tracing-demo - just rollout-watch - -Step 4: Trigger Rollout (in another terminal) - kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"registry.lan.buetow.org:30001/tracing-demo-frontend:latest"}]' - -═══════════════════════════════════════════════════════════════════════════ - -DOCUMENTATION ROADMAP -═══════════════════════════════════════════════════════════════════════════ - -NEW TO ARGO ROLLOUTS? - 1. Read: README-ROLLOUTS.md (3 min) - 2. Read: ARGO-ROLLOUTS-SUMMARY.md (10 min) - 3. Follow: ROLLOUTS-CHECKLIST.md (step-by-step) - -WANT DETAILED GUIDE? - → ROLLOUTS-SETUP.md - - Complete setup instructions - - 5 demo scenarios with expected output - - Monitoring dashboards - - Advanced patterns - -DOING THE DEPLOYMENT? - → ROLLOUTS-CHECKLIST.md - - Pre-deployment checks - - Installation steps - - Verification - - Troubleshooting - -TROUBLESHOOTING? - → ROLLOUTS-SETUP.md → Troubleshooting section - → argo-rollouts/README.md - → tracing-demo/ROLLOUTS-DEMO.md - -═══════════════════════════════════════════════════════════════════════════ - -KEY FILES EXPLAINED -═══════════════════════════════════════════════════════════════════════════ - -argo-rollouts/Justfile - - Automates installation of Argo Rollouts controller - - Commands: install, upgrade, uninstall, status, logs - - Deploys to: cicd namespace - -argo-rollouts/values.yaml - - Helm chart configuration for Argo Rollouts - - Sets resource limits, metrics, replicas - -argocd-apps/cicd/argo-rollouts.yaml - - ArgoCD Application resource - - Manages controller installation via GitOps - - Auto-syncs when argo-rollouts/ changes in git - -tracing-demo/helm-chart/templates/frontend-rollout.yaml - - Replaces frontend-deployment.yaml - - Defines canary strategy: - * Step 1: 50% traffic - * Step 2: 2-minute pause - * Step 3: 100% promotion - - Keeps same pods, volumes, env vars as Deployment - -tracing-demo/Justfile (updated) - - New commands for rollout management - - just rollout-watch - - just rollout-status - - just rollout-promote - - just rollout-abort - - just rollout-history - -tracing-demo/rollout-demo.sh - - Automation script for demo - - Checks prerequisites - - Guides through demo workflow - - Can be extended for CI/CD - -═══════════════════════════════════════════════════════════════════════════ - -WHAT CHANGED IN EXISTING FILES -═══════════════════════════════════════════════════════════════════════════ - -tracing-demo/Justfile - [+] 8 new rollout commands - [-] No breaking changes to existing commands - -tracing-demo/helm-chart/templates/frontend-deployment.yaml - [~] Still exists (for reference, not deployed) - [→] Replaced by frontend-rollout.yaml in deployment - -argocd-apps/services/tracing-demo.yaml - [+] RespectIgnoreDifferences=true sync option - [-] No other changes (points to same Helm chart) - -═══════════════════════════════════════════════════════════════════════════ - -WHAT DID NOT CHANGE -═══════════════════════════════════════════════════════════════════════════ - -✓ Middleware & Backend services remain Deployments -✓ All service definitions (frontend, middleware, backend services) -✓ Ingress configuration -✓ All other apps (audiobookshelf, miniflux, etc.) -✓ ArgoCD configuration & installation -✓ Prometheus/Grafana setup - -═══════════════════════════════════════════════════════════════════════════ - -HOW TO NAVIGATE THIS -═══════════════════════════════════════════════════════════════════════════ - -If you want to... See... -──────────────────────────────────────────────────────────────────────────── -Understand what was created ARGO-ROLLOUTS-SUMMARY.md -Get started quickly README-ROLLOUTS.md -Deploy step-by-step ROLLOUTS-CHECKLIST.md -See detailed scenarios & examples ROLLOUTS-SETUP.md -Troubleshoot issues ROLLOUTS-SETUP.md (Troubleshooting section) -Learn technical details tracing-demo/ROLLOUTS-DEMO.md -Install the controller argo-rollouts/Justfile + argo-rollouts/README.md -See the rollout definition tracing-demo/helm-chart/templates/frontend-rollout.yaml -Run a demo tracing-demo/rollout-demo.sh or just rollout-watch -Monitor during rollout Prometheus/Grafana (see ROLLOUTS-SETUP.md) -Integrate with CI/CD See ROLLOUTS-SETUP.md section "GitOps Flow" - -═══════════════════════════════════════════════════════════════════════════ diff --git a/f3s/tracing-demo/ROLLOUTS-SETUP.md b/f3s/tracing-demo/ROLLOUTS-SETUP.md deleted file mode 100644 index 0ea965c..0000000 --- a/f3s/tracing-demo/ROLLOUTS-SETUP.md +++ /dev/null @@ -1,373 +0,0 @@ -# Argo Rollouts Setup and Demo Guide - -Complete setup and demonstration of Argo Rollouts with the tracing-demo application. Canary strategy: 33% traffic (1 pod) for 1 minute, then auto-promote to 100%. - -## Quick Setup - -### 1. Install Argo Rollouts Controller - -```bash -cd /home/paul/git/conf/f3s/argo-rollouts -just install -``` - -Verify installation: -```bash -kubectl get pods -n cicd -l app.kubernetes.io/name=argo-rollouts -kubectl get crd | grep rollout -``` - -### 2. Install kubectl Plugin (Optional but Recommended) - -```bash -curl -LO https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64 -chmod +x kubectl-argo-rollouts-linux-amd64 -sudo install -m 755 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts -``` - -Verify: -```bash -kubectl argo rollouts version -``` - -### 3. Sync ArgoCD with New Applications - -```bash -argocd app sync argo-rollouts -argocd app sync tracing-demo -``` - -### 4. Verify Rollout is Deployed - -```bash -kubectl get rollout tracing-demo-frontend -n services -kubectl describe rollout tracing-demo-frontend -n services -``` - -Expected status: `Healthy` with `3/3 replicas` in stable state. - -## Quick Demo (90 seconds) - -### Terminal 1 - Watch Progress - -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-watch -``` - -Or use the kubectl command directly: -```bash -kubectl argo rollouts get rollout tracing-demo-frontend -n services --watch -``` - -### Terminal 2 - Trigger Rollout - -Wait 10 seconds for Terminal 1 to start watching, then trigger: - -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -### Watch the Timeline - -**Terminal 1 will show:** - -``` -Step: 0/3 -SetWeight: 33 -Canary: 1 pod (new version) - starting -Stable: 3 pods (old version) - handling requests -``` - -→ After 15 seconds, canary pod becomes ready: - -``` -Step: 1/3 -SetWeight: 33 -Canary: 1 pod (new version) - ready, receiving 33% traffic -Stable: 3 pods (old version) - receiving 67% traffic -``` - -→ After ~60 seconds, auto-promotion begins: - -``` -Step: 2/3 -SetWeight: 100 -Canary scaling → Stable -``` - -→ After ~90 seconds, complete: - -``` -Status: Healthy -Replicas: 3/3 all running new version -``` - -## Demo Scenarios - -### Scenario 1: Observe the Full Rollout - -Just follow the "Quick Demo" above. Watch all three steps progress automatically over 90 seconds. - -### Scenario 2: Abort Rollout (Simulate Failure) - -**Terminal 1**: Watch the rollout -```bash -just rollout-watch -``` - -**Terminal 2**: Trigger rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -**Terminal 3 (while at step 1)**: Abort the rollout -```bash -cd /home/paul/git/conf/f3s/tracing-demo -just rollout-abort -``` - -Result: -- Canary pods terminate -- Old 3 pods continue running -- Status shows "Aborted" - -Verify: -```bash -just rollout-status -``` - -### Scenario 3: Load Testing During Rollout - -**Terminal 1**: Watch rollout -```bash -just rollout-watch -``` - -**Terminal 2**: Start load test -```bash -just load-test & -``` - -**Terminal 3**: Trigger rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -Load test will hit both old and new pods during the 1-minute canary window. - -### Scenario 4: Check Logs During Rollout - -**Terminal 1**: Watch rollout -```bash -just rollout-watch -``` - -**Terminal 2**: Trigger rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -**Terminal 3**: Watch logs -```bash -kubectl logs -n services -l app=tracing-demo-frontend -f --tail=20 -``` - -See logs from both old and new pods. - -### Scenario 5: Monitor via Grafana Tempo (Distributed Tracing) - -**Terminal 1**: Watch rollout -```bash -just rollout-watch -``` - -**Terminal 2**: Trigger rollout -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -**Terminal 3**: Open Grafana -1. Navigate to https://grafana.f3s.buetow.org -2. Go to Explore → Select "Tempo" datasource -3. Query: `{ resource.service.name = "frontend" }` -4. See traces from both old and new versions during canary phase - -## Timeline Breakdown - -| Time | Event | Status | -|------|-------|--------| -| 0s | Trigger rollout | Rollout starts | -| 0-5s | Canary pod created | `Step 0/3: SetWeight 33` | -| 5-15s | Canary pod becoming ready | Still not ready | -| 15s | Canary pod ready | `Step 1/3: SetWeight 33, canary ready` | -| 15-60s | Observing canary | Requests split 67/33 (old/new) | -| 60s | Auto-promotion triggered | `Step 2/3: SetWeight 100` | -| 60-70s | Scaling new pods | Canary → Stable | -| 70-80s | Terminating old pods | Old pods scaling down | -| ~90s | Complete | `Status: Healthy, 3/3 replicas` | - -## Monitoring During Rollout - -### kubectl Commands - -Real-time status: -```bash -kubectl argo rollouts get rollout tracing-demo-frontend -n services --watch -``` - -Check specific details: -```bash -kubectl argo rollouts describe rollout tracing-demo-frontend -n services -kubectl argo rollouts history tracing-demo-frontend -n services -``` - -Pod status: -```bash -kubectl get pods -n services -l app=tracing-demo-frontend -o wide -``` - -### Prometheus Metrics - -```bash -# Port-forward Prometheus -kubectl port-forward -n monitoring svc/prometheus 9090:9090 -``` - -Then query: -```promql -# Pod counts during rollout -kube_replicaset_replicas{replicaset=~"tracing-demo-frontend.*"} - -# Pod status -kube_pod_status_phase{namespace="services", pod=~"tracing-demo-frontend.*"} - -# Pod age (shows which are old vs new) -time() - kube_pod_created{namespace="services", pod=~"tracing-demo-frontend.*"} -``` - -### Grafana Dashboards - -1. Open Grafana: https://grafana.f3s.buetow.org -2. Explore → Tempo datasource -3. Query: `{ resource.service.name = "frontend" }` -4. See traces from old and new versions -5. Notice latency/error differences during rollout - -## Rollout Configuration - -Located in: `/home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml` - -Key settings: -```yaml -replicas: 3 # 3 pods total -strategy: - canary: - steps: - - setWeight: 33 # Send 1 pod (33%) to canary - - pause: - duration: 1m # Wait 1 minute, then auto-promote - - setWeight: 100 # Promote all to new version -``` - -To modify pause duration: -```bash -# Edit the file -nano /home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml - -# Change duration: 1m to duration: 5m (for example) -# Then commit and push -git add -A && git commit -m "chore: extend canary pause to 5 minutes" -git push r0 master -``` - -ArgoCD will auto-sync the new rollout configuration. - -## Troubleshooting - -### Rollout shows "ErrImagePull" on canary pod - -This happens if using an image tag that doesn't exist. The env var patch approach forces a rollout without changing the image, so use: - -```bash -kubectl patch rollout tracing-demo-frontend -n services \ - --type='json' \ - -p='[{"op":"add","path":"/spec/template/spec/containers/0/env/-","value":{"name":"ROLLOUT_V","value":"'$(date +%s)'"}}]' -``` - -### Rollout stuck in "Progressing" - -Check pod status: -```bash -kubectl describe rollout tracing-demo-frontend -n services -kubectl get pods -n services -l app=tracing-demo-frontend -``` - -Check controller logs: -```bash -kubectl logs -n cicd -l app.kubernetes.io/name=argo-rollouts --tail=50 -``` - -### Controller not running - -```bash -kubectl get pods -n cicd -l app.kubernetes.io/name=argo-rollouts -kubectl logs -n cicd -l app.kubernetes.io/name=argo-rollouts -``` - -### Auto-promotion not happening - -Verify pause duration is set: -```bash -kubectl get rollout tracing-demo-frontend -n services -o yaml | grep -A 5 "pause:" -``` - -## Advanced: Modify Canary Parameters - -### Increase observation time to 5 minutes - -```bash -# Edit rollout YAML -nano /home/paul/git/conf/f3s/tracing-demo/helm-chart/templates/frontend-rollout.yaml - -# Change: -# - pause: -# duration: 1m -# To: -# - pause: -# duration: 5m - -git add -A && git commit -m "chore: extend canary pause to 5 minutes" -git push r0 master -``` - -### Reduce traffic weight to canary (more conservative) - -```yaml -steps: -- setWeight: 10 # Only 10% traffic (0.3 pods worth) -- pause: - duration: 2m # Observe longer -- setWeight: 100 -``` - -### Add health check analysis (requires Flagger or ArgoCD Analysis) - -For automated rollback based on error rate thresholds, see `/home/paul/git/conf/f3s/ROLLOUTS-SETUP.md` → "Advanced: Custom Analysis" section. - -## References - -- [Argo Rollouts Canary Strategy](https://argoproj.github.io/argo-rollouts/features/canary/) -- [Argo Rollouts Best Practices](https://argoproj.github.io/argo-rollouts/best-practices/) -- [kubectl-argo-rollouts Plugin](https://argoproj.github.io/argo-rollouts/getting-started/#using-kubectl-with-argo-rollouts) -- [Flagger for Automated Analysis](https://flagger.app/) |
