summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-09Run cgit as root - required for nginx and spawn-fcgiPaul Buetow
cgit image needs root to: - Bind sockets with spawn-fcgi - Run nginx master process - Write to /var/run/nginx.pid The initContainer already sets up cache dir with proper permissions.
2026-01-09Set USE_CUSTOM_CONFIG=true to skip cgit template substitutionPaul Buetow
The cgit entrypoint tries to write to /etc/cgitrc which is mounted read-only from our ConfigMap. Set USE_CUSTOM_CONFIG=true to use our custom cgitrc directly without template substitution.
2026-01-09Fix cgit permissions - use UID 101 (nginx user)Paul Buetow
The cgit image runs as nginx user (UID 101), not www-data (UID 33). - Update initContainer to chown cache to 101:1000 - Update cgit securityContext to runAsUser: 101
2026-01-09Fix permissions using fsGroup and initContainer patternPaul Buetow
Follow webdav/filebrowser pattern for proper permission handling: - Add fsGroup: 1000 at pod level for git repo access - Add initContainer to chown emptyDir volumes - Run git-server as root (required for sshd) - Run cgit as user 33 (www-data) - Restore cgit-cache emptyDir volume with proper ownership
2026-01-09Run containers as root and use emptyDir for writeable dirsPaul Buetow
- Mount emptyDir for /etc/ssh to allow SSH host key generation - Mount emptyDir for /var/cache/cgit to allow cache initialization - Run both containers as root with proper capabilities - Copy sshd_config at runtime from /tmp to /etc/ssh - Add imagePullPolicy: Always to force image refresh
2026-01-09Fix SSH host keys and container securityPaul Buetow
- Generate SSH host keys at runtime via entrypoint script - Remove fsGroup security context to fix emptyDir permissions - Allow cgit to initialize cache directory as root
2026-01-09Fix sshd_config and cgit permissionsPaul Buetow
- Remove unsupported UsePAM option from sshd_config - Run cgit as root to allow cache directory initialization - Add CHOWN and DAC_OVERRIDE capabilities for cgit
2026-01-09Fix git-server deploymentPaul Buetow
- Use registry.lan.buetow.org for deployment (internal DNS) - Add emptyDir volume for cgit cache directory - Add README.md with deployment and secret management instructions This fixes image pull issues and cgit permission errors.
2026-01-09Add self-hosted git server with SSH and cgit web UIPaul Buetow
Deploy a self-hosted git repository solution to replace external Codeberg dependency. Components: - SSH git server: Alpine-based container with OpenSSH and git - cgit web UI: Browse repositories at cgit.f3s.buetow.org - Single pod design: git-server + cgit containers sharing storage Infrastructure: - Docker image in git-server/docker-image/ with Justfile build automation - Helm chart in git-server/helm-chart/ for Kubernetes deployment - 5Gi ReadWriteMany PVC for NFS-backed repository storage - ClusterIP service for ArgoCD internal access - NodePort 30022 for external SSH push access - Traefik ingress for cgit web UI ArgoCD Application manifest deployed to cicd namespace. Note: SSH keys must be created as Kubernetes secrets manually, not in git. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-09add cgitPaul Buetow
2026-01-08Enable prune for prometheus ArgoCD app, update READMEPaul Buetow
2026-01-08prunePaul Buetow
2026-01-08Remove invalid radicale scrape job (no metrics endpoint)Paul Buetow
2026-01-08Fix: disable kubeScheduler rules entirelyPaul Buetow
2026-01-08Disable KubeProxyDown and KubeSchedulerDown alerts for k3sPaul Buetow
2026-01-08Add PrometheusHosts to gogios config for f3s cluster alertsPaul Buetow
Amp-Thread-ID: https://ampcode.com/threads/T-019b9eec-b607-7271-9b75-f05255a60742 Co-authored-by: Amp <amp@ampcode.com>
2026-01-08add agentsPaul Buetow
2026-01-08Add Prometheus NodePort and alert query targets to JustfilePaul Buetow
2026-01-08Add NodePort service for Prometheus on port 30090Paul Buetow
2026-01-08renamePaul Buetow
2026-01-08Add volumeName to PVC for explicit bindingPaul Buetow
2026-01-08Change apache PV/PVC to ReadWriteMany for multi-pod accessPaul Buetow
2026-01-08Disable kube-proxy and kube-scheduler monitoring for k3sPaul Buetow
K3s embeds kube-proxy and kube-scheduler functionality into the main k3s server process, unlike standard Kubernetes where they run as separate components. This change disables monitoring for these components to prevent false-positive critical alerts: - KubeProxyDown - KubeSchedulerDown These alerts were firing because kube-prometheus-stack expects standard Kubernetes architecture with separate kube-proxy and kube-scheduler pods/processes. Cluster info: - Running k3s v1.32.6+k3s1 - 3 control-plane nodes (r0, r1, r2) - Components embedded in k3s binary Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Add convenient port-forward targets for Prometheus monitoringPaul Buetow
Added enhanced port-forward targets with helpful UI information: - 'just alerts' - Quick access to Prometheus alerts view - 'just alertmanager' - Quick access to Alertmanager UI - Enhanced output showing all relevant URLs All port-forward commands now display: - Access URLs with direct links to specific views - Clear instructions for stopping (Ctrl+C) Usage: cd prometheus/ just alerts # Opens Prometheus alerts (port 9090) just alertmanager # Opens Alertmanager (port 9093) just port-forward-prometheus [port] just port-forward-grafana [port] After running, access: - Prometheus Alerts: http://localhost:9090/alerts - Alertmanager: http://localhost:9093 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Configure Alertmanager routing for ArgoCD application alertsPaul Buetow
Added Alertmanager configuration to: - Route ArgoCD application alerts to dedicated 'argocd-alerts' receiver - Group ArgoCD alerts by alertname, name (app name), and severity - Faster alert grouping for ArgoCD (10s wait vs 30s default) - Repeat ArgoCD alerts every 6 hours - Suppress Watchdog test alerts - Configure inhibit rules to prevent alert spam Alerts are visible in: - Prometheus UI: http://localhost:9090/alerts - Alertmanager UI: http://localhost:9093 - Grafana dashboard: ArgoCD Applications - Health & Sync Status This ensures critical application issues are properly routed and visible in the monitoring UI for immediate action. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Update CLAUDE.md instruction formatPaul Buetow
2026-01-08Add Grafana dashboard for ArgoCD applications monitoringPaul Buetow
Created comprehensive Grafana dashboard showing: - Total applications count - Healthy vs unhealthy applications - Out-of-sync status - Detailed table with all applications and their status - Health status timeline graph - Sync operations rate - Active ArgoCD-related alerts Dashboard will auto-load in Grafana via ConfigMap with label grafana_dashboard='1' Access at: https://grafana.f3s.buetow.org → Dashboards → ArgoCD Applications Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Add comprehensive ArgoCD application monitoring and alertsPaul Buetow
This implements monitoring for ALL services deployed via ArgoCD by leveraging ArgoCD's native Prometheus metrics instead of scraping individual services. Changes: - Created ArgoCD application alerts for health and sync status monitoring - Alert when applications are unhealthy (Degraded, Missing, Unknown, Suspended) - Alert when applications are out of sync for >10 minutes - Alert when sync operations are failing repeatedly - Alert when applications are stuck in Progressing state - Added recording rules for unhealthy/out-of-sync application counts - Added radicale health monitoring via scrape config - Added radicale to additional-scrape-configs for direct health checks - Monitors radicale web interface availability Benefits: - Single monitoring solution for all 21 ArgoCD-managed applications - Automatic monitoring for new applications added to ArgoCD - Early detection of configuration drift and deployment issues - Centralized alerting with actionable remediation steps Monitored applications include: radicale, registry, alloy, grafana, loki, prometheus, tempo, anki-sync-server, audiobookshelf, filebrowser, immich, keybr, kobo-sync-server, miniflux, opodsync, and more. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Reorganize argocd-apps by namespace for better structurePaul Buetow
- Create subdirectories: monitoring/, services/, infra/, test/ - Move 6 monitoring apps to monitoring/ - Move 13 service apps to services/ - Move 1 infra app to infra/ - Move 1 test app to test/ - Add README.md documenting the structure and usage This organization: - Makes it easier to understand which apps belong to which namespace - Allows applying apps by namespace: kubectl apply -f argocd-apps/monitoring/ - Supports namespace-scoped app-of-apps patterns - Provides better clarity when browsing the repository All 21 applications remain functional and validated with kubectl --dry-run.
2026-01-07Migrate Grafana-Ingress to ArgoCD GitOpsPaul Buetow
- Created ArgoCD Application for grafana-ingress - Simple custom Helm chart exposing Grafana via Traefik - Updated Justfile with ArgoCD commands - Status: Synced and Healthy - Ingress working at https://grafana.f3s.buetow.org 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Prometheus to ArgoCD GitOpsPaul Buetow
- Successfully migrated kube-prometheus-stack to ArgoCD - Multi-source Application: upstream chart + manifests directory - PostSync hook automatically restarts Grafana to reload datasources - All recording rules applied (FreeBSD, OpenBSD, ZFS) - All dashboards provisioned - Grafana datasources configured (Prometheus, Loki, Tempo, Alertmanager) - Updated Justfile with ArgoCD commands - Status: Synced and Healthy - Grafana restarted successfully by PostSync hook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Prepare Prometheus for ArgoCD GitOps migrationPaul Buetow
- Created manifests/ directory with all additional resources - Added sync wave annotations for proper ordering - Created PostSync hook for Grafana pod restart - Converted additional-scrape-configs to Kubernetes Secret - Organized: PVs (wave 0), Secrets/ConfigMaps (wave 1), PrometheusRules (wave 3), Dashboards (wave 4), Hook (wave 10) - Created multi-source ArgoCD Application (upstream chart + manifests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Loki and Alloy to ArgoCD GitOpsPaul Buetow
- Created two ArgoCD Application manifests (loki and alloy) - Updated Justfile with ArgoCD commands for both apps - Loki: log aggregation (SingleBinary mode, 10Gi storage) - Alloy: log collection DaemonSet + OTLP receiver for traces - Both apps are Synced and Healthy - Alloy forwards logs to Loki and traces to Tempo 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Tempo to ArgoCD GitOpsPaul Buetow
- Created ArgoCD Application manifest for Tempo - Updated Justfile with ArgoCD commands (sync, argocd-status, restart) - Tested delete/re-deploy workflow - Verified Tempo is Synced and Healthy - OTLP receivers enabled on ports 4317 (gRPC) and 4318 (HTTP) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate remaining 4 apps: example-apache-volume-claim, registry, ↵Paul Buetow
pushgateway, immich Apps migrated in this commit: - example-apache-volume-claim (test namespace, 2 replicas, 1 PVC) - registry (infra namespace, Docker registry, 1 PVC) - pushgateway (monitoring namespace, Prometheus metrics) - immich (multi-component: server, postgres, valkey, ML) Also: - Deleted unused example-apache directory - Updated all Justfiles with ArgoCD commands - All apps synced and healthy Progress: 16/22 active apps (73%) Remaining apps (all in monitoring namespace): - prometheus (kube-prometheus-stack) - loki (umbrella chart) - tempo - grafana-ingress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate 8 remaining simple service apps to ArgoCDPaul Buetow
Apps migrated in this commit: - anki-sync-server (custom images, 1 PVC) - syncthing (file sync, 2 PVCs) - audiobookshelf (3 PVCs) - radicale (CalDAV/CardDAV) - opodsync (podcast sync, 2-container pod) - kobo-sync-server (eReader sync) - filebrowser (3 PVCs) - webdav (WebDAV server) All apps: - Created ArgoCD Application manifests - Updated Justfiles with ArgoCD commands - All synced successfully and healthy - Zero downtime migrations Also includes: - Updated migration progress tracker (12/23 apps, 52%) - Deleted freshrss directory (app no longer needed) Progress: 12/23 apps (52%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate wallabag and keybr to ArgoCD GitOpsPaul Buetow
- Added ArgoCD Application manifests for wallabag and keybr - Updated Justfiles to use ArgoCD commands (sync, argocd-status, restart) - Removed Helm commands (install, upgrade, delete) - Tested delete/re-deploy workflow for both apps - All resources sync successfully, zero downtime Apps migrated: 4/23 (17%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Test GitOps: Scale frontend to 2 replicasPaul Buetow
Testing ArgoCD auto-sync functionality by scaling the tracing-demo frontend deployment from 1 to 2 replicas. This validates the complete GitOps workflow: commit → push → auto-sync → deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07UpdatePaul Buetow
2026-01-07Update AGENT.md with gogios monitoring configuration patternsPaul Buetow
Document how gogios.json.tpl handles server-specific vs service domain checks: - Dedicated bare hostname checks for server FQDNs - Service domain checks with all prefix variants - Why server hostnames must be skipped in @acme_hosts loop - Impact of not skipping: 12 false critical alerts Explains the same skip pattern used across httpd.conf.tpl, relayd.conf.tpl, and gogios.json.tpl for consistent handling of server-specific hostnames. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Fix gogios monitoring to skip server hostname www/standby variantsPaul Buetow
Skip blowfish.buetow.org and fishfinger.buetow.org in the @acme_hosts loop that creates monitoring checks for www and standby prefix variants. These server-specific hostnames: - Don't have DNS records for www/standby prefixes - Already have dedicated bare hostname checks (lines 29-46) - Should only be monitored without prefix variants This prevents 12 false critical alerts for non-existent: - www.blowfish.buetow.org - standby.blowfish.buetow.org - www.fishfinger.buetow.org - standby.fishfinger.buetow.org Follows same pattern as httpd.conf.tpl and relayd.conf.tpl where server hostnames are skipped in shared configuration loops. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06Refactor AGENT.md to focus on infrastructure knowledgePaul Buetow
Removed troubleshooting narrative and restructured to document the system architecture, configuration patterns, and operational knowledge. Now covers: - Architecture overview and component responsibilities - Configuration array roles (@acme_hosts, @f3s_hosts, @prefixes) - Template processing and variable scoping - Routing configuration logic - TLS certificate management in multi-server deployments - Server block patterns and duplicate prevention - Server-specific vs. shared host configuration - Deployment process and testing procedures - Monitoring system (Gogios) behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06Add AGENT.md with debugging learnings and architecture insightsPaul Buetow
Documents the investigation process, root cause analysis, and key learnings from debugging the blowfish/fishfinger 404 errors. Includes: - Architecture overview of relayd + httpd routing - Template variable scoping and processing - Common pitfalls with server-specific vs shared configuration - TLS certificate management in multi-server deployments - Debugging methodology and verification approaches 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06Fix 404 errors for blowfish/fishfinger index.txt URLsPaul Buetow
Added blowfish.buetow.org and fishfinger.buetow.org to @acme_hosts array to ensure proper routing through relayd to localhost instead of falling through to f3s cluster backends. Changes: - Rexfile: Add blowfish.buetow.org and fishfinger.buetow.org to @acme_hosts - httpd.conf.tpl: Skip current server hostname in @acme_hosts loop to avoid duplicate server blocks (already handled by dedicated "Current server's FQDN" block) - relayd.conf.tpl: Skip both server hostnames in TLS keypair loop since each server only has its own certificate (not the other server's cert) This ensures relayd routes these hostnames to localhost:8080 where httpd serves content from /htdocs/buetow.org/self including index.txt health checks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-06joPaul Buetow
2026-01-06jo promptsPaul Buetow
2026-01-06add gogios.buetow.orgPaul Buetow
2026-01-03Enable WebSocket support in relayd for audiobookshelfPaul Buetow
- Add http websockets directive to relayd.conf.tpl to allow WebSocket upgrade connections - Fix "Socket failed to connect" error in audiobookshelf web interface - Also add immich helm chart configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-02add immicPaul Buetow
2026-01-01fixPaul Buetow