| Age | Commit message (Collapse) | Author |
|
cgit image needs root to:
- Bind sockets with spawn-fcgi
- Run nginx master process
- Write to /var/run/nginx.pid
The initContainer already sets up cache dir with proper permissions.
|
|
The cgit entrypoint tries to write to /etc/cgitrc which is mounted
read-only from our ConfigMap. Set USE_CUSTOM_CONFIG=true to use our
custom cgitrc directly without template substitution.
|
|
The cgit image runs as nginx user (UID 101), not www-data (UID 33).
- Update initContainer to chown cache to 101:1000
- Update cgit securityContext to runAsUser: 101
|
|
Follow webdav/filebrowser pattern for proper permission handling:
- Add fsGroup: 1000 at pod level for git repo access
- Add initContainer to chown emptyDir volumes
- Run git-server as root (required for sshd)
- Run cgit as user 33 (www-data)
- Restore cgit-cache emptyDir volume with proper ownership
|
|
- Mount emptyDir for /etc/ssh to allow SSH host key generation
- Mount emptyDir for /var/cache/cgit to allow cache initialization
- Run both containers as root with proper capabilities
- Copy sshd_config at runtime from /tmp to /etc/ssh
- Add imagePullPolicy: Always to force image refresh
|
|
- Generate SSH host keys at runtime via entrypoint script
- Remove fsGroup security context to fix emptyDir permissions
- Allow cgit to initialize cache directory as root
|
|
- Remove unsupported UsePAM option from sshd_config
- Run cgit as root to allow cache directory initialization
- Add CHOWN and DAC_OVERRIDE capabilities for cgit
|
|
- Use registry.lan.buetow.org for deployment (internal DNS)
- Add emptyDir volume for cgit cache directory
- Add README.md with deployment and secret management instructions
This fixes image pull issues and cgit permission errors.
|
|
Deploy a self-hosted git repository solution to replace external Codeberg dependency.
Components:
- SSH git server: Alpine-based container with OpenSSH and git
- cgit web UI: Browse repositories at cgit.f3s.buetow.org
- Single pod design: git-server + cgit containers sharing storage
Infrastructure:
- Docker image in git-server/docker-image/ with Justfile build automation
- Helm chart in git-server/helm-chart/ for Kubernetes deployment
- 5Gi ReadWriteMany PVC for NFS-backed repository storage
- ClusterIP service for ArgoCD internal access
- NodePort 30022 for external SSH push access
- Traefik ingress for cgit web UI
ArgoCD Application manifest deployed to cicd namespace.
Note: SSH keys must be created as Kubernetes secrets manually, not in git.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amp-Thread-ID: https://ampcode.com/threads/T-019b9eec-b607-7271-9b75-f05255a60742
Co-authored-by: Amp <amp@ampcode.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K3s embeds kube-proxy and kube-scheduler functionality into the main
k3s server process, unlike standard Kubernetes where they run as
separate components.
This change disables monitoring for these components to prevent
false-positive critical alerts:
- KubeProxyDown
- KubeSchedulerDown
These alerts were firing because kube-prometheus-stack expects
standard Kubernetes architecture with separate kube-proxy and
kube-scheduler pods/processes.
Cluster info:
- Running k3s v1.32.6+k3s1
- 3 control-plane nodes (r0, r1, r2)
- Components embedded in k3s binary
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added enhanced port-forward targets with helpful UI information:
- 'just alerts' - Quick access to Prometheus alerts view
- 'just alertmanager' - Quick access to Alertmanager UI
- Enhanced output showing all relevant URLs
All port-forward commands now display:
- Access URLs with direct links to specific views
- Clear instructions for stopping (Ctrl+C)
Usage:
cd prometheus/
just alerts # Opens Prometheus alerts (port 9090)
just alertmanager # Opens Alertmanager (port 9093)
just port-forward-prometheus [port]
just port-forward-grafana [port]
After running, access:
- Prometheus Alerts: http://localhost:9090/alerts
- Alertmanager: http://localhost:9093
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added Alertmanager configuration to:
- Route ArgoCD application alerts to dedicated 'argocd-alerts' receiver
- Group ArgoCD alerts by alertname, name (app name), and severity
- Faster alert grouping for ArgoCD (10s wait vs 30s default)
- Repeat ArgoCD alerts every 6 hours
- Suppress Watchdog test alerts
- Configure inhibit rules to prevent alert spam
Alerts are visible in:
- Prometheus UI: http://localhost:9090/alerts
- Alertmanager UI: http://localhost:9093
- Grafana dashboard: ArgoCD Applications - Health & Sync Status
This ensures critical application issues are properly routed and visible
in the monitoring UI for immediate action.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
Created comprehensive Grafana dashboard showing:
- Total applications count
- Healthy vs unhealthy applications
- Out-of-sync status
- Detailed table with all applications and their status
- Health status timeline graph
- Sync operations rate
- Active ArgoCD-related alerts
Dashboard will auto-load in Grafana via ConfigMap with label grafana_dashboard='1'
Access at: https://grafana.f3s.buetow.org → Dashboards → ArgoCD Applications
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
This implements monitoring for ALL services deployed via ArgoCD by leveraging ArgoCD's native Prometheus metrics instead of scraping individual services.
Changes:
- Created ArgoCD application alerts for health and sync status monitoring
- Alert when applications are unhealthy (Degraded, Missing, Unknown, Suspended)
- Alert when applications are out of sync for >10 minutes
- Alert when sync operations are failing repeatedly
- Alert when applications are stuck in Progressing state
- Added recording rules for unhealthy/out-of-sync application counts
- Added radicale health monitoring via scrape config
- Added radicale to additional-scrape-configs for direct health checks
- Monitors radicale web interface availability
Benefits:
- Single monitoring solution for all 21 ArgoCD-managed applications
- Automatic monitoring for new applications added to ArgoCD
- Early detection of configuration drift and deployment issues
- Centralized alerting with actionable remediation steps
Monitored applications include: radicale, registry, alloy, grafana, loki,
prometheus, tempo, anki-sync-server, audiobookshelf, filebrowser, immich,
keybr, kobo-sync-server, miniflux, opodsync, and more.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Create subdirectories: monitoring/, services/, infra/, test/
- Move 6 monitoring apps to monitoring/
- Move 13 service apps to services/
- Move 1 infra app to infra/
- Move 1 test app to test/
- Add README.md documenting the structure and usage
This organization:
- Makes it easier to understand which apps belong to which namespace
- Allows applying apps by namespace: kubectl apply -f argocd-apps/monitoring/
- Supports namespace-scoped app-of-apps patterns
- Provides better clarity when browsing the repository
All 21 applications remain functional and validated with kubectl --dry-run.
|
|
- Created ArgoCD Application for grafana-ingress
- Simple custom Helm chart exposing Grafana via Traefik
- Updated Justfile with ArgoCD commands
- Status: Synced and Healthy
- Ingress working at https://grafana.f3s.buetow.org
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Successfully migrated kube-prometheus-stack to ArgoCD
- Multi-source Application: upstream chart + manifests directory
- PostSync hook automatically restarts Grafana to reload datasources
- All recording rules applied (FreeBSD, OpenBSD, ZFS)
- All dashboards provisioned
- Grafana datasources configured (Prometheus, Loki, Tempo, Alertmanager)
- Updated Justfile with ArgoCD commands
- Status: Synced and Healthy
- Grafana restarted successfully by PostSync hook
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Created manifests/ directory with all additional resources
- Added sync wave annotations for proper ordering
- Created PostSync hook for Grafana pod restart
- Converted additional-scrape-configs to Kubernetes Secret
- Organized: PVs (wave 0), Secrets/ConfigMaps (wave 1), PrometheusRules (wave 3), Dashboards (wave 4), Hook (wave 10)
- Created multi-source ArgoCD Application (upstream chart + manifests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Created two ArgoCD Application manifests (loki and alloy)
- Updated Justfile with ArgoCD commands for both apps
- Loki: log aggregation (SingleBinary mode, 10Gi storage)
- Alloy: log collection DaemonSet + OTLP receiver for traces
- Both apps are Synced and Healthy
- Alloy forwards logs to Loki and traces to Tempo
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Created ArgoCD Application manifest for Tempo
- Updated Justfile with ArgoCD commands (sync, argocd-status, restart)
- Tested delete/re-deploy workflow
- Verified Tempo is Synced and Healthy
- OTLP receivers enabled on ports 4317 (gRPC) and 4318 (HTTP)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
pushgateway, immich
Apps migrated in this commit:
- example-apache-volume-claim (test namespace, 2 replicas, 1 PVC)
- registry (infra namespace, Docker registry, 1 PVC)
- pushgateway (monitoring namespace, Prometheus metrics)
- immich (multi-component: server, postgres, valkey, ML)
Also:
- Deleted unused example-apache directory
- Updated all Justfiles with ArgoCD commands
- All apps synced and healthy
Progress: 16/22 active apps (73%)
Remaining apps (all in monitoring namespace):
- prometheus (kube-prometheus-stack)
- loki (umbrella chart)
- tempo
- grafana-ingress
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Apps migrated in this commit:
- anki-sync-server (custom images, 1 PVC)
- syncthing (file sync, 2 PVCs)
- audiobookshelf (3 PVCs)
- radicale (CalDAV/CardDAV)
- opodsync (podcast sync, 2-container pod)
- kobo-sync-server (eReader sync)
- filebrowser (3 PVCs)
- webdav (WebDAV server)
All apps:
- Created ArgoCD Application manifests
- Updated Justfiles with ArgoCD commands
- All synced successfully and healthy
- Zero downtime migrations
Also includes:
- Updated migration progress tracker (12/23 apps, 52%)
- Deleted freshrss directory (app no longer needed)
Progress: 12/23 apps (52%)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Added ArgoCD Application manifests for wallabag and keybr
- Updated Justfiles to use ArgoCD commands (sync, argocd-status, restart)
- Removed Helm commands (install, upgrade, delete)
- Tested delete/re-deploy workflow for both apps
- All resources sync successfully, zero downtime
Apps migrated: 4/23 (17%)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Testing ArgoCD auto-sync functionality by scaling the tracing-demo
frontend deployment from 1 to 2 replicas. This validates the complete
GitOps workflow: commit → push → auto-sync → deployment.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
Document how gogios.json.tpl handles server-specific vs service domain checks:
- Dedicated bare hostname checks for server FQDNs
- Service domain checks with all prefix variants
- Why server hostnames must be skipped in @acme_hosts loop
- Impact of not skipping: 12 false critical alerts
Explains the same skip pattern used across httpd.conf.tpl, relayd.conf.tpl,
and gogios.json.tpl for consistent handling of server-specific hostnames.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Skip blowfish.buetow.org and fishfinger.buetow.org in the @acme_hosts loop
that creates monitoring checks for www and standby prefix variants.
These server-specific hostnames:
- Don't have DNS records for www/standby prefixes
- Already have dedicated bare hostname checks (lines 29-46)
- Should only be monitored without prefix variants
This prevents 12 false critical alerts for non-existent:
- www.blowfish.buetow.org
- standby.blowfish.buetow.org
- www.fishfinger.buetow.org
- standby.fishfinger.buetow.org
Follows same pattern as httpd.conf.tpl and relayd.conf.tpl where server
hostnames are skipped in shared configuration loops.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Removed troubleshooting narrative and restructured to document the
system architecture, configuration patterns, and operational knowledge.
Now covers:
- Architecture overview and component responsibilities
- Configuration array roles (@acme_hosts, @f3s_hosts, @prefixes)
- Template processing and variable scoping
- Routing configuration logic
- TLS certificate management in multi-server deployments
- Server block patterns and duplicate prevention
- Server-specific vs. shared host configuration
- Deployment process and testing procedures
- Monitoring system (Gogios) behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Documents the investigation process, root cause analysis, and key learnings
from debugging the blowfish/fishfinger 404 errors. Includes:
- Architecture overview of relayd + httpd routing
- Template variable scoping and processing
- Common pitfalls with server-specific vs shared configuration
- TLS certificate management in multi-server deployments
- Debugging methodology and verification approaches
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added blowfish.buetow.org and fishfinger.buetow.org to @acme_hosts array
to ensure proper routing through relayd to localhost instead of falling
through to f3s cluster backends.
Changes:
- Rexfile: Add blowfish.buetow.org and fishfinger.buetow.org to @acme_hosts
- httpd.conf.tpl: Skip current server hostname in @acme_hosts loop to avoid
duplicate server blocks (already handled by dedicated "Current server's FQDN" block)
- relayd.conf.tpl: Skip both server hostnames in TLS keypair loop since each
server only has its own certificate (not the other server's cert)
This ensures relayd routes these hostnames to localhost:8080 where httpd
serves content from /htdocs/buetow.org/self including index.txt health checks.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
|
|
|
|
- Add http websockets directive to relayd.conf.tpl to allow WebSocket upgrade connections
- Fix "Socket failed to connect" error in audiobookshelf web interface
- Also add immich helm chart configuration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
|