summaryrefslogtreecommitdiff
path: root/f3s/prometheus/Justfile
AgeCommit message (Collapse)Author
2026-01-08Add Prometheus NodePort and alert query targets to JustfilePaul Buetow
2026-01-08Add convenient port-forward targets for Prometheus monitoringPaul Buetow
Added enhanced port-forward targets with helpful UI information: - 'just alerts' - Quick access to Prometheus alerts view - 'just alertmanager' - Quick access to Alertmanager UI - Enhanced output showing all relevant URLs All port-forward commands now display: - Access URLs with direct links to specific views - Clear instructions for stopping (Ctrl+C) Usage: cd prometheus/ just alerts # Opens Prometheus alerts (port 9090) just alertmanager # Opens Alertmanager (port 9093) just port-forward-prometheus [port] just port-forward-grafana [port] After running, access: - Prometheus Alerts: http://localhost:9090/alerts - Alertmanager: http://localhost:9093 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Prometheus to ArgoCD GitOpsPaul Buetow
- Successfully migrated kube-prometheus-stack to ArgoCD - Multi-source Application: upstream chart + manifests directory - PostSync hook automatically restarts Grafana to reload datasources - All recording rules applied (FreeBSD, OpenBSD, ZFS) - All dashboards provisioned - Grafana datasources configured (Prometheus, Loki, Tempo, Alertmanager) - Updated Justfile with ArgoCD commands - Status: Synced and Healthy - Grafana restarted successfully by PostSync hook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Fix Grafana datasource provisioning by switching to direct ConfigMap mountingPaul Buetow
After extensive debugging (documented in problem.md), resolved the issue where Tempo and Loki datasources would not appear in Grafana despite correct configuration. Root Cause: - Sidecar-based provisioning with label discovery was not triggering the provisioner module - Multi-step indirection (sidecar → watch → write → reload) had silent failures Solution (following x-rag pattern): - Disabled sidecar datasource provisioning - Created unified grafana-datasources-all.yaml with all datasources - Mount ConfigMap directly to /etc/grafana/provisioning/datasources/ - Grafana now reads datasources on startup via built-in provisioning Changes: - NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo) - MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts - MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script - MODIFIED: README.md - Documented new provisioning approach - NEW: problem.md - Complete debugging journey with 16 attempts documented - DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history) Result: ✅ All datasources now successfully provision on Grafana startup ✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation ✅ Loki datasource (uid=loki) appears in Grafana ✅ Simple, maintainable approach without sidecar complexity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Add comprehensive ZFS monitoring for FreeBSD serversPaul Buetow
Implemented complete ZFS monitoring solution including ARC cache statistics, pool health/capacity metrics, dataset usage, and I/O throughput monitoring. Changes: - Add ZFS recording rules (9 calculated metrics for ARC hit rates, memory usage, etc.) - Add comprehensive Grafana dashboard with 19 panels across 5 rows: * Pool Overview: capacity, health, size, free space, usage trends * I/O Throughput: read/write operations and bytes per second * Dataset Statistics: table showing all datasets with usage details * ARC Cache Statistics: hit rates, size, memory usage * ARC Breakdown: data vs metadata, MRU vs MFU with pie charts - Update Justfile to deploy ZFS recording rules - Add textfile collector script on FreeBSD servers (f0, f1, f2) for pool/dataset metrics Metrics collected: - Pool: size, allocated, free, capacity %, health status - I/O: read/write operations and throughput (via zpool iostat) - Dataset: used, available, referenced space per filesystem - ARC: hit rate, size, memory usage, data/metadata breakdown Fixes: - Pool health panel properly displays ONLINE/DEGRADED/FAULTED status - All stat panels have correct options configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26joPaul Buetow
2025-12-06add openbsd node exportersPaul Buetow
2025-12-06more on thisPaul Buetow
2025-10-24add persistent volumes to prometheus/grafanaPaul Buetow