summaryrefslogtreecommitdiff
path: root/f3s/prometheus
AgeCommit message (Collapse)Author
17 hoursgoprecords: bump image to 0.5.2HEADmasterPaul Buetow
2026-05-16f3s/prometheus: add trivy unresolved-alerts report generatorPaul Buetow
Adds gen-trivy-unresolved-alerts.py which queries Prometheus (/api/v1/rules + /api/v1/alerts) via kubectl exec and produces TRIVY-UNRESOLVED-ALERTS.md. The generated *-ALERTS.md snapshots are gitignored — they're regenerable point-in-time inventories.
2026-05-10nfs-monitor: fix node_exporter textfile_collector Helm chart keyPaul Buetow
Use extraHostVolumeMounts (prometheus-node-exporter sub-chart key for host path mounts) instead of extraVolumes/extraVolumeMounts, which are for general volumes. This correctly wires /var/lib/node_exporter/ textfile_collector into the container so the textfile arg takes effect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10nfs-monitor: add Prometheus alerts for NFS auto-repair failuresPaul Buetow
- check-nfs-mount.sh: write nfs_mount_monitor_consecutive_failures gauge to /var/lib/node_exporter/textfile_collector/nfs_mount_monitor.prom on every run (via write_textfile_metric helper, called from write_fail_count and directly on healthy runs); atomic tmp+mv write prevents partial reads - Rexfile: create /var/lib/node_exporter/textfile_collector dir on r-nodes - prometheus.yaml (ArgoCD app): enable textfile_collector in node_exporter DaemonSet via extraArgs/extraVolumes/extraVolumeMounts; mount host path /var/lib/node_exporter/textfile_collector into container - persistence-values.yaml: sync node_exporter textfile_collector config - nfs-mount-monitor-alerts.yaml: PrometheusRule with two alerts: NfsMountAutoRepairWarning (>= 3 consecutive failures, severity: warning) NfsMountAutoRepairCritical (>= 5 consecutive failures, severity: critical) wired into new 'nfs-alerts' Alertmanager receiver with 30m repeat_interval Tested: rex deploy succeeded, .prom files present on r0/r1/r2, timer clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16goprecords: restore for:1h after alert testPaul Buetow
2026-04-16goprecords: temp set for:1m for alert testPaul Buetow
2026-04-16goprecords: add Prometheus scraping and stale-host alert rulePaul Buetow
- service.yaml: add 'metrics' port (8080) so kubernetes SD auto-discovers the /metrics endpoint alongside the existing http port (80) - prometheus/manifests/goprecords-alerts.yaml: GoprecordsHostNotReporting fires (warning) when a non-excluded host last reported >5 months ago Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08feat(f3s): deploy Trivy Operator for image CVE scanning (task h)Paul Buetow
- ArgoCD app: aquasecurity/trivy-operator in monitoring with ServiceMonitor - PrometheusRule for Critical/High trivy_image_vulnerabilities alerts - Alertmanager route/receiver for component=trivy (UI; webhook TBD) Made-with: Cursor
2026-04-08f3s/prometheus: add Garage admin scrape targets (task f)Paul Buetow
Add job_name garage for 192.168.2.130-132:3903 with os=freebsd label. Mirror config in additional-scrape-configs-secret for kube apply/ArgoCD. Made-with: Cursor
2026-02-07feat: add LAN ingresses for all servicesPaul Buetow
Add *.f3s.lan.buetow.org ingress resources for all services to enable LAN access with TLS termination. This allows direct access from the 192.168.1.0/24 network through the FreeBSD CARP/relayd setup. Services updated: - argocd: argocd.f3s.lan.buetow.org - cgit: cgit.f3s.lan.buetow.org - grafana: grafana.f3s.lan.buetow.org - anki-sync-server: anki.f3s.lan.buetow.org - apache: f3s.lan.buetow.org, www.f3s.lan.buetow.org, standby.f3s.lan.buetow.org - audiobookshelf: audiobookshelf.f3s.lan.buetow.org - filebrowser: filebrowser.f3s.lan.buetow.org - immich: immich.f3s.lan.buetow.org - ipv6test: ipv6test.f3s.lan.buetow.org (+ ipv4/ipv6 subdomains) - keybr: keybr.f3s.lan.buetow.org - koreader-sync-server: koreader.f3s.lan.buetow.org - miniflux: flux.f3s.lan.buetow.org - opodsync: gpodder.f3s.lan.buetow.org - radicale: radicale.f3s.lan.buetow.org - syncthing: syncthing.f3s.lan.buetow.org - tracing-demo: tracing-demo.f3s.lan.buetow.org - wallabag: bag.f3s.lan.buetow.org - webdav: webdav.f3s.lan.buetow.org All LAN ingresses use: - TLS with f3s-lan-tls certificate (cert-manager) - Traefik entrypoints: web,websecure - Same backend services as external ingresses Also fixed koreader-sync-server ingress to use modern annotations. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-01-19resolve merge conflict in argocd dashboardPaul Buetow
Kept the version with the additional "Unhealthy Applications" panel which provides better visibility into problematic applications. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19fix radicale scrape config causing TargetDown alertPaul Buetow
Radicale does not expose Prometheus metrics. The previous config tried to scrape /.web/ which returns HTML, causing parse errors. Synced with additional-scrape-configs.yaml which properly drops radicale from scraping. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19Merge branch 'master' of codeberg.org:snonux/confPaul Buetow
2026-01-18Add unhealthy applications panel to ArgoCD dashboardPaul Buetow
Adds a dedicated table panel showing only applications with health_status != "Healthy" for quick identification of issues. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15Update monitoring and gogios configurationPaul Buetow
- Add node resources multi-select dashboard for Prometheus - Update gogios cron schedule and add HTML status file output - Update Prometheus scrape configs - Add gogios documentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Remove invalid radicale scrape job (no metrics endpoint)Paul Buetow
2026-01-08Add Prometheus NodePort and alert query targets to JustfilePaul Buetow
2026-01-08Add NodePort service for Prometheus on port 30090Paul Buetow
2026-01-08Add convenient port-forward targets for Prometheus monitoringPaul Buetow
Added enhanced port-forward targets with helpful UI information: - 'just alerts' - Quick access to Prometheus alerts view - 'just alertmanager' - Quick access to Alertmanager UI - Enhanced output showing all relevant URLs All port-forward commands now display: - Access URLs with direct links to specific views - Clear instructions for stopping (Ctrl+C) Usage: cd prometheus/ just alerts # Opens Prometheus alerts (port 9090) just alertmanager # Opens Alertmanager (port 9093) just port-forward-prometheus [port] just port-forward-grafana [port] After running, access: - Prometheus Alerts: http://localhost:9090/alerts - Alertmanager: http://localhost:9093 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Add Grafana dashboard for ArgoCD applications monitoringPaul Buetow
Created comprehensive Grafana dashboard showing: - Total applications count - Healthy vs unhealthy applications - Out-of-sync status - Detailed table with all applications and their status - Health status timeline graph - Sync operations rate - Active ArgoCD-related alerts Dashboard will auto-load in Grafana via ConfigMap with label grafana_dashboard='1' Access at: https://grafana.f3s.buetow.org → Dashboards → ArgoCD Applications Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-08Add comprehensive ArgoCD application monitoring and alertsPaul Buetow
This implements monitoring for ALL services deployed via ArgoCD by leveraging ArgoCD's native Prometheus metrics instead of scraping individual services. Changes: - Created ArgoCD application alerts for health and sync status monitoring - Alert when applications are unhealthy (Degraded, Missing, Unknown, Suspended) - Alert when applications are out of sync for >10 minutes - Alert when sync operations are failing repeatedly - Alert when applications are stuck in Progressing state - Added recording rules for unhealthy/out-of-sync application counts - Added radicale health monitoring via scrape config - Added radicale to additional-scrape-configs for direct health checks - Monitors radicale web interface availability Benefits: - Single monitoring solution for all 21 ArgoCD-managed applications - Automatic monitoring for new applications added to ArgoCD - Early detection of configuration drift and deployment issues - Centralized alerting with actionable remediation steps Monitored applications include: radicale, registry, alloy, grafana, loki, prometheus, tempo, anki-sync-server, audiobookshelf, filebrowser, immich, keybr, kobo-sync-server, miniflux, opodsync, and more. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Grafana-Ingress to ArgoCD GitOpsPaul Buetow
- Created ArgoCD Application for grafana-ingress - Simple custom Helm chart exposing Grafana via Traefik - Updated Justfile with ArgoCD commands - Status: Synced and Healthy - Ingress working at https://grafana.f3s.buetow.org 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Migrate Prometheus to ArgoCD GitOpsPaul Buetow
- Successfully migrated kube-prometheus-stack to ArgoCD - Multi-source Application: upstream chart + manifests directory - PostSync hook automatically restarts Grafana to reload datasources - All recording rules applied (FreeBSD, OpenBSD, ZFS) - All dashboards provisioned - Grafana datasources configured (Prometheus, Loki, Tempo, Alertmanager) - Updated Justfile with ArgoCD commands - Status: Synced and Healthy - Grafana restarted successfully by PostSync hook 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-07Prepare Prometheus for ArgoCD GitOps migrationPaul Buetow
- Created manifests/ directory with all additional resources - Added sync wave annotations for proper ordering - Created PostSync hook for Grafana pod restart - Converted additional-scrape-configs to Kubernetes Secret - Organized: PVs (wave 0), Secrets/ConfigMaps (wave 1), PrometheusRules (wave 3), Dashboards (wave 4), Hook (wave 10) - Created multi-source ArgoCD Application (upstream chart + manifests) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31Document Admin API and updated out-of-order configurationPaul Buetow
Updated Prometheus documentation to reflect current configuration: - Added web.enable-admin-api flag documentation - Updated outOfOrderTimeWindow from 720h to 744h (31 days) - Added Data Deletion section with cleanup script usage - Documented manual deletion via Admin API endpoints Provides complete guide for data cleanup after benchmark testing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31UpdatePaul Buetow
2025-12-31Enable Prometheus admin API for data deletionPaul Buetow
Added web.enable-admin-api flag to allow selective deletion of time series data via the /api/v1/admin/tsdb endpoints. This enables cleanup of benchmark data using the delete_series and clean_tombstones APIs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31Increase Prometheus out-of-order window to 31 daysPaul Buetow
Changes: - outOfOrderTimeWindow: 720h → 744h (30 days → 31 days) Rationale: Provides 1-day buffer for 30-day backfill operations to avoid edge case rejections where the oldest samples exceed the limit due to timing variations between data generation and ingestion. With this configuration: - 30-day benchmarks achieve 99.85% success rate (vs 50% with 720h) - Only 4/2592 batches rejected (first few batches slightly over 30d) - Allows safe backfilling of up to 30 days of historic data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31Enable Prometheus historic data ingestion with out-of-order supportPaul Buetow
This commit configures Prometheus to accept historic data via the Remote Write API, enabling backfilling of test metrics for development and troubleshooting purposes. Changes: - Enable Remote Write receiver (--web.enable-remote-write-receiver) - Enable out-of-order ingestion with 30-day window (720h) - Enable exemplar-storage and otlp-write-receiver features - Add Epimetheus dashboard ConfigMap for Grafana provisioning - Remove old prometheus-pusher directory (moved to separate repo) - Document configuration, use cases, and performance considerations Configuration allows backfilling data up to 30 days in the past, supporting tools like Epimetheus for generating synthetic historic metrics. Performance note: This is optimized for ad-hoc troubleshooting, not production use. Out-of-order ingestion increases memory usage, TSDB overhead, and may impact query performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30Enable Prometheus Remote Write receiver for historic data ingestionPaul Buetow
Updated persistence-values.yaml to enable the Remote Write receiver using the correct flag for Prometheus 3.x: - Changed from enableFeatures (not supported in 3.8.1) - To additionalArgs with web.enable-remote-write-receiver This allows Epimetheus to push historic data with preserved timestamps via the Prometheus Remote Write API endpoint (/api/v1/write). Applied via: just upgrade 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30Add AUTO mode with automatic timestamp detectionPaul Buetow
This commit adds intelligent auto-detection of data age, automatically choosing the appropriate ingestion method without user intervention. ## New Features 1. **AUTO Mode** (-mode=auto) - Automatically detects timestamp age from input data - Routes realtime data (< 5min) → Pushgateway - Routes historic data (> 5min) → Remote Write API - No manual timestamp calculation needed! 2. **Input Format Support** - CSV format: metric_name,labels,value,timestamp_ms - JSON format: [{metric, labels, value, timestamp_ms}] - Read from file (-file=path) or stdin - Comments supported in CSV (#) 3. **Smart Routing Logic** - 5-minute threshold determines ingestion method - Handles mixed data (current + historic) in single import - Clear logging shows which method is used for each sample 4. **Test Data Generation** - generate-test-data.sh creates samples for all time ranges - Demonstrates: current, 1h, 1d, 1w, 1m old data - Actual timestamps generated dynamically ## Files Added - auto-ingest.go: Core auto-detection logic - AUTO-MODE.md: Complete documentation - generate-test-data.sh: Test data generator - test-data.csv: Example data template - test-all-ages.csv: Generated test data (all ages) ## Example Usage ```bash # Generate test data ./generate-test-data.sh # Auto-import (detects ages automatically) ./prometheus-pusher-auto \ -mode=auto \ -file=test-all-ages.csv \ -pushgateway=http://localhost:9091 \ -prometheus=http://localhost:9090/api/v1/write ``` ## Output Example ``` 📊 Auto-ingest summary: Total samples: 15 Realtime samples (< 5min old): 3 Historic samples (> 5min old): 12 🔄 Ingesting 3 REALTIME samples via Pushgateway... ⏰ Ingesting 12 HISTORIC samples via Remote Write... [1/12] app_requests_total (age: 1.0 hours) [2/12] app_temperature_celsius (age: 1.0 days) ... 🎉 Auto-ingest complete! ``` ## Supported Time Ranges ✅ Current data (< 5min) ✅ 1 hour old data ✅ 1 day old data ✅ 1 week old data ✅ 1 month old data All ages are automatically detected and routed correctly! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30Add historic data ingestion support to prometheus-pusherPaul Buetow
This commit extends prometheus-pusher to support ingesting historic data with custom timestamps via Prometheus Remote Write API. ## Key Changes 1. New Historic Data Module (historic.go) - GenerateHistoricMetrics: Creates metrics for specific past timestamps - PushHistoricData: Sends single datapoint via Remote Write API - BackfillHistoricData: Backfills range of historic data - Uses Protobuf + Snappy encoding per Prometheus spec 2. Enhanced Main Binary (main.go, realtime.go) - Refactored to support multiple modes - Mode 1: realtime - Push to Pushgateway (original behavior) - Mode 2: historic - Push single historic datapoint - Mode 3: backfill - Backfill range of historic data - Command-line flags for configuration 3. Prometheus Configuration (persistence-values.yaml) - Added web.enable-remote-write-receiver flag - Enables Prometheus to accept timestamped samples via Remote Write API - Required for historic data ingestion 4. Documentation (HISTORIC.md) - Complete guide for historic data ingestion - Explains limitations and best practices - Examples for all three modes - Troubleshooting guide ## Technical Details **Problem**: Pushgateway doesn't support custom timestamps - Prometheus always uses "now" when scraping. This prevents backfilling historic data. **Solution**: Use Prometheus Remote Write API which accepts timestamped samples. Requires enabling --web.enable-remote-write-receiver flag. **Data Format**: Protobuf (prompb.WriteRequest) with Snappy compression **Use Cases**: - Backfill missing data (e.g., during outage) - Import historic data from other systems - Testing with specific timestamps - Data migration scenarios ## Usage Examples ```bash # Realtime mode (original behavior) ./prometheus-pusher-historic -mode=realtime -continuous # Push data from 24 hours ago ./prometheus-pusher-historic -mode=historic -hours-ago=24 # Backfill last 48 hours with 1-hour intervals ./prometheus-pusher-historic -mode=backfill -start-hours=48 -end-hours=0 -interval=1 ``` ## Dependencies Added - github.com/prometheus/prometheus (for prompb package) - github.com/golang/snappy (for compression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30Add Prometheus Pushgateway and data ingestion toolPaul Buetow
This commit adds a complete Prometheus data ingestion solution: 1. Pushgateway Helm Chart (f3s/pushgateway/) - Standalone helm chart for Prometheus Pushgateway - Deployed to monitoring namespace - Receives pushed metrics via HTTP POST on port 9091 2. Prometheus Pusher (f3s/prometheus-pusher/) - Standalone Go binary (12MB) for pushing metrics to Pushgateway - Demonstrates all Prometheus metric types: * Counter (app_requests_total) * Gauge (app_active_connections, app_temperature_celsius) * Histogram (app_request_duration_seconds) * Labeled Counter (app_jobs_processed_total) - Pushes metrics every 15 seconds - Includes comprehensive documentation and examples 3. Prometheus Configuration - Updated additional-scrape-configs.yaml to scrape Pushgateway - Uses honor_labels to preserve pushed metric labels Architecture: Go Binary → Pushgateway → Prometheus → Grafana The pusher binary generates realistic example metrics and pushes them to Pushgateway in Prometheus text format. Prometheus then scrapes the Pushgateway and makes the metrics available for querying and alerting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Fix Grafana datasource provisioning by switching to direct ConfigMap mountingPaul Buetow
After extensive debugging (documented in problem.md), resolved the issue where Tempo and Loki datasources would not appear in Grafana despite correct configuration. Root Cause: - Sidecar-based provisioning with label discovery was not triggering the provisioner module - Multi-step indirection (sidecar → watch → write → reload) had silent failures Solution (following x-rag pattern): - Disabled sidecar datasource provisioning - Created unified grafana-datasources-all.yaml with all datasources - Mount ConfigMap directly to /etc/grafana/provisioning/datasources/ - Grafana now reads datasources on startup via built-in provisioning Changes: - NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo) - MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts - MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script - MODIFIED: README.md - Documented new provisioning approach - NEW: problem.md - Complete debugging journey with 16 attempts documented - DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history) Result: ✅ All datasources now successfully provision on Grafana startup ✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation ✅ Loki datasource (uid=loki) appears in Grafana ✅ Simple, maintainable approach without sidecar complexity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Add Grafana Tempo distributed tracing with demo applicationPaul Buetow
- Deploy Grafana Tempo in monolithic mode for distributed tracing - Configure Tempo with OTLP receivers (gRPC:4317, HTTP:4318) - Set up 10Gi filesystem storage with 7-day retention - Integrate Tempo datasource in Grafana with traces-to-logs and traces-to-metrics correlation - Update Grafana Alloy to collect and forward traces - Add OTLP receiver configuration to alloy-values.yaml - Configure batch processor for efficient trace forwarding to Tempo - Patch Alloy service to expose OTLP ports 4317/4318 - Create demo tracing application (frontend, middleware, backend) - Implement three-tier Python Flask application with OpenTelemetry instrumentation - Auto-instrument with OpenTelemetry for Flask and requests libraries - Push Docker images to private registry (registry.lan.buetow.org:30001) - Deploy via Helm chart with Traefik ingress at tracing-demo.f3s.buetow.org - Update Grafana configuration in prometheus/persistence-values.yaml - Add Tempo to additionalDataSources for automatic provisioning Files added: - tempo/values.yaml: Tempo Helm chart configuration - tempo/persistent-volumes.yaml: Storage configuration (10Gi PV/PVC) - tempo/datasource-configmap.yaml: Grafana datasource with correlations - tempo/Justfile: Installation automation - tempo/README.md: Documentation - tracing-demo/docker/frontend/: Python Flask frontend with OTel - tracing-demo/docker/middleware/: Python Flask middleware with OTel - tracing-demo/docker/backend/: Python Flask backend with OTel - tracing-demo/helm-chart/: Kubernetes deployments, services, ingress - tracing-demo/docker-image-Justfile: Docker build/push automation - tracing-demo/Justfile: Helm deployment automation - tracing-demo/README.md: Documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Add comprehensive ZFS monitoring for FreeBSD serversPaul Buetow
Implemented complete ZFS monitoring solution including ARC cache statistics, pool health/capacity metrics, dataset usage, and I/O throughput monitoring. Changes: - Add ZFS recording rules (9 calculated metrics for ARC hit rates, memory usage, etc.) - Add comprehensive Grafana dashboard with 19 panels across 5 rows: * Pool Overview: capacity, health, size, free space, usage trends * I/O Throughput: read/write operations and bytes per second * Dataset Statistics: table showing all datasets with usage details * ARC Cache Statistics: hit rates, size, memory usage * ARC Breakdown: data vs metadata, MRU vs MFU with pie charts - Update Justfile to deploy ZFS recording rules - Add textfile collector script on FreeBSD servers (f0, f1, f2) for pool/dataset metrics Metrics collected: - Pool: size, allocated, free, capacity %, health status - I/O: read/write operations and throughput (via zpool iostat) - Dataset: used, available, referenced space per filesystem - ARC: hit rate, size, memory usage, data/metadata breakdown Fixes: - Pool health panel properly displays ONLINE/DEGRADED/FAULTED status - All stat panels have correct options configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26fixPaul Buetow
2025-12-26joPaul Buetow
2025-12-26fixPaul Buetow
2025-12-25observability: enable etcd metrics scrapingPaul Buetow
- Enable etcd metrics on port 2381 - Add blog post draft documenting the changes
2025-12-25revert: undo all observability changes from todayPaul Buetow
Reverts hostname relabeling and etcd metrics changes
2025-12-25observability: node-exporter hostnames + etcd metricsPaul Buetow
- Add relabel_configs to show hostnames for node-exporter targets - Enable etcd metrics scraping on port 2381 - Update blog post draft
2025-12-25observability: display hostnames instead of IPs, enable etcd metricsPaul Buetow
- Add relabel_configs to additional-scrape-configs.yaml for FreeBSD/OpenBSD hosts - Add node name relabeling for node-exporter on k3s nodes - Enable etcd metrics scraping with hostname relabeling - Add DRAFT blog post documenting the changes Amp-Thread-ID: https://ampcode.com/threads/T-019b571c-4afc-7789-becf-bc8a3c4e1e1f Co-authored-by: Amp <amp@ampcode.com>
2025-12-25use hosts not IPsPaul Buetow
2025-12-07add openbsd routing rulesPaul Buetow
2025-12-06add openbsd node exportersPaul Buetow
2025-12-06add morePaul Buetow
2025-12-06more on thisPaul Buetow
2025-10-24fixPaul Buetow
2025-10-24add persistent volumes to prometheus/grafanaPaul Buetow