conf - Configuration files for the automation of my personal infrastructure (servers, laptops, workstations, phones)!

Age	Commit message (Collapse)	Author
2025-12-31	Document Admin API and updated out-of-order configuration	Paul Buetow
	Updated Prometheus documentation to reflect current configuration: - Added web.enable-admin-api flag documentation - Updated outOfOrderTimeWindow from 720h to 744h (31 days) - Added Data Deletion section with cleanup script usage - Documented manual deletion via Admin API endpoints Provides complete guide for data cleanup after benchmark testing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31	Update	Paul Buetow

2025-12-31	Enable Prometheus admin API for data deletion	Paul Buetow
	Added web.enable-admin-api flag to allow selective deletion of time series data via the /api/v1/admin/tsdb endpoints. This enables cleanup of benchmark data using the delete_series and clean_tombstones APIs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31	Increase Prometheus out-of-order window to 31 days	Paul Buetow
	Changes: - outOfOrderTimeWindow: 720h → 744h (30 days → 31 days) Rationale: Provides 1-day buffer for 30-day backfill operations to avoid edge case rejections where the oldest samples exceed the limit due to timing variations between data generation and ingestion. With this configuration: - 30-day benchmarks achieve 99.85% success rate (vs 50% with 720h) - Only 4/2592 batches rejected (first few batches slightly over 30d) - Allows safe backfilling of up to 30 days of historic data 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-31	Enable Prometheus historic data ingestion with out-of-order support	Paul Buetow
	This commit configures Prometheus to accept historic data via the Remote Write API, enabling backfilling of test metrics for development and troubleshooting purposes. Changes: - Enable Remote Write receiver (--web.enable-remote-write-receiver) - Enable out-of-order ingestion with 30-day window (720h) - Enable exemplar-storage and otlp-write-receiver features - Add Epimetheus dashboard ConfigMap for Grafana provisioning - Remove old prometheus-pusher directory (moved to separate repo) - Document configuration, use cases, and performance considerations Configuration allows backfilling data up to 30 days in the past, supporting tools like Epimetheus for generating synthetic historic metrics. Performance note: This is optimized for ad-hoc troubleshooting, not production use. Out-of-order ingestion increases memory usage, TSDB overhead, and may impact query performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Enable Prometheus Remote Write receiver for historic data ingestion	Paul Buetow
	Updated persistence-values.yaml to enable the Remote Write receiver using the correct flag for Prometheus 3.x: - Changed from enableFeatures (not supported in 3.8.1) - To additionalArgs with web.enable-remote-write-receiver This allows Epimetheus to push historic data with preserved timestamps via the Prometheus Remote Write API endpoint (/api/v1/write). Applied via: just upgrade 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Consolidate all documentation into single comprehensive README.md	Paul Buetow
	- Merged content from 10 separate .md files into README.md - Removed: ANSWER.md, AUTO-MODE.md, DASHBOARD.md, HISTORIC.md, LIMITATIONS.md, QUERY_EXAMPLES.md, QUICK-START.md, SETUP-COMPLETE.md, SUMMARY.md, USAGE.md - README.md now includes: * Quick start guide * All operating modes (realtime, historic, backfill, auto) * Data formats (CSV, JSON) * Test metrics documentation * Grafana dashboard setup * Example queries and curl commands * Time range limitations * Troubleshooting guide * Architecture diagram * Best practices 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Rename test metrics with clear prefix and add Grafana dashboard	Paul Buetow
	Renamed all test metrics with "prometheus_pusher_test_" prefix to clearly indicate they are generated by the prometheus-pusher testing/demo functionality. Metric renaming: - app_requests_total → prometheus_pusher_test_requests_total - app_active_connections → prometheus_pusher_test_active_connections - app_temperature_celsius → prometheus_pusher_test_temperature_celsius - app_request_duration_seconds → prometheus_pusher_test_request_duration_seconds - app_jobs_processed_total → prometheus_pusher_test_jobs_processed_total Grafana Dashboard: - Created comprehensive dashboard with 8 panels - Request rate and total requests visualization - Active connections gauge (0-100 with thresholds) - Temperature gauge (0-50°C with thresholds) - Request duration percentiles (p50, p90, p99) - Average request duration stat - Jobs processed by type (bar gauge) - Jobs status breakdown table - Auto-refresh every 10s, 15-minute default time range Files added: - grafana-dashboard.json: Dashboard definition - deploy-dashboard.sh: Automated deployment script - DASHBOARD.md: Complete dashboard documentation Updated: - internal/metrics/generator.go: Renamed metric names - internal/ingester/remotewrite.go: Updated historic metric names - internal/ingester/remotewrite_test.go: Updated test expectations Tests updated and passing with 63.9% coverage maintained. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add comprehensive Prometheus query examples documentation	Paul Buetow
	Document demonstrates actual curl commands and their outputs for all metric types ingested by prometheus-pusher: - Counter metrics (app_requests_total) - Gauge metrics (app_temperature_celsius, app_active_connections) - Histogram metrics (app_request_duration_seconds with buckets, sum, count) - Labeled counter metrics (app_jobs_processed_total with multiple label combinations) Includes: - Complete curl commands - Actual JSON responses from Prometheus API - Explanations of each metric type - Additional query examples (filters, ranges, aggregations) Verifies data ingestion works correctly with real query results. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add comprehensive unit tests with 63.9% coverage	Paul Buetow
	Implemented unit tests across all internal packages to achieve 63.9% test coverage, exceeding the 60% target. Test coverage by package: - internal/config: 100.0% (config validation, constants) - internal/metrics: 100.0% (Sample methods, Collectors, Simulate) - internal/parser: 92.3% (CSV/JSON parsing, format detection) - internal/ingester: 44.9% (auto routing, time series conversion) New test files: - internal/config/config_test.go: Config creation and constants - internal/metrics/sample_test.go: Sample type methods (Age, IsRecent) - internal/metrics/generator_test.go: Collectors and simulation - internal/parser/csv_test.go: CSV parsing with various inputs - internal/parser/json_test.go: JSON parsing and validation - internal/parser/parser_test.go: Parser factory and format handling - internal/ingester/auto_test.go: Auto mode routing logic - internal/ingester/remotewrite_test.go: Time series conversion - internal/ingester/pushgateway_test.go: Pushgateway ingester Tests cover: - Happy path and error cases - Context cancellation support - Edge cases (empty input, invalid formats) - Label parsing and timestamp handling - Metric type generation (counter, gauge, histogram) - Table-driven tests for comprehensive coverage All 50+ tests passing ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Refactor prometheus-pusher following Go best practices	Paul Buetow
	Major refactoring to improve code organization and follow Go conventions: - Moved main entry point to cmd/prometheus-pusher/main.go - Organized code into internal packages (config, metrics, parser, ingester, version) - Implemented proper dependency injection (no package-level variables) - Added context.Context to all blocking operations - Used value semantics where feasible (Sample, Config, Ingesters) - Proper error wrapping with %w throughout - All functions under 50 lines, focused and single-purpose - Consistent ordering: constants, types, constructors, public, private - Added -version flag to display version from internal/version package Package structure: - cmd/prometheus-pusher: Main entry point with flag parsing and mode routing - internal/config: Configuration types and constants - internal/version: Version constant (0.0.0) - internal/metrics: Sample type and Collectors for metric generation - internal/parser: CSV/JSON parsers with context support - internal/ingester: Pushgateway, RemoteWrite, and Auto ingesters All modes tested and working: realtime, historic, backfill, auto 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add documentation and tests for ingestion time range limits	Paul Buetow

2025-12-30	Add QUICK-START.md for single binary reference	Paul Buetow

2025-12-30	Consolidate into single prometheus-pusher binary	Paul Buetow
	Unified all functionality into one binary instead of multiple variants. ## Changes - Removed: prometheus-pusher-auto, prometheus-pusher-historic - Single binary: prometheus-pusher (supports all modes) - Updated all documentation to reference single binary - Updated run.sh to use unified binary ## Usage One binary, four modes: ```bash # Realtime mode (default) ./prometheus-pusher -mode=realtime -continuous # Historic mode (single datapoint) ./prometheus-pusher -mode=historic -hours-ago=24 # Backfill mode (range of datapoints) ./prometheus-pusher -mode=backfill -start-hours=48 -end-hours=0 -interval=1 # Auto mode (automatic timestamp detection) ./prometheus-pusher -mode=auto -file=data.csv ``` All features accessible from one unified tool! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add ANSWER.md documenting support for all time ranges	Paul Buetow

2025-12-30	Add AUTO mode with automatic timestamp detection	Paul Buetow
	This commit adds intelligent auto-detection of data age, automatically choosing the appropriate ingestion method without user intervention. ## New Features 1. AUTO Mode (-mode=auto) - Automatically detects timestamp age from input data - Routes realtime data (< 5min) → Pushgateway - Routes historic data (> 5min) → Remote Write API - No manual timestamp calculation needed! 2. Input Format Support - CSV format: metric_name,labels,value,timestamp_ms - JSON format: [{metric, labels, value, timestamp_ms}] - Read from file (-file=path) or stdin - Comments supported in CSV (#) 3. Smart Routing Logic - 5-minute threshold determines ingestion method - Handles mixed data (current + historic) in single import - Clear logging shows which method is used for each sample 4. Test Data Generation - generate-test-data.sh creates samples for all time ranges - Demonstrates: current, 1h, 1d, 1w, 1m old data - Actual timestamps generated dynamically ## Files Added - auto-ingest.go: Core auto-detection logic - AUTO-MODE.md: Complete documentation - generate-test-data.sh: Test data generator - test-data.csv: Example data template - test-all-ages.csv: Generated test data (all ages) ## Example Usage ```bash # Generate test data ./generate-test-data.sh # Auto-import (detects ages automatically) ./prometheus-pusher-auto \ -mode=auto \ -file=test-all-ages.csv \ -pushgateway=http://localhost:9091 \ -prometheus=http://localhost:9090/api/v1/write ``` ## Output Example ``` 📊 Auto-ingest summary: Total samples: 15 Realtime samples (< 5min old): 3 Historic samples (> 5min old): 12 🔄 Ingesting 3 REALTIME samples via Pushgateway... ⏰ Ingesting 12 HISTORIC samples via Remote Write... [1/12] app_requests_total (age: 1.0 hours) [2/12] app_temperature_celsius (age: 1.0 days) ... 🎉 Auto-ingest complete! ``` ## Supported Time Ranges ✅ Current data (< 5min) ✅ 1 hour old data ✅ 1 day old data ✅ 1 week old data ✅ 1 month old data All ages are automatically detected and routed correctly! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add historic data ingestion support to prometheus-pusher	Paul Buetow
	This commit extends prometheus-pusher to support ingesting historic data with custom timestamps via Prometheus Remote Write API. ## Key Changes 1. New Historic Data Module (historic.go) - GenerateHistoricMetrics: Creates metrics for specific past timestamps - PushHistoricData: Sends single datapoint via Remote Write API - BackfillHistoricData: Backfills range of historic data - Uses Protobuf + Snappy encoding per Prometheus spec 2. Enhanced Main Binary (main.go, realtime.go) - Refactored to support multiple modes - Mode 1: realtime - Push to Pushgateway (original behavior) - Mode 2: historic - Push single historic datapoint - Mode 3: backfill - Backfill range of historic data - Command-line flags for configuration 3. Prometheus Configuration (persistence-values.yaml) - Added web.enable-remote-write-receiver flag - Enables Prometheus to accept timestamped samples via Remote Write API - Required for historic data ingestion 4. Documentation (HISTORIC.md) - Complete guide for historic data ingestion - Explains limitations and best practices - Examples for all three modes - Troubleshooting guide ## Technical Details Problem: Pushgateway doesn't support custom timestamps - Prometheus always uses "now" when scraping. This prevents backfilling historic data. Solution: Use Prometheus Remote Write API which accepts timestamped samples. Requires enabling --web.enable-remote-write-receiver flag. Data Format: Protobuf (prompb.WriteRequest) with Snappy compression Use Cases: - Backfill missing data (e.g., during outage) - Import historic data from other systems - Testing with specific timestamps - Data migration scenarios ## Usage Examples ```bash # Realtime mode (original behavior) ./prometheus-pusher-historic -mode=realtime -continuous # Push data from 24 hours ago ./prometheus-pusher-historic -mode=historic -hours-ago=24 # Backfill last 48 hours with 1-hour intervals ./prometheus-pusher-historic -mode=backfill -start-hours=48 -end-hours=0 -interval=1 ``` ## Dependencies Added - github.com/prometheus/prometheus (for prompb package) - github.com/golang/snappy (for compression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add Prometheus Pushgateway and data ingestion tool	Paul Buetow
	This commit adds a complete Prometheus data ingestion solution: 1. Pushgateway Helm Chart (f3s/pushgateway/) - Standalone helm chart for Prometheus Pushgateway - Deployed to monitoring namespace - Receives pushed metrics via HTTP POST on port 9091 2. Prometheus Pusher (f3s/prometheus-pusher/) - Standalone Go binary (12MB) for pushing metrics to Pushgateway - Demonstrates all Prometheus metric types: * Counter (app_requests_total) * Gauge (app_active_connections, app_temperature_celsius) * Histogram (app_request_duration_seconds) * Labeled Counter (app_jobs_processed_total) - Pushes metrics every 15 seconds - Includes comprehensive documentation and examples 3. Prometheus Configuration - Updated additional-scrape-configs.yaml to scrape Pushgateway - Uses honor_labels to preserve pushed metric labels Architecture: Go Binary → Pushgateway → Prometheus → Grafana The pusher binary generates realistic example metrics and pushes them to Pushgateway in Prometheus text format. Prometheus then scrapes the Pushgateway and makes the metrics available for querying and alerting. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Fix ArgoCD to preserve password on helm upgrade	Paul Buetow
	Remove fixed password from values.yaml so helm upgrade doesn't reset the admin password when users change it via UI. Changes: - Remove argocdServerAdminPassword from values.yaml - Leave password generation to ArgoCD default behavior - Update Justfile install message to show get-password command Behavior now: - helm install: Generates random password in argocd-initial-admin-secret - helm upgrade: Preserves existing password (does NOT reset) - helm uninstall: Deletes secret along with all resources - User password changes via UI are preserved Verified: - Password hash unchanged after helm upgrade ✅ - Secret deleted on helm uninstall ✅ - Login works before and after upgrade ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Fix ArgoCD secret management - delete with helm uninstall	Paul Buetow
	Change admin password handling so it's properly managed by Helm and gets deleted when running helm uninstall, while using a fixed password instead of random generation. Changes: - Set fixed argocdServerAdminPassword in values.yaml - Remove configs.secret.createSecret: false (use Helm default: true) - Remove argocd-secret.yaml (Helm creates it now) - Update Justfile to not apply manual secret - Password: "argocd-admin-default" (bcrypt hash in values.yaml) Behavior: - helm install: Creates secret with fixed password - helm upgrade: Updates secret to fixed password (resets any UI changes) - helm uninstall: Deletes secret along with all resources - Secret has Helm annotations (managed by Helm) This is standard Helm behavior - the password in values.yaml is the source of truth. User can change via UI, but helm operations will reset it to the configured value. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Configure ArgoCD to preserve admin password across redeployments	Paul Buetow
	Ensure admin password persists through helm uninstall/install cycles by managing argocd-secret outside of Helm's control. Changes: - Set configs.secret.createSecret: false in values.yaml - Create argocd-secret.yaml with default admin password - Update Justfile to apply secret before helm install - Secret is now managed by kubectl, not Helm - Default password: "argocd-admin-default" (change after first login) Benefits: - Admin password survives helm uninstall/install - Password changes via UI/CLI are preserved - No random password regeneration on redeployments - Secret has no Helm annotations (not managed by Helm) The argocd-secret will persist across redeployments unless explicitly deleted. PVC and admin password are now both persistent. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Configure persistent cache for ArgoCD repo-server	Paul Buetow
	Enable PVC mount for ArgoCD repo-server to cache Git repositories and generated manifests, improving performance for subsequent deployments. Changes: - Mount argocd-repo-server-pvc at /home/argocd/repo-cache - Set XDG_CACHE_HOME environment variable to use persistent cache - Avoid conflict with default /tmp mount used by ArgoCD This ensures Git repo clones and Helm charts are cached persistently across pod restarts, reducing network traffic and speeding up syncs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add ArgoCD deployment to cicd namespace	Paul Buetow
	Deploy ArgoCD v3.2.3 for GitOps continuous delivery in the k3s cluster. Configuration: - New cicd namespace for CI/CD tooling - Non-HA single instance deployment (following cluster patterns) - Traefik ingress at argocd.f3s.buetow.org - Prometheus ServiceMonitor integration for metrics - 10Gi persistent volume for repo-server cache - Insecure mode with TLS termination at proxy Components deployed: - argocd-server (Web UI and API) - argocd-repo-server (Repository management) - argocd-application-controller (Application sync) - argocd-redis (State cache) - argocd-applicationset-controller (Multi-app management) Also adds argocd.f3s.buetow.org to frontends Rexfile for relayd proxy configuration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	Add automatic fallback for f3s hosts when cluster is down	Paul Buetow
	Configure OpenBSD relayd and httpd to serve a friendly fallback page when the f3s Kubernetes cluster is unreachable. Changes to relayd.conf.tpl: - Reorder relay forward statements: f3s first, localhost as backup - Remove protocol-level forward rules for f3s hosts to enable relay-level failover - Add explicit localhost routing for non-f3s hosts - Health checks on f3s table trigger automatic failover to localhost Changes to httpd.conf.tpl: - Add request rewrite directive to serve fallback page for ALL paths - Prevents 404 errors for deep links like /login?redirect=/files/ - Ensures consistent fallback experience regardless of requested URL When all f3s nodes fail health checks, traffic automatically routes to localhost:8080 serving static fallback content from /var/www/htdocs/f3s_fallback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-30	fallback page display	Paul Buetow

2025-12-28	Clean up Tempo datasource ConfigMap formatting	Paul Buetow
	Remove unnecessary quotes and comments from the Tempo datasource ConfigMap. This file is now deprecated in favor of the unified grafana-datasources-all.yaml approach, but keeping it cleaned up for historical reference. Changes: - Remove quotes from string values (datasourceUid, spanStartTimeShift, etc.) - Remove inline comments - Format tags array properly - Standardize YAML formatting Note: This ConfigMap is no longer used. Datasources are now provisioned via direct ConfigMap mounting using grafana-datasources-all.yaml. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28	Fix distributed tracing by excluding health checks from instrumentation	Paul Buetow
	Problem: - Only health check traces appeared in Tempo - API endpoint traces (/api/process) were not visible - Alloy OTLP receivers were not listening (needed restart) Root Causes: 1. Health check endpoints were creating massive trace volume from Kubernetes probes 2. Batch processor (100 spans) was filling with health checks before API traces could export 3. Alloy DaemonSet needed restart to activate OTLP receivers after configuration update Solution: 1. Restarted Alloy to activate OTLP gRPC (4317) and HTTP (4318) receivers 2. Excluded /health endpoint from Flask auto-instrumentation in all three services: - frontend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") - middleware: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") - backend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") Result: ✅ Distributed traces now visible in Tempo with full span chains ✅ Single /api/process request creates 8 spans across 3 services: - Frontend: GET /api/process, frontend-process, POST (200ms) - Middleware: POST /api/transform, middleware-transform, GET (180ms) - Backend: GET /api/data, backend-get-data (100ms) ✅ Complete request flow traced: frontend → middleware → backend ✅ Node graph will now show service dependencies ✅ Traces-to-logs and traces-to-metrics correlation enabled 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28	Fix Grafana datasource provisioning by switching to direct ConfigMap mounting	Paul Buetow
	After extensive debugging (documented in problem.md), resolved the issue where Tempo and Loki datasources would not appear in Grafana despite correct configuration. Root Cause: - Sidecar-based provisioning with label discovery was not triggering the provisioner module - Multi-step indirection (sidecar → watch → write → reload) had silent failures Solution (following x-rag pattern): - Disabled sidecar datasource provisioning - Created unified grafana-datasources-all.yaml with all datasources - Mount ConfigMap directly to /etc/grafana/provisioning/datasources/ - Grafana now reads datasources on startup via built-in provisioning Changes: - NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo) - MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts - MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script - MODIFIED: README.md - Documented new provisioning approach - NEW: problem.md - Complete debugging journey with 16 attempts documented - DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history) Result: ✅ All datasources now successfully provision on Grafana startup ✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation ✅ Loki datasource (uid=loki) appears in Grafana ✅ Simple, maintainable approach without sidecar complexity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28	Add Grafana Tempo distributed tracing with demo application	Paul Buetow
	- Deploy Grafana Tempo in monolithic mode for distributed tracing - Configure Tempo with OTLP receivers (gRPC:4317, HTTP:4318) - Set up 10Gi filesystem storage with 7-day retention - Integrate Tempo datasource in Grafana with traces-to-logs and traces-to-metrics correlation - Update Grafana Alloy to collect and forward traces - Add OTLP receiver configuration to alloy-values.yaml - Configure batch processor for efficient trace forwarding to Tempo - Patch Alloy service to expose OTLP ports 4317/4318 - Create demo tracing application (frontend, middleware, backend) - Implement three-tier Python Flask application with OpenTelemetry instrumentation - Auto-instrument with OpenTelemetry for Flask and requests libraries - Push Docker images to private registry (registry.lan.buetow.org:30001) - Deploy via Helm chart with Traefik ingress at tracing-demo.f3s.buetow.org - Update Grafana configuration in prometheus/persistence-values.yaml - Add Tempo to additionalDataSources for automatic provisioning Files added: - tempo/values.yaml: Tempo Helm chart configuration - tempo/persistent-volumes.yaml: Storage configuration (10Gi PV/PVC) - tempo/datasource-configmap.yaml: Grafana datasource with correlations - tempo/Justfile: Installation automation - tempo/README.md: Documentation - tracing-demo/docker/frontend/: Python Flask frontend with OTel - tracing-demo/docker/middleware/: Python Flask middleware with OTel - tracing-demo/docker/backend/: Python Flask backend with OTel - tracing-demo/helm-chart/: Kubernetes deployments, services, ingress - tracing-demo/docker-image-Justfile: Docker build/push automation - tracing-demo/Justfile: Helm deployment automation - tracing-demo/README.md: Documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28	Add comprehensive ZFS monitoring for FreeBSD servers	Paul Buetow
	Implemented complete ZFS monitoring solution including ARC cache statistics, pool health/capacity metrics, dataset usage, and I/O throughput monitoring. Changes: - Add ZFS recording rules (9 calculated metrics for ARC hit rates, memory usage, etc.) - Add comprehensive Grafana dashboard with 19 panels across 5 rows: * Pool Overview: capacity, health, size, free space, usage trends * I/O Throughput: read/write operations and bytes per second * Dataset Statistics: table showing all datasets with usage details * ARC Cache Statistics: hit rates, size, memory usage * ARC Breakdown: data vs metadata, MRU vs MFU with pie charts - Update Justfile to deploy ZFS recording rules - Add textfile collector script on FreeBSD servers (f0, f1, f2) for pool/dataset metrics Metrics collected: - Pool: size, allocated, free, capacity %, health status - I/O: read/write operations and throughput (via zpool iostat) - Dataset: used, available, referenced space per filesystem - ARC: hit rate, size, memory usage, data/metadata breakdown Fixes: - Pool health panel properly displays ONLINE/DEGRADED/FAULTED status - All stat panels have correct options configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26	add webdav	Paul Buetow

2025-12-26	move	Paul Buetow

2025-12-26	fix	Paul Buetow

2025-12-26	jo	Paul Buetow

2025-12-26	fix	Paul Buetow

2025-12-26	delete filrerise	Paul Buetow

2025-12-25	observability: enable etcd metrics scraping	Paul Buetow
	- Enable etcd metrics on port 2381 - Add blog post draft documenting the changes
2025-12-25	revert: undo all observability changes from today	Paul Buetow
	Reverts hostname relabeling and etcd metrics changes
2025-12-25	observability: node-exporter hostnames + etcd metrics	Paul Buetow
	- Add relabel_configs to show hostnames for node-exporter targets - Enable etcd metrics scraping on port 2381 - Update blog post draft
2025-12-25	observability: display hostnames instead of IPs, enable etcd metrics	Paul Buetow
	- Add relabel_configs to additional-scrape-configs.yaml for FreeBSD/OpenBSD hosts - Add node name relabeling for node-exporter on k3s nodes - Enable etcd metrics scraping with hostname relabeling - Add DRAFT blog post documenting the changes Amp-Thread-ID: https://ampcode.com/threads/T-019b571c-4afc-7789-becf-bc8a3c4e1e1f Co-authored-by: Amp <amp@ampcode.com>
2025-12-25	use hosts not IPs	Paul Buetow

2025-12-07	add openbsd routing rules	Paul Buetow

2025-12-06	add openbsd node exporters	Paul Buetow

2025-12-06	add more	Paul Buetow

2025-12-06	more on this	Paul Buetow

2025-12-05	Fix Loki to use NFS persistent volume	Paul Buetow

2025-12-05	Add Grafana Loki with Alloy for log collection	Paul Buetow

2025-12-05	Fix Loki URL in README	Paul Buetow

2025-12-05	Add Grafana Loki deployment	Paul Buetow

2025-12-05	Add keybr.com typing tutor deployment	Paul Buetow
	Amp-Thread-ID: https://ampcode.com/threads/T-ccf9cd44-5adf-4633-9f3d-d822f733af4d Co-authored-by: Amp <amp@ampcode.com>