summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-12-28Clean up Tempo datasource ConfigMap formattingPaul Buetow
Remove unnecessary quotes and comments from the Tempo datasource ConfigMap. This file is now deprecated in favor of the unified grafana-datasources-all.yaml approach, but keeping it cleaned up for historical reference. Changes: - Remove quotes from string values (datasourceUid, spanStartTimeShift, etc.) - Remove inline comments - Format tags array properly - Standardize YAML formatting Note: This ConfigMap is no longer used. Datasources are now provisioned via direct ConfigMap mounting using grafana-datasources-all.yaml. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Fix distributed tracing by excluding health checks from instrumentationPaul Buetow
Problem: - Only health check traces appeared in Tempo - API endpoint traces (/api/process) were not visible - Alloy OTLP receivers were not listening (needed restart) Root Causes: 1. Health check endpoints were creating massive trace volume from Kubernetes probes 2. Batch processor (100 spans) was filling with health checks before API traces could export 3. Alloy DaemonSet needed restart to activate OTLP receivers after configuration update Solution: 1. Restarted Alloy to activate OTLP gRPC (4317) and HTTP (4318) receivers 2. Excluded /health endpoint from Flask auto-instrumentation in all three services: - frontend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") - middleware: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") - backend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health") Result: ✅ Distributed traces now visible in Tempo with full span chains ✅ Single /api/process request creates 8 spans across 3 services: - Frontend: GET /api/process, frontend-process, POST (200ms) - Middleware: POST /api/transform, middleware-transform, GET (180ms) - Backend: GET /api/data, backend-get-data (100ms) ✅ Complete request flow traced: frontend → middleware → backend ✅ Node graph will now show service dependencies ✅ Traces-to-logs and traces-to-metrics correlation enabled 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Fix Grafana datasource provisioning by switching to direct ConfigMap mountingPaul Buetow
After extensive debugging (documented in problem.md), resolved the issue where Tempo and Loki datasources would not appear in Grafana despite correct configuration. Root Cause: - Sidecar-based provisioning with label discovery was not triggering the provisioner module - Multi-step indirection (sidecar → watch → write → reload) had silent failures Solution (following x-rag pattern): - Disabled sidecar datasource provisioning - Created unified grafana-datasources-all.yaml with all datasources - Mount ConfigMap directly to /etc/grafana/provisioning/datasources/ - Grafana now reads datasources on startup via built-in provisioning Changes: - NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo) - MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts - MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script - MODIFIED: README.md - Documented new provisioning approach - NEW: problem.md - Complete debugging journey with 16 attempts documented - DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history) Result: ✅ All datasources now successfully provision on Grafana startup ✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation ✅ Loki datasource (uid=loki) appears in Grafana ✅ Simple, maintainable approach without sidecar complexity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Add Grafana Tempo distributed tracing with demo applicationPaul Buetow
- Deploy Grafana Tempo in monolithic mode for distributed tracing - Configure Tempo with OTLP receivers (gRPC:4317, HTTP:4318) - Set up 10Gi filesystem storage with 7-day retention - Integrate Tempo datasource in Grafana with traces-to-logs and traces-to-metrics correlation - Update Grafana Alloy to collect and forward traces - Add OTLP receiver configuration to alloy-values.yaml - Configure batch processor for efficient trace forwarding to Tempo - Patch Alloy service to expose OTLP ports 4317/4318 - Create demo tracing application (frontend, middleware, backend) - Implement three-tier Python Flask application with OpenTelemetry instrumentation - Auto-instrument with OpenTelemetry for Flask and requests libraries - Push Docker images to private registry (registry.lan.buetow.org:30001) - Deploy via Helm chart with Traefik ingress at tracing-demo.f3s.buetow.org - Update Grafana configuration in prometheus/persistence-values.yaml - Add Tempo to additionalDataSources for automatic provisioning Files added: - tempo/values.yaml: Tempo Helm chart configuration - tempo/persistent-volumes.yaml: Storage configuration (10Gi PV/PVC) - tempo/datasource-configmap.yaml: Grafana datasource with correlations - tempo/Justfile: Installation automation - tempo/README.md: Documentation - tracing-demo/docker/frontend/: Python Flask frontend with OTel - tracing-demo/docker/middleware/: Python Flask middleware with OTel - tracing-demo/docker/backend/: Python Flask backend with OTel - tracing-demo/helm-chart/: Kubernetes deployments, services, ingress - tracing-demo/docker-image-Justfile: Docker build/push automation - tracing-demo/Justfile: Helm deployment automation - tracing-demo/README.md: Documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-28Add comprehensive ZFS monitoring for FreeBSD serversPaul Buetow
Implemented complete ZFS monitoring solution including ARC cache statistics, pool health/capacity metrics, dataset usage, and I/O throughput monitoring. Changes: - Add ZFS recording rules (9 calculated metrics for ARC hit rates, memory usage, etc.) - Add comprehensive Grafana dashboard with 19 panels across 5 rows: * Pool Overview: capacity, health, size, free space, usage trends * I/O Throughput: read/write operations and bytes per second * Dataset Statistics: table showing all datasets with usage details * ARC Cache Statistics: hit rates, size, memory usage * ARC Breakdown: data vs metadata, MRU vs MFU with pie charts - Update Justfile to deploy ZFS recording rules - Add textfile collector script on FreeBSD servers (f0, f1, f2) for pool/dataset metrics Metrics collected: - Pool: size, allocated, free, capacity %, health status - I/O: read/write operations and throughput (via zpool iostat) - Dataset: used, available, referenced space per filesystem - ARC: hit rate, size, memory usage, data/metadata breakdown Fixes: - Pool health panel properly displays ONLINE/DEGRADED/FAULTED status - All stat panels have correct options configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-26add webdavPaul Buetow
2025-12-26movePaul Buetow
2025-12-26fixPaul Buetow
2025-12-26joPaul Buetow
2025-12-26fixPaul Buetow
2025-12-26delete filrerisePaul Buetow
2025-12-25observability: enable etcd metrics scrapingPaul Buetow
- Enable etcd metrics on port 2381 - Add blog post draft documenting the changes
2025-12-25revert: undo all observability changes from todayPaul Buetow
Reverts hostname relabeling and etcd metrics changes
2025-12-25observability: node-exporter hostnames + etcd metricsPaul Buetow
- Add relabel_configs to show hostnames for node-exporter targets - Enable etcd metrics scraping on port 2381 - Update blog post draft
2025-12-25observability: display hostnames instead of IPs, enable etcd metricsPaul Buetow
- Add relabel_configs to additional-scrape-configs.yaml for FreeBSD/OpenBSD hosts - Add node name relabeling for node-exporter on k3s nodes - Enable etcd metrics scraping with hostname relabeling - Add DRAFT blog post documenting the changes Amp-Thread-ID: https://ampcode.com/threads/T-019b571c-4afc-7789-becf-bc8a3c4e1e1f Co-authored-by: Amp <amp@ampcode.com>
2025-12-25use hosts not IPsPaul Buetow
2025-12-07add openbsd routing rulesPaul Buetow
2025-12-06add openbsd node exportersPaul Buetow
2025-12-06add morePaul Buetow
2025-12-06more on thisPaul Buetow
2025-12-05Fix Loki to use NFS persistent volumePaul Buetow
2025-12-05Add Grafana Loki with Alloy for log collectionPaul Buetow
2025-12-05Fix Loki URL in READMEPaul Buetow
2025-12-05Add Grafana Loki deploymentPaul Buetow
2025-12-05Add keybr.com typing tutor deploymentPaul Buetow
Amp-Thread-ID: https://ampcode.com/threads/T-ccf9cd44-5adf-4633-9f3d-d822f733af4d Co-authored-by: Amp <amp@ampcode.com>
2025-12-05add keybr.comPaul Buetow
2025-12-03add htmlPaul Buetow
2025-12-03initial f3s fallbackPaul Buetow
2025-11-22add filebrowserPaul Buetow
2025-11-21works nowPaul Buetow
2025-11-21initial filerisePaul Buetow
2025-11-07UpdatePaul Buetow
2025-11-04remove fotos.buetow.orgPaul Buetow
2025-11-02use www.* as alt name in certsPaul Buetow
2025-10-27use new gogiosPaul Buetow
2025-10-27change to directoryPaul Buetow
2025-10-26odd bug workaroundPaul Buetow
2025-10-24fixPaul Buetow
2025-10-24add persistent volumes to prometheus/grafanaPaul Buetow
2025-10-22add grafana ingressPaul Buetow
2025-10-22add prometheusPaul Buetow
2025-10-18added koreader-sync-serverPaul Buetow
2025-10-08also redirect stderrPaul Buetow
2025-09-27more in thisPaul Buetow
2025-09-24UpdatePaul Buetow
2025-09-23new foooddsPaul Buetow
2025-09-14fixPaul Buetow
2025-09-14addPaul Buetow
2025-09-13rename to exPaul Buetow
2025-09-13add xxx mail domainPaul Buetow