| Age | Commit message (Collapse) | Author |
|
Remove unnecessary quotes and comments from the Tempo datasource ConfigMap.
This file is now deprecated in favor of the unified grafana-datasources-all.yaml
approach, but keeping it cleaned up for historical reference.
Changes:
- Remove quotes from string values (datasourceUid, spanStartTimeShift, etc.)
- Remove inline comments
- Format tags array properly
- Standardize YAML formatting
Note: This ConfigMap is no longer used. Datasources are now provisioned via
direct ConfigMap mounting using grafana-datasources-all.yaml.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Problem:
- Only health check traces appeared in Tempo
- API endpoint traces (/api/process) were not visible
- Alloy OTLP receivers were not listening (needed restart)
Root Causes:
1. Health check endpoints were creating massive trace volume from Kubernetes probes
2. Batch processor (100 spans) was filling with health checks before API traces could export
3. Alloy DaemonSet needed restart to activate OTLP receivers after configuration update
Solution:
1. Restarted Alloy to activate OTLP gRPC (4317) and HTTP (4318) receivers
2. Excluded /health endpoint from Flask auto-instrumentation in all three services:
- frontend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health")
- middleware: FlaskInstrumentor().instrument_app(app, excluded_urls="/health")
- backend: FlaskInstrumentor().instrument_app(app, excluded_urls="/health")
Result:
✅ Distributed traces now visible in Tempo with full span chains
✅ Single /api/process request creates 8 spans across 3 services:
- Frontend: GET /api/process, frontend-process, POST (200ms)
- Middleware: POST /api/transform, middleware-transform, GET (180ms)
- Backend: GET /api/data, backend-get-data (100ms)
✅ Complete request flow traced: frontend → middleware → backend
✅ Node graph will now show service dependencies
✅ Traces-to-logs and traces-to-metrics correlation enabled
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
After extensive debugging (documented in problem.md), resolved the issue where Tempo
and Loki datasources would not appear in Grafana despite correct configuration.
Root Cause:
- Sidecar-based provisioning with label discovery was not triggering the provisioner module
- Multi-step indirection (sidecar → watch → write → reload) had silent failures
Solution (following x-rag pattern):
- Disabled sidecar datasource provisioning
- Created unified grafana-datasources-all.yaml with all datasources
- Mount ConfigMap directly to /etc/grafana/provisioning/datasources/
- Grafana now reads datasources on startup via built-in provisioning
Changes:
- NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo)
- MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts
- MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script
- MODIFIED: README.md - Documented new provisioning approach
- NEW: problem.md - Complete debugging journey with 16 attempts documented
- DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history)
Result:
✅ All datasources now successfully provision on Grafana startup
✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation
✅ Loki datasource (uid=loki) appears in Grafana
✅ Simple, maintainable approach without sidecar complexity
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Deploy Grafana Tempo in monolithic mode for distributed tracing
- Configure Tempo with OTLP receivers (gRPC:4317, HTTP:4318)
- Set up 10Gi filesystem storage with 7-day retention
- Integrate Tempo datasource in Grafana with traces-to-logs and traces-to-metrics correlation
- Update Grafana Alloy to collect and forward traces
- Add OTLP receiver configuration to alloy-values.yaml
- Configure batch processor for efficient trace forwarding to Tempo
- Patch Alloy service to expose OTLP ports 4317/4318
- Create demo tracing application (frontend, middleware, backend)
- Implement three-tier Python Flask application with OpenTelemetry instrumentation
- Auto-instrument with OpenTelemetry for Flask and requests libraries
- Push Docker images to private registry (registry.lan.buetow.org:30001)
- Deploy via Helm chart with Traefik ingress at tracing-demo.f3s.buetow.org
- Update Grafana configuration in prometheus/persistence-values.yaml
- Add Tempo to additionalDataSources for automatic provisioning
Files added:
- tempo/values.yaml: Tempo Helm chart configuration
- tempo/persistent-volumes.yaml: Storage configuration (10Gi PV/PVC)
- tempo/datasource-configmap.yaml: Grafana datasource with correlations
- tempo/Justfile: Installation automation
- tempo/README.md: Documentation
- tracing-demo/docker/frontend/: Python Flask frontend with OTel
- tracing-demo/docker/middleware/: Python Flask middleware with OTel
- tracing-demo/docker/backend/: Python Flask backend with OTel
- tracing-demo/helm-chart/: Kubernetes deployments, services, ingress
- tracing-demo/docker-image-Justfile: Docker build/push automation
- tracing-demo/Justfile: Helm deployment automation
- tracing-demo/README.md: Documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Implemented complete ZFS monitoring solution including ARC cache statistics,
pool health/capacity metrics, dataset usage, and I/O throughput monitoring.
Changes:
- Add ZFS recording rules (9 calculated metrics for ARC hit rates, memory usage, etc.)
- Add comprehensive Grafana dashboard with 19 panels across 5 rows:
* Pool Overview: capacity, health, size, free space, usage trends
* I/O Throughput: read/write operations and bytes per second
* Dataset Statistics: table showing all datasets with usage details
* ARC Cache Statistics: hit rates, size, memory usage
* ARC Breakdown: data vs metadata, MRU vs MFU with pie charts
- Update Justfile to deploy ZFS recording rules
- Add textfile collector script on FreeBSD servers (f0, f1, f2) for pool/dataset metrics
Metrics collected:
- Pool: size, allocated, free, capacity %, health status
- I/O: read/write operations and throughput (via zpool iostat)
- Dataset: used, available, referenced space per filesystem
- ARC: hit rate, size, memory usage, data/metadata breakdown
Fixes:
- Pool health panel properly displays ONLINE/DEGRADED/FAULTED status
- All stat panels have correct options configuration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Enable etcd metrics on port 2381
- Add blog post draft documenting the changes
|
|
Reverts hostname relabeling and etcd metrics changes
|
|
- Add relabel_configs to show hostnames for node-exporter targets
- Enable etcd metrics scraping on port 2381
- Update blog post draft
|
|
- Add relabel_configs to additional-scrape-configs.yaml for FreeBSD/OpenBSD hosts
- Add node name relabeling for node-exporter on k3s nodes
- Enable etcd metrics scraping with hostname relabeling
- Add DRAFT blog post documenting the changes
Amp-Thread-ID: https://ampcode.com/threads/T-019b571c-4afc-7789-becf-bc8a3c4e1e1f
Co-authored-by: Amp <amp@ampcode.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amp-Thread-ID: https://ampcode.com/threads/T-ccf9cd44-5adf-4633-9f3d-d822f733af4d
Co-authored-by: Amp <amp@ampcode.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|