| Age | Commit message (Collapse) | Author |
|
Use extraHostVolumeMounts (prometheus-node-exporter sub-chart key for
host path mounts) instead of extraVolumes/extraVolumeMounts, which are
for general volumes. This correctly wires /var/lib/node_exporter/
textfile_collector into the container so the textfile arg takes effect.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
- check-nfs-mount.sh: write nfs_mount_monitor_consecutive_failures gauge
to /var/lib/node_exporter/textfile_collector/nfs_mount_monitor.prom on
every run (via write_textfile_metric helper, called from write_fail_count
and directly on healthy runs); atomic tmp+mv write prevents partial reads
- Rexfile: create /var/lib/node_exporter/textfile_collector dir on r-nodes
- prometheus.yaml (ArgoCD app): enable textfile_collector in node_exporter
DaemonSet via extraArgs/extraVolumes/extraVolumeMounts; mount host path
/var/lib/node_exporter/textfile_collector into container
- persistence-values.yaml: sync node_exporter textfile_collector config
- nfs-mount-monitor-alerts.yaml: PrometheusRule with two alerts:
NfsMountAutoRepairWarning (>= 3 consecutive failures, severity: warning)
NfsMountAutoRepairCritical (>= 5 consecutive failures, severity: critical)
wired into new 'nfs-alerts' Alertmanager receiver with 30m repeat_interval
Tested: rex deploy succeeded, .prom files present on r0/r1/r2, timer clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
|
|
Added web.enable-admin-api flag to allow selective deletion of time series data
via the /api/v1/admin/tsdb endpoints. This enables cleanup of benchmark data
using the delete_series and clean_tombstones APIs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Changes:
- outOfOrderTimeWindow: 720h → 744h (30 days → 31 days)
Rationale:
Provides 1-day buffer for 30-day backfill operations to avoid edge
case rejections where the oldest samples exceed the limit due to
timing variations between data generation and ingestion.
With this configuration:
- 30-day benchmarks achieve 99.85% success rate (vs 50% with 720h)
- Only 4/2592 batches rejected (first few batches slightly over 30d)
- Allows safe backfilling of up to 30 days of historic data
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
This commit configures Prometheus to accept historic data via the Remote
Write API, enabling backfilling of test metrics for development and
troubleshooting purposes.
Changes:
- Enable Remote Write receiver (--web.enable-remote-write-receiver)
- Enable out-of-order ingestion with 30-day window (720h)
- Enable exemplar-storage and otlp-write-receiver features
- Add Epimetheus dashboard ConfigMap for Grafana provisioning
- Remove old prometheus-pusher directory (moved to separate repo)
- Document configuration, use cases, and performance considerations
Configuration allows backfilling data up to 30 days in the past, supporting
tools like Epimetheus for generating synthetic historic metrics.
Performance note: This is optimized for ad-hoc troubleshooting, not
production use. Out-of-order ingestion increases memory usage, TSDB overhead,
and may impact query performance.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Updated persistence-values.yaml to enable the Remote Write receiver
using the correct flag for Prometheus 3.x:
- Changed from enableFeatures (not supported in 3.8.1)
- To additionalArgs with web.enable-remote-write-receiver
This allows Epimetheus to push historic data with preserved timestamps
via the Prometheus Remote Write API endpoint (/api/v1/write).
Applied via: just upgrade
🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
This commit adds intelligent auto-detection of data age, automatically
choosing the appropriate ingestion method without user intervention.
## New Features
1. **AUTO Mode** (-mode=auto)
- Automatically detects timestamp age from input data
- Routes realtime data (< 5min) → Pushgateway
- Routes historic data (> 5min) → Remote Write API
- No manual timestamp calculation needed!
2. **Input Format Support**
- CSV format: metric_name,labels,value,timestamp_ms
- JSON format: [{metric, labels, value, timestamp_ms}]
- Read from file (-file=path) or stdin
- Comments supported in CSV (#)
3. **Smart Routing Logic**
- 5-minute threshold determines ingestion method
- Handles mixed data (current + historic) in single import
- Clear logging shows which method is used for each sample
4. **Test Data Generation**
- generate-test-data.sh creates samples for all time ranges
- Demonstrates: current, 1h, 1d, 1w, 1m old data
- Actual timestamps generated dynamically
## Files Added
- auto-ingest.go: Core auto-detection logic
- AUTO-MODE.md: Complete documentation
- generate-test-data.sh: Test data generator
- test-data.csv: Example data template
- test-all-ages.csv: Generated test data (all ages)
## Example Usage
```bash
# Generate test data
./generate-test-data.sh
# Auto-import (detects ages automatically)
./prometheus-pusher-auto \
-mode=auto \
-file=test-all-ages.csv \
-pushgateway=http://localhost:9091 \
-prometheus=http://localhost:9090/api/v1/write
```
## Output Example
```
📊 Auto-ingest summary:
Total samples: 15
Realtime samples (< 5min old): 3
Historic samples (> 5min old): 12
🔄 Ingesting 3 REALTIME samples via Pushgateway...
⏰ Ingesting 12 HISTORIC samples via Remote Write...
[1/12] app_requests_total (age: 1.0 hours)
[2/12] app_temperature_celsius (age: 1.0 days)
...
🎉 Auto-ingest complete!
```
## Supported Time Ranges
✅ Current data (< 5min)
✅ 1 hour old data
✅ 1 day old data
✅ 1 week old data
✅ 1 month old data
All ages are automatically detected and routed correctly!
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
This commit extends prometheus-pusher to support ingesting historic data
with custom timestamps via Prometheus Remote Write API.
## Key Changes
1. New Historic Data Module (historic.go)
- GenerateHistoricMetrics: Creates metrics for specific past timestamps
- PushHistoricData: Sends single datapoint via Remote Write API
- BackfillHistoricData: Backfills range of historic data
- Uses Protobuf + Snappy encoding per Prometheus spec
2. Enhanced Main Binary (main.go, realtime.go)
- Refactored to support multiple modes
- Mode 1: realtime - Push to Pushgateway (original behavior)
- Mode 2: historic - Push single historic datapoint
- Mode 3: backfill - Backfill range of historic data
- Command-line flags for configuration
3. Prometheus Configuration (persistence-values.yaml)
- Added web.enable-remote-write-receiver flag
- Enables Prometheus to accept timestamped samples via Remote Write API
- Required for historic data ingestion
4. Documentation (HISTORIC.md)
- Complete guide for historic data ingestion
- Explains limitations and best practices
- Examples for all three modes
- Troubleshooting guide
## Technical Details
**Problem**: Pushgateway doesn't support custom timestamps - Prometheus
always uses "now" when scraping. This prevents backfilling historic data.
**Solution**: Use Prometheus Remote Write API which accepts timestamped
samples. Requires enabling --web.enable-remote-write-receiver flag.
**Data Format**: Protobuf (prompb.WriteRequest) with Snappy compression
**Use Cases**:
- Backfill missing data (e.g., during outage)
- Import historic data from other systems
- Testing with specific timestamps
- Data migration scenarios
## Usage Examples
```bash
# Realtime mode (original behavior)
./prometheus-pusher-historic -mode=realtime -continuous
# Push data from 24 hours ago
./prometheus-pusher-historic -mode=historic -hours-ago=24
# Backfill last 48 hours with 1-hour intervals
./prometheus-pusher-historic -mode=backfill -start-hours=48 -end-hours=0 -interval=1
```
## Dependencies Added
- github.com/prometheus/prometheus (for prompb package)
- github.com/golang/snappy (for compression)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
After extensive debugging (documented in problem.md), resolved the issue where Tempo
and Loki datasources would not appear in Grafana despite correct configuration.
Root Cause:
- Sidecar-based provisioning with label discovery was not triggering the provisioner module
- Multi-step indirection (sidecar → watch → write → reload) had silent failures
Solution (following x-rag pattern):
- Disabled sidecar datasource provisioning
- Created unified grafana-datasources-all.yaml with all datasources
- Mount ConfigMap directly to /etc/grafana/provisioning/datasources/
- Grafana now reads datasources on startup via built-in provisioning
Changes:
- NEW: grafana-datasources-all.yaml - Unified datasource configuration (Prometheus, Alertmanager, Loki, Tempo)
- MODIFIED: persistence-values.yaml - Disabled sidecar, added extraVolumes/extraVolumeMounts
- MODIFIED: Justfile - Updated to use unified ConfigMap, removed patch script
- MODIFIED: README.md - Documented new provisioning approach
- NEW: problem.md - Complete debugging journey with 16 attempts documented
- DEPRECATED: loki-datasource.yaml, tempo-datasource.yaml, patch-datasources.sh (kept for history)
Result:
✅ All datasources now successfully provision on Grafana startup
✅ Tempo datasource (uid=tempo) appears in Grafana with traces-to-logs correlation
✅ Loki datasource (uid=loki) appears in Grafana
✅ Simple, maintainable approach without sidecar complexity
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
- Deploy Grafana Tempo in monolithic mode for distributed tracing
- Configure Tempo with OTLP receivers (gRPC:4317, HTTP:4318)
- Set up 10Gi filesystem storage with 7-day retention
- Integrate Tempo datasource in Grafana with traces-to-logs and traces-to-metrics correlation
- Update Grafana Alloy to collect and forward traces
- Add OTLP receiver configuration to alloy-values.yaml
- Configure batch processor for efficient trace forwarding to Tempo
- Patch Alloy service to expose OTLP ports 4317/4318
- Create demo tracing application (frontend, middleware, backend)
- Implement three-tier Python Flask application with OpenTelemetry instrumentation
- Auto-instrument with OpenTelemetry for Flask and requests libraries
- Push Docker images to private registry (registry.lan.buetow.org:30001)
- Deploy via Helm chart with Traefik ingress at tracing-demo.f3s.buetow.org
- Update Grafana configuration in prometheus/persistence-values.yaml
- Add Tempo to additionalDataSources for automatic provisioning
Files added:
- tempo/values.yaml: Tempo Helm chart configuration
- tempo/persistent-volumes.yaml: Storage configuration (10Gi PV/PVC)
- tempo/datasource-configmap.yaml: Grafana datasource with correlations
- tempo/Justfile: Installation automation
- tempo/README.md: Documentation
- tracing-demo/docker/frontend/: Python Flask frontend with OTel
- tracing-demo/docker/middleware/: Python Flask middleware with OTel
- tracing-demo/docker/backend/: Python Flask backend with OTel
- tracing-demo/helm-chart/: Kubernetes deployments, services, ingress
- tracing-demo/docker-image-Justfile: Docker build/push automation
- tracing-demo/Justfile: Helm deployment automation
- tracing-demo/README.md: Documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
|
|
|
|
- Enable etcd metrics on port 2381
- Add blog post draft documenting the changes
|
|
Reverts hostname relabeling and etcd metrics changes
|
|
- Add relabel_configs to show hostnames for node-exporter targets
- Enable etcd metrics scraping on port 2381
- Update blog post draft
|
|
- Add relabel_configs to additional-scrape-configs.yaml for FreeBSD/OpenBSD hosts
- Add node name relabeling for node-exporter on k3s nodes
- Enable etcd metrics scraping with hostname relabeling
- Add DRAFT blog post documenting the changes
Amp-Thread-ID: https://ampcode.com/threads/T-019b571c-4afc-7789-becf-bc8a3c4e1e1f
Co-authored-by: Amp <amp@ampcode.com>
|
|
|
|
|
|
|
|
|