summaryrefslogtreecommitdiff
path: root/docs/design/architecture.md
blob: 2a01e09128c5050e0ea4ad33171d51d64f32cefa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Architecture

High-level data flow and when to use each ingestion path or backend.

## Data flow

```
┌─────────────────────────────────────────────────────────────────────────┐
│                            Epimetheus                                   │
│                     (Metrics Ingestion Tool)                            │
│                                                                         │
│  Modes:                                                                 │
│  • Realtime  - Current metrics (< 5 min old)                            │
│  • Historic  - Historic metrics (≥ 5 min old)                          │
│  • Backfill  - Range of historic data                                  │
│  • Auto      - Automatic routing based on timestamp age                 │
│  • Watch     - CSV file monitoring (Prometheus and/or ClickHouse)       │
└─────────────────────────────────────────────────────────────────────────┘
         │                                           │
         │ Realtime Data                            │ Historic Data
         │ (via HTTP POST)                          │ (via Remote Write API)
         │ Uses "now" timestamp                     │ Preserves timestamps
         ▼                                           ▼
┌─────────────────────┐                    ┌─────────────────────┐
│   Pushgateway       │                    │ Prometheus /        │
│   (Port 9091)       │                    │ VictoriaMetrics     │
│                     │                    │ (Remote Write)      │
│ • Buffers metrics   │                    │                     │
│ • Scraped by        │──── Scraped ─────▶ │ /api/v1/write       │
│   Prometheus        │    every 15-30s    │                     │
└─────────────────────┘                    └─────────────────────┘
                                                      │
                                                      │ Query API
                                                      ▼
                                           ┌─────────────────────┐
                                           │     Grafana         │
                                           │   Dashboards        │
                                           └─────────────────────┘
```

**Watch mode** can also write to **ClickHouse** (separate path; see [ClickHouse backend](../backends/clickhouse.md)).

## Watch mode (CSV file watcher)

Watch mode polls CSV file(s), uses file modification time as the sample timestamp, and can push to Prometheus (Remote Write) and/or ClickHouse.

```
┌─────────────────┐     poll (1s)      ┌─────────────────────────────────────┐
│   CSV file(s)   │ ─────────────────▶ │  Epimetheus (watch mode)           │
│                 │                    │  • Parse tabular CSV                │
│  File mtime =   │                    │  • Numeric columns → metrics       │
│  sample time    │                    │  • String columns → labels         │
└─────────────────┘                    │  • Optional DNS resolution (IPs)  │
                                       └─────────────────────────────────────┘
                                                        │
                                    ┌────────────────────┼────────────────────┐
                                    │                    │                    │
                                    ▼                    ▼                    │
                            ┌───────────────┐    ┌───────────────┐            │
                            │  Prometheus  │    │  ClickHouse   │            │
                            │  (optional)  │    │  (optional)   │            │
                            │  Remote Write │    │  HTTP insert  │            │
                            │  /api/v1/write│    │  (batched)    │            │
                            └───────────────┘    └───────────────┘            │
```

At least one of `-prometheus` or `-clickhouse` must be set. See [Operating Modes](../guides/modes.md) and [ClickHouse backend](../backends/clickhouse.md).

## When to use Pushgateway vs Remote Write

**Use Pushgateway (realtime mode):**

- Short-lived batch jobs
- Service-level metrics
- Jobs behind firewalls
- Current/recent data (< 5 minutes old)

**Use Remote Write (historic, backfill, watch, or auto with old data):**

- Historic data import
- Backfilling gaps
- Data migration
- Data older than 5 minutes
- Watch mode (to preserve file mtime as timestamp)

**Use Auto mode:**

- Mixed current and historic data in one file
- Unknown timestamp ages
- General-purpose file import

## When to use which backend

- **Prometheus or VictoriaMetrics:** Set `-prometheus=` to the backend’s Remote Write URL. Use for realtime (via Pushgateway scraped by Prometheus/VM), historic, backfill, auto, and watch.
- **ClickHouse:** Set `-clickhouse=` in watch mode for analytics/long-term storage. Can be used together with Prometheus or alone (with `-prometheus=` empty).

## Metric design (best practices)

- **Types:** Counter for cumulative values (requests, errors); Gauge for point-in-time (temperature, connections); Histogram for distributions (latency).
- **Labels:** Meaningful labels; avoid high cardinality (user IDs, raw timestamps); keep combinations reasonable (< 1000 per metric).
- **Naming:** Descriptive names; units in gauge names (e.g. `_celsius`, `_bytes`); `_total` suffix for counters.