summaryrefslogtreecommitdiff
path: root/docs/DOCS-RESTRUCTURE-PLAN.md
blob: c688993df49f61ae273278667a7e66c7e9d9f90f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# Documentation Restructure Plan

This plan addresses the current documentation sprawl and clarifies the **multiple ingestion backends** (Prometheus, ClickHouse, and future backends such as VictoriaMetrics) and **modes** (realtime, historic, backfill, auto, watch).

---

## 1. Current State Summary

### 1.1 Existing Markdown Files

| File        | Purpose                          | Issues |
|------------|-----------------------------------|--------|
| `README.md` | Single ~995-line doc: intro, modes, backends, setup, troubleshooting, macOS, cleanup | Too long; mixes audiences and backends; hard to maintain |
| `AGENT.md`  | Agent rules (Grafana dashboard guidelines + ref to `~/git/conf/snippets/go/go-projects.md`) | Fine as-is; not user docs |
| `CLAUDE.md` | One-line pointer to AGENT.md      | Fine as-is |

### 1.2 Broken or Missing References in README

- `CSV-FORMAT-FLEXIBILITY.md` – linked, **does not exist**
- `DNS-RESOLUTION-FEATURE.md` – linked, **does not exist**
- `DTAIL-METRICS-EXAMPLE.md` – linked, **does not exist**
- `MAGEFILE.md` – linked, **does not exist** (build logic lives in `Magefile.go`)

### 1.3 Ingestion Backends (from codebase)

| Backend    | Modes                    | Notes |
|-----------|---------------------------|--------|
| **Prometheus** | realtime (Pushgateway), historic/backfill/auto (Remote Write), watch (Remote Write) | Primary; Remote Write requires feature flag |
| **ClickHouse** | watch only               | Optional; can run with Prometheus or alone |

*VictoriaDB / VictoriaMetrics:* Not present in code today. Plan leaves room for a dedicated backend doc when added.

---

## 2. Goals

1. **Separate by ingestion backend** so Prometheus vs ClickHouse (and future backends) have clear, non-redundant docs.
2. **Split by audience and topic**: quick start vs reference vs operations (setup, troubleshooting, cleanup).
3. **Fix broken links**: either add the missing docs or replace links with in-README sections / new doc paths.
4. **Single source of truth** for each concept (e.g. “how watch mode works” and “how to configure Prometheus” in one place each).
5. **Easier maintenance**: smaller, focused files; clear naming; one `docs/` tree.

---

## 3. Proposed Directory Layout

```
epimetheus/
├── README.md                    # Short overview + quick start + doc index (slimmed)
├── AGENT.md                     # Unchanged
├── CLAUDE.md                    # Unchanged
├── docs/
│   ├── README.md                # Documentation index (nav + short descriptions)
│   │
│   ├── guides/                  # How-to and concepts
│   │   ├── quickstart.md        # Minimal path to first push (Prometheus or ClickHouse)
│   │   ├── modes.md             # All modes: realtime, historic, backfill, auto, watch
│   │   ├── data-formats.md      # CSV (epimetheus + tabular) and JSON
│   │   ├── csv-format-flexibility.md   # “Any CSV” + examples (replaces missing file)
│   │   ├── dns-resolution.md    # IP → hostname resolution (replaces missing file)
│   │   └── dtail-metrics-example.md    # Optional: dtail.csv walkthrough (replaces missing file)
│   │
│   ├── backends/                # One doc per ingestion backend
│   │   ├── prometheus.md        # Pushgateway + Remote Write, config, limits
│   │   ├── clickhouse.md        # Watch-only; schema; verify script
│   │   └── (future) victoriametrics.md  # When/if added
│   │
│   ├── operations/              # Setup, runbooks, platform-specific
│   │   ├── setup-prometheus.md  # Remote Write receiver, scrape config, retention
│   │   ├── setup-clickhouse.md  # Table creation, verify-clickhouse.sh
│   │   ├── troubleshooting.md   # Connection issues, “no metrics”, out-of-order, etc.
│   │   ├── cleanup.md           # Benchmark cleanup, Pushgateway delete, port-forwards
│   │   ├── macos-setup.md       # Brew, Prometheus args, Remote Write on macOS
│   │   └── kubernetes.md        # Port-forwards, Helm, ConfigMaps (from current README)
│   │
│   ├── reference/               # Reference material
│   │   ├── cli.md               # All flags by mode
│   │   ├── test-metrics.md      # epimetheus_test_* metrics and types
│   │   ├── grafana-dashboard.md # Panels, deploy options, datasource
│   │   ├── example-queries.md   # PromQL and curl examples
│   │   └── magefile.md          # Mage targets (replaces missing MAGEFILE.md)
│   │
│   └── design/                  # Optional, for contributors
│       └── architecture.md     # High-level data flow (current ASCII diagrams)
```

---

## 4. File-by-File Plan

### 4.1 Root `README.md` (slimmed)

- **Keep:** Project name, tagline, “Why Epimetheus”, **one** high-level architecture diagram (simplified).
- **Keep:** Very short “Overview” (1 paragraph) and **Quick Start** (3–5 steps pointing at `docs/guides/quickstart.md` for details).
- **Add:** **Documentation index** – bullet list with links to:
  - `docs/README.md`
  - `docs/guides/quickstart.md`, `docs/guides/modes.md`
  - `docs/backends/prometheus.md`, `docs/backends/clickhouse.md`
  - `docs/operations/setup-prometheus.md`, `docs/operations/troubleshooting.md`
  - `docs/reference/cli.md`, `docs/reference/magefile.md`
- **Move out of README into `docs/`:**
  - All mode details → `docs/guides/modes.md`
  - Backend-specific behaviour → `docs/backends/*.md`
  - Setup (Prometheus, ClickHouse, k8s, macOS) → `docs/operations/*.md`
  - Data formats → `docs/guides/data-formats.md` (+ csv-format-flexibility, dns-resolution, dtail example)
  - Test metrics, Grafana, example queries → `docs/reference/*.md`
  - Troubleshooting, cleanup → `docs/operations/*.md`
  - Time range / retention → `docs/backends/prometheus.md` and `docs/operations/setup-prometheus.md`
- **Fix links:** Remove links to `CSV-FORMAT-FLEXIBILITY.md`, `DNS-RESOLUTION-FEATURE.md`, `DTAIL-METRICS-EXAMPLE.md`, `MAGEFILE.md` from README; point to `docs/guides/...` and `docs/reference/magefile.md` instead.

**Target:** README under ~150–200 lines.

---

### 4.2 `docs/README.md` (new)

- Title: “Epimetheus Documentation”.
- Short intro (2–3 sentences).
- **Structured index** with sections:
  - **Guides:** quickstart, modes, data formats, CSV flexibility, DNS resolution, dtail example.
  - **Ingestion backends:** Prometheus, ClickHouse (and placeholder for Victoria* if desired).
  - **Operations:** setup (Prometheus, ClickHouse), troubleshooting, cleanup, macOS, Kubernetes.
  - **Reference:** CLI, test metrics, Grafana, example queries, Mage.
- Each entry: link + one-line description.

---

### 4.3 Guides

| Doc | Content | Source |
|-----|--------|--------|
| `guides/quickstart.md` | Minimal steps: build/run, push to Prometheus or ClickHouse, view (Prometheus UI or verify-clickhouse.sh). | Current README “Quick Start” + “Run in Realtime Mode” + one watch example. |
| `guides/modes.md` | Table: mode name, purpose, which backends, main flags. Then one subsection per mode (realtime, historic, backfill, auto, watch) with short description and example command. | Current README “Operating Modes”. |
| `guides/data-formats.md` | Epimetheus CSV (metric_name, labels, value, timestamp_ms), JSON format, optional timestamp. Link to csv-format-flexibility for tabular CSV. | Current README “Data Formats”. |
| `guides/csv-format-flexibility.md` | “Works with any CSV”: headers → metric names/labels, numeric vs string columns, sanitization, examples (web, food). | New content; replaces missing `CSV-FORMAT-FLEXIBILITY.md`. |
| `guides/dns-resolution.md` | Default `ip` resolution; `-resolve-ip-labels`; behaviour on failure. | New content; replaces missing `DNS-RESOLUTION-FEATURE.md`. |
| `guides/dtail-metrics-example.md` | Optional: step-by-step dtail.csv example. | New content; replaces missing `DTAIL-METRICS-EXAMPLE.md`; can be short. |

---

### 4.4 Backends

| Doc | Content | Source |
|-----|--------|--------|
| `backends/prometheus.md` | Pushgateway (realtime) vs Remote Write (historic/watch); URLs; time range and retention limits; out-of-order; link to setup-prometheus. | README Prometheus bits + “Time Range Limitations” + “Setup Requirements” (Remote Write). |
| `backends/clickhouse.md` | Watch-only; `-clickhouse`, `-clickhouse-table`; table schema (from code/comments); `verify-clickhouse.sh`; Prometheus + ClickHouse together. | README “ClickHouse Support” + verify-clickhouse.sh + internal/ingester/clickhouse.go. |

---

### 4.5 Operations

| Doc | Content | Source |
|-----|--------|--------|
| `operations/setup-prometheus.md` | Enable Remote Write receiver (and Admin API); scrape config for Pushgateway; retention; Prometheus 3.x syntax; verify commands. | Current README “Setup Requirements” (Prometheus). |
| `operations/setup-clickhouse.md` | Ensure table exists (e.g. from ingester); run verify script; optional Docker/systemd. | From README + scripts + code. |
| `operations/troubleshooting.md` | Pushgateway connection; metrics not in Prometheus; “Remote write receiver not enabled”; out-of-order errors; dashboard not in Grafana; ClickHouse connection. | Current README “Troubleshooting”. |
| `operations/cleanup.md` | Cleanup benchmark data script; manual Prometheus delete/tombstones; Pushgateway delete; stop port-forwards; uninstall Pushgateway. | Current README “Cleanup”. |
| `operations/macos-setup.md` | Brew install; prometheus.args (Remote Write, Admin API); verify; optional “temporary” run. | Current README “MacOS Setup”. |
| `operations/kubernetes.md` | Port-forwards (Pushgateway, Prometheus, Grafana); Helm/ConfigMap for dashboard; namespace. | Extracted from README examples. |

---

### 4.6 Reference

| Doc | Content | Source |
|-----|--------|--------|
| `reference/cli.md` | Table or list of all flags by mode (realtime, historic, backfill, auto, watch); default values. | From README + `cmd/epimetheus/main.go`. |
| `reference/test-metrics.md` | Each `epimetheus_test_*` metric: type, description, labels, use case. | Current README “Test Metrics”. |
| `reference/grafana-dashboard.md` | Panels list; deploy (ConfigMap, manual import, script); datasource; link to AGENT.md for panel guidelines. | Current README “Grafana Dashboard”. |
| `reference/example-queries.md` | PromQL and curl examples (basic, histogram, labeled counter). | Current README “Example Queries”. |
| `reference/magefile.md` | List of Mage targets (build, test, run, RunWatchClickHouse, cleanup, etc.) with one-line description and example. | From `Magefile.go`; replaces missing `MAGEFILE.md`. |

---

### 4.7 Design (optional)

| Doc | Content | Source |
|-----|--------|--------|
| `design/architecture.md` | High-level data flow; ASCII diagrams (current README); “when to use Pushgateway vs Remote Write” and “when to use which backend”. | Current README “Architecture” and “Best Practices”. |

---

## 5. Implementation Order

1. **Create `docs/` and index**
   - Create `docs/README.md` with the full index (links can target paths that don’t exist yet).
2. **Fix broken links and add missing content**
   - Add `docs/guides/csv-format-flexibility.md`, `docs/guides/dns-resolution.md`, `docs/guides/dtail-metrics-example.md`, `docs/reference/magefile.md` so all current README links resolve.
3. **Backend-centric docs**
   - Add `docs/backends/prometheus.md` and `docs/backends/clickhouse.md`; move/duplicate content from README.
4. **Operations**
   - Add `docs/operations/setup-prometheus.md`, `setup-clickhouse.md`, `troubleshooting.md`, `cleanup.md`, `macos-setup.md`, `kubernetes.md`; move content from README.
5. **Guides**
   - Add `docs/guides/quickstart.md`, `modes.md`, `data-formats.md`; move content from README.
6. **Reference**
   - Add `docs/reference/cli.md`, `test-metrics.md`, `grafana-dashboard.md`, `example-queries.md`; move content from README.
7. **Slim README**
   - Cut README down to overview, quick start, and doc index; replace old links with `docs/...` links.
8. **Optional**
   - Add `docs/design/architecture.md` and link from `docs/README.md`.

---

## 6. Cross-Cutting Conventions

- **Links:** Prefer relative links from repo root (e.g. `[Modes](docs/guides/modes.md)`) or from `docs/` (e.g. `[Prometheus](backends/prometheus.md)` inside docs).
- **Backend mentions:** In mode/CLI docs, use a short table or sentence: “Supported backends: Prometheus (all modes), ClickHouse (watch only).”
- **One diagram:** Keep one high-level diagram in README or `design/architecture.md`; avoid duplicating large ASCII art in multiple files.
- **CLI and defaults:** Single source of truth in `reference/cli.md`; guides and backend docs can quote the relevant subset.
- **Version/legal:** Keep “Version” and “License” in root README (or CONTRIBUTING.md if you add one).

---

## 7. Future: VictoriaMetrics / VictoriaDB

When adding a new backend (e.g. VictoriaMetrics, which speaks Prometheus Remote Write):

- Add `docs/backends/victoriametrics.md` (or `victoriadb.md`) with URL format, any extra flags, and differences from Prometheus.
- In `docs/README.md` and root README, add one line to the “Ingestion backends” section.
- In `docs/guides/modes.md` and `reference/cli.md`, extend the “which backends support which mode” table and flags.
- No need to duplicate full setup/troubleshooting if it matches Prometheus; link to `backends/prometheus.md` and note compatibility where relevant.

---

## 8. Checklist Before Calling Done

- [ ] All current README links resolve (no 404s).
- [ ] README is under ~200 lines and ends with doc index.
- [ ] `docs/README.md` lists every new doc with link and one-line description.
- [ ] Prometheus vs ClickHouse (and modes) are clearly separated in backends and guides.
- [ ] Setup, troubleshooting, and cleanup live under `docs/operations/`.
- [ ] Mage is documented in `docs/reference/magefile.md` and linked from root README.
- [ ] Optional: `docs/design/architecture.md` exists and is linked from index.

This plan gives you a single place to extend when you add VictoriaDB/VictoriaMetrics or another backend, and keeps the root README short while all detailed docs live under `docs/` with a clear structure by topic and backend.