summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-05-06 09:35:55 +0300
committerPaul Buetow <paul@buetow.org>2026-05-06 09:35:55 +0300
commitfbb7c9a9ad8d03d5d095ac441a58b37537e0ab8d (patch)
tree2ccb042e90ca3ed99e13d9e7bf36948e7e362936 /README.md
parent3b20f2c4d16c7b7f583e9ab2b51213e9ddc94fd5 (diff)
add Dockerfile and Rocky Linux 9 build docs
Introduces a Docker-based build path so ior can be compiled on any Linux host without a native Rocky 9 toolchain setup: - Dockerfile: Rocky 9 minimal image with Go (version from ARG, default from go.mod), static libelf/libzstd built from source, libbpfgo at v0.9.2-libbpf-1.5.1, and mage; CMD runs mage generate + mage all against the repo root mounted as a volume. - scripts/build-with-docker.sh: reads GO_VERSION from go.mod, passes it as --build-arg to docker build, mounts tracefs and BTF into the container, writes the binary to the repo root. - Magefile.go: adds BuildDocker target that wraps the script. - README.md: simplified to the two build paths (Docker + native) with links to docs/; removed GOTOOLCHAIN=auto throughout. - docs/build-rocky-linux-9.md: full manual Rocky 9 steps, libbpfgo toolchain setup/rollback, compile-once-run-everywhere explanation, and timing semantics. - docs/tui-reference.md: complete TUI hotkey reference, recording mode details, and the .ior.zst vs Parquet trade-off table. - AGENTS.md: removed GOTOOLCHAIN=auto from all build commands. - internal/c/generated_tracepoints.c: regenerated against the host kernel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'README.md')
-rw-r--r--README.md356
1 files changed, 36 insertions, 320 deletions
diff --git a/README.md b/README.md
index f5fb268..f031fc6 100644
--- a/README.md
+++ b/README.md
@@ -25,357 +25,73 @@ The demo is fully reproducible: `mage installDemoTools` once, then `sudo -v && m
## Requirements
- Go 1.26 or newer (ior relies on cgo via libbpfgo).
+- Linux with a BTF-enabled kernel (`/sys/kernel/btf/vmlinux` present).
-## Local libbpfgo Toolchain
+## Build
-`ior` links against a locally built `libbpfgo` checkout. By default
-`Magefile.go` expects that checkout at `../libbpfgo` relative to this repo; set
-`LIBBPFGO=/absolute/path/to/libbpfgo` if you keep it elsewhere.
+### Docker build (recommended — no toolchain setup required)
-Pin that checkout to `v0.9.2-libbpf-1.5.1` and rebuild the static artifacts
-before running `mage` targets:
+Builds the static `ior` binary inside a Rocky Linux 9 container and writes it
+to the repo root. Requires only Docker and a Linux host with tracefs and BTF:
```shell
-git -C ../libbpfgo checkout v0.9.2-libbpf-1.5.1
-git -C ../libbpfgo submodule update --init --recursive
-make -C ../libbpfgo libbpfgo-static
+mage buildDocker
```
-Validated commands for this pin:
+On first run this takes ~15–20 minutes to build the image. Subsequent runs
+reuse the cached image and finish in under a minute. To skip the image build:
```shell
-env GOTOOLCHAIN=auto mage world
-env GOTOOLCHAIN=auto mage integrationTest
+./scripts/build-with-docker.sh --run
```
-Troubleshooting and rollback:
+### Native build
-- If builds fail with `bpf/bpf.h` missing, re-run the checkout, submodule sync,
- and `make libbpfgo-static` commands above, then retry `env GOTOOLCHAIN=auto mage world`.
-- Prefer Mage targets over raw `go test` for packages that import `libbpfgo`;
- Mage injects the required `CGO_CFLAGS`, `CGO_LDFLAGS`, and `LIBBPFGO` values.
-- To roll back to the previous wrapper state, repin `go.mod` to
- `github.com/aquasecurity/libbpfgo v0.6.0-libbpf-1.3.0.20240111220235-90dbffffbdab`,
- then reset the sibling checkout and rebuild:
+`ior` links against a locally built `libbpfgo`. Clone it as a sibling of this
+repo and build the static archive once:
```shell
-git -C ../libbpfgo checkout 90dbffffbdab
+git clone https://github.com/aquasecurity/libbpfgo ../libbpfgo
+git -C ../libbpfgo checkout v0.9.2-libbpf-1.5.1
git -C ../libbpfgo submodule update --init --recursive
make -C ../libbpfgo libbpfgo-static
```
-## Timing Semantics
-
-Each reported event pair has two timing counters:
-
-- `durationNs`: syscall runtime on the same thread (`exit(current) - enter(current)`).
-- `durationToPrevNs`: inter-syscall gap on the same thread (`enter(current) - exit(previous)`).
-
-Important details:
-
-- `durationToPrevNs` is tracked per `tid` (thread), not globally across all threads.
-- The first observed syscall pair for a thread has `durationToPrevNs = 0` because there is no prior exit timestamp.
-- `durationToPrevNs` is attributed to the current syscall pair (the one whose `enter` closes the gap).
-- There is no separate "idle" pseudo-event bucket; use the `durationToPrev` count field when aggregated flamegraph output should emphasize inter-syscall time.
-
-## Rocky Linux 9
-
-Verified on a fresh Rocky Linux 9.7 install (e.g. kernel `5.14.0-611.5.1.el9_7`,
-exact stamp not required). Runs on the **stock RHEL 9 kernel** — no kernel
-upgrade needed. One build-time caveat:
-
-- Rocky 9 ships neither `libelf.a` nor `libzstd.a` (no `*-static` packages). Both have
- to be built from source — the elfutils dance is the same as the Fedora section above;
- `libzstd.a` needs an extra `make` from the upstream tarball.
-
-> Historical note. Earlier versions of `ior` typed BPF tracepoint context as
-> `struct trace_event_raw_sys_enter`/`_exit` (the BTF-emitted alias). RHEL 9
-> backports an `rt`-tree patch that adds `preempt_lazy_count` to `struct
-> trace_entry`, which widens those aliases by 8 bytes and shifts the `args`/`ret`
-> offsets — but the actual context the kernel hands the program is still
-> `struct syscall_trace_enter`/`_exit`, where the offsets did not move. The
-> verifier saw the program reading past `max_ctx_offset` and rejected the
-> attach with `EACCES`. `ior` now uses `syscall_trace_*` directly (matching
-> the [bcc fix](https://github.com/iovisor/bcc/pull/4920) and inspektor-gadget),
-> so the stock kernel works with no workaround.
+Then build everything:
```shell
-# 1) Enable repos and install build dependencies (CRB ships static libs).
-sudo dnf config-manager --set-enabled crb
-sudo dnf install -y epel-release
-sudo dnf install -y gcc clang bpftool elfutils-libelf-devel zlib-static \
- glibc-static libzstd-devel git make cmake wget rpmdevtools strace bpftrace
-sudo dnf builddep -y elfutils
-
-# 2) Install Go 1.26 from go.dev (Rocky 9 ships only Go 1.25; ior needs 1.26+).
-cd /tmp
-wget -q https://go.dev/dl/go1.26.2.linux-amd64.tar.gz
-sudo tar -C /usr/local -xf go1.26.2.linux-amd64.tar.gz
-echo 'export PATH=/usr/local/go/bin:$HOME/go/bin:$PATH' | sudo tee /etc/profile.d/go.sh
-source /etc/profile.d/go.sh
-
-# 3) Build libelf.a from elfutils source (same trick as the Fedora section).
-mkdir -p ~/src && cd ~
-dnf download --source elfutils-libelf
-rpm -ivh elfutils-*.src.rpm
-tar -C ~/src -xjf rpmbuild/SOURCES/elfutils-*.tar.bz2
-cd ~/src/elfutils-*
-./configure --enable-deterministic-archives --disable-debuginfod --disable-libdebuginfod
-make -C lib -j$(nproc)
-make -C libelf -j$(nproc)
-sudo cp -v libelf/libelf.a /usr/lib64/
-
-# 4) Build libzstd.a from upstream (libzstd-devel does not ship the static archive).
-cd /tmp
-wget -q https://github.com/facebook/zstd/releases/download/v1.5.5/zstd-1.5.5.tar.gz
-tar xzf zstd-1.5.5.tar.gz
-make -C zstd-1.5.5/lib -j$(nproc) libzstd.a
-sudo cp -v zstd-1.5.5/lib/libzstd.a /usr/lib64/
-
-# 5) Clone ior + libbpfgo, pin libbpfgo, build the static archive, install mage.
-mkdir -p ~/git
-git clone https://codeberg.org/snonux/ior ~/git/ior
-git clone https://github.com/aquasecurity/libbpfgo ~/git/libbpfgo
-git -C ~/git/libbpfgo checkout v0.9.2-libbpf-1.5.1
-git -C ~/git/libbpfgo submodule update --init --recursive
-make -C ~/git/libbpfgo libbpfgo-static
-go install github.com/magefile/mage@latest
-
-# 6) Generate against the live kernel (the syscall-coverage audit is
-# kernel-specific; IOR_FORCE_GENERATE skips the strict diff against the
-# committed audit which was generated on a different kernel build).
-cd ~/git/ior
-env IOR_FORCE_GENERATE=1 GOTOOLCHAIN=auto mage generate
-env GOTOOLCHAIN=auto mage all
-
-# 7) Smoke test.
-sudo ./ior -plain -duration 5
+mage world
```
-If `./ior -plain -duration 5` prints `Probing for 5s` and a stream of CSV rows,
-the install is good.
+For Rocky Linux 9 specific steps (building static libelf/libzstd, installing Go
+1.26) see [docs/build-rocky-linux-9.md](./docs/build-rocky-linux-9.md).
## Compile once, run everywhere
-The full build dance above only has to happen on **one** machine. The resulting
-`ior` binary is portable across Linux hosts: `scp ior other-host:/usr/local/bin/`
-and run it there. Two reasons it works:
-
-- The Go binary is compiled with `-extldflags "-static"` and links libbpf,
- libelf, libzstd, and zlib as static archives. There is no runtime dependency
- on the build host's library versions (a couple of glibc resolver functions —
- `getpwnam_r` and friends — fall back to the target's libc, which is fine on
- any reasonable distro).
-- The BPF object inside the binary is built with libbpf's CO-RE
- (Compile-Once, Run-Everywhere) machinery. Field offsets are not baked into
- the bytecode; libbpf reads the target kernel's BTF
- (`/sys/kernel/btf/vmlinux`) at load time and patches the program for that
- kernel. As long as the target ships BTF — true on every Debian, Ubuntu,
- Fedora, Arch, RHEL, and now ElRepo `kernel-ml` build at the time of
- writing — the same `ior` binary runs without recompilation.
-
-So in practice: pick one Rocky 9 / Fedora box, do the build dance once, then
-distribute the 23 MB binary to wherever you want to trace. The build host needs
-all the dev tooling; the trace hosts need only a BTF-enabled kernel and `sudo`.
+Build on one machine, then `scp ior other-host:/usr/local/bin/` and run it
+anywhere. The binary is fully statically linked and uses libbpf CO-RE
+(Compile-Once, Run-Everywhere) to adapt field offsets to the target kernel's
+BTF at load time — no recompile per host or kernel version needed.
-For the eBPF + CO-RE explanation, see Part 2 of the I/O Riot NG blog series:
-[Unveiling I/O Riot NG — Part 2: under the hood](https://foo.zone/gemfeed/unveiling-ior-ng-part-2.html).
+See [docs/build-rocky-linux-9.md](./docs/build-rocky-linux-9.md) for the full
+explanation.
-## TUI Flamegraphs
+## TUI
-Flamegraphs are available only inside the TUI dashboard.
-Use `-fields` to change the stack order and `-count` to choose the metric.
-The default stack order is `comm,path,tracepoint` (bottom to top).
+Press **H** inside the dashboard to toggle the built-in help panel. Tabs are
+reachable with **tab/shift+tab** or number keys **1–6**. Full hotkey reference:
+[docs/tui-reference.md](./docs/tui-reference.md).
## Recording Modes
-`ior` has four distinct output flows. They are intentionally different:
-
-| Mode | How to use it | What it writes | Filter behavior |
-| --- | --- | --- | --- |
-| TUI dashboard | default startup | nothing continuously; data stays in memory unless you export | current TUI/global filters drive what you see |
-| TUI CSV snapshot export | press `e` in the dashboard | one `ior-stream-<timestamp>.csv` snapshot of the current filtered stream view | exports only the currently filtered in-memory rows |
-| Headless `.ior.zst` export | start with `-flamegraph -name <name>` | one aggregated native trace artifact written at shutdown | no TUI filter stack; this is the native trace/integration workflow |
-| Parquet recording | press `R` in the TUI, or start with `-parquet <file>` | a streaming Parquet file of traced syscall rows | TUI mode records rows that pass the active TUI filter; headless `-parquet` records all traced rows |
-
-Important distinction:
-
-- `.ior.zst` output is an aggregated native artifact, not a row-by-row event log.
-- CSV export is a point-in-time snapshot of the ring buffer.
-- Parquet recording is a streaming capture from start to stop.
-- The ring buffer is capped, so CSV export is not a replacement for Parquet recording or `.ior.zst` output.
-
-### Headless Native `.ior.zst` Output
-
-Use `-flamegraph` when you want the native `ior` trace artifact instead of a streaming row log:
-
-```shell
-sudo ./ior -flamegraph -name trace-run -duration 60
-```
-
-Native `.ior.zst` behavior:
-
-- writes one `*.ior.zst` file when the run ends
-- stores aggregated counters for repeated syscall/path/process combinations
-- is intended for `ior`'s native flamegraph and integration-style workflows
-- does not preserve one output row per traced syscall
-
-### TUI Parquet Recording
-
-Start a recording from the dashboard with `R`.
-
-- First `R`: open a filename prompt (`ior-recording-<timestamp>.parquet` by default).
-- `Enter`: start recording to that file.
-- Second `R`: stop and finalize the active Parquet file.
-- Recording stops automatically when you quit the TUI or reselect PID/TID/session scope.
-
-Lifecycle details:
-
-- TUI recording uses the active TUI global filter at emission time.
-- If a filter change restarts tracing, the recorder stays alive and continues writing matching rows after the restart.
-- The dashboard footer shows the active recording path or the last recording error.
-
-### Headless Parquet Recording
-
-Use `-parquet` to skip the TUI and stream traced syscall rows directly to a Parquet file:
-
-```shell
-sudo ./ior -parquet trace.parquet -duration 60
-```
-
-Headless Parquet mode behavior:
-
-- skips the TUI completely
-- records all traced rows
-- rejects content filters such as `-comm`, `-path`, `-pid`, and `-tid`
-- cannot be combined with `-plain`, `-flamegraph`, `--testflames`, or `--testliveflames`
+`ior` has four distinct output flows:
-Use headless mode when you want a full recording, and TUI mode when you want interactive filtering plus optional start/stop recording from the dashboard.
-
-### Choosing Between `.ior.zst` and Parquet
-
-Both formats are useful, but they solve different problems:
-
-| Question | Native `.ior.zst` | Parquet |
+| Mode | How to use it | What it writes |
| --- | --- | --- |
-| Data shape | aggregated counters | one row per traced syscall |
-| Write pattern | collect in memory, write one compressed artifact at the end | stream rows continuously while recording |
-| Best for | `ior`-native trace artifacts, flamegraph workflows, integration assertions | offline analysis in other tools, long captures, preserving per-event detail |
-| Relative write cost | usually lower because repeated events are folded together before file write | usually higher because each traced row is serialized |
-| Detail retained | loses original row order and per-event granularity | keeps per-event timing and syscall fields |
-
-Rule of thumb:
-
-- choose `.ior.zst` when you want the native `ior` artifact and do not need every traced syscall row preserved
-- choose Parquet when you want a full event stream for downstream analysis outside `ior`
-
-## TUI Navigation
-
-The TUI interface provides an in‑screen help panel (toggle with **H**) that lists all available keys. Use this help screen to discover navigation shortcuts.
-
-You can move between dashboard tabs:
-
-- **tab** – next dashboard tab
-- **shift+tab** – previous dashboard tab
-- **1** – Overview
-- **2** – Syscalls
-- **3** – Files
-- **4** – Processes
-- **5** – Latency+Gaps
-- **6** – Stream
-
-The bottom hint shows `press H for help` when the help is hidden.
-
-
-
-The TUI has two key scopes:
-
-- Global hotkeys: available from dashboard screens.
-- Dashboard hotkeys: behavior that depends on the active dashboard tab (especially `6:Stream`).
-
-Help visibility:
-
-- `H`: toggle bottom help sections on/off.
-- By default, help is hidden and the bottom hint shows `press H for help`.
-
-### Global Hotkeys
-
-- `tab`: next dashboard tab.
-- `shift+tab`: previous dashboard tab.
-- `1`: `Overview` tab.
-- `2`: `Syscalls` tab.
-- `3`: `Files` tab.
-- `4`: `Processes` tab.
-- `5`: `Latency+Gaps` tab.
-- `6`: `Stream` tab.
-- `7`: `Stream` tab (alias).
-- `e`: export filtered stream rows to CSV (`ior-stream-<timestamp>.csv`) in current working directory.
-- `R`: start or stop Parquet recording from the TUI dashboard.
-- `p`: re-open process selector (PID selection flow).
-- `t`: open TID selector flow.
-- `o`: open probe selection/toggling dialog.
-- `r`: refresh dashboard snapshot.
-- `q` or `ctrl+c`: quit.
-
-### Dashboard / Tab-Specific Hotkeys
-
-- `d` in `3:Files`: toggle directory-grouped files view.
-- `s` in sortable table tabs (`2:Syscalls`, `3:Files`, `4:Processes`): sort by the selected column using that table's default direction.
-- `S` in sortable table tabs (`2:Syscalls`, `3:Files`, `4:Processes`): reverse-sort by the selected column.
-- `j/k` or `up/down` in list-like tabs (`2:Syscalls`, `3:Files`, `4:Processes`): scroll list.
-
-`left/right` and `h/l` do not switch tabs. In `6:Stream` paused mode they move selected column.
-
-### 6:Stream Hotkeys and Behavior
-
-`6:Stream` has two modes:
-
-- Live mode (`paused=false`): rows update continuously.
-- Pause mode (`paused=true`): selection/cell/filter/search/export workflows are enabled.
-
-Core controls:
-
-- `space`: toggle live/pause.
-- `g`/`G`: jump to top/tail.
-- `c`: clear stream filters.
-- `f`: open advanced filter modal.
-- `j/k` or `up/down`: move selected row in pause mode; scroll in live mode.
-- `left/right` or `h/l`: move selected column in pause mode.
-
-#### Enter-Based Filter Stack (Pause Mode)
-
-In pause mode, `enter` on the selected cell pushes a new filter onto a stack and immediately re-filters the current ring buffer snapshot. Filters are stackable.
-
-- String columns use case-insensitive substring match:
-- `Comm` -> `comm~<value>`
-- `Syscall` -> `syscall~<value>`
-- `File` -> `file~<value>`
-- Numeric exact match:
-- `PID`, `TID`, `FD`, `Ret`, `Bytes`
-- Numeric threshold (`>=`):
-- `Latency` -> `latency>=selected_value`
-- `Gap` -> `gap>=selected_value`
-
-Undo:
-
-- `esc` in pause mode pops the most recent filter from the stack (LIFO).
-- Repeated `esc` keeps undoing until no stacked filters remain.
-
-#### Regex Search (Pause Mode)
-
-- `/`: open regex prompt and search forward.
-- `?`: open regex prompt and search backward.
-- Search checks all stream columns/fields and wraps around ring-buffer rows.
-- `n`: next match in the same direction as last `/` or `?`.
-- `N`: previous match (opposite direction).
-
-#### Stream CSV Export (Pause Mode)
-
-- `x`: quick export filtered stream rows to CSV (`ior-stream-<timestamp>.csv`).
-- `X`: export filtered stream rows to CSV with filename prompt.
-- `E`: open last stream-exported CSV in foreground editor (`EDITOR` -> `VISUAL` -> `SUDO_EDITOR` -> fallback `hx`, else `vi`).
-
-Export behavior:
+| TUI dashboard | default startup | nothing — data stays in memory until export |
+| TUI CSV snapshot | press `e` | `ior-stream-<timestamp>.csv` of filtered stream |
+| Headless `.ior.zst` | `-flamegraph -name <name>` | aggregated native trace artifact |
+| Parquet recording | press `R` in TUI, or `-parquet <file>` | streaming Parquet file |
-- `e` exports a fresh filtered stream snapshot using the current shared TUI filter, even outside paused mode.
-- `x`/`X` export the currently paused stream rows, preserving the stream tab's exact paused view.
+Full details and the `.ior.zst` vs Parquet trade-off:
+[docs/tui-reference.md](./docs/tui-reference.md).