diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-18 20:54:35 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-18 20:54:35 +0200 |
| commit | cd554b0af706b5f62b4e1bfde04091052b4aac61 (patch) | |
| tree | e6d02f1c2a1da27da17386e8832c2d4a3e699cdf /docs | |
| parent | b421b2232351049277ee4ad5b31367bb2b6779bb (diff) | |
cleanup
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/libbpfgo-upgrade-plan.md | 154 | ||||
| -rw-r--r-- | docs/parquet-recording-perf-baseline.md | 113 | ||||
| -rw-r--r-- | docs/tui-dashboard-table-sorting-plan.md | 336 | ||||
| -rw-r--r-- | docs/tui-flamegraph-behavior.md | 46 | ||||
| -rw-r--r-- | docs/tui-flamegraph-plan.md | 450 | ||||
| -rw-r--r-- | docs/tui-global-filter-architecture.md | 160 |
6 files changed, 0 insertions, 1259 deletions
diff --git a/docs/libbpfgo-upgrade-plan.md b/docs/libbpfgo-upgrade-plan.md deleted file mode 100644 index 1969015..0000000 --- a/docs/libbpfgo-upgrade-plan.md +++ /dev/null @@ -1,154 +0,0 @@ -# libbpfgo Upgrade Plan - -## Goal - -Upgrade `ior` from `github.com/aquasecurity/libbpfgo v0.6.0-libbpf-1.3.0...` -to the latest tagged upstream release `v0.9.2-libbpf-1.5.1`, and align the -repo's Go module, local static-link toolchain checkout, build instructions, and -runtime validation on that same tag. - -## Current State - -- `go.mod` / `go.sum` now pin - `github.com/aquasecurity/libbpfgo v0.9.2-libbpf-1.5.1` -- `Magefile.go` defaults to the sibling checkout at `../libbpfgo` (local path: - `/home/paul/git/libbpfgo`) and emits rebuild guidance if static - artifacts are missing -- The local checkout is currently ahead of the latest tag: - `v0.9.2-libbpf-1.5.1-23-g9a319d2` -- `README.md`, `AGENTS.md`, and `integrationtests/README.md` now pin the tag, - sync the `libbpf` submodule, and document the rebuild or validation workflow -- Integration coverage now passes again after restoring the legacy - `-flamegraph` / `-name` compatibility path used by the harness to collect - `.ior.zst` artifacts - -## Upgrade Target - -- Upstream tag: `v0.9.2-libbpf-1.5.1` -- Local checkout to use for static headers/archive: - `/home/paul/git/libbpfgo` -- Repo-relative default checkout path used by `Magefile.go`: `../libbpfgo` -- Override path for local experiments: `LIBBPFGO=/absolute/path/to/libbpfgo` -- Do not target `libbpfgo` `main` as part of this upgrade unless a tagged - release blocker is found - -## Pinned Source of Truth - -- `go.mod` / `go.sum` pin `github.com/aquasecurity/libbpfgo - v0.9.2-libbpf-1.5.1` -- `README.md`, `AGENTS.md`, and `integrationtests/README.md` document the same - checkout, tag, validation commands, and `make libbpfgo-static` workflow -- `Magefile.go` fails with explicit rebuild guidance when the local - `libbpfgo` checkout is missing the static artifacts that `ior` expects -- `internal/ior.go` preserves the legacy `-flamegraph` / `-name` trace-output - path required by the integration harness while leaving TUI and `-plain` - behavior unchanged - -## Breaking-Change Watchpoints - -- `v0.8.0-libbpf-1.5` includes a `BPFProg` API alignment change -- `v0.9.1-libbpf-1.5.1` changes `AttachUprobe` / - `AttachURetprobe` signatures -- `libbpf` minimum version moves from `1.3.x` to `1.5.1` -- Static builds require `git submodule update --init --recursive` in the local - `libbpfgo` checkout before `make libbpfgo-static` - -`ior` appears to use a narrow subset of APIs: - -- module loading (`NewModuleFromFile`, `NewModuleFromBuffer`, `BPFLoadObject`) -- maps (`GetMap`, `SetMaxEntries`, `InitGlobalVariable`) -- ringbuf (`InitRingBuf`) -- program lookup and tracepoint attach (`GetProgram`, `AttachTracepoint`) - -The direct API-break risk is therefore expected to be low, but compile/runtime -validation is still required. - -## Implementation Workstreams - -1. Align the version source of truth - - Pin `go.mod` / `go.sum` to `v0.9.2-libbpf-1.5.1` - - Align the local checkout instructions in `README.md` - - Align `AGENTS.md` and `Magefile.go` guidance with the same tag and rebuild flow - - Ensure the local checkout is reset to the exact tag and rebuilt - -2. Rebuild the local static toolchain - - In `/home/paul/git/libbpfgo`: - - `git checkout v0.9.2-libbpf-1.5.1` - - `git submodule update --init --recursive` - - `make libbpfgo-static` - -3. Compile and fix `ior` - - Rebuild `ior` against the upgraded wrapper and static `libbpf` - - Fix any compile/API regressions in: - - `internal/ior.go` - - `internal/bpfsetup.go` - - `internal/bpfembed.go` - - any `probemanager` adapter code if signatures changed - -4. Validate behavior - - Run `env GOTOOLCHAIN=auto mage world` - - Run root-required `env GOTOOLCHAIN=auto mage integrationTest` - - Specifically verify: - - embedded `ior.bpf.o` loading still works - - tracepoint attach/detach still works - - ring buffer event ingestion still works - - static build/link flags still work with the rebuilt local checkout - -5. Finalize docs and rollback guidance - - Document the exact `libbpfgo` tag and rebuild commands - - Mention the local checkout path used by `Magefile.go` - - Add troubleshooting notes for submodule sync / static rebuild failures - - Record the rollback target: `go.mod` pseudo-version - `v0.6.0-libbpf-1.3.0.20240111220235-90dbffffbdab` plus local checkout - commit `90dbffffbdab` - -## Validation Result - -- `env GOTOOLCHAIN=auto mage world` passed after the pinning commit - `f28dab3` -- `env GOTOOLCHAIN=auto mage integrationTest` passed after compatibility fix - commit `28338f4` -- The embedded-object path is covered by - `env GOTOOLCHAIN=auto TEST_NAME=TestLoadBPFModuleUsesEmbeddedObjectByDefault mage testWithName` - -## Troubleshooting - -- Missing `bpf/bpf.h` or `libbpf` symbols usually means the sibling checkout is - not at `v0.9.2-libbpf-1.5.1` or was not rebuilt after a `git checkout`. -- Raw `go test` can still fail for packages that import `libbpfgo` because it - does not inherit the `CGO_CFLAGS`, `CGO_LDFLAGS`, and `LIBBPFGO` values that - `Magefile.go` sets up. Use Mage targets for validated flows. -- If integration tests fail immediately with unknown `-flamegraph` / - `-name` flags, rebuild `ior` from a checkout that includes commit `28338f4`. - -## Rollback - -If the tagged release proves insufficient, revert the `ior` side to -`github.com/aquasecurity/libbpfgo -v0.6.0-libbpf-1.3.0.20240111220235-90dbffffbdab`, reset the sibling checkout, -and rebuild: - -```bash -git -C /home/paul/git/libbpfgo checkout 90dbffffbdab -git -C /home/paul/git/libbpfgo submodule update --init --recursive -make -C /home/paul/git/libbpfgo libbpfgo-static -``` - -## Validation Commands - -- `GOTOOLCHAIN=auto mage test` -- `GOTOOLCHAIN=auto mage world` -- `GOTOOLCHAIN=auto mage integrationTest` - -## References - -- Repo files: - - `go.mod` - - `README.md` - - `AGENTS.md` - - `Magefile.go` - - `internal/ior.go` - - `internal/bpfsetup.go` - - `internal/bpfembed.go` -- Local toolchain checkout: - - `/home/paul/git/libbpfgo` diff --git a/docs/parquet-recording-perf-baseline.md b/docs/parquet-recording-perf-baseline.md deleted file mode 100644 index e1731a7..0000000 --- a/docs/parquet-recording-perf-baseline.md +++ /dev/null @@ -1,113 +0,0 @@ -# Parquet Recording Performance Baseline - -Captured on 2026-03-13 from the benchmark task using the current Parquet recording implementation. - -## Reproduction - -Run the pipeline benchmark profiler: - -```bash -env GOTOOLCHAIN=auto mage benchProf -``` - -This writes timestamped pipeline profiles under `bench-profiles/`. The baseline captured for this run was: - -- `bench-profiles/pipeline-20260313-054719-cpu.prof` -- `bench-profiles/pipeline-20260313-054719-mem.prof` -- `bench-profiles/pipeline-20260313-054719-block.prof` - -Useful follow-up commands: - -```bash -env GOTOOLCHAIN=auto go tool pprof -top bench-profiles/pipeline-20260313-054719-cpu.prof -env GOTOOLCHAIN=auto go tool pprof -top -sample_index=alloc_space bench-profiles/pipeline-20260313-054719-mem.prof -env GOTOOLCHAIN=auto go tool pprof -top -sample_index=inuse_space bench-profiles/pipeline-20260313-054719-mem.prof -env GOTOOLCHAIN=auto go tool pprof -top bench-profiles/pipeline-20260313-054719-block.prof -``` - -## Baseline Numbers - -`mage benchProf` recorded the parquet-specific pipeline benchmarks at: - -- `BenchmarkPipelineHeadlessParquetCapture`: `14.20 ms/op`, `2000 pairs/op`, `347159 B/op`, `7212 allocs/op` -- `BenchmarkPipelineTUIParquetRecording`: `19.13 ms/op`, `2000 pairs/op`, `994016 B/op`, `19873 allocs/op` - -Interpretation: - -- The TUI recording path is about 35% slower than the headless parquet path for the same synthetic stream. -- The TUI recording path allocates about 2.9x more memory per operation because it also exercises the stats engine, ring buffer, live trie, and stream fanout path. - -## CPU Findings - -Top CPU samples were still dominated by the core event-loop path rather than parquet serialization itself: - -- `(*eventLoop).processRawEvent` and `(*eventLoop).tracepointExited` were the heaviest cumulative runtime buckets. -- `file.NewFdWithPid` and `os.Readlink` remained a large cumulative cost in exit handling and fd/path materialization. -- Channel scheduling (`runtime.chansend`, `runtime.chanrecv`, `runtime.selectgo`) stayed visible, especially in the TUI fanout path. -- Parquet-specific work was present but secondary: `parquet.(*Recorder).runSession`, `parquet.(*Writer).Close`, parquet-go column flushing, and Zstd compression showed up as meaningful but not dominant contributors. - -## Allocation Findings - -Allocation-space profile highlights: - -- `benchmarkPipelineMix` still accounted for the single largest allocation bucket because it rebuilds the synthetic raw-event stream for each benchmark run. -- `os.Readlink`, `file.(*FdFile).Dup`, and `file.NewFdWithPid` remained major allocators in the traced event path. -- TUI-only structures added measurable cost: - - `tui/eventstream.NewRingBuffer` - - `parquet.newRecordingSession` - - `benchmarkPipelineTUIParquet` -- Parquet writer lifecycle allocations were visible but bounded: - - parquet-go column buffers - - Zstd encoder initialization - - recorder session queue allocation - -Retained in-use memory was modest and dominated by parquet-go writer buffers and Zstd encoder state during flush/close: - -- `parquet-go/internal/memory.newSlice` -- parquet column buffer construction -- Zstd encoder initialization blocks - -## Contention Findings - -The block profile did not show a recorder lock hotspot. It was dominated by channel waits: - -- `runtime.chanrecv2`: about 65.8% of blocked time -- `runtime.chanrecv1`: about 31.8% of blocked time - -Most blocked time came from long-lived background workers waiting on channels, especially comm resolver workers. That means the current parquet path does not yet show a major mutex-contention bottleneck; the bigger costs are work done per event and the extra TUI fanout/allocation load. - -## Optimization Targets - -These are the highest-value targets for the follow-up optimization task: - -- Reduce fd/path resolution overhead in the event loop, especially `Readlink`-driven work in `file.NewFdWithPid`. -- Lower TUI recording allocations by reusing stream fanout buffers and reducing ring-buffer/session setup churn. -- Revisit recorder/session and parquet writer setup costs if recordings are started frequently in short sessions. -- Only optimize parquet compression or flush behavior after confirming they dominate a focused headless profile; they are not currently the primary cost center. - -## Verified Follow-up Win - -After profiling, the first optimization pass removed the extra TUI `streamEvents` channel hop and pushed directly into the mutex-protected ring buffer. - -Re-run command: - -```bash -env GOTOOLCHAIN=auto mage benchProf -``` - -Optimized pipeline artifacts: - -- `bench-profiles/pipeline-20260313-055321-cpu.prof` -- `bench-profiles/pipeline-20260313-055321-mem.prof` -- `bench-profiles/pipeline-20260313-055321-block.prof` - -Benchmark comparison for the changed path: - -| Benchmark | Before | After | Change | -| --- | --- | --- | --- | -| `BenchmarkPipelineTUIParquetRecording` | `19.13 ms/op`, `994016 B/op`, `19873 allocs/op` | `16.51 ms/op`, `992334 B/op`, `19866 allocs/op` | about `13.7%` faster with a small allocation reduction | - -Notes: - -- `BenchmarkPipelineHeadlessParquetCapture` also moved between runs, but that path was not changed; treat that difference as benchmark noise rather than a verified optimization win. -- Post-change CPU samples still show the event loop and fd/path resolution dominating overall cost, so the next optimization pass should stay focused on those areas instead of tuning parquet compression first. diff --git a/docs/tui-dashboard-table-sorting-plan.md b/docs/tui-dashboard-table-sorting-plan.md deleted file mode 100644 index 0d4586e..0000000 --- a/docs/tui-dashboard-table-sorting-plan.md +++ /dev/null @@ -1,336 +0,0 @@ -# TUI Dashboard Table Sorting Plan - -## Overview - -Add column-driven sorting to the dashboard table views for: - -- `3:Syscalls` -- `4:Files` -- `5:Processes` - -This is a **table-view-only** feature. Bubble, treemap, and icicle modes keep -their existing ordering rules. - -The task wording says "sort by any row", but the current dashboard already -tracks both a selected row and a selected column. This plan therefore treats -`s` as **sort by the currently selected column/cell**. - -Pressing `s`: - -1. on a new selected column enables that column's sort order -2. again on the same selected column clears the custom sort and restores the - tab's current default ordering - -## Current Behavior - -The dashboard already has the key pieces needed for this feature: - -- `internal/tui/dashboard/model.go` - - stores row selection and selected column for Syscalls, Files, and Processes - - routes table navigation with `left/right` and `h/l` -- `internal/tui/dashboard/syscalls.go` - - renders the syscall table from `snap.Syscalls()` -- `internal/tui/dashboard/files.go` - - renders both the file table and the grouped-directory table -- `internal/tui/dashboard/processes.go` - - renders the process table - -The current default ordering comes from the snapshot producers: - -- Syscalls: `Count desc`, then `Name asc` -- Files: `Accesses desc`, then `Path asc` -- Grouped directories: `Accesses desc`, then `Directory asc` -- Processes: `Syscalls desc`, then `Bytes desc`, then `PID asc` - -That ordering should remain the baseline whenever no custom sort is active. - -## Design Goals - -- `s` sorts by the selected column in table mode. -- `s` on the same selected column toggles back to the default ranking. -- `Enter` continues to act on the row currently visible on screen after sorting. -- Sorting stays in the dashboard layer; `statsengine` snapshot semantics do not - change. -- Selection remains anchored to the same logical entity when sorting changes. -- Width changes do not corrupt sort state for the Syscalls tab. - -## UX Rules - -- `s` is active only for sortable dashboard tables: - - Syscalls table mode - - Files table mode - - Files directory-grouped table mode - - Processes table mode -- `s` does nothing in: - - Overview - - Latency+Gaps - - Stream - - Flame - - bubble/treemap/icicle modes -- Table footer hints should add `s:sort`. -- The footer should also show the active sort, for example: - - `sort: default` - - `sort: p95 desc` - - `sort: Path asc` -- Expanded help should mention `s` so the feature is discoverable. - -## State Model - -Add dashboard-local sort state per table shape. - -Example shape: - -```go -type tableSortState[K comparable] struct { - active bool - key K -} -``` - -Recommended fields on `dashboard.Model`: - -- `syscallsSort` -- `filesSort` -- `filesDirSort` -- `processesSort` - -`Files` needs **two** sort states because the tab has two different table -schemas: - -- file rows -- grouped directory rows - -Those states should persist independently when `d` toggles between files and -directories. - -## Logical Sort Keys - -Do **not** store the raw selected column index as the sort identifier. - -The Syscalls table changes shape by width: - -- narrow layout: `Syscall Count Rate/s Avg p95 p99 Bytes Errors` -- wide layout: `Syscall Count Rate/s Avg Min Max p50 p95 p99 Bytes Errors` - -If sort state stored only a column index, resizing from narrow to wide would -turn "sort by p95" into "sort by Min". The sort state must therefore use a -stable logical key enum, and map the current visible column index to that enum -at keypress time. - -Recommended enums: - -- `syscallSortKey` -- `fileSortKey` -- `fileDirSortKey` -- `processSortKey` - -## Column Ordering Rules - -Use a fixed natural direction per logical column. This avoids inventing a -three-state cycle and matches the task requirement of "sort" plus "toggle back". - -### Syscalls - -- `Syscall`: `Name asc` -- `Count`: `Count desc` -- `Rate/s`: `RatePerSec desc` -- `Avg`: `LatencyMeanNs desc` -- `Min`: `LatencyMinNs desc` -- `Max`: `LatencyMaxNs desc` -- `p50`: `LatencyP50Ns desc` -- `p95`: `LatencyP95Ns desc` -- `p99`: `LatencyP99Ns desc` -- `Bytes`: `Bytes desc` -- `Errors`: `Errors desc` - -### Files - -- `Accesses`: `Accesses desc` -- `Read`: `BytesRead desc` -- `Write`: `BytesWritten desc` -- `Avg Latency`: `AvgLatencyNs desc` -- `Max Latency`: `MaxLatencyNs desc` -- `Path`: `Path asc` - -### Grouped Directories - -- `Accesses`: `Accesses desc` -- `Read`: `BytesRead desc` -- `Write`: `BytesWritten desc` -- `Avg Latency`: `AvgLatencyNs desc` -- `Max Latency`: `MaxLatencyNs desc` -- `Files`: `FileCount desc` -- `Directory`: `Dir asc` - -### Processes - -- `PID`: `PID asc` -- `Comm`: `Comm asc` -- `Syscalls`: `Syscalls desc` -- `Rate/s`: `RatePerSec desc` -- `Total Bytes`: `Bytes desc` -- `Avg Latency`: `AvgLatencyNs desc` - -## Comparator Rules - -For deterministic output, custom comparators should fall back to the existing -default ranking for that row type. - -Examples: - -- `p95 desc`, then syscall default order -- `Path asc`, then file default order -- `Comm asc`, then process default order - -This keeps ties stable and makes the "toggle back to default" behavior -predictable. - -## Selection Anchoring - -Changing sort order must not leave the cursor on the same numeric row index if -that index now points to a different entity. - -Before toggling sort: - -1. capture the currently selected logical entity key -2. recompute the sorted rows -3. restore the selected row to the same entity in the new order -4. if the entity no longer exists, clamp as today - -Recommended identity keys: - -- Syscalls: `Name` -- Files: `Path` -- Grouped directories: `Dir` -- Processes: `PID` - -This same anchor logic should run on refresh ticks while custom sorting is -active so the selected item does not drift unpredictably as live stats change. - -## Implementation Shape - -Keep the sorting logic in `internal/tui/dashboard`, not in `internal/statsengine`. - -Reason: - -- snapshot order is part of the existing aggregate ranking behavior -- only the table presentation needs alternate ordering -- bubble/treemap/icicle already have their own ordering rules - -Recommended implementation split: - -- `internal/tui/common/keys.go` - - add a `Sort` binding for `s` - - include it in dashboard help output -- `internal/tui/dashboard/model.go` - - add per-table sort state - - handle `s` - - ignore `s` outside sortable table modes - - preserve selection anchors when sort changes - - make `selectedSyscallFilter`, `selectedFileFilter`, and - `selectedProcessSnapshot` read from the same sorted rows used by rendering -- `internal/tui/dashboard/syscalls.go` - - add syscall sort key mapping from visible column index - - add sorted syscall row helper - - expose active sort label for footer hints -- `internal/tui/dashboard/files.go` - - add file and directory sort key helpers - - keep file and grouped-directory comparators separate -- `internal/tui/dashboard/processes.go` - - add process sort key helpers and sorted row helper -- `internal/tui/dashboard/table.go` - - extend footer hints/status rendering as needed for the active sort label - -## Rendering/Data Consistency - -The most important implementation rule is: - -**the rendered rows and the row-selection actions must use the exact same sorted -slice** - -Without this, the UI can show one row while `Enter` filters a different row. - -The safest approach is to centralize each table's sorted typed rows in helper -functions and use those helpers in both: - -- render paths -- selected-row action paths - -## Files Tab Details - -The Files tab needs one extra rule beyond Syscalls and Processes: - -- in plain file mode, sorting operates on `[]statsengine.FileSnapshot` -- in directory-grouped mode, sorting operates on `[]DirSnapshot` - -The two modes should not share a single sort key because their columns differ. -Switching with `d` should preserve: - -- last file-table custom sort -- last directory-table custom sort - -## Interaction With Existing Features - -- `Enter` - - still filters the currently selected visible row -- `d` - - only changes Files table shape; custom sort state persists per mode -- `v` - - custom sort state persists, but only applies when returning to table mode -- `b` - - unaffected; bubble/treemap ordering remains metric-driven -- terminal resize - - sort state persists because it stores logical keys, not raw indices -- trace restart / filter apply - - sort state should remain as view state - -## Testing Plan - -Add focused tests in `internal/tui/dashboard` and `internal/tui/common`. - -### Model behavior - -- `s` on Syscalls enables a column sort. -- `s` on the same Syscalls column restores default sorting. -- `s` on Processes does nothing in non-table modes. -- `s` on Files uses file-mode sort state when `filesDirGrouped == false`. -- `s` on Files uses directory-mode sort state when `filesDirGrouped == true`. -- changing sort preserves the selected entity instead of only the row index. - -### Width-sensitive syscall behavior - -- sorting by `p95` in narrow mode survives a resize into wide mode -- sorting by `Syscall` or `Count` maps correctly in both layouts - -### Selection action consistency - -- `selectedSyscallFilter()` uses sorted syscall rows -- `selectedFileFilter()` uses sorted file or directory rows -- `selectedProcessSnapshot()` uses sorted process rows in table mode - -### Help/footer rendering - -- expanded help includes `s` -- table footer includes `s:sort` -- active sort label is visible in the table footer - -### Negative cases - -- `s` does nothing on Overview / Stream / Flame / Latency+Gaps -- `s` does nothing for bubble / treemap / icicle views - -## Recommended Delivery Order - -1. add key binding and sort state plumbing in `dashboard.Model` -2. implement sorted typed-row helpers per tab -3. switch render paths and selected-row actions to the shared helpers -4. add footer/help output -5. add regression tests for sort toggling, width changes, and selected-row - action consistency - -## Non-Goals - -- no change to snapshot generation order in `statsengine` -- no sortable Overview or Latency+Gaps tables -- no ascending/descending toggle cycle beyond "custom sort" vs "default" -- no behavior change for bubble/treemap/icicle ordering diff --git a/docs/tui-flamegraph-behavior.md b/docs/tui-flamegraph-behavior.md deleted file mode 100644 index cc9bb5d..0000000 --- a/docs/tui-flamegraph-behavior.md +++ /dev/null @@ -1,46 +0,0 @@ -# TUI Flamegraph Expected Behavior - -This document records the expected interaction and layout behavior for the TUI -flamegraph. It is intended as a stable reference for regressions and for tests -under `internal/tui/flamegraph/` and `internal/tui/dashboard/`. - -## Interaction - -- `space` toggles pause. `p` does not pause the flamegraph and remains reserved - for the global PID picker at the top-level TUI. -- `enter` and left-click zoom into the selected or clicked frame. -- Clicking an ancestor frame in the zoom lineage re-roots the view to that - ancestor. -- `u`, `backspace`, and `esc` undo one zoom step. -- Direct clicks into a deep descendant create a single undo step back to the - previous zoom root, not an implicit stack of every skipped ancestor. -- While paused, navigation and zoom must continue to work against the frozen - snapshot. - -## Layout - -- The selected frame must not render with underline or a horizontal highlight - line across the bar. -- The current zoom root must span the full flamegraph width. -- The children of the current zoom root must be normalized to the full viewport - width, even when the zoom root has self time or exclusive weight. -- Zooming from any direction must produce the same full-width result for the - newly selected zoom root. -- The zoom lineage rows shown above the zoomed subtree provide context, but they - must not steal horizontal space from the zoomed subtree. - -## Rendering - -- Rendering the dashboard view must not mutate persistent flamegraph state. -- Redundant same-size viewport updates must be no-ops. -- In paused mode, repeated renders must not reintroduce stale frame geometry or - leave artifacts from a previous layout on screen. - -## Regression Coverage - -These expectations are covered by tests in: - -- `internal/tui/flamegraph/renderer_test.go` -- `internal/tui/flamegraph/model_test.go` -- `internal/tui/flamegraph/stress_test.go` -- `internal/tui/dashboard/model_test.go` diff --git a/docs/tui-flamegraph-plan.md b/docs/tui-flamegraph-plan.md deleted file mode 100644 index 261f0fb..0000000 --- a/docs/tui-flamegraph-plan.md +++ /dev/null @@ -1,450 +0,0 @@ -# TUI Flamegraph Tab - Full Design Plan - -## Overview - -Add a **7th dashboard tab** (`7:Flame`) that renders a live, interactive flamegraph -directly in the terminal using lipgloss for layout/styling and **Charm Harmonica** -for smooth spring-based animations on both zoom transitions and live data refresh. -The tab consumes data from an embedded `LiveTrie` and -provides interactive flamegraph navigation directly in-terminal. - -## Architecture - -``` -BPF events -> eventLoop.printCb - | - +-> statsengine.Ingest() (existing tabs 1-5) - +-> eventstream.Push() (existing tab 6) - +-> LiveTrie.Ingest() (NEW: tab 7) -``` - -The `LiveTrie` is instantiated in the TUI startup path and published via -`runtimeBindings`, similar to how `SnapshotSource` and `RingBuffer` are already -wired. - -## New Files and Packages - -| File/Package | Purpose | -|---|---| -| `internal/tui/flamegraph/model.go` | Bubble Tea sub-model: state, Update, View | -| `internal/tui/flamegraph/renderer.go` | Converts LiveTrie snapshot -> terminal frame layout, renders with lipgloss | -| `internal/tui/flamegraph/animation.go` | Harmonica spring state for frame width interpolation and zoom transitions | -| `internal/tui/flamegraph/search.go` | Search/highlight: text input bubble, match filtering, highlight styling | -| `internal/tui/flamegraph/zoom.go` | Zoom stack management (zoom into subtree, undo zoom, reset zoom) | -| `internal/tui/flamegraph/controls.go` | Toolbar rendering (status line, field order, keybindings help) | - -## Detailed Design - -### 1. Data Wiring - -**Changes to existing files:** - -- `internal/tui/tui.go` -- Add `SetLiveTrie(*flamegraph.LiveTrie)` to - `TraceRuntimeBindings` interface and `runtimeBindings` struct. The trace starter - publishes the `LiveTrie` the same way it publishes the stats engine. -- `internal/ior.go` / trace setup -- When running in TUI mode, create a `LiveTrie` - alongside the stats engine. In the `eventLoop.printCb`, call - `liveTrie.Ingest(ep)` in addition to existing stats/stream ingestion. Publish via - `bindings.SetLiveTrie(lt)`. -- `internal/tui/dashboard/model.go` -- Add `liveTrie *flamegraph.LiveTrie` field, - `flamegraphModel flamegraphtui.Model` child model. Wire refresh tick to poll - `LiveTrie.Version()`. - -### 2. Tab Integration - -**Changes to `internal/tui/dashboard/tabs.go`:** - -```go -const ( - // ... existing tabs ... - TabFlame // new 7th tab -) - -var allTabs = []Tab{..., TabFlame} -``` - -Tab label: `"Flame"` (short: `"Flm"`). Key binding: `7`. - -**Changes to `internal/tui/dashboard/model.go`:** - -- Add `flamegraphModel` field of type `flamegraphtui.Model` -- In `Update()`, on `refreshTickMsg` or a dedicated `flameTickMsg` (200ms like - stream), poll `LiveTrie.Version()` and push snapshot data into the flamegraph - model -- In `handleKey()`, add `key.Matches(msg, m.keys.Seven)` -> `TabFlame` -- In `renderActiveTab()`, delegate to `flamegraphModel.View(width, height)` -- On `WindowSizeMsg`, propagate dimensions to `flamegraphModel.SetViewport(w, h)` - -- this triggers re-layout of all frames to fit new terminal size - -### 3. Flamegraph TUI Model (`internal/tui/flamegraph/model.go`) - -```go -type Model struct { - // Data - liveTrie *flamegraph.LiveTrie - lastVersion uint64 - snapshot *flamegraph.trieSnapshot // latest parsed snapshot - - // Layout - frames []tuiFrame // current rendered frames - targetFrames []tuiFrame // target frames (for animation lerp) - width, height int - - // Interaction - selectedIdx int // cursor/selected frame index - zoomStack []zoomState // zoom history for undo - zoomRoot *flamegraph.trieSnapshot // current zoom root (nil = full view) - - // Search - searchActive bool - searchInput textinput.Model // from bubbles/textinput - searchQuery string - matchIndices map[int]bool // frame indices matching search - - // Field ordering - fieldPresets [][]string - fieldIndex int - - // Animation - springs []frameSpring // per-frame Harmonica spring state - animTicker bool // whether animation tick is running - - // Flags - paused bool - isDark bool -} - -type tuiFrame struct { - Name string - Col int // column position (0-based, in terminal cells) - Row int // row from bottom - Width int // width in terminal cells - Total uint64 - Percent float64 - Fill lipgloss.Color - Depth int - Path string // full path for zoom identification -} -``` - -### 4. Rendering Strategy (`internal/tui/flamegraph/renderer.go`) - -Terminal flamegraphs use a **cell-based layout** rather than pixel coordinates: - -1. **BuildTerminalLayout(snapshot, width, height, zoomRoot)** converts trie snapshot - to `[]tuiFrame`: - - Width is terminal columns (not 1200px). Each frame width = - `floor(termWidth * (node.total / rootTotal))`. - - Height is terminal rows. Each frame is exactly **1 row tall** (not 16px). - - Rows grow bottom-to-top: root at the bottom, leaves at the top (classic - flamegraph orientation). If tree depth exceeds available rows, only show the - deepest `height-2` levels (toolbar + status take 2 rows). - - Frames narrower than 1 cell are culled (terminal equivalent of `minWidthPx`). - -2. **Frame rendering**: Each frame is a **colored block** of text: - - Use lipgloss background color fill with the existing `frameColor()` warm palette - - Frame text = truncated function/path name that fits within the frame width - - Selected frame gets a distinct border/highlight style (e.g., bold + inverted) - - Search-matched frames get a different highlight color (e.g., red background) - -3. **Compositing**: Use `lipgloss.Place()` or the new lipgloss v2 compositor/canvas - to layer frames at their (col, row) positions. Each row of the flamegraph is - assembled by joining frame cells horizontally with background fill for gaps. - -4. **Auto-resize**: On `WindowSizeMsg`, re-run `BuildTerminalLayout` with new - dimensions. All frame widths and row counts recalculate. Harmonica springs - animate from old positions/widths to new ones. - -### 5. Subtree Highlighting - -When the user selects (navigates to) a frame, the **entire subtree rooted at that -frame** is visually highlighted so the user can see exactly what would be zoomed -into on `enter`. - -**Visual states for any frame:** - -| State | Visual Treatment | -|---|---| -| **Selected frame** | Bold text + bright border/underline + slightly lightened background | -| **Selected subtree** (ancestors + descendants) | Full saturation, normal brightness -- "active" look | -| **Not in subtree** | **Dimmed**: reduced saturation / lower contrast background, muted text | -| **Search match** | Red/magenta background overlay (overrides dim but not selection) | - -Dimming the *non-subtree* frames makes the selected subtree "pop" naturally. - -**Ancestor vs Descendant Distinction:** - -| Relationship | Visual | -|---|---| -| **Selected frame** | Bold, inverted/bright border | -| **Descendants** | Full color, normal weight | -| **Ancestors** | Full color with subtle left-border indicator (breadcrumb trail) | -| **Unrelated** | Dimmed (lower contrast background, gray text) | - -**Subtree membership** computed via `Path` field (the `\x1f`-delimited ancestor -chain). A frame is in the subtree if: -- Its path is a **prefix** of the selected frame's path (ancestor), OR -- The selected frame's path is a **prefix** of its path (descendant), OR -- It **is** the selected frame - -O(n) scan over frames, recomputed each time selection moves. - -**Interaction with search**: Search matches outside subtree shown dimmed in match -color; inside subtree shown bright. Selected frame matching search uses selection -style. - -### 6. Animation with Harmonica (`internal/tui/flamegraph/animation.go`) - -```go -type frameSpring struct { - widthSpring harmonica.Spring - colSpring harmonica.Spring - currentW float64 - currentCol float64 - velocityW float64 - velocityCol float64 -} -``` - -**Two animation scenarios:** - -1. **Data refresh**: When `LiveTrie` version changes and new frame widths differ - from current, set new target widths. On each animation tick (~30fps = - `tea.Tick(33ms)`), call `spring.Update(current, velocity, target)` for each - frame's width and column. Render at interpolated values. Stop animation tick - when all frames reach target within epsilon. - -2. **Zoom transition**: When user zooms into a subtree, the target layout changes - (zoomed subtree expands to fill full width). Springs animate column positions - and widths from pre-zoom to post-zoom. Undo-zoom reverses this. - -**Spring configuration**: `harmonica.NewSpring(harmonica.FPS(30), 6.0, 1.0)` -- -critically damped for snappy transitions without oscillation. - -### 7. Keybindings - -| Key(s) | Action | -|---|---| -| `j` / `down` / `arrow-down` | Move selection to frame below (shallower depth) | -| `k` / `up` / `arrow-up` | Move selection to frame above (deeper / toward leaves) | -| `h` / `left` / `arrow-left` | Move selection to previous sibling at same depth | -| `l` / `right` / `arrow-right` | Move selection to next sibling at same depth | -| `enter` | Zoom into selected frame's subtree | -| `backspace` / `u` | Undo zoom (pop zoom stack) | -| `escape` (when zoomed) | Reset zoom to root | -| `/` | Open search input | -| `escape` (when searching) | Close search, clear highlights | -| `n` | Jump to next search match | -| `N` (shift+n) | Jump to previous search match | -| `p` | Toggle pause | -| `r` | Reset baseline | -| `o` | Cycle field order preset | -| `?` | Toggle flame-specific help overlay | - -Both vim-style (j/k/h/l) and regular cursor keys (arrow keys) are bound to the -same actions via `key.NewBinding(key.WithKeys("j", "down"))`. - -### 8. Search (`internal/tui/flamegraph/search.go`) - -- Uses `bubbles/textinput` for inline search input at the bottom of flame view -- On submit, iterate frames and mark matching indices (case-insensitive substring - match on frame name) -- Matching frames rendered with highlight color; non-matching frames dimmed -- `n`/`N` moves selection to next/previous match -- Show match count in status line (e.g. `3/12 matches`) - -### 8.1 Color Coding (Implemented) - -Flame frames now use semantic colors first, with hash-based fallback for unknown labels: - -| Category | Match rule | Color (RGBA / hex) | -|---|---|---| -| Read I/O | name contains `read`/`pread` | `78,132,201` (`#4E84C9`) | -| Write I/O | name contains `write`/`pwrite` | `222,122,58` (`#DE7A3A`) | -| Metadata I/O | name contains `open`, `close`, `stat`, `rename`, `link` | `196,168,72` (`#C4A848`) | -| Path-oriented nodes | starts with `/`, contains `/`, or `path:` | `88,156,84` (`#589C54`) | -| Process/thread labels | contains `pid` or `tid` | `67,151,149` (`#439795`) | -| Other syscall buckets | starts with `sys_` | `191,99,74` (`#BF634A`) | -| Fallback | anything else | deterministic hash palette | - -This keeps common I/O classes visually stable across refreshes while preserving -distinct colors for uncategorized frames. - -### 9. Zoom (`internal/tui/flamegraph/zoom.go`) - -- `zoomStack []zoomState` where `zoomState` holds the `path` string of the zoomed - node and the previous `selectedIdx` -- On zoom-in: push current state, find subtree node matching selected frame's path, - set as `zoomRoot`, rebuild layout with subtree as root -- On undo: pop stack, restore previous root -- On reset: clear stack, set `zoomRoot = nil` - -### 10. Field Order Cycling - -Preset cycle: -```go -fieldPresets = [][]string{ - {"comm", "path", "tracepoint"}, - {"path", "tracepoint", "comm"}, - {"tracepoint", "comm", "path"}, - {"pid", "path", "tracepoint"}, -} -``` - -Pressing `o` calls `LiveTrie.Reconfigure(nextPreset)` which resets the trie and -starts fresh accumulation. - -### 11. Toolbar / Status Line (`internal/tui/flamegraph/controls.go`) - -Rendered as a single line above the flamegraph area: - -``` -[LIVE] | o:order(comm>path>tp) | /:search | enter:zoom | u:undo | r:reset | p:pause -``` - -When paused: `[PAUSED]` in red. When searching: shows search input and match count. - -Selected frame info line at the bottom: -``` -sys_read (1,234 calls, 45.2%) - /usr/bin/myapp > /dev/sda > sys_enter_read -``` - -### 12. Dependencies to Add - -- `github.com/charmbracelet/harmonica` -- spring animation -- `charm.land/bubbles/v2/textinput` -- search input (already transitive via bubbles v2) - -### 13. Changes to Existing Files (Summary) - -| File | Change | -|---|---| -| `internal/tui/dashboard/tabs.go` | Add `TabFlame`, update `allTabs`, `String()`, `tabLabel()` | -| `internal/tui/dashboard/model.go` | Add `flamegraphModel` field, wire refresh, handle key `7`, render in `renderActiveTab()` | -| `internal/tui/tui.go` | Add `SetLiveTrie()` to bindings interface, propagate to dashboard | -| `internal/tui/common/keys.go` | Add `Seven` key binding for tab 7 | -| `internal/ior.go` | Create `LiveTrie` in TUI mode, wire into eventLoop callback, publish via bindings | -| `internal/flags/flags.go` | Add `-fields` default propagation to TUI mode | -| `go.mod` | Add `github.com/charmbracelet/harmonica` dependency | - -### 14. Risks and Mitigations - -1. **Performance at high event rates**: The `LiveTrie.Ingest()` call adds overhead - to the hot path. Mitigation: TUI render is decoupled via version polling. - -2. **Terminal width too narrow**: Flamegraphs with many shallow frames may not - render meaningfully in 80-column terminals. Mitigation: cull frames below 1 cell, - show "terminal too narrow" message below ~60 columns. - -3. **Animation frame budget**: 30fps animation ticks in a terminal could cause - flicker on slow terminals. Mitigation: only run animation tick when springs are - active, stop when settled. - -4. **Color support**: Not all terminals support 24-bit color. Mitigation: lipgloss - v2 auto-downgrades. The warm flamegraph palette degrades gracefully to 256-color. - ---- - -## Benchmarking & Profiling Plan - -### Goals - -1. Quantify render performance at various terminal sizes and trie depths -2. Measure animation overhead of Harmonica spring ticks at 30fps with N springs -3. Detect regressions via baseline benchmarks running in CI alongside `mage bench` -4. Profile hot paths to identify allocations and CPU bottlenecks - -### Benchmark Suite - -New file: `internal/tui/flamegraph/bench_test.go` - -| Benchmark | What it measures | -|---|---| -| `BenchmarkBuildTerminalLayout` | trieSnapshot -> []tuiFrame at widths 80/120/200/300 and depths 10/50/100 | -| `BenchmarkRenderFrame` | Full View() render at 80x24, 120x40, 200x60 | -| `BenchmarkComputeSubtreeSet` | Subtree membership with 100/1000/5000 frames | -| `BenchmarkSearchHighlight` | Search match computation across N frames | -| `BenchmarkSpringUpdate` | harmonica spring.Update() across 100/500/2000 springs | -| `BenchmarkAnimationTick` | Full tick: update springs + rebuild render output | -| `BenchmarkZoomTransition` | Layout rebuild on zoom-in | -| `BenchmarkLiveTrieIngestAndSnapshot` | End-to-end: ingest N events + snapshot + layout | -| `BenchmarkResizeRelayout` | Layout rebuild at new terminal dimensions | - -### Benchmark Fixtures - -Synthetic trie generators in `internal/tui/flamegraph/testdata_test.go`: - -| Label | Depth | Breadth | Approximate frame count | -|---|---|---|---| -| `small` | 5 | 3 | ~120 | -| `medium` | 10 | 5 | ~2,500 | -| `large` | 15 | 8 | ~10,000+ | -| `deep` | 50 | 2 | ~100 (narrow but deep) | -| `wide` | 3 | 50 | ~5,000 (shallow but very wide) | - -### Performance Targets - -| Operation | Target | Rationale | -|---|---|---| -| `BuildTerminalLayout` (medium, 120-col) | < 500us | Well within one tick interval | -| `View()` full render (medium, 120x40) | < 2ms | 30fps = 33ms budget | -| `ComputeSubtreeSet` (1000 frames) | < 100us | Runs on every selection move | -| Single animation tick (500 springs) | < 1ms | 16ms frame budget headroom | -| `LiveTrie.Ingest` + `SnapshotJSON` | < 200us | Hot path performance | - -### Profiling Integration - -#### Built-in profiling flag - -When `-pprof` is set in TUI mode: -- Write `ior-tui-cpu.prof` during session -- Write `ior-tui-mem.prof` on quit -- Write `ior-tui-trace.out` for first 10 seconds - -#### Mage Targets - -| Target | Command | -|---|---| -| `mage benchFlame` | `go test ./internal/tui/flamegraph/ -bench=. -benchmem -count=5` | -| `mage benchFlameProf` | Same + `-cpuprofile` + `-memprofile` | -| `mage benchFlameCmp` | Compare against saved baseline via `benchstat` | - -### Allocation Targets - -| Hot path | Strategy | -|---|---| -| `BuildTerminalLayout` | Pre-allocate []tuiFrame, reuse across refreshes | -| `View()` render | strings.Builder with pre-estimated capacity, cache styles | -| `computeSubtreeSet` | Reuse map[int]bool (clear + repopulate) | -| Spring updates | Fixed-size []frameSpring, no per-tick allocs | - -Target: **zero allocs** in animation tick, **< 5 allocs/op** in full render. - -### Stress Tests - -New file: `internal/tui/flamegraph/stress_test.go` - -- **TestStressHighEventRate**: 100k events from 10 goroutines + concurrent render -- **TestStressRapidResize**: 100 WindowSizeMsg with random dimensions -- **TestStressZoomDuringRefresh**: Interleaved zoom/undo with data refresh ticks - -All run with `-race`. - -### Profiling Workflow (Manual) - -```bash -# Run TUI with profiling -sudo ior -pprof -pid 1234 - -# Analyze CPU profile -go tool pprof -http=:8080 ior-tui-cpu.prof - -# Analyze allocations -go tool pprof -http=:8080 -alloc_space ior-tui-mem.prof - -# Analyze execution trace -go tool trace ior-tui-trace.out - -# Benchmark-specific profiling -mage benchFlameProf -go tool pprof -http=:8080 flame-cpu.prof -``` diff --git a/docs/tui-global-filter-architecture.md b/docs/tui-global-filter-architecture.md deleted file mode 100644 index 386ef75..0000000 --- a/docs/tui-global-filter-architecture.md +++ /dev/null @@ -1,160 +0,0 @@ -# TUI Global Filter Architecture - -## Overview - -Add one global filter flow for the TUI that is accessible from any dashboard -screen/tab and applies consistently across: - -- Flame -- Overview -- Syscalls -- Files -- Processes -- Latency+Gaps -- Stream - -The filter UI should reuse the current stream filter concepts, but the filter -state must move to the top-level TUI model so there is a single source of truth. - -## Goals - -- One shared filter modal opened from anywhere in the dashboard. -- One shared filter state owned by the top-level TUI model. -- Aggregate dashboards must only reflect matching live events. -- The stream tab must preserve its existing ring buffer across filter changes. -- Existing stream rows must be re-filtered locally after a filter change. -- String filters must remain substring-based. File path matching is explicitly a - partial substring match, not exact-only. - -## Supported Filter Fields - -The global filter supports the fields currently exposed by the stream filter -workflow, plus the existing runtime PID/TID controls: - -- `syscall` -- `comm` -- `file/path` -- `pid` -- `tid` -- `fd` -- `latency` -- `gap` -- `bytes` -- `retval` -- `errors only` - -## Matching Semantics - -- String fields use case-insensitive substring matching. -- `file/path` uses the same case-insensitive substring matching as the other - string fields. -- Numeric fields use the existing comparison operators (`=`, `!=`, `>`, `>=`, - `<`, `<=`). -- `errors only` keeps only events with negative return values / error-marked - events. - -## Architecture - -``` -BPF events -> eventLoop / print callback - | - +-> global runtime matcher - | - +-> statsengine.Ingest() (filtered live aggregates) - +-> liveTrie.Ingest() (filtered flamegraph) - +-> eventstream.Push() (filtered new stream rows) - -TUI state - top-level model - | - +-> owns shared global filter state - +-> owns global filter modal lifecycle - +-> restarts tracing when filter changes - +-> preserves current screen/tab - +-> asks Stream to re-filter buffered rows in place -``` - -## Runtime Behavior - -Applying a new global filter does all of the following: - -1. Preserve the current screen/tab. -2. Stop the active trace runtime. -3. Reset aggregate dashboard state and flamegraph baseline. -4. Restart tracing with the new global filter. -5. Keep the stream ring buffer contents intact. -6. Re-filter existing buffered stream rows locally so the stream updates - immediately. - -This means aggregate tabs only show post-change matching data, while Stream can -still show matching historical rows from before the restart. - -## Ownership and Data Flow - -### Top-level TUI model - -The top-level TUI model owns: - -- active global filter state -- global filter modal visibility -- filter apply/cancel/clear behavior -- trace restart lifecycle -- publication of filter state to child models that need local re-filtering - -### Stream model - -The stream model no longer owns the primary filter system. It must: - -- accept the shared global filter -- re-filter its retained `allEvents` slice on demand -- preserve the ring buffer across filter changes -- keep regex search as a separate feature -- drop the stream-local add/undo filter stack - -### Runtime / trace startup - -The TUI trace context currently carries only PID/TID. It must be expanded to -carry the full global filter payload. The trace startup path then uses that -payload to construct a runtime matcher before forwarding events into: - -- stats engine -- flamegraph live trie -- new stream events - -## Key Implementation Areas - -- `internal/tui/tui.go` - - own shared filter state - - open modal globally - - restart trace on apply - - preserve current screen/tab -- `internal/tui/dashboard/model.go` - - route global shortcut access cleanly across tabs - - expose active filter summary in dashboard rendering/help -- `internal/tui/eventstream/*` - - refactor modal for reuse - - keep stream history - - re-filter buffered rows in place - - remove stream-local filter stack behavior -- `internal/ior.go` - - plumb full filter payload through trace startup - - apply runtime matcher before aggregate/flame/live stream ingestion - -## UX Rules - -- `f` opens the global filter modal from any dashboard tab. -- `Enter` in the modal applies the filter. -- `Esc` closes the modal without applying. -- clear action resets to the unfiltered state. -- active filter summary is visible in dashboard status/help areas. -- stream regex search (`/`, `?`, `n`, `N`) remains separate from filtering. - -## Testing Requirements - -- context round-trip of the full global filter payload -- runtime matcher coverage for all supported fields -- stream ring buffer retention across filter changes -- local re-filtering of buffered stream rows -- file path substring matching coverage -- aggregate dashboards only reflecting matching live events after restart -- help/status rendering updates for the shared filter workflow |
