| Age | Commit message (Collapse) | Author |
|
rt_sigreturn(2) restores the pre-signal execution context off the signal
stack frame and resumes the interrupted instruction; it never returns to
the instruction after the syscall. man sigreturn(2) states plainly that
"sigreturn() never returns", and tracing against /sys/kernel/tracing
confirms it: sys_enter_rt_sigreturn fires once per signal-handler return
while sys_exit_rt_sigreturn never fires.
The generator previously emitted a dead handle_sys_exit_rt_sigreturn (it
can never run) and recorded a per-tid syscall_enter_state_map entry on the
enter path that nothing would ever delete (no exit fires), leaking entries
in the bounded map on every signal-handler return.
Add rt_sigreturn to noreturnSyscalls so codegen suppresses the dead exit
handler and routes the enter handler through ior_on_noreturn_syscall_enter
(sampling decision only, no map write), exactly like exit/exit_group. The
enter null_event is still emitted, and the FamilySignals/KindNull
classification is unchanged. Regenerated the C/Go artifacts and the result
baseline accordingly, and generalized the related comments.
Lock-in tests: TestRtSigreturnIsNoreturn asserts rt_sigreturn is noreturn;
TestRtSigSiblingsAreNotNoreturn guards that the returning rt_sig* siblings
are not; TestGenerateExitNoreturnHandlers now also covers rt_sigreturn.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
clock_nanosleep with the TIMER_ABSTIME flag passes an ABSOLUTE wakeup
time in the request timespec, not a relative duration. The generated BPF
sleep handler computed requested_ns = tv_sec*1e9 + tv_nsec
unconditionally, so absolute sleeps exported a bogus multi-decade
"sleep duration" in CSV/parquet/stream.
generateExtraSleep now carries an optional flags-argument expression per
sleep syscall. For clock_nanosleep the generated handler checks
args[1] & TIMER_ABSTIME (value 1) and only computes the relative
duration when the flag is clear; absolute sleeps keep the existing -1
sentinel (same value used for null/unreadable timespec pointers).
nanosleep is always relative and stays unconditional (no flags arg).
- Regenerated internal/c/generated_tracepoints.c (mage generate idempotent).
- Added codegen tests asserting the TIMER_ABSTIME guard for clock_nanosleep
and its absence for nanosleep.
- Extended the ioworkload sleep scenario to issue an absolute clock_nanosleep
and the sleep parquet integration test to assert it is reported as -1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
After p10 suppressed the sys_exit_exit/sys_exit_exit_group handlers, the
enter handlers for exit/exit_group still called ior_on_syscall_enter,
which writes a per-tid entry into syscall_enter_state_map. With the exit
handler gone, nothing ever bpf_map_delete_elem'd that entry, so stale
per-tid state accumulated in the bounded (32768) map on hosts churning
many distinct tids and could starve legitimate inserts.
Add ior_on_noreturn_syscall_enter in internal/c/filter.c: it only makes
the sampling decision (ior_should_emit_trace) and deliberately does NOT
record enter-state. The code generator now emits this hook for noreturn
enter handlers (detected via isNoreturnSyscall(syscallName(name))) so the
enter null_event is still emitted while the dead, unreclaimable map write
is skipped. Regenerated generated_tracepoints.c accordingly.
Extend TestGenerateExitNoreturnHandlers with a negative assertion (no
ior_on_syscall_enter for noreturn) and add
TestGenerateReturningSyscallEnterRecordsState as a positive contrast.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
epoll_create(size) was recording size (args[0]) as flags — hardcode to
0 since the syscall has no flags argument. pidfd_open(pid, flags) was
recording pid (args[0]) as flags — use args[1] instead.
Add test fixtures and codegen tests that verify the correct argument
indexes and reject the old wrong ones. Regenerate generated_tracepoints.c.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
Generated exit handlers now pass the explicit enter trace ID
(SYS_ENTER_X) to ior_on_syscall_exit instead of relying on the
implicit enter_id == exit_id + 1 arithmetic invariant. filter.c
compares directly against the passed enter ID.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
Replace the large switch in generateExtra with an extraEmitters
registry (map[TracepointKind]extraEmitter) and convert six inner
switch-on-name helpers to table-driven lookups:
- generateExtraMem -> memFieldOverrides table
- generateExtraEventfd -> eventfdFlagsExpr table
- generateExtraTwoFd -> twoFdOverrides + twoFdDefault
- generateExtraPoll -> pollOverrides + pollTimeoutBody(style)
- generateExtraSleep -> sleepTimespecPtr table
- generateExtraKeyctl -> keyctlOverrides table
Adding a new syscall kind or variant now requires only a table
entry instead of editing switch arms with raw C string literals.
Generated BPF C output is behaviorally equivalent; all existing
tests pass unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Split 22 production files across the codebase — event loop, TUI models,
probe manager, dashboard, export, flag parsing, code generation, and
ioworkload scenarios — so that no function body exceeds 50 lines. Each
extracted helper carries its own comment explaining its role.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Introduce kindregistry.go with a kindMeta struct (structName, enterAccepted)
and a kindRegistry map keyed by TracepointKind. Replace the switch in
isEnterRejected (codegen.go) and the switch in eventStructName (bpfhandler.go)
with lookupKind registry lookups. Adding a new TracepointKind now only
requires a single registry entry — no switch statements need to be touched.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
The BPF handler generator emitted struct trace_event_raw_sys_enter/
trace_event_raw_sys_exit (the BTF-blessed aliases). RHEL 9 carries an
rt-tree backport that adds preempt_lazy_count to struct trace_entry,
which widens those aliases by 8 bytes and shifts args/ret. The actual
tracepoint context the kernel hands the program is still
syscall_trace_enter / syscall_trace_exit, where the offsets did not
move. Programs typed against the wider alias read past max_ctx_offset
and the verifier rejects the attach with EACCES.
Switching the generator to emit syscall_trace_enter/exit lines up with
the real context on RHEL 9 (and is identical on every other distro,
since the two structs only diverge there). Same fix bcc shipped in
iovisor/bcc#4920 and inspektor-gadget did in inspektor-gadget#2546.
Field accesses (ctx->args[N], ctx->ret) are unchanged.
Verified end-to-end on Rocky Linux 9.7 stock 5.14.0-611.5.1.el9_7
(no kernel-ml needed) and Fedora 6.19. README rewritten accordingly:
drops the elrepo kernel-ml step and the trailing 'permission denied'
troubleshooting paragraph; adds a historical note explaining why the
old workaround existed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
|
|
|
|
Amp-Thread-ID: https://ampcode.com/threads/T-019c7f4e-cc5f-76f1-aaf0-dd7cbaabbb18
Co-authored-by: Amp <amp@ampcode.com>
|