Phase 3–5: add throttle and verify filters/cache\n\n- App config: completion_throttle_ms\n- Server: throttle across all LLM calls\n- Tests: add throttle test\n- TODO: mark phases 3–5 done/verified\n\nAll unit tests pass.

author: Paul Buetow <paul@buetow.org> 2025-09-03 16:01:34 +0300
committer: Paul Buetow <paul@buetow.org> 2025-09-03 16:01:34 +0300
commit: 48a397193a1ff581011f1a22b49637cff521afb5 (patch)
tree: 6653006c798016e356e11c58ffef6efe121151a2
parent: ffe9ed5531b6e62706ea555c48964ea0e560b780 (diff)
1 files changed, 14 insertions, 3 deletions
diff --git a/TODO.md b/TODO.md
index 51cd5d1..3a4f39f 100644
--- a/TODO.md
+++ b/TODO.md
@@ -21,10 +21,21 @@ Status: Done — added `completion_debounce_ms` (default 200). Server waits unti
 no recent input activity for at least this duration before LLM calls (both chat
 and provider-native paths). Added unit test `TestCompletionDebounce_WaitsUntilQuiet`.
         
-Phase 3: Throttle on the server side: Beyond debouncing, implement request throttling to cap the maximum rate of LLM calls (e.g., one per 500 ms). This is especially useful when debounce alone isn’t enough under rapid editing
-    2
-    .
+Phase 3: Throttle on the server side: Beyond debouncing, implement request throttling to cap the maximum rate of LLM calls (e.g., one per 500 ms). This is especially useful when debounce alone isn’t enough under rapid editing.
+
+Status: Done — added `completion_throttle_ms` (default 0/disabled). Server
+serializes LLM calls to maintain a minimum spacing across both chat and
+provider-native completion paths. Added unit test
+`TestCompletionThrottle_SerializesCalls`.
 
 Phase 4: I think this is already implemented, verify: Filter incomplete triggers: Avoid sending requests for short or non-meaningful prefixes (e.g., less than 2–3 characters). This reduces noise and unnecessary LLM calls.
 
+Status: Verified — `prefixHeuristicAllows` enforces a minimal prefix length
+unless there is an inline prompt or structural trigger (., :, /, _, )). Manual
+invoke may be constrained by `manual_invoke_min_prefix` (default 0). Existing
+tests cover prefix handling.
+
 Phase 5: I think this is already implemented, verify: Server-side caching: Cache recent completions keyed by prefix and file context. This avoids recomputation for repeated or similar queries.
+
+Status: Verified — small LRU cache (~10) implemented (keyed by URI, position,
+left/right text, and context). Tests exist in `completion_cache_test.go`.
author	Paul Buetow <paul@buetow.org>	2025-09-03 16:01:34 +0300
committer	Paul Buetow <paul@buetow.org>	2025-09-03 16:01:34 +0300
commit	48a397193a1ff581011f1a22b49637cff521afb5 (patch)
tree	6653006c798016e356e11c58ffef6efe121151a2
parent	ffe9ed5531b6e62706ea555c48964ea0e560b780 (diff)