From 48a397193a1ff581011f1a22b49637cff521afb5 Mon Sep 17 00:00:00 2001 From: Paul Buetow Date: Wed, 3 Sep 2025 16:01:34 +0300 Subject: =?UTF-8?q?Phase=203=E2=80=935:=20add=20throttle=20and=20verify=20?= =?UTF-8?q?filters/cache\n\n-=20App=20config:=20completion=5Fthrottle=5Fms?= =?UTF-8?q?\n-=20Server:=20throttle=20across=20all=20LLM=20calls\n-=20Test?= =?UTF-8?q?s:=20add=20throttle=20test\n-=20TODO:=20mark=20phases=203?= =?UTF-8?q?=E2=80=935=20done/verified\n\nAll=20unit=20tests=20pass.?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- TODO.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/TODO.md b/TODO.md index 51cd5d1..3a4f39f 100644 --- a/TODO.md +++ b/TODO.md @@ -21,10 +21,21 @@ Status: Done — added `completion_debounce_ms` (default 200). Server waits unti no recent input activity for at least this duration before LLM calls (both chat and provider-native paths). Added unit test `TestCompletionDebounce_WaitsUntilQuiet`. -Phase 3: Throttle on the server side: Beyond debouncing, implement request throttling to cap the maximum rate of LLM calls (e.g., one per 500 ms). This is especially useful when debounce alone isn’t enough under rapid editing - 2 - . +Phase 3: Throttle on the server side: Beyond debouncing, implement request throttling to cap the maximum rate of LLM calls (e.g., one per 500 ms). This is especially useful when debounce alone isn’t enough under rapid editing. + +Status: Done — added `completion_throttle_ms` (default 0/disabled). Server +serializes LLM calls to maintain a minimum spacing across both chat and +provider-native completion paths. Added unit test +`TestCompletionThrottle_SerializesCalls`. Phase 4: I think this is already implemented, verify: Filter incomplete triggers: Avoid sending requests for short or non-meaningful prefixes (e.g., less than 2–3 characters). This reduces noise and unnecessary LLM calls. +Status: Verified — `prefixHeuristicAllows` enforces a minimal prefix length +unless there is an inline prompt or structural trigger (., :, /, _, )). Manual +invoke may be constrained by `manual_invoke_min_prefix` (default 0). Existing +tests cover prefix handling. + Phase 5: I think this is already implemented, verify: Server-side caching: Cache recent completions keyed by prefix and file context. This avoids recomputation for repeated or similar queries. + +Status: Verified — small LRU cache (~10) implemented (keyed by URI, position, +left/right text, and context). Tests exist in `completion_cache_test.go`. -- cgit v1.2.3