summaryrefslogtreecommitdiff
path: root/REPORT.md
diff options
context:
space:
mode:
Diffstat (limited to 'REPORT.md')
-rw-r--r--REPORT.md124
1 files changed, 0 insertions, 124 deletions
diff --git a/REPORT.md b/REPORT.md
deleted file mode 100644
index f416edd..0000000
--- a/REPORT.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# Unit Test Improvement Report
-
-This report outlines areas for improvement in the project's unit tests. While the existing tests provide good coverage, the following suggestions aim to enhance their realism, robustness, and maintainability.
-
-## 1. `internal/hexaicli/run_test.go`
-
-- **`TestRunChat_StreamAndNonStream`**: The fake client and streamer return very simplistic, hardcoded responses (`"Hi!"`, `"Yo"`).
- - **Recommendation**: Enhance the fake client to return more realistic, multi-line, or structured code/text responses. This would better test the output handling and parsing logic. Consider adding cases for empty or malformed LLM responses.
-
-## 2. `internal/lsp/codeaction_test.go`
-
-- **`TestBuildRewriteCodeAction_LazyAndResolves`**: The `fakeLLM` returns a simple, hardcoded string (`"REWRITTEN"`).
- - **Recommendation**: Test with more complex and realistic code transformations. For example, the fake LLM could return a multi-line code block, a function with a different signature, or even code with syntax errors to test how the client-side handles such responses.
-
-- **`TestBuildDiagnosticsCodeAction_LazyAndResolves`**: Similar to the rewrite action, the `fakeLLM` returns a simple string (`"FIXED"`).
- - **Recommendation**: The fake LLM should return a code snippet that actually addresses the provided diagnostic. This would make the test a more faithful representation of the feature's intended behavior.
-
-## 3. `internal/lsp/handlers_end_to_end_test.go`
-
-- **`TestDetectAndHandleChat_InsertsReply`**: The `fakeLLM` returns a single word (`"Hello"`).
- - **Recommendation**: A more realistic test would involve a multi-word or multi-line response, which would better test the formatting and insertion logic (e.g., how newlines are handled).
-
-- **`TestHandleCodeActionResolve_Document`**: The `fakeLLM` returns a simple, hardcoded response.
- - **Recommendation**: The fake LLM's response should be a more realistic documentation block for the given function. This would help verify that the documentation generation and insertion logic works as expected with real-world-like data.
-
-## 4. `internal/lsp/completion_prefix_strip_test.go`
-
-- **`TestTryLLMCompletion_ManualInvokeAfterWhitespace_Allows`**: The `fakeLLM` returns a very short, non-representative code snippet (`"() *CustData"`).
- - **Recommendation**: Use a more complete and realistic code suggestion to test the completion logic, including how it handles longer suggestions and potential formatting.
-
-## 5. `internal/llm/*_http_test.go` (New Findings)
-
-- **`TestOpenAI_Chat_Success`** in `openai_http_test.go` and **`TestCopilot_EnsureSession_AndChat_Success`** in `copilot_http_test.go` use `httptest` to mock the backend services, which is great. However, the mocked responses are minimal (e.g., `{"choices":[{"message":{"content":"OK"}}]}`).
- - **Recommendation**: Expand these tests to handle more complex and realistic payloads from the LLM providers. This includes multi-choice responses, responses with `finish_reason` other than `stop`, and error objects in the response body. This will make the client code more robust.
-
-## 6. General Recommendations
-
-- **Table-Driven Tests**: Some test files contain multiple, repetitive test functions that could be consolidated into table-driven tests. This would improve readability and make it easier to add new test cases. Examples include `internal/lsp/handlers_test.go` and `internal/lsp/completion_prefix_strip_test.go`.
-
-- **More Realistic Mock Data**: Across the board, the mock data used in tests is often very simplistic. While this is acceptable for basic unit tests, creating a set of more realistic mock responses from the LLM would allow for more robust testing of the parsing, formatting, and error-handling logic. This could include:
- - Multi-line code snippets.
- - Code with complex syntax.
- - Responses containing Markdown formatting.
- - Malformed or incomplete JSON/code.
- - Empty responses.
-
-By addressing these points, the test suite will be more robust and provide a higher degree of confidence in the application's behavior when interacting with a real LLM.
-
----
-
-## Plan and Status (living checklist)
-
-Legend: [ ] pending · [~] in progress · [x] done/partially done
-
-1) internal/hexaicli/run_test.go
-- [ ] Enhance fake client/streamer responses to multi-line/structured outputs in TestRunChat_StreamAndNonStream.
-- [ ] Add cases for empty/malformed LLM responses and ensure graceful handling.
-
-2) internal/lsp/codeaction_test.go and related e2e tests
-- [ ] Make fake LLM rewrite responses multi-line and structural (e.g., signature change) and validate insertion.
-- [ ] Make diagnostics-fix responses actually address a provided diagnostic; assert the fix is reflected in text edits.
-- [ ] Document-code action: return realistic docblocks (multi-line) and assert formatting/placement.
-
-3) internal/lsp/handlers_end_to_end_test.go
- - [x] Use multi-line replies in TestDetectAndHandleChat_InsertsReply; verify newline formatting and insertion contains both lines.
- - [x] Use more realistic documentation blocks in TestHandleCodeActionResolve_Document; verified multi-line insertion.
-
-4) internal/lsp/completion_prefix_strip_test.go
-- [ ] Replace short snippet ("() *CustData") with fuller realistic suggestions; add additional cases to exercise prefix/indent logic with longer outputs.
-
-5) internal/llm/*_http_test.go
-- [x] OpenAI success: basic chat completion via httptest.
-- [x] OpenAI stream: SSE delta accumulation in ChatStream.
-- [x] Copilot token + chat: ensureSession + /chat/completions success.
-- [x] Copilot CodeCompletion: SSE-style stream with multiple choices.
- - [x] Expand OpenAI mocked responses: multi-choice, different finish_reason, error objects; assert parsing.
- - [x] Expand Copilot mocked responses: multi-choice, error object in body; assert parsing and error propagation.
- - [x] Additional negative cases: OpenAI no-choices and empty delta; Copilot chat no-choices; SSE malformed and decode-error paths.
-
-6) General
- - [x] Convert repetitive tests to table-driven style where appropriate (e.g., completion prefix/strip; instruction markers; label/filter; code fences/inline spans; buildPrompts variants; computeTextEditAndFilter variants).
-- [ ] Introduce a shared set of realistic mock responses (multi-line code, markdown, malformed json) and reuse across tests.
-
-## Progress (latest)
-
-- [x] Coverage gates and CI ergonomics
- - Added `mage covercheck` with per-package totals and exceptions.
-
-- [x] Coverage raised to ≥80%:
- - internal/lsp: ~81.2% (new e2e and helper tests)
- - internal/llm: ~80.3% (OpenAI/Copilot HTTP + SSE + token + CodeCompletion)
- - internal/hexaicli, internal/hexailsp, internal/appconfig, internal/logging all ≥90%
-
-- [x] Provider realism improvements (partial):
- - OpenAI: added ChatStream SSE and success path via httptest.
- - Copilot: added ensureSession (token) + chat success and Codex CodeCompletion SSE.
- - Next: multi-choice and finish_reason variants; error objects coverage.
-
-## Status updates (since last run)
-
-- [~] 1) hexaicli: adjusted tests for environment variability; realism enhancements pending.
-- [x] 2) lsp code actions: document-code and diagnostics tests now use multi-line responses in fake LLM to better simulate real outputs.
-- [x] Added rewrite/diagnostics realism tests that validate multi-line replacements and exact range preservation.
-- [x] 3) lsp e2e chat/document: chat test now uses multi-line reply and validates insertion contains both lines; document resolve uses multi-line docblock.
-- [x] 4) lsp completion: manual-invoke test now uses a multi-line realistic function signature with body; still passes and exercises formatting.
-- [x] 5) llm providers: added OpenAI success + SSE stream and Copilot token+chat + Codex SSE tests; coverage ≥80%. Expanded with multi-choice and error-body cases.
-- [x] 6) General: introduced shared fixtures (internal/testutil) and added table-driven tests for code fences, inline spans, label selection, prefix stripping, instruction markers, buildPrompts, and computeTextEditAndFilter. Documented patterns in docs/testing.md.
-- [x] Added table-driven tests for instruction marker extraction and prefix stripping.
-
-## Next actions (prioritized)
-
-1. LSP realism
-- Implement multi-line rewrite/diagnostics/doc responses from fake LLM, assert proper NewText and ranges.
-- Expand chat reply test to multi-line; verify inserted formatting.
-
-2. Provider payload breadth
-- OpenAI: multi-choice responses, finish_reason != stop, error bodies; negative SSE chunks.
-- Copilot: multi-choice in chat, error body propagation in non-2xx; expand CodeCompletion SSE variants.
-
-3. Table-driven refactors
-- Convert repetitive cases (prefix stripping, instruction extraction, label selection) to table-driven style to ease adding new scenarios.
-
-4. Negative/malformed inputs
-- Add malformed/missing fields, empty model responses, and malformed SSE to assert robust error handling in clients and LSP handlers.