summaryrefslogtreecommitdiff
path: root/CHANNELLESS_GREP_IMPLEMENTATION.md
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2025-06-28 19:47:10 +0300
committerPaul Buetow <paul@buetow.org>2025-06-28 19:47:10 +0300
commitc75b6595f6cb0c94f4ecc05ca7c27ec0e83de368 (patch)
treeedc815d8e0e35eaad5fbfd201852b33cd074fc6d /CHANNELLESS_GREP_IMPLEMENTATION.md
parent408d6365383ecca294c3260df261f08092484aef (diff)
feat: implement channel-less grep for 62% performance improvement
- Add LineProcessor interface for direct line processing without channels - Implement channel-less file reading in readfile_processor.go - Add optimized reader with 256KB buffering for efficient I/O - Create GrepLineProcessor for direct writing without intermediate channels - Fix serverless mode hanging due to stdin pipe detection - Fix base64 decoding bug (was counting characters instead of arguments) - Fix message output formatting by adding proper newline handling Performance improvements: - Channel-based: 9.00s → Channel-less: 3.42s (62% faster on 100MB files) - Removed channel synchronization overhead and context switching - Reduced memory allocations with buffer pooling Environment variables: - DTAIL_CHANNELLESS_GREP=yes - Enable channel-less implementation - DTAIL_OPTIMIZED_READER=yes - Use optimized buffered reader Known limitation: Inverted grep with context (--invert with --before/--after) not fully implemented in channel-less mode. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'CHANNELLESS_GREP_IMPLEMENTATION.md')
-rw-r--r--CHANNELLESS_GREP_IMPLEMENTATION.md103
1 files changed, 103 insertions, 0 deletions
diff --git a/CHANNELLESS_GREP_IMPLEMENTATION.md b/CHANNELLESS_GREP_IMPLEMENTATION.md
new file mode 100644
index 0000000..af79d9c
--- /dev/null
+++ b/CHANNELLESS_GREP_IMPLEMENTATION.md
@@ -0,0 +1,103 @@
+# Channel-less dgrep Implementation
+
+## Overview
+
+This document describes the channel-less implementation of dgrep that was created to address performance bottlenecks caused by channel overhead in the original implementation.
+
+## Problem Statement
+
+The original dgrep implementation used multiple channels in a pipeline:
+- `rawLines chan *bytes.Buffer` (buffer: 100) - Raw lines read from file
+- `lines chan *line.Line` (buffer: 100) - Filtered lines to send to client
+
+This created several performance issues:
+1. Fixed channel buffer sizes causing blocking under high throughput
+2. Context switching overhead between goroutines
+3. Channel synchronization overhead
+4. Memory allocations for channel operations
+
+## Solution
+
+The channel-less implementation replaces the channel pipeline with direct function calls using a `LineProcessor` interface.
+
+### Key Components
+
+1. **LineProcessor Interface** (`internal/io/line/processor.go`)
+ - Defines methods for processing lines without channels
+ - `ProcessLine()` - Handle a single line
+ - `Flush()` - Ensure buffered data is written
+ - `Close()` - Clean up resources
+
+2. **GrepLineProcessor** (`internal/server/handlers/lineprocessor.go`)
+ - Implements LineProcessor for grep operations
+ - Writes directly to the network connection
+ - Uses internal buffering for efficiency (64KB buffer)
+ - Thread-safe with mutex protection
+
+3. **Modified File Reading** (`internal/io/fs/readfile_processor.go`)
+ - `StartWithProcessor()` - Channel-less file reading
+ - Direct callbacks instead of channel sends
+ - Inline regex filtering without goroutines
+
+4. **Optimized File Reading** (`internal/io/fs/readfile_processor_optimized.go`)
+ - Uses buffered line reading instead of byte-by-byte
+ - Custom scanner with 256KB buffer
+ - Efficient handling of long lines
+ - Special optimization for tail mode
+
+### Feature Flags
+
+The implementation can be controlled via environment variables:
+- `DTAIL_CHANNELLESS_GREP=yes` - Enable channel-less grep implementation
+- `DTAIL_OPTIMIZED_READER=yes` - Use optimized buffered reader
+
+### Benefits
+
+1. **Reduced Latency**: No channel queuing delays
+2. **Lower Memory Usage**: No channel buffers
+3. **Better CPU Efficiency**: Fewer context switches
+4. **Simpler Code Flow**: Direct processing without goroutine coordination
+5. **Predictable Performance**: No channel blocking
+
+### Backward Compatibility
+
+- Original channel-based implementation remains available
+- Same command-line interface
+- Protocol compatibility maintained
+- All integration tests pass unchanged
+
+### Performance Testing
+
+Use the provided script to compare performance:
+
+```bash
+./test_channelless_performance.sh
+```
+
+This will test:
+1. Original channel-based implementation
+2. Channel-less implementation
+3. Optimized channel-less implementation
+
+### Usage
+
+To use the channel-less implementation:
+
+```bash
+# Enable channel-less grep
+export DTAIL_CHANNELLESS_GREP=yes
+
+# Also enable optimized reader
+export DTAIL_OPTIMIZED_READER=yes
+
+# Run dgrep normally
+dgrep -regex "pattern" file.log
+```
+
+### Future Improvements
+
+1. Extend channel-less approach to other commands (dcat, dtail)
+2. Add configurable buffer sizes
+3. Implement zero-copy optimizations
+4. Add performance metrics collection
+5. Consider using io_uring on Linux for async I/O \ No newline at end of file