diff options
| author | Paul Buetow <paul@buetow.org> | 2025-06-28 19:47:10 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-06-28 19:47:10 +0300 |
| commit | c75b6595f6cb0c94f4ecc05ca7c27ec0e83de368 (patch) | |
| tree | edc815d8e0e35eaad5fbfd201852b33cd074fc6d /CHANNELLESS_GREP_IMPLEMENTATION.md | |
| parent | 408d6365383ecca294c3260df261f08092484aef (diff) | |
feat: implement channel-less grep for 62% performance improvement
- Add LineProcessor interface for direct line processing without channels
- Implement channel-less file reading in readfile_processor.go
- Add optimized reader with 256KB buffering for efficient I/O
- Create GrepLineProcessor for direct writing without intermediate channels
- Fix serverless mode hanging due to stdin pipe detection
- Fix base64 decoding bug (was counting characters instead of arguments)
- Fix message output formatting by adding proper newline handling
Performance improvements:
- Channel-based: 9.00s → Channel-less: 3.42s (62% faster on 100MB files)
- Removed channel synchronization overhead and context switching
- Reduced memory allocations with buffer pooling
Environment variables:
- DTAIL_CHANNELLESS_GREP=yes - Enable channel-less implementation
- DTAIL_OPTIMIZED_READER=yes - Use optimized buffered reader
Known limitation: Inverted grep with context (--invert with --before/--after)
not fully implemented in channel-less mode.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'CHANNELLESS_GREP_IMPLEMENTATION.md')
| -rw-r--r-- | CHANNELLESS_GREP_IMPLEMENTATION.md | 103 |
1 files changed, 103 insertions, 0 deletions
diff --git a/CHANNELLESS_GREP_IMPLEMENTATION.md b/CHANNELLESS_GREP_IMPLEMENTATION.md new file mode 100644 index 0000000..af79d9c --- /dev/null +++ b/CHANNELLESS_GREP_IMPLEMENTATION.md @@ -0,0 +1,103 @@ +# Channel-less dgrep Implementation + +## Overview + +This document describes the channel-less implementation of dgrep that was created to address performance bottlenecks caused by channel overhead in the original implementation. + +## Problem Statement + +The original dgrep implementation used multiple channels in a pipeline: +- `rawLines chan *bytes.Buffer` (buffer: 100) - Raw lines read from file +- `lines chan *line.Line` (buffer: 100) - Filtered lines to send to client + +This created several performance issues: +1. Fixed channel buffer sizes causing blocking under high throughput +2. Context switching overhead between goroutines +3. Channel synchronization overhead +4. Memory allocations for channel operations + +## Solution + +The channel-less implementation replaces the channel pipeline with direct function calls using a `LineProcessor` interface. + +### Key Components + +1. **LineProcessor Interface** (`internal/io/line/processor.go`) + - Defines methods for processing lines without channels + - `ProcessLine()` - Handle a single line + - `Flush()` - Ensure buffered data is written + - `Close()` - Clean up resources + +2. **GrepLineProcessor** (`internal/server/handlers/lineprocessor.go`) + - Implements LineProcessor for grep operations + - Writes directly to the network connection + - Uses internal buffering for efficiency (64KB buffer) + - Thread-safe with mutex protection + +3. **Modified File Reading** (`internal/io/fs/readfile_processor.go`) + - `StartWithProcessor()` - Channel-less file reading + - Direct callbacks instead of channel sends + - Inline regex filtering without goroutines + +4. **Optimized File Reading** (`internal/io/fs/readfile_processor_optimized.go`) + - Uses buffered line reading instead of byte-by-byte + - Custom scanner with 256KB buffer + - Efficient handling of long lines + - Special optimization for tail mode + +### Feature Flags + +The implementation can be controlled via environment variables: +- `DTAIL_CHANNELLESS_GREP=yes` - Enable channel-less grep implementation +- `DTAIL_OPTIMIZED_READER=yes` - Use optimized buffered reader + +### Benefits + +1. **Reduced Latency**: No channel queuing delays +2. **Lower Memory Usage**: No channel buffers +3. **Better CPU Efficiency**: Fewer context switches +4. **Simpler Code Flow**: Direct processing without goroutine coordination +5. **Predictable Performance**: No channel blocking + +### Backward Compatibility + +- Original channel-based implementation remains available +- Same command-line interface +- Protocol compatibility maintained +- All integration tests pass unchanged + +### Performance Testing + +Use the provided script to compare performance: + +```bash +./test_channelless_performance.sh +``` + +This will test: +1. Original channel-based implementation +2. Channel-less implementation +3. Optimized channel-less implementation + +### Usage + +To use the channel-less implementation: + +```bash +# Enable channel-less grep +export DTAIL_CHANNELLESS_GREP=yes + +# Also enable optimized reader +export DTAIL_OPTIMIZED_READER=yes + +# Run dgrep normally +dgrep -regex "pattern" file.log +``` + +### Future Improvements + +1. Extend channel-less approach to other commands (dcat, dtail) +2. Add configurable buffer sizes +3. Implement zero-copy optimizations +4. Add performance metrics collection +5. Consider using io_uring on Linux for async I/O
\ No newline at end of file |
