chore: clean up temporary files and reorganize documentation

- Delete temporary benchmark shell scripts (7 files) - Delete temporary log files from root and integrationtests - Delete .out test output files - Delete temporary Python analysis scripts - Move documentation to doc/ directory: - TURBOBOOST_OPTIMIZATION.md → doc/turboboost_optimization.md - performance_optimization_summary.md → doc/performance_optimization_summary.md - integrationtests/REFACTORING_GUIDE.md → doc/refactoring_guide.md - benchmarks/PROFILING.md → doc/profiling.md 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
author: Paul Buetow <paul@buetow.org> 2025-07-04 11:25:17 +0300
committer: Paul Buetow <paul@buetow.org> 2025-07-04 11:25:17 +0300
commit: 0645644bb945c4ce4707252c38a8d454b2ac9567 (patch)
tree: aaff70f07cb07b85cbdcb53faf35c13ca40292ef /doc
parent: aa2f547cf2b6136dc60f541f30c27a426ec7c6c8 (diff)
4 files changed, 796 insertions, 0 deletions
diff --git a/doc/performance_optimization_summary.md b/doc/performance_optimization_summary.md
new file mode 100644
index 0000000..9d1162b
--- /dev/null
+++ b/doc/performance_optimization_summary.md
@@ -0,0 +1,62 @@
+# DTail Performance Optimization Summary
+
+## Changes Made
+
+### 1. Optimized Trace Logging (`/internal/io/dlog/dlog.go`)
+
+**Problem**: The `Trace()` and `Devel()` functions were calling `runtime.Caller(1)` for every invocation, even when trace logging was disabled. This was causing ~497ns overhead per call.
+
+**Solution**: Added early level checks before the expensive `runtime.Caller()` operation:
+
+```go
+func (d *DLog) Trace(args ...interface{}) string {
+    // Early check to avoid expensive runtime.Caller when trace is disabled
+    if d.maxLevel < Trace {
+        return ""
+    }
+    // ... rest of function
+}
+```
+
+### 2. Improved Buffering in Turbo Mode (`/internal/server/handlers/turbo_writer.go`)
+
+**Problem**: Turbo mode was forcing immediate flush after every line in serverless mode, defeating the purpose of buffering.
+
+**Solution**: Removed the immediate flush condition for serverless mode, allowing proper buffering:
+
+```go
+// Changed from:
+if w.writeBuf.Len() >= w.bufSize || w.serverless {
+    return w.flushBuffer()
+}
+
+// To:
+if w.writeBuf.Len() >= w.bufSize {
+    return w.flushBuffer()
+}
+```
+
+## Performance Results
+
+### Before Optimization
+- Turbo mode was **3-5x slower** than non-turbo mode
+- DCat (10MB): 678ms (turbo) vs 210ms (non-turbo)
+- DGrep (10MB): 570ms (turbo) vs 96ms (non-turbo)
+
+### After Optimization
+- Turbo mode is now **2.87x faster** than non-turbo mode
+- DCat (1M lines): 0.66s (turbo) vs 1.89s (non-turbo) - **65% improvement**
+- DCat with colors: 1.69s (turbo) vs 2.24s (non-turbo) - **24% improvement**
+
+## Verification
+- ✅ All unit tests pass
+- ✅ All integration tests pass
+- ✅ No functionality regression
+- ✅ Backward compatible
+
+## Files Modified
+1. `/internal/io/dlog/dlog.go` - Lines 196-216
+2. `/internal/server/handlers/turbo_writer.go` - Lines 104-108
+
+## Key Takeaway
+The trace logging overhead was the primary bottleneck, causing DTail to spend more time logging than processing data. By adding simple level checks before expensive operations, we achieved a ~3x performance improvement in turbo mode.
+\ No newline at end of file
diff --git a/doc/profiling.md b/doc/profiling.md
new file mode 100644
index 0000000..7925fb3
--- /dev/null
+++ b/doc/profiling.md
@@ -0,0 +1,376 @@
+# DTail Profiling Framework
+
+This document describes the profiling framework for dtail commands (dcat, dgrep, dmap) to analyze CPU usage and memory allocations.
+
+## Overview
+
+The profiling framework provides:
+- CPU profiling to identify performance bottlenecks
+- Memory profiling to track allocations and detect leaks
+- Integration with existing benchmarks
+- Analysis tools for profile interpretation
+
+## Quick Start
+
+### 1. Build the Tools
+
+```bash
+make build  # Builds all tools including dprofile
+```
+
+### 2. Run Commands with Profiling
+
+Each command now supports profiling flags:
+
+```bash
+# Profile dcat
+./dcat -profile -profiledir profiles -plain -cfg none /path/to/file.log
+
+# Profile dgrep with specific profiling types
+./dgrep -cpuprofile -memprofile -profiledir profiles -regex "error" /path/to/file.log
+
+# Profile dmap
+./dmap -profile -query "select count(*) from data.csv"
+```
+
+### 3. Analyze Profiles
+
+Use dtail-tools for quick analysis:
+
+```bash
+# List all profiles
+./dtail-tools profile -mode list
+
+# Analyze a specific profile
+./dtail-tools profile -mode analyze profiles/dcat_cpu_20240101_120000.prof
+
+# Open web browser with flame graph
+./dtail-tools profile -mode analyze profiles/dcat_cpu_*.prof -web
+
+# You can also use go tool pprof directly:
+go tool pprof profiles/dcat_cpu_20240101_120000.prof
+```
+
+## Profiling Options
+
+### Command-line Flags
+
+All dtail commands support these profiling flags:
+
+- `-cpuprofile`: Enable CPU profiling only
+- `-memprofile`: Enable memory profiling only
+- `-profile`: Enable both CPU and memory profiling
+- `-profiledir <dir>`: Directory to store profiles (default: "profiles")
+
+### Profile Types
+
+1. **CPU Profile** (`*_cpu_*.prof`)
+   - Samples CPU usage during execution
+   - Identifies hot functions and code paths
+   - Useful for optimizing computational bottlenecks
+
+2. **Memory Profile** (`*_mem_*.prof`)
+   - Captures heap allocations at end of execution
+   - Shows memory usage by function
+   - Helps identify memory leaks
+
+3. **Allocation Profile** (`*_alloc_*.prof`)
+   - Tracks all allocations during execution
+   - More detailed than memory profile
+   - Useful for reducing allocation pressure
+
+## Using with Benchmarks
+
+### Automated Profiling
+
+Run profiling using dtail-tools:
+
+```bash
+# Quick profiling with small datasets
+./dtail-tools profile -mode quick
+
+# Full profiling suite
+./dtail-tools profile -mode full
+
+# Profile dmap specifically (with MapReduce format)
+./dtail-tools profile -mode dmap
+```
+
+This tool:
+- Generates test data of various sizes
+- Profiles dcat, dgrep, and dmap with different workloads  
+- Stores profiles in the `profiles` directory
+- Provides immediate analysis of results
+
+### Using Make Targets
+
+```bash
+# Quick profiling with immediate results
+make profile-quick
+
+# Full profiling suite
+make profile-all
+
+# Profile dmap specifically
+make profile-dmap
+
+# List available profiles
+make profile-list
+
+# Analyze a specific profile
+make profile-analyze PROFILE=profiles/dcat_cpu_*.prof
+
+# Open web interface for profile
+make profile-web PROFILE=profiles/dcat_cpu_*.prof
+```
+
+### Benchmark Integration
+
+Run profiling-enabled benchmarks:
+
+```bash
+cd benchmarks
+go test -bench="WithProfiling" -benchtime=1x
+```
+
+### Custom Profile Runner
+
+Use the profile runner in your benchmarks:
+
+```go
+import "github.com/mimecast/dtail/benchmarks"
+
+func BenchmarkMyFeature(b *testing.B) {
+    benchmarks.ProfileBenchmark(b, "MyFeature", "dcat", 
+        "--plain", "--cfg", "none", "testfile.log")
+}
+```
+
+## Profile Analysis
+
+### Using go tool pprof
+
+For interactive analysis:
+
+```bash
+# Interactive mode
+go tool pprof profiles/dcat_cpu_*.prof
+
+# Common pprof commands:
+# top       - Show top functions
+# list func - Show source code for function
+# web       - Generate SVG graph
+# peek func - Show callers/callees of function
+```
+
+Generate visualizations:
+
+```bash
+# Flame graph (requires graphviz)
+go tool pprof -http=:8080 profiles/dcat_cpu_*.prof
+
+# Generate SVG
+go tool pprof -svg profiles/dgrep_mem_*.prof > profile.svg
+
+# Generate text report
+go tool pprof -text profiles/dmap_alloc_*.prof > report.txt
+```
+
+### Using dtail-tools profile
+
+The dtail-tools profile command provides quick summaries:
+
+```bash
+# List all profiles
+./dtail-tools profile -mode list
+
+# Analyze specific profile
+./dtail-tools profile -mode analyze profiles/dcat_cpu_20240101_120000.prof
+
+# Get help
+./dtail-tools profile -h
+```
+
+## Optimization Workflow
+
+1. **Baseline Performance**
+   ```bash
+   # Run benchmarks without profiling
+   cd benchmarks
+   go test -bench="BenchmarkDCat" -benchtime=10s
+   ```
+
+2. **Profile Execution**
+   ```bash
+   # Run with profiling
+   ./dcat -profile -profiledir profiles large_file.log
+   ```
+
+3. **Identify Bottlenecks**
+   ```bash
+   # Analyze CPU profile
+   ./dprofile -profile profiles/dcat_cpu_*.prof -top 10
+   
+   # Check memory allocations
+   go tool pprof -alloc_space profiles/dcat_alloc_*.prof
+   ```
+
+4. **Optimize Code**
+   - Focus on functions with high Flat% (direct CPU usage)
+   - Reduce allocations in hot paths
+   - Consider buffering and pooling
+
+5. **Verify Improvements**
+   ```bash
+   # Re-run benchmarks after optimization
+   go test -bench="BenchmarkDCat" -benchtime=10s
+   ```
+
+## Common Performance Issues
+
+### CPU Bottlenecks
+
+Look for:
+- Regex compilation in loops
+- Excessive string operations
+- Inefficient algorithms (O(n²) or worse)
+- Unnecessary type conversions
+
+Example optimization:
+```go
+// Before: Regex compiled every time
+for _, line := range lines {
+    if regexp.MustCompile(pattern).MatchString(line) {
+        // ...
+    }
+}
+
+// After: Compile once
+re := regexp.MustCompile(pattern)
+for _, line := range lines {
+    if re.MatchString(line) {
+        // ...
+    }
+}
+```
+
+### Memory Issues
+
+Common patterns:
+- String concatenation in loops
+- Large temporary slices
+- Unclosed resources
+- Excessive goroutines
+
+Example optimization:
+```go
+// Before: Many allocations
+result := ""
+for _, s := range strings {
+    result += s + "\n"
+}
+
+// After: Single allocation
+var buf strings.Builder
+buf.Grow(estimatedSize)
+for _, s := range strings {
+    buf.WriteString(s)
+    buf.WriteByte('\n')
+}
+result := buf.String()
+```
+
+## Tips and Best Practices
+
+1. **Profile Real Workloads**
+   - Use production-like data sizes
+   - Test with actual file formats
+   - Include network operations if relevant
+
+2. **Compare Profiles**
+   ```bash
+   # Compare before/after optimization
+   go tool pprof -diff_base=before.prof after.prof
+   ```
+
+3. **Focus on Hot Paths**
+   - Optimize functions with >5% CPU usage first
+   - Small improvements in hot paths have big impact
+
+4. **Memory Profiling**
+   - Use `-alloc_space` for total allocations
+   - Use `-inuse_space` for current heap usage
+   - Check for growing heap over time
+
+5. **Benchmark Regularly**
+   - Add profiling to CI/CD pipeline
+   - Track performance over releases
+   - Set performance regression alerts
+
+## Troubleshooting
+
+### No profiles generated
+- Check write permissions for profile directory
+- Ensure command completes successfully
+- Verify profiling flags are correct
+
+### Empty or small profiles
+- Run command with larger workload
+- Increase execution time
+- Check if command exits too quickly
+
+### Analysis tools fail
+- Ensure profile format is valid
+- Check Go version compatibility
+- Verify graphviz is installed for visualizations
+
+## Advanced Usage
+
+### Custom Profiling Points
+
+Add profiling snapshots in code:
+
+```go
+import "github.com/mimecast/dtail/internal/profiling"
+
+func processLargeFile() {
+    profiler := profiling.GetProfiler() // Assumes global profiler
+    
+    // Take memory snapshot before processing
+    profiler.Snapshot("before_processing")
+    
+    // ... process file ...
+    
+    // Take snapshot after
+    profiler.Snapshot("after_processing")
+}
+```
+
+### Continuous Profiling
+
+For long-running operations:
+
+```go
+// Start periodic metrics logging
+ticker := time.NewTicker(30 * time.Second)
+go func() {
+    for range ticker.C {
+        profiler.LogMetrics("periodic")
+    }
+}()
+defer ticker.Stop()
+```
+
+## Contributing
+
+When adding new features:
+1. Include benchmark tests
+2. Run profiling before submitting PR
+3. Document any performance implications
+4. Add profiling examples for new commands
+
+## References
+
+- [Go Profiling Documentation](https://go.dev/blog/pprof)
+- [pprof Tool Guide](https://github.com/google/pprof)
+- [Go Performance Tips](https://go.dev/wiki/Performance)
+\ No newline at end of file
diff --git a/doc/refactoring_guide.md b/doc/refactoring_guide.md
new file mode 100644
index 0000000..88c77d5
--- /dev/null
+++ b/doc/refactoring_guide.md
@@ -0,0 +1,240 @@
+# Integration Tests Refactoring Guide
+
+## Overview
+
+This guide outlines the refactoring opportunities for the dtail integration tests to reduce code duplication and improve maintainability.
+
+## Key Benefits of Refactoring
+
+1. **Reduced Code Duplication**: ~40-50% reduction in test code
+2. **Improved Maintainability**: Changes to common patterns only need to be made in one place
+3. **Better Test Hygiene**: Automatic cleanup using `t.Cleanup()`
+4. **Clearer Test Intent**: Helper functions make tests more readable
+5. **Reduced Copy-Paste Errors**: Less boilerplate to copy incorrectly
+
+## Common Patterns Identified
+
+### 1. Test Skip Pattern
+**Before:**
+```go
+if !config.Env("DTAIL_INTEGRATION_TEST_RUN_MODE") {
+    t.Log("Skipping")
+    return
+}
+```
+
+**After:**
+```go
+skipIfNotIntegrationTest(t)
+```
+
+### 2. Server Setup Pattern
+**Before:**
+```go
+port := getUniquePortNumber()
+bindAddress := "localhost"
+ctx, cancel := context.WithCancel(context.Background())
+defer cancel()
+
+_, _, _, err := startCommand(ctx, t,
+    "", "../dserver",
+    "--cfg", "none",
+    "--logger", "stdout",
+    "--logLevel", "error",
+    "--bindAddress", bindAddress,
+    "--port", fmt.Sprintf("%d", port),
+)
+if err != nil {
+    t.Error(err)
+    return
+}
+time.Sleep(500 * time.Millisecond)
+```
+
+**After:**
+```go
+server := NewTestServer(t)
+if err := server.Start("error"); err != nil {
+    t.Error(err)
+    return
+}
+```
+
+### 3. File Cleanup Pattern
+**Before:**
+```go
+defer os.Remove(outFile)
+defer os.Remove(csvFile)
+defer os.Remove(queryFile)
+```
+
+**After:**
+```go
+cleanupFiles(t, outFile, csvFile, queryFile)
+// or
+fileSet := &TestFileSet{...}
+fileSet.Cleanup(t)
+```
+
+### 4. Command Arguments Pattern
+**Before:**
+```go
+args := []string{
+    "--plain", "--cfg", "none",
+    "--servers", fmt.Sprintf("%s:%d", bindAddress, port),
+    "--trustAllHosts", "--noColor",
+    "--files", inFile,
+}
+```
+
+**After:**
+```go
+args := NewCommandArgs()
+args.Plain = true
+args.Servers = []string{server.Address()}
+args.TrustAllHosts = true
+args.NoColor = true
+args.Files = []string{inFile}
+// args.ToSlice() produces the string array
+```
+
+### 5. Dual Mode Testing Pattern
+**Before:**
+```go
+func TestX(t *testing.T) {
+    if !config.Env("DTAIL_INTEGRATION_TEST_RUN_MODE") {
+        t.Log("Skipping")
+        return
+    }
+    
+    t.Run("Serverless", func(t *testing.T) {
+        testXServerless(t)
+    })
+    
+    t.Run("ServerMode", func(t *testing.T) {
+        testXWithServer(t)
+    })
+}
+```
+
+**After:**
+```go
+func TestX(t *testing.T) {
+    runDualModeTest(t, DualModeTest{
+        Name: "TestX",
+        ServerlessTest: testXServerless,
+        ServerTest: testXWithServer,
+    })
+}
+```
+
+## Refactoring Strategy
+
+### Phase 1: Add Helper Functions
+1. Add `testhelpers.go` with all common utilities
+2. Ensure all tests still pass
+
+### Phase 2: Refactor Test by Test
+1. Start with simpler tests (e.g., dcat_test.go)
+2. Refactor one test function at a time
+3. Run tests after each refactoring
+4. Commit after each file is complete
+
+### Phase 3: Additional Improvements
+1. Add table-driven tests where appropriate
+2. Create test fixtures for common scenarios
+3. Add more sophisticated helpers as patterns emerge
+
+## Example Refactoring Results
+
+### Before (TestDCat1WithServer):
+```go
+func testDCat1WithServer(t *testing.T, inFile string) error {
+    outFile := "dcat1.out"
+    port := getUniquePortNumber()
+    bindAddress := "localhost"
+
+    ctx, cancel := context.WithCancel(context.Background())
+    defer cancel()
+
+    _, _, _, err := startCommand(ctx, t,
+        "", "../dserver",
+        "--cfg", "none",
+        "--logger", "stdout",
+        "--logLevel", "error",
+        "--bindAddress", bindAddress,
+        "--port", fmt.Sprintf("%d", port),
+    )
+    if err != nil {
+        return err
+    }
+
+    time.Sleep(500 * time.Millisecond)
+
+    _, err = runCommand(ctx, t, outFile,
+        "../dcat", "--plain", "--cfg", "none",
+        "--servers", fmt.Sprintf("%s:%d", bindAddress, port),
+        "--files", inFile,
+        "--trustAllHosts",
+        "--noColor")
+    if err != nil {
+        return err
+    }
+
+    cancel()
+
+    if err := compareFiles(t, outFile, inFile); err != nil {
+        return err
+    }
+
+    os.Remove(outFile)
+    return nil
+}
+```
+
+### After:
+```go
+func testDCat1WithServer_Refactored(t *testing.T, inFile string) {
+    fileSet := &TestFileSet{
+        InputFile:    inFile,
+        OutputFile:   "dcat1.out",
+        ExpectedFile: inFile,
+    }
+    fileSet.Cleanup(t)
+
+    server := NewTestServer(t)
+    if err := server.Start("error"); err != nil {
+        t.Error(err)
+        return
+    }
+
+    args := NewCommandArgs()
+    args.Plain = true
+    args.Servers = []string{server.Address()}
+    args.Files = []string{inFile}
+    args.TrustAllHosts = true
+    args.NoColor = true
+
+    err := runCommandAndVerify(t, server.ctx, fileSet.OutputFile, fileSet.ExpectedFile,
+        "../dcat", args.ToSlice()...)
+    if err != nil {
+        t.Error(err)
+    }
+}
+```
+
+## Metrics
+
+Based on the examples:
+- **Lines of code reduction**: ~45%
+- **Boilerplate elimination**: ~70%
+- **Improved readability**: Subjective but significant
+- **Error-prone patterns eliminated**: Port management, cleanup, context handling
+
+## Next Steps
+
+1. Review and approve the helper functions
+2. Create a PR with `testhelpers.go`
+3. Incrementally refactor tests in separate PRs
+4. Document any new patterns that emerge
+5. Consider creating a test generator for common scenarios
+\ No newline at end of file
diff --git a/doc/turboboost_optimization.md b/doc/turboboost_optimization.md
new file mode 100644
index 0000000..f13943f
--- /dev/null
+++ b/doc/turboboost_optimization.md
@@ -0,0 +1,118 @@
+# DTail Turbo Boost Optimization
+
+## Overview
+
+This document describes the turbo boost optimization feature that provides significant performance improvements for DTail operations by using channel-less processing and optimized I/O.
+
+## Problem Statement
+
+The original dgrep implementation used multiple channels in a pipeline:
+- `rawLines chan *bytes.Buffer` (buffer: 100) - Raw lines read from file
+- `lines chan *line.Line` (buffer: 100) - Filtered lines to send to client
+
+This created several performance issues:
+1. Fixed channel buffer sizes causing blocking under high throughput
+2. Context switching overhead between goroutines
+3. Channel synchronization overhead
+4. Memory allocations for channel operations
+
+## Solution
+
+The channel-less implementation replaces the channel pipeline with direct function calls using a `LineProcessor` interface.
+
+### Key Components
+
+1. **LineProcessor Interface** (`internal/io/line/processor.go`)
+   - Defines methods for processing lines without channels
+   - `ProcessLine()` - Handle a single line
+   - `Flush()` - Ensure buffered data is written
+   - `Close()` - Clean up resources
+
+2. **GrepLineProcessor** (`internal/server/handlers/lineprocessor.go`)
+   - Implements LineProcessor for grep operations
+   - Writes directly to the network connection
+   - Uses internal buffering for efficiency (64KB buffer)
+   - Thread-safe with mutex protection
+
+3. **Modified File Reading** (`internal/io/fs/readfile_processor.go`)
+   - `StartWithProcessor()` - Channel-less file reading
+   - Direct callbacks instead of channel sends
+   - Inline regex filtering without goroutines
+
+4. **Optimized File Reading** (`internal/io/fs/readfile_processor_optimized.go`)
+   - Uses buffered line reading instead of byte-by-byte
+   - Custom scanner with 256KB buffer
+   - Efficient handling of long lines
+   - Special optimization for tail mode
+
+### Feature Flags
+
+The implementation can be controlled via environment variables:
+- `DTAIL_CHANNELLESS_GREP=yes` - Enable channel-less grep implementation
+- `DTAIL_OPTIMIZED_READER=yes` - Use optimized buffered reader
+
+### Benefits
+
+1. **Reduced Latency**: No channel queuing delays
+2. **Lower Memory Usage**: No channel buffers
+3. **Better CPU Efficiency**: Fewer context switches
+4. **Simpler Code Flow**: Direct processing without goroutine coordination
+5. **Predictable Performance**: No channel blocking
+
+### Backward Compatibility
+
+- Original channel-based implementation remains available
+- Same command-line interface
+- Protocol compatibility maintained
+- All integration tests pass unchanged
+
+### Performance Testing
+
+Use the provided script to compare performance:
+
+```bash
+./test_channelless_performance.sh
+```
+
+This will test:
+1. Original channel-based implementation
+2. Channel-less implementation
+3. Optimized channel-less implementation
+
+### Usage
+
+To use the channel-less implementation:
+
+```bash
+# Enable channel-less grep
+export DTAIL_CHANNELLESS_GREP=yes
+
+# Also enable optimized reader
+export DTAIL_OPTIMIZED_READER=yes
+
+# Run dgrep normally
+dgrep -regex "pattern" file.log
+```
+
+### Future Improvements
+
+1. Extend channel-less approach to other commands (dcat, dtail)
+2. Add configurable buffer sizes
+3. Implement zero-copy optimizations
+4. Add performance metrics collection
+5. Consider using io_uring on Linux for async I/O
+
+## Usage
+
+To enable turbo boost optimizations:
+
+```bash
+export DTAIL_TURBOBOOST_ENABLE=yes
+```
+
+This enables:
+- Channel-less implementation for grep and cat operations
+- Optimized buffered I/O reader (256KB buffer)
+- Buffer pooling to reduce memory allocations
+
+The turbo boost mode is designed to be extended to other DTail commands in the future.
author	Paul Buetow <paul@buetow.org>	2025-07-04 11:25:17 +0300
committer	Paul Buetow <paul@buetow.org>	2025-07-04 11:25:17 +0300
commit	0645644bb945c4ce4707252c38a8d454b2ac9567 (patch)
tree	aaff70f07cb07b85cbdcb53faf35c13ca40292ef /doc
parent	aa2f547cf2b6136dc60f541f30c27a426ec7c6c8 (diff)