diff options
| author | Paul Buetow <paul@buetow.org> | 2025-07-04 11:25:17 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-07-04 11:25:17 +0300 |
| commit | 0645644bb945c4ce4707252c38a8d454b2ac9567 (patch) | |
| tree | aaff70f07cb07b85cbdcb53faf35c13ca40292ef /doc | |
| parent | aa2f547cf2b6136dc60f541f30c27a426ec7c6c8 (diff) | |
chore: clean up temporary files and reorganize documentation
- Delete temporary benchmark shell scripts (7 files)
- Delete temporary log files from root and integrationtests
- Delete .out test output files
- Delete temporary Python analysis scripts
- Move documentation to doc/ directory:
- TURBOBOOST_OPTIMIZATION.md ā doc/turboboost_optimization.md
- performance_optimization_summary.md ā doc/performance_optimization_summary.md
- integrationtests/REFACTORING_GUIDE.md ā doc/refactoring_guide.md
- benchmarks/PROFILING.md ā doc/profiling.md
š¤ Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/performance_optimization_summary.md | 62 | ||||
| -rw-r--r-- | doc/profiling.md | 376 | ||||
| -rw-r--r-- | doc/refactoring_guide.md | 240 | ||||
| -rw-r--r-- | doc/turboboost_optimization.md | 118 |
4 files changed, 796 insertions, 0 deletions
diff --git a/doc/performance_optimization_summary.md b/doc/performance_optimization_summary.md new file mode 100644 index 0000000..9d1162b --- /dev/null +++ b/doc/performance_optimization_summary.md @@ -0,0 +1,62 @@ +# DTail Performance Optimization Summary + +## Changes Made + +### 1. Optimized Trace Logging (`/internal/io/dlog/dlog.go`) + +**Problem**: The `Trace()` and `Devel()` functions were calling `runtime.Caller(1)` for every invocation, even when trace logging was disabled. This was causing ~497ns overhead per call. + +**Solution**: Added early level checks before the expensive `runtime.Caller()` operation: + +```go +func (d *DLog) Trace(args ...interface{}) string { + // Early check to avoid expensive runtime.Caller when trace is disabled + if d.maxLevel < Trace { + return "" + } + // ... rest of function +} +``` + +### 2. Improved Buffering in Turbo Mode (`/internal/server/handlers/turbo_writer.go`) + +**Problem**: Turbo mode was forcing immediate flush after every line in serverless mode, defeating the purpose of buffering. + +**Solution**: Removed the immediate flush condition for serverless mode, allowing proper buffering: + +```go +// Changed from: +if w.writeBuf.Len() >= w.bufSize || w.serverless { + return w.flushBuffer() +} + +// To: +if w.writeBuf.Len() >= w.bufSize { + return w.flushBuffer() +} +``` + +## Performance Results + +### Before Optimization +- Turbo mode was **3-5x slower** than non-turbo mode +- DCat (10MB): 678ms (turbo) vs 210ms (non-turbo) +- DGrep (10MB): 570ms (turbo) vs 96ms (non-turbo) + +### After Optimization +- Turbo mode is now **2.87x faster** than non-turbo mode +- DCat (1M lines): 0.66s (turbo) vs 1.89s (non-turbo) - **65% improvement** +- DCat with colors: 1.69s (turbo) vs 2.24s (non-turbo) - **24% improvement** + +## Verification +- ā
All unit tests pass +- ā
All integration tests pass +- ā
No functionality regression +- ā
Backward compatible + +## Files Modified +1. `/internal/io/dlog/dlog.go` - Lines 196-216 +2. `/internal/server/handlers/turbo_writer.go` - Lines 104-108 + +## Key Takeaway +The trace logging overhead was the primary bottleneck, causing DTail to spend more time logging than processing data. By adding simple level checks before expensive operations, we achieved a ~3x performance improvement in turbo mode.
\ No newline at end of file diff --git a/doc/profiling.md b/doc/profiling.md new file mode 100644 index 0000000..7925fb3 --- /dev/null +++ b/doc/profiling.md @@ -0,0 +1,376 @@ +# DTail Profiling Framework + +This document describes the profiling framework for dtail commands (dcat, dgrep, dmap) to analyze CPU usage and memory allocations. + +## Overview + +The profiling framework provides: +- CPU profiling to identify performance bottlenecks +- Memory profiling to track allocations and detect leaks +- Integration with existing benchmarks +- Analysis tools for profile interpretation + +## Quick Start + +### 1. Build the Tools + +```bash +make build # Builds all tools including dprofile +``` + +### 2. Run Commands with Profiling + +Each command now supports profiling flags: + +```bash +# Profile dcat +./dcat -profile -profiledir profiles -plain -cfg none /path/to/file.log + +# Profile dgrep with specific profiling types +./dgrep -cpuprofile -memprofile -profiledir profiles -regex "error" /path/to/file.log + +# Profile dmap +./dmap -profile -query "select count(*) from data.csv" +``` + +### 3. Analyze Profiles + +Use dtail-tools for quick analysis: + +```bash +# List all profiles +./dtail-tools profile -mode list + +# Analyze a specific profile +./dtail-tools profile -mode analyze profiles/dcat_cpu_20240101_120000.prof + +# Open web browser with flame graph +./dtail-tools profile -mode analyze profiles/dcat_cpu_*.prof -web + +# You can also use go tool pprof directly: +go tool pprof profiles/dcat_cpu_20240101_120000.prof +``` + +## Profiling Options + +### Command-line Flags + +All dtail commands support these profiling flags: + +- `-cpuprofile`: Enable CPU profiling only +- `-memprofile`: Enable memory profiling only +- `-profile`: Enable both CPU and memory profiling +- `-profiledir <dir>`: Directory to store profiles (default: "profiles") + +### Profile Types + +1. **CPU Profile** (`*_cpu_*.prof`) + - Samples CPU usage during execution + - Identifies hot functions and code paths + - Useful for optimizing computational bottlenecks + +2. **Memory Profile** (`*_mem_*.prof`) + - Captures heap allocations at end of execution + - Shows memory usage by function + - Helps identify memory leaks + +3. **Allocation Profile** (`*_alloc_*.prof`) + - Tracks all allocations during execution + - More detailed than memory profile + - Useful for reducing allocation pressure + +## Using with Benchmarks + +### Automated Profiling + +Run profiling using dtail-tools: + +```bash +# Quick profiling with small datasets +./dtail-tools profile -mode quick + +# Full profiling suite +./dtail-tools profile -mode full + +# Profile dmap specifically (with MapReduce format) +./dtail-tools profile -mode dmap +``` + +This tool: +- Generates test data of various sizes +- Profiles dcat, dgrep, and dmap with different workloads +- Stores profiles in the `profiles` directory +- Provides immediate analysis of results + +### Using Make Targets + +```bash +# Quick profiling with immediate results +make profile-quick + +# Full profiling suite +make profile-all + +# Profile dmap specifically +make profile-dmap + +# List available profiles +make profile-list + +# Analyze a specific profile +make profile-analyze PROFILE=profiles/dcat_cpu_*.prof + +# Open web interface for profile +make profile-web PROFILE=profiles/dcat_cpu_*.prof +``` + +### Benchmark Integration + +Run profiling-enabled benchmarks: + +```bash +cd benchmarks +go test -bench="WithProfiling" -benchtime=1x +``` + +### Custom Profile Runner + +Use the profile runner in your benchmarks: + +```go +import "github.com/mimecast/dtail/benchmarks" + +func BenchmarkMyFeature(b *testing.B) { + benchmarks.ProfileBenchmark(b, "MyFeature", "dcat", + "--plain", "--cfg", "none", "testfile.log") +} +``` + +## Profile Analysis + +### Using go tool pprof + +For interactive analysis: + +```bash +# Interactive mode +go tool pprof profiles/dcat_cpu_*.prof + +# Common pprof commands: +# top - Show top functions +# list func - Show source code for function +# web - Generate SVG graph +# peek func - Show callers/callees of function +``` + +Generate visualizations: + +```bash +# Flame graph (requires graphviz) +go tool pprof -http=:8080 profiles/dcat_cpu_*.prof + +# Generate SVG +go tool pprof -svg profiles/dgrep_mem_*.prof > profile.svg + +# Generate text report +go tool pprof -text profiles/dmap_alloc_*.prof > report.txt +``` + +### Using dtail-tools profile + +The dtail-tools profile command provides quick summaries: + +```bash +# List all profiles +./dtail-tools profile -mode list + +# Analyze specific profile +./dtail-tools profile -mode analyze profiles/dcat_cpu_20240101_120000.prof + +# Get help +./dtail-tools profile -h +``` + +## Optimization Workflow + +1. **Baseline Performance** + ```bash + # Run benchmarks without profiling + cd benchmarks + go test -bench="BenchmarkDCat" -benchtime=10s + ``` + +2. **Profile Execution** + ```bash + # Run with profiling + ./dcat -profile -profiledir profiles large_file.log + ``` + +3. **Identify Bottlenecks** + ```bash + # Analyze CPU profile + ./dprofile -profile profiles/dcat_cpu_*.prof -top 10 + + # Check memory allocations + go tool pprof -alloc_space profiles/dcat_alloc_*.prof + ``` + +4. **Optimize Code** + - Focus on functions with high Flat% (direct CPU usage) + - Reduce allocations in hot paths + - Consider buffering and pooling + +5. **Verify Improvements** + ```bash + # Re-run benchmarks after optimization + go test -bench="BenchmarkDCat" -benchtime=10s + ``` + +## Common Performance Issues + +### CPU Bottlenecks + +Look for: +- Regex compilation in loops +- Excessive string operations +- Inefficient algorithms (O(n²) or worse) +- Unnecessary type conversions + +Example optimization: +```go +// Before: Regex compiled every time +for _, line := range lines { + if regexp.MustCompile(pattern).MatchString(line) { + // ... + } +} + +// After: Compile once +re := regexp.MustCompile(pattern) +for _, line := range lines { + if re.MatchString(line) { + // ... + } +} +``` + +### Memory Issues + +Common patterns: +- String concatenation in loops +- Large temporary slices +- Unclosed resources +- Excessive goroutines + +Example optimization: +```go +// Before: Many allocations +result := "" +for _, s := range strings { + result += s + "\n" +} + +// After: Single allocation +var buf strings.Builder +buf.Grow(estimatedSize) +for _, s := range strings { + buf.WriteString(s) + buf.WriteByte('\n') +} +result := buf.String() +``` + +## Tips and Best Practices + +1. **Profile Real Workloads** + - Use production-like data sizes + - Test with actual file formats + - Include network operations if relevant + +2. **Compare Profiles** + ```bash + # Compare before/after optimization + go tool pprof -diff_base=before.prof after.prof + ``` + +3. **Focus on Hot Paths** + - Optimize functions with >5% CPU usage first + - Small improvements in hot paths have big impact + +4. **Memory Profiling** + - Use `-alloc_space` for total allocations + - Use `-inuse_space` for current heap usage + - Check for growing heap over time + +5. **Benchmark Regularly** + - Add profiling to CI/CD pipeline + - Track performance over releases + - Set performance regression alerts + +## Troubleshooting + +### No profiles generated +- Check write permissions for profile directory +- Ensure command completes successfully +- Verify profiling flags are correct + +### Empty or small profiles +- Run command with larger workload +- Increase execution time +- Check if command exits too quickly + +### Analysis tools fail +- Ensure profile format is valid +- Check Go version compatibility +- Verify graphviz is installed for visualizations + +## Advanced Usage + +### Custom Profiling Points + +Add profiling snapshots in code: + +```go +import "github.com/mimecast/dtail/internal/profiling" + +func processLargeFile() { + profiler := profiling.GetProfiler() // Assumes global profiler + + // Take memory snapshot before processing + profiler.Snapshot("before_processing") + + // ... process file ... + + // Take snapshot after + profiler.Snapshot("after_processing") +} +``` + +### Continuous Profiling + +For long-running operations: + +```go +// Start periodic metrics logging +ticker := time.NewTicker(30 * time.Second) +go func() { + for range ticker.C { + profiler.LogMetrics("periodic") + } +}() +defer ticker.Stop() +``` + +## Contributing + +When adding new features: +1. Include benchmark tests +2. Run profiling before submitting PR +3. Document any performance implications +4. Add profiling examples for new commands + +## References + +- [Go Profiling Documentation](https://go.dev/blog/pprof) +- [pprof Tool Guide](https://github.com/google/pprof) +- [Go Performance Tips](https://go.dev/wiki/Performance)
\ No newline at end of file diff --git a/doc/refactoring_guide.md b/doc/refactoring_guide.md new file mode 100644 index 0000000..88c77d5 --- /dev/null +++ b/doc/refactoring_guide.md @@ -0,0 +1,240 @@ +# Integration Tests Refactoring Guide + +## Overview + +This guide outlines the refactoring opportunities for the dtail integration tests to reduce code duplication and improve maintainability. + +## Key Benefits of Refactoring + +1. **Reduced Code Duplication**: ~40-50% reduction in test code +2. **Improved Maintainability**: Changes to common patterns only need to be made in one place +3. **Better Test Hygiene**: Automatic cleanup using `t.Cleanup()` +4. **Clearer Test Intent**: Helper functions make tests more readable +5. **Reduced Copy-Paste Errors**: Less boilerplate to copy incorrectly + +## Common Patterns Identified + +### 1. Test Skip Pattern +**Before:** +```go +if !config.Env("DTAIL_INTEGRATION_TEST_RUN_MODE") { + t.Log("Skipping") + return +} +``` + +**After:** +```go +skipIfNotIntegrationTest(t) +``` + +### 2. Server Setup Pattern +**Before:** +```go +port := getUniquePortNumber() +bindAddress := "localhost" +ctx, cancel := context.WithCancel(context.Background()) +defer cancel() + +_, _, _, err := startCommand(ctx, t, + "", "../dserver", + "--cfg", "none", + "--logger", "stdout", + "--logLevel", "error", + "--bindAddress", bindAddress, + "--port", fmt.Sprintf("%d", port), +) +if err != nil { + t.Error(err) + return +} +time.Sleep(500 * time.Millisecond) +``` + +**After:** +```go +server := NewTestServer(t) +if err := server.Start("error"); err != nil { + t.Error(err) + return +} +``` + +### 3. File Cleanup Pattern +**Before:** +```go +defer os.Remove(outFile) +defer os.Remove(csvFile) +defer os.Remove(queryFile) +``` + +**After:** +```go +cleanupFiles(t, outFile, csvFile, queryFile) +// or +fileSet := &TestFileSet{...} +fileSet.Cleanup(t) +``` + +### 4. Command Arguments Pattern +**Before:** +```go +args := []string{ + "--plain", "--cfg", "none", + "--servers", fmt.Sprintf("%s:%d", bindAddress, port), + "--trustAllHosts", "--noColor", + "--files", inFile, +} +``` + +**After:** +```go +args := NewCommandArgs() +args.Plain = true +args.Servers = []string{server.Address()} +args.TrustAllHosts = true +args.NoColor = true +args.Files = []string{inFile} +// args.ToSlice() produces the string array +``` + +### 5. Dual Mode Testing Pattern +**Before:** +```go +func TestX(t *testing.T) { + if !config.Env("DTAIL_INTEGRATION_TEST_RUN_MODE") { + t.Log("Skipping") + return + } + + t.Run("Serverless", func(t *testing.T) { + testXServerless(t) + }) + + t.Run("ServerMode", func(t *testing.T) { + testXWithServer(t) + }) +} +``` + +**After:** +```go +func TestX(t *testing.T) { + runDualModeTest(t, DualModeTest{ + Name: "TestX", + ServerlessTest: testXServerless, + ServerTest: testXWithServer, + }) +} +``` + +## Refactoring Strategy + +### Phase 1: Add Helper Functions +1. Add `testhelpers.go` with all common utilities +2. Ensure all tests still pass + +### Phase 2: Refactor Test by Test +1. Start with simpler tests (e.g., dcat_test.go) +2. Refactor one test function at a time +3. Run tests after each refactoring +4. Commit after each file is complete + +### Phase 3: Additional Improvements +1. Add table-driven tests where appropriate +2. Create test fixtures for common scenarios +3. Add more sophisticated helpers as patterns emerge + +## Example Refactoring Results + +### Before (TestDCat1WithServer): +```go +func testDCat1WithServer(t *testing.T, inFile string) error { + outFile := "dcat1.out" + port := getUniquePortNumber() + bindAddress := "localhost" + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + _, _, _, err := startCommand(ctx, t, + "", "../dserver", + "--cfg", "none", + "--logger", "stdout", + "--logLevel", "error", + "--bindAddress", bindAddress, + "--port", fmt.Sprintf("%d", port), + ) + if err != nil { + return err + } + + time.Sleep(500 * time.Millisecond) + + _, err = runCommand(ctx, t, outFile, + "../dcat", "--plain", "--cfg", "none", + "--servers", fmt.Sprintf("%s:%d", bindAddress, port), + "--files", inFile, + "--trustAllHosts", + "--noColor") + if err != nil { + return err + } + + cancel() + + if err := compareFiles(t, outFile, inFile); err != nil { + return err + } + + os.Remove(outFile) + return nil +} +``` + +### After: +```go +func testDCat1WithServer_Refactored(t *testing.T, inFile string) { + fileSet := &TestFileSet{ + InputFile: inFile, + OutputFile: "dcat1.out", + ExpectedFile: inFile, + } + fileSet.Cleanup(t) + + server := NewTestServer(t) + if err := server.Start("error"); err != nil { + t.Error(err) + return + } + + args := NewCommandArgs() + args.Plain = true + args.Servers = []string{server.Address()} + args.Files = []string{inFile} + args.TrustAllHosts = true + args.NoColor = true + + err := runCommandAndVerify(t, server.ctx, fileSet.OutputFile, fileSet.ExpectedFile, + "../dcat", args.ToSlice()...) + if err != nil { + t.Error(err) + } +} +``` + +## Metrics + +Based on the examples: +- **Lines of code reduction**: ~45% +- **Boilerplate elimination**: ~70% +- **Improved readability**: Subjective but significant +- **Error-prone patterns eliminated**: Port management, cleanup, context handling + +## Next Steps + +1. Review and approve the helper functions +2. Create a PR with `testhelpers.go` +3. Incrementally refactor tests in separate PRs +4. Document any new patterns that emerge +5. Consider creating a test generator for common scenarios
\ No newline at end of file diff --git a/doc/turboboost_optimization.md b/doc/turboboost_optimization.md new file mode 100644 index 0000000..f13943f --- /dev/null +++ b/doc/turboboost_optimization.md @@ -0,0 +1,118 @@ +# DTail Turbo Boost Optimization + +## Overview + +This document describes the turbo boost optimization feature that provides significant performance improvements for DTail operations by using channel-less processing and optimized I/O. + +## Problem Statement + +The original dgrep implementation used multiple channels in a pipeline: +- `rawLines chan *bytes.Buffer` (buffer: 100) - Raw lines read from file +- `lines chan *line.Line` (buffer: 100) - Filtered lines to send to client + +This created several performance issues: +1. Fixed channel buffer sizes causing blocking under high throughput +2. Context switching overhead between goroutines +3. Channel synchronization overhead +4. Memory allocations for channel operations + +## Solution + +The channel-less implementation replaces the channel pipeline with direct function calls using a `LineProcessor` interface. + +### Key Components + +1. **LineProcessor Interface** (`internal/io/line/processor.go`) + - Defines methods for processing lines without channels + - `ProcessLine()` - Handle a single line + - `Flush()` - Ensure buffered data is written + - `Close()` - Clean up resources + +2. **GrepLineProcessor** (`internal/server/handlers/lineprocessor.go`) + - Implements LineProcessor for grep operations + - Writes directly to the network connection + - Uses internal buffering for efficiency (64KB buffer) + - Thread-safe with mutex protection + +3. **Modified File Reading** (`internal/io/fs/readfile_processor.go`) + - `StartWithProcessor()` - Channel-less file reading + - Direct callbacks instead of channel sends + - Inline regex filtering without goroutines + +4. **Optimized File Reading** (`internal/io/fs/readfile_processor_optimized.go`) + - Uses buffered line reading instead of byte-by-byte + - Custom scanner with 256KB buffer + - Efficient handling of long lines + - Special optimization for tail mode + +### Feature Flags + +The implementation can be controlled via environment variables: +- `DTAIL_CHANNELLESS_GREP=yes` - Enable channel-less grep implementation +- `DTAIL_OPTIMIZED_READER=yes` - Use optimized buffered reader + +### Benefits + +1. **Reduced Latency**: No channel queuing delays +2. **Lower Memory Usage**: No channel buffers +3. **Better CPU Efficiency**: Fewer context switches +4. **Simpler Code Flow**: Direct processing without goroutine coordination +5. **Predictable Performance**: No channel blocking + +### Backward Compatibility + +- Original channel-based implementation remains available +- Same command-line interface +- Protocol compatibility maintained +- All integration tests pass unchanged + +### Performance Testing + +Use the provided script to compare performance: + +```bash +./test_channelless_performance.sh +``` + +This will test: +1. Original channel-based implementation +2. Channel-less implementation +3. Optimized channel-less implementation + +### Usage + +To use the channel-less implementation: + +```bash +# Enable channel-less grep +export DTAIL_CHANNELLESS_GREP=yes + +# Also enable optimized reader +export DTAIL_OPTIMIZED_READER=yes + +# Run dgrep normally +dgrep -regex "pattern" file.log +``` + +### Future Improvements + +1. Extend channel-less approach to other commands (dcat, dtail) +2. Add configurable buffer sizes +3. Implement zero-copy optimizations +4. Add performance metrics collection +5. Consider using io_uring on Linux for async I/O + +## Usage + +To enable turbo boost optimizations: + +```bash +export DTAIL_TURBOBOOST_ENABLE=yes +``` + +This enables: +- Channel-less implementation for grep and cat operations +- Optimized buffered I/O reader (256KB buffer) +- Buffer pooling to reduce memory allocations + +The turbo boost mode is designed to be extended to other DTail commands in the future. |
