diff options
| author | Paul Buetow <paul@buetow.org> | 2025-07-04 15:35:16 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-07-04 15:35:16 +0300 |
| commit | d37f32deb6cd6a575cc169adf1a1c1fba44e53d9 (patch) | |
| tree | aaf5f6abc90066892a6a23cb619969ddd4ef5574 /benchmarks/baselines/optimization_progression_summary.md | |
| parent | 1249f9ec51b1355ca17f73244dcbe0acc5556516 (diff) | |
feat: add Profile-Guided Optimization (PGO) support
- Add comprehensive PGO module in internal/tools/pgo/
- Integrate PGO into dtail-tools command with full CLI support
- Add Makefile targets for PGO workflow:
- make pgo: Full PGO workflow
- make pgo-quick: Quick PGO with smaller datasets
- make pgo-generate: Generate profiles only
- make build-pgo: Build with existing profiles
- make install-pgo: Install optimized binaries
- Add convenience functions to data generator for PGO
- Document PGO workflow in CLAUDE.md
Performance improvements observed:
- DCat: 3.8-7.0% additional improvement over turbo mode
- DGrep: Up to 19% improvement for low hit rates
- DMap: Variable impact, up to 64% for min_max on large files
Benchmarks show total performance gains (pre-turbo → turbo+PGO):
- DCat: 14-21x faster
- DGrep: 9-15x faster
- DMap: 9-29% faster
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'benchmarks/baselines/optimization_progression_summary.md')
| -rw-r--r-- | benchmarks/baselines/optimization_progression_summary.md | 105 |
1 files changed, 105 insertions, 0 deletions
diff --git a/benchmarks/baselines/optimization_progression_summary.md b/benchmarks/baselines/optimization_progression_summary.md new file mode 100644 index 0000000..c8174f6 --- /dev/null +++ b/benchmarks/baselines/optimization_progression_summary.md @@ -0,0 +1,105 @@ +# DTail Performance Optimization Progression Summary + +Generated: 2025-07-04 + +This document summarizes the performance improvements achieved through successive optimizations: +1. **Pre-Turbo Mode** (v4.3.0 baseline) +2. **Turbo Mode** (enabled by default) +3. **Turbo Mode + PGO** (Profile-Guided Optimization) + +## Executive Summary + +The optimization journey shows dramatic performance improvements, with turbo mode providing the most significant gains (up to 21x for DCat, 15x for DGrep), while PGO added incremental improvements for specific workloads. + +## DCat Performance (File Reading) + +| File Size | Pre-Turbo | Turbo Mode | Turbo + PGO | Total Improvement | +|-----------|-----------|------------|-------------|-------------------| +| **10MB** | 17.77 MB/s | 242.8 MB/s | 259.9 MB/s | **14.6x faster** | +| **100MB** | 20.66 MB/s | 318.8 MB/s | 339.2 MB/s | **16.4x faster** | +| **1GB** | 15.66 MB/s | 320.3 MB/s | 330.4 MB/s | **21.1x faster** | + +### DCat Key Insights: +- Turbo mode delivers 13.7x to 20.4x speedup +- PGO adds additional 3.8% to 7.0% improvement +- Larger files benefit more from optimizations + +## DGrep Performance (Pattern Search) + +### 10MB Files +| Hit Rate | Pre-Turbo | Turbo Mode | Turbo + PGO | Total Improvement | +|----------|-----------|------------|-------------|-------------------| +| **1%** | 30.70 MB/s | 389.5 MB/s | 417.9 MB/s | **13.6x faster** | +| **10%** | 36.61 MB/s | 308.2 MB/s | 324.0 MB/s | **8.9x faster** | +| **50%** | 24.93 MB/s | 281.2 MB/s | 285.3 MB/s | **11.4x faster** | +| **90%** | 17.24 MB/s | 247.8 MB/s | 265.6 MB/s | **15.4x faster** | + +### 100MB Files (1% hit rate) +| Metric | Pre-Turbo | Turbo Mode* | Turbo + PGO | Total Improvement | +|--------|-----------|-------------|-------------|-------------------| +| MB/s | 37.71 | ~390 (est) | 493.5 | **13.1x faster** | +| Lines/sec | 663,620 | ~6,900,000 | 8,685,054 | **13.1x faster** | + +*Estimated based on 10MB performance scaling + +### DGrep Key Insights: +- Turbo mode delivers 8.4x to 14.4x speedup across different hit rates +- PGO adds 5-10% improvement for low hit rates +- Performance scales well with file size + +## DMap Performance (MapReduce Queries) + +### 10MB Files +| Query Type | Pre-Turbo | Turbo Mode* | Turbo + PGO | Total Improvement | +|------------|-----------|-------------|-------------|-------------------| +| **count** | 14.12 MB/s | ~21.7 MB/s | 15.45 MB/s | **9.4% faster** | +| **sum_avg** | 13.30 MB/s | ~21.0 MB/s | 17.05 MB/s | **28.2% faster** | +| **min_max** | 17.77 MB/s | ~21.8 MB/s | 21.08 MB/s | **18.6% faster** | +| **multi** | 16.57 MB/s | ~21.3 MB/s | 21.34 MB/s | **28.8% faster** | + +*Estimated from benchmark comparison data + +### 1GB Files (notable results) +| Query Type | Turbo Mode | Turbo + PGO | PGO Impact | +|------------|------------|-------------|------------| +| **min_max** | 28.67 MB/s | 47.21 MB/s | **+64.7%** | +| **multi** | 43.08 MB/s | 44.06 MB/s | **+2.3%** | + +### DMap Key Insights: +- Modest overall improvements compared to DCat/DGrep +- Turbo mode impact limited due to CPU-bound nature of MapReduce +- PGO shows mixed results, excellent for min_max on large files +- Total improvements range from 9% to 29% + +## Optimization Impact Summary + +### By Operation Type: +1. **I/O-Bound Operations (DCat)**: Massive 14-21x improvement +2. **Mixed I/O/CPU Operations (DGrep)**: Substantial 9-15x improvement +3. **CPU-Bound Operations (DMap)**: Modest 9-29% improvement + +### By Optimization Stage: +1. **Turbo Mode**: Game-changing impact + - DCat: 13.7x to 20.4x speedup + - DGrep: 8.4x to 14.4x speedup + - DMap: ~25-55% speedup + +2. **PGO (Profile-Guided Optimization)**: Incremental refinements + - DCat: Additional 3.8-7.0% improvement + - DGrep: 5-10% for low hit rates, mixed for high hit rates + - DMap: Variable (-28% to +65%), workload-dependent + +## Recommendations + +1. **Turbo mode should remain enabled by default** - provides dramatic performance improvements +2. **PGO benefits are workload-specific** - consider custom PGO profiles for specific use cases +3. **MapReduce operations** may benefit from algorithm-level optimizations rather than compiler optimizations +4. **For maximum performance**: Use turbo mode + PGO for DCat/DGrep operations with sparse matches + +## Technical Details + +- **Pre-Turbo baseline**: v4.3.0 (baseline_20250626_103142_v4.3.0.txt) +- **Turbo mode baseline**: baseline_20250704_130702_turbo-enabled.txt +- **Turbo + PGO baseline**: baseline_20250704_133941_post-pgo-optimized.txt +- **CPU**: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz +- **Platform**: Linux
\ No newline at end of file |
