diff options
| author | Paul Buetow <paul@buetow.org> | 2025-06-25 23:10:24 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-06-25 23:10:24 +0300 |
| commit | 41ec9cf2942edc7be58d78e49a050131bb2faf8c (patch) | |
| tree | a3f9dbd423c120f76e629f06524381476e948e9a /benchmarks/README.md | |
| parent | 281360144171c98641f50e938c439915c9b2580a (diff) | |
Add comprehensive benchmarking framework for DTail
- Create benchmark framework to measure performance of dcat, dgrep, and dmap
- Generate test files of 10MB, 100MB, and 1GB with configurable patterns
- Support benchmarking with gzip and zstd compressed files
- Implement tool-specific benchmarks:
* DCat: Simple reading, multiple files, compressed files
* DGrep: Pattern matching, regex complexity, context lines, inverted grep
* DMap: Aggregations, group by operations, complex queries, time intervals
- Track performance metrics: throughput (MB/sec), lines/sec, memory usage
- Save results in multiple formats: JSON, CSV, and Markdown reports
- Add Makefile targets: benchmark, benchmark-quick, benchmark-full
- Support environment variables for configuration (sizes, timeouts, etc.)
- Automatically clean up temporary .tmp files after benchmarks
The framework provides consistent performance testing across the DTail toolset
and enables tracking performance regressions between commits.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'benchmarks/README.md')
| -rw-r--r-- | benchmarks/README.md | 140 |
1 files changed, 140 insertions, 0 deletions
diff --git a/benchmarks/README.md b/benchmarks/README.md new file mode 100644 index 0000000..0b030d4 --- /dev/null +++ b/benchmarks/README.md @@ -0,0 +1,140 @@ +# DTail Benchmarks + +This directory contains comprehensive benchmarks for the DTail toolset (dcat, dgrep, dmap). + +## Overview + +The benchmarking framework tests performance across: +- Different file sizes (10MB, 100MB, 1GB) +- Various compression formats (none, gzip, zstd) +- Different query patterns and complexities +- Server mode vs serverless operation + +## Prerequisites + +Before running benchmarks, ensure all DTail binaries are built: + +```bash +cd .. +make build +``` + +## Running Benchmarks + +### Quick Benchmarks (Small Files Only) +```bash +go test -bench=BenchmarkQuick ./benchmarks +``` + +### All Benchmarks +```bash +go test -bench=. ./benchmarks +``` + +### Specific Tool Benchmarks +```bash +# DCat benchmarks only +go test -bench=BenchmarkDCat ./benchmarks + +# DGrep benchmarks only +go test -bench=BenchmarkDGrep ./benchmarks + +# DMap benchmarks only +go test -bench=BenchmarkDMap ./benchmarks +``` + +### With Memory Profiling +```bash +go test -bench=. -benchmem ./benchmarks +``` + +### Custom Configuration +```bash +# Run with specific file sizes +DTAIL_BENCH_SIZES=small,medium go test -bench=. ./benchmarks + +# Keep temporary files for inspection +DTAIL_BENCH_KEEP_FILES=true go test -bench=. ./benchmarks + +# Set custom timeout +DTAIL_BENCH_TIMEOUT=30m go test -bench=. ./benchmarks +``` + +## Benchmark Categories + +### DCat Benchmarks +- **Simple**: Sequential file reading +- **Multiple Files**: Reading 10-100 files concurrently +- **Compressed**: Performance with gzip/zstd compression +- **Server Mode**: Client-server performance comparison + +### DGrep Benchmarks +- **Simple Pattern**: Basic string matching with varying hit rates +- **Regex Pattern**: Complex regex performance +- **Context Lines**: Impact of --before/--after flags +- **Inverted**: Performance of --invert grep +- **Compressed**: Grep on compressed files + +### DMap Benchmarks +- **Simple Aggregation**: Basic count, sum, avg operations +- **Group By Cardinality**: Performance with different group sizes +- **Complex Queries**: WHERE clauses and multiple conditions +- **Time Intervals**: Time-based grouping performance +- **Custom Functions**: Performance of maskdigits, md5sum, etc. + +## Output + +Benchmark results are saved in multiple formats: +- `benchmark_results/results_TIMESTAMP.json` - Machine-readable JSON +- `benchmark_results/results_TIMESTAMP.csv` - Spreadsheet-compatible CSV +- `benchmark_results/results_TIMESTAMP.md` - Human-readable Markdown report +- `benchmark_results/latest.json` - Most recent results for easy access + +## Interpreting Results + +Key metrics: +- **MB/sec**: Throughput in megabytes per second +- **lines/sec**: Lines processed per second +- **compression_ratio**: For compressed file benchmarks +- **matched_lines**: For grep benchmarks +- **approx_groups**: For MapReduce group by operations + +## Performance Tuning + +For accurate benchmarks: +1. Run on isolated hardware +2. Disable CPU frequency scaling +3. Close unnecessary applications +4. Run multiple times and average results + +## Continuous Integration + +The benchmarks can be integrated into CI/CD pipelines: + +```yaml +# Example GitHub Actions workflow +- name: Run Benchmarks + run: | + make build + go test -bench=BenchmarkQuick ./benchmarks +``` + +## Troubleshooting + +### "Command not found" errors +Ensure DTail binaries are built: `make build` + +### Disk space issues +Benchmarks create large temporary files. Ensure sufficient disk space (>2GB). + +### Timeout errors +Increase timeout: `DTAIL_BENCH_TIMEOUT=60m go test -bench=. ./benchmarks` + +## Contributing + +When adding new benchmarks: +1. Follow existing naming conventions +2. Include warmup runs +3. Report relevant metrics +4. Clean up temporary files +5. Document in this README
\ No newline at end of file |
