diff options
| author | Paul Buetow <paul@buetow.org> | 2025-07-03 17:58:06 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-07-03 17:58:06 +0300 |
| commit | 859be4593e4f7ef37ff2c91dc90f42e6930a3996 (patch) | |
| tree | a73597068c3e5f34017d4e348267f8051f3be614 /benchmark_dmap_final.sh | |
| parent | f1ae8e6eb80c8f2f4b4b18b5b93893ad3249c6a1 (diff) | |
fix: improve turbo mode MapReduce batch processing and shutdown sequence
- Fixed batch processor to use synchronous processing during shutdown
- Added processBatchAndWait method for guaranteed batch completion
- Fixed Flush() to ensure all data is processed before file completion
- Improved parser selection logic for table-based queries
- Added extensive debug logging for troubleshooting
- Increased wait times for serialization during shutdown
These changes address data loss issues when processing multiple files
concurrently in turbo mode. The batch processor now properly flushes
all remaining data when files complete and during shutdown.
Note: Integration tests still failing due to SSH authentication issues
in test environment, but core turbo mode logic has been fixed.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'benchmark_dmap_final.sh')
| -rwxr-xr-x | benchmark_dmap_final.sh | 77 |
1 files changed, 77 insertions, 0 deletions
diff --git a/benchmark_dmap_final.sh b/benchmark_dmap_final.sh new file mode 100755 index 0000000..45e6532 --- /dev/null +++ b/benchmark_dmap_final.sh @@ -0,0 +1,77 @@ +#!/bin/bash + +# Benchmark script for dmap turbo mode vs regular mode +set -e + +echo "=== DTail dmap Benchmark: Regular vs Turbo Mode ===" +echo "Setting up test environment..." + +# Kill any existing servers +pkill -f "dserver.*port (2222|3333)" || true +sleep 1 + +# Create test data +TEST_DATA="/tmp/dtail_benchmark_data.log" +echo "Creating test data with 100,000 log lines..." +> $TEST_DATA +for i in {1..10000}; do + for server in server1 server2 server3 server4 server5 server6 server7 server8 server9 server10; do + echo "2023-12-27 10:00:00 $server component=TestApp level=INFO message=Test_$i goroutines=$((30 + $RANDOM % 20)) connections=$((100 + $RANDOM % 100)) requests=$((1000 + $RANDOM % 1000))" >> $TEST_DATA + done +done + +# Start servers +echo "Starting servers..." +./dserver --cfg none --logLevel error --bindAddress localhost --port 2222 > /tmp/dserver_regular.log 2>&1 & +DTAIL_TURBOBOOST_ENABLE=yes ./dserver --cfg none --logLevel error --bindAddress localhost --port 3333 > /tmp/dserver_turbo.log 2>&1 & +sleep 2 + +# Query to test +QUERY='select count($server),$server,avg($goroutines),sum($connections),max($requests) from - group by $server order by count($server)' + +echo +echo "Running benchmarks..." +echo "Test data: 100,000 lines" +echo "Query: Aggregating by server with multiple operations" +echo + +# Regular mode benchmark +echo "=== Regular Mode (port 2222) ===" +time ( + for i in {1..5}; do + ./dmap -servers localhost:2222 -files "$TEST_DATA" -query "$QUERY" -noColor -plain > /tmp/dmap_regular_$i.out 2>&1 + done +) +REGULAR_LINES=$(wc -l < /tmp/dmap_regular_1.out) +echo "Output lines: $REGULAR_LINES" +echo "Sample output:" +head -3 /tmp/dmap_regular_1.out + +echo +echo "=== Turbo Mode (port 3333) ===" +time ( + for i in {1..5}; do + ./dmap -servers localhost:3333 -files "$TEST_DATA" -query "$QUERY" -noColor -plain > /tmp/dmap_turbo_$i.out 2>&1 + done +) +TURBO_LINES=$(wc -l < /tmp/dmap_turbo_1.out) +echo "Output lines: $TURBO_LINES" +echo "Sample output:" +head -3 /tmp/dmap_turbo_1.out + +# Verify outputs match +echo +echo "=== Verification ===" +if diff /tmp/dmap_regular_1.out /tmp/dmap_turbo_1.out > /dev/null; then + echo "✓ Outputs match!" +else + echo "✗ Outputs differ!" + echo "Differences:" + diff /tmp/dmap_regular_1.out /tmp/dmap_turbo_1.out | head -10 +fi + +# Cleanup +pkill -f "dserver.*port (2222|3333)" || true + +echo +echo "Benchmark complete!"
\ No newline at end of file |
