summaryrefslogtreecommitdiff
path: root/doc/pgo_commands_detail.md
blob: e874ba84b7240e3794fd273d28ef54f3590f2549 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# PGO Command Execution Details

This document shows the exact commands executed during Profile-Guided Optimization (PGO) generation for DTail tools.

## Overview

When running `make pgo-generate` or `dtail-tools pgo`, the following commands are executed to generate performance profiles for each tool.

## Commands Executed

### 1. Building Baseline Binaries

```bash
go build -o pgo-build/dtail-baseline ./cmd/dtail
go build -o pgo-build/dcat-baseline ./cmd/dcat
go build -o pgo-build/dgrep-baseline ./cmd/dgrep
go build -o pgo-build/dmap-baseline ./cmd/dmap
go build -o pgo-build/dserver-baseline ./cmd/dserver
```

### 2. Profile Generation Commands

#### DTail Profile Generation
```bash
# Background log writer process
bash -c "for i in {1..200}; do 
  level=$((i % 4))
  case $level in 
    0) lvl=INFO;; 
    1) lvl=WARN;; 
    2) lvl=ERROR;; 
    3) lvl=DEBUG;; 
  esac
  echo \"[2025-07-04 15:00:00] $lvl - Test log line number $i with some additional text to process\" >> growing.log
  sleep 0.015
done"

# DTail command
pgo-build/dtail-baseline \
  -cfg none \
  -plain \
  -profile \
  -profiledir pgo-profiles/iter_dtail_TIMESTAMP \
  -regex "ERROR|WARN" \
  -shutdownAfter 3 \
  pgo-profiles/growing.log
```

#### DCat Profile Generation
```bash
pgo-build/dcat-baseline \
  -cfg none \
  -plain \
  -profile \
  -profiledir pgo-profiles/iter_dcat_TIMESTAMP \
  pgo-profiles/test.log
```

#### DGrep Profile Generation
```bash
pgo-build/dgrep-baseline \
  -cfg none \
  -plain \
  -profile \
  -profiledir pgo-profiles/iter_dgrep_TIMESTAMP \
  -regex "ERROR|WARN" \
  pgo-profiles/test.log
```

#### DMap Profile Generation
```bash
pgo-build/dmap-baseline \
  -cfg none \
  -plain \
  -profile \
  -profiledir pgo-profiles/iter_dmap_TIMESTAMP \
  -files pgo-profiles/test.csv \
  -query "select status, count(*) group by status"
```

#### DServer Profile Generation
```bash
# Start dserver with pprof endpoint
pgo-build/dserver-baseline \
  -cfg none \
  -pprof localhost:16060 \
  -port 12222

# Client commands to generate server load (run concurrently):
pgo-build/dcat-baseline \
  -cfg none \
  -server localhost:12222 \
  pgo-profiles/test.log

pgo-build/dgrep-baseline \
  -cfg none \
  -server localhost:12222 \
  -regex "ERROR|WARN" \
  pgo-profiles/test.log

pgo-build/dgrep-baseline \
  -cfg none \
  -server localhost:12222 \
  -regex "INFO.*action" \
  pgo-profiles/test.log

pgo-build/dmap-baseline \
  -cfg none \
  -server localhost:12222 \
  -files pgo-profiles/test.csv \
  -query "select status, count(*) group by status"

pgo-build/dmap-baseline \
  -cfg none \
  -server localhost:12222 \
  -files pgo-profiles/test.csv \
  -query "select department, avg(salary) group by department"

# Capture CPU profile via HTTP
curl http://localhost:16060/debug/pprof/profile?seconds=5 > dserver.pprof
```

### 3. Profile Merging

When multiple iterations are run, profiles are merged:

```bash
# Merge multiple profile iterations
go tool pprof -proto \
  pgo-profiles/dcat.pprof.0.pprof \
  pgo-profiles/dcat.pprof.1.pprof \
  > pgo-profiles/dcat.pprof
```

### 4. Building with PGO

```bash
# Build optimized binaries using profiles
go build -pgo=pgo-profiles/dcat.pprof -o pgo-build/dcat ./cmd/dcat
go build -pgo=pgo-profiles/dgrep.pprof -o pgo-build/dgrep ./cmd/dgrep
go build -pgo=pgo-profiles/dmap.pprof -o pgo-build/dmap ./cmd/dmap
go build -pgo=pgo-profiles/dtail.pprof -o pgo-build/dtail ./cmd/dtail
go build -pgo=pgo-profiles/dserver.pprof -o pgo-build/dserver ./cmd/dserver
```

### 5. Performance Comparison Commands

Quick benchmarks are run to compare baseline vs optimized:

```bash
# Baseline benchmark
pgo-build/dcat-baseline -cfg none -plain /tmp/pgo_bench.log

# Optimized benchmark
pgo-build/dcat -cfg none -plain /tmp/pgo_bench.log

# Similar commands for dgrep and dmap
pgo-build/dgrep-baseline -cfg none -plain -regex ERROR /tmp/pgo_bench.log
pgo-build/dmap-baseline -cfg none -plain -files /tmp/pgo_bench.csv -query "select count(*)"
```

## Test Data Generation

The PGO framework generates realistic test data:

### Log File (test.log)
- Contains timestamps, log levels (INFO, WARN, ERROR, DEBUG)
- Includes user actions, durations, and status information
- Default size: 1,000,000 lines (configurable with -datasize)

### CSV File (test.csv)
- Contains employee data with departments, salaries, status
- Used for MapReduce queries
- Default size: 100,000 rows (1/10 of log file size)

### Growing Log File (growing.log)
- Used specifically for dtail testing
- Simulates real-time log generation
- Writes ~200 lines over 3 seconds with mixed log levels

## Customization Options

### Adjust Test Data Size
```bash
dtail-tools pgo -datasize 5000000  # 5 million lines
```

### Run More Iterations
```bash
dtail-tools pgo -iterations 5  # Run 5 iterations per command
```

### Profile Specific Commands Only
```bash
dtail-tools pgo dcat dgrep  # Only optimize dcat and dgrep
```

### Verbose Output
```bash
dtail-tools pgo -v  # Show all command execution details
```

### Profile Generation Only
```bash
dtail-tools pgo -profileonly  # Skip building optimized binaries
```

## Notes

1. **Empty Profiles**: Some commands (like dtail) may generate empty profiles if they are I/O-bound. This is handled gracefully.

2. **DServer Profiling**: Uses HTTP pprof endpoint instead of command-line profiling to capture server-side performance data.

3. **Concurrent Execution**: Multiple client commands are run concurrently against dserver to generate realistic load patterns.

4. **Profile Quality**: The effectiveness of PGO depends on how well the test workload represents real-world usage patterns.