summaryrefslogtreecommitdiff
path: root/docs/SERVERLESS_LARGE_FILES_ISSUE.md
blob: 3ff4b5f4c10dc749768f49e0535711d645bb29b7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Serverless Mode Large File Issue

## Summary
While the serverless mode deadlock has been partially resolved, files larger than approximately 10KB still experience timeouts in serverless mode.

## Current Status
- ✅ Files up to 10KB work correctly
- ❌ Files larger than 100KB timeout
- ❌ The 72MB test_data.log used in profiling examples still hangs

## Technical Details
The current fix uses a channel-based approach to prevent deadlocks:
- Separate goroutines for reading from client/server handlers
- Buffered channels (100 slots) for data transfer
- 32KB buffer size for read operations

However, this approach still has limitations with larger files, possibly due to:
1. Channel buffer exhaustion
2. Synchronization issues between read/write operations
3. EOF handling complexities
4. Memory pressure from buffering large amounts of data

## Workaround
For profiling large files, avoid serverless mode by specifying a dummy server:
```bash
./dcat -profile -profiledir profiles -plain -cfg none -servers dummy test_data.log
```

## Proposed Solutions

### Short-term
1. Increase channel buffer sizes dynamically based on file size
2. Implement backpressure handling
3. Add proper flow control between readers and writers

### Long-term
1. Redesign serverless mode to avoid bidirectional copying
2. Implement a proper streaming architecture
3. Consider using io.Pipe with proper goroutine management
4. Add file size detection and automatic mode switching

## Testing
Use the test_serverless.go script to verify fixes:
```go
// Test different file sizes
sizes := []struct {
    name string
    size int
}{
    {"tiny", 100},        // ✅ Works
    {"small", 1024},      // ✅ Works  
    {"medium", 10240},    // ✅ Works
    {"large", 102400},    // ❌ Timeouts
    {"xlarge", 1048576},  // ❌ Timeouts
}
```

## Impact
- Profiling benchmarks work for small to medium test files
- Large file profiling requires non-serverless mode
- Integration tests may need adjustment if they use large files in serverless mode