summaryrefslogtreecommitdiff
path: root/doc/architecture.md
blob: ccb5cbd0df6802dc8bb57015808f4e5466ba5ffe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# GitSyncer Architecture

## Overview

GitSyncer is designed as a command-line tool that synchronizes Git repositories across multiple platforms. It follows a modular architecture with clear separation of concerns between CLI handling, API clients, configuration management, and core synchronization logic.

## High-Level Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                         CLI Layer                             │
│                    (cmd/gitsyncer/main.go)                   │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────┴───────────────────────────────────┐
│                     CLI Handlers                             │
│              (internal/cli/handlers.go)                      │
│           (internal/cli/sync_handlers.go)                    │
└──────┬──────────────────┬──────────────────┬────────────────┘
       │                  │                  │
┌──────┴──────┐    ┌──────┴──────┐    ┌─────┴──────┐
│   Config    │    │  API Clients │    │    Sync    │
│  Manager    │    │              │    │   Engine   │
│(config.go)  │    │ - GitHub     │    │ (sync.go)  │
│             │    │ - Codeberg   │    │            │
└─────────────┘    └──────────────┘    └────────────┘
```

## Component Architecture

### 1. Entry Point (cmd/gitsyncer/main.go)

The main function serves as the application entry point and:
- Parses command-line flags
- Routes to appropriate handlers based on flags
- Manages application lifecycle and exit codes

### 2. CLI Layer (internal/cli/)

The CLI layer is responsible for user interaction and consists of:

#### flags.go
- Defines all command-line flags
- Provides flag parsing logic
- Returns a structured `Flags` object

#### handlers.go
- General command handlers (version, config, list operations)
- Configuration loading and validation
- Error presentation to users

#### sync_handlers.go
- Sync-specific operations
- Orchestrates API clients and sync engine
- Handles batch operations

### 3. Configuration Management (internal/config/)

- Loads JSON configuration files
- Validates configuration structure
- Provides helper methods for finding organizations
- Supports multiple configuration file locations

### 4. API Clients

#### GitHub Client (internal/github/)
- Authenticates using personal access tokens
- Creates repositories via GitHub API
- Lists public repositories with pagination
- Handles multiple token sources (config, env, file)

#### Codeberg Client (internal/codeberg/)
- Interacts with Codeberg's Gitea API
- Lists public repositories for users/organizations
- Supports pagination for large repository lists
- No authentication required for public operations

### 5. Sync Engine (internal/sync/)

The core synchronization logic is divided into several components:

#### sync.go - Main Orchestrator
- Coordinates the entire sync process
- Manages working directory
- Handles repository-level operations

#### repository_setup.go
- Clones repositories or ensures they exist
- Configures Git remotes
- Handles initial repository setup

#### branch_sync.go
- Manages branch-level synchronization
- Tracks which remotes have which branches
- Orchestrates merge and push operations

#### git_operations.go
- Low-level Git command wrappers
- Handles merge conflicts
- Manages stashing and checkout operations

#### branch_filter.go
- Implements regex-based branch filtering
- Excludes branches based on patterns
- Provides filtering reports

#### branch_analyzer.go
- Detects abandoned branches (6+ months inactive)
- Generates abandonment reports
- Analyzes branch activity

### 6. Version Management (internal/version/)

- Provides version information
- Supports build-time metadata injection
- Formats version strings for display

## Data Flow

### Sync Operation Flow

1. **Configuration Loading**
   ```
   User → CLI → Config Loader → Config Validation
   ```

2. **Repository Discovery**
   ```
   Config → API Clients → Repository Lists → Filtering
   ```

3. **Synchronization Process**
   ```
   For each repository:
     └→ Setup/Clone Repository
     └→ Configure Remotes
     └→ Fetch All Remotes
     └→ Get All Branches
     └→ Filter Branches
     └→ For each branch:
         └→ Checkout/Create Branch
         └→ Merge from Remotes
         └→ Push to All Remotes
     └→ Analyze Abandoned Branches
     └→ Generate Reports
   ```

## Design Principles

### 1. Modularity
Each package has a single, well-defined responsibility:
- CLI handling is separate from business logic
- API clients are independent and interchangeable
- Core sync logic is platform-agnostic

### 2. Configuration-Driven
- All behavior is controlled via configuration
- No hard-coded organization or repository names
- Flexible remote naming based on hosts

### 3. Error Handling
- Graceful degradation (missing repos don't stop sync)
- Clear error messages with actionable guidance
- Proper exit codes for scripting

### 4. Extensibility
- New platforms can be added by implementing API clients
- Branch filtering is regex-based for flexibility
- Sync strategies can be extended

## Security Considerations

### Token Management
- GitHub tokens are never logged or displayed
- Multiple token sources for flexibility
- Tokens are loaded on-demand

### Git Operations
- All operations use standard Git commands
- No custom Git protocol implementation
- Respects Git's security model

## Performance Characteristics

### Scalability
- Handles multiple repositories in sequence
- Pagination support for large repository lists
- Efficient branch filtering

### Resource Usage
- Minimal memory footprint
- Disk usage proportional to repository sizes
- Network usage optimized with selective fetching

## Future Architecture Considerations

### Planned Enhancements
1. **Parallel Synchronization** - Sync multiple repos concurrently
2. **Webhook Support** - Trigger syncs on push events
3. **More Platforms** - GitLab, Bitbucket, Gitea support
4. **Conflict Resolution** - Automated conflict resolution strategies

### Extension Points
- Platform interface for new Git hosts
- Pluggable authentication mechanisms
- Custom sync strategies
- Hook system for pre/post sync actions