diff options
| author | Paul Buetow <paul@buetow.org> | 2026-01-06 23:25:39 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-01-06 23:25:39 +0200 |
| commit | 95b08c4015157a0035a3a39cea3fca8713a3417e (patch) | |
| tree | bae63d7867e25c522d80437bba547d80d82fa7d4 | |
| parent | 58e1d6fa34364ecec2d54525992280933a94f758 (diff) | |
Refactor AGENT.md to focus on infrastructure knowledge
Removed troubleshooting narrative and restructured to document the
system architecture, configuration patterns, and operational knowledge.
Now covers:
- Architecture overview and component responsibilities
- Configuration array roles (@acme_hosts, @f3s_hosts, @prefixes)
- Template processing and variable scoping
- Routing configuration logic
- TLS certificate management in multi-server deployments
- Server block patterns and duplicate prevention
- Server-specific vs. shared host configuration
- Deployment process and testing procedures
- Monitoring system (Gogios) behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
| -rw-r--r-- | frontends/AGENT.md | 262 |
1 files changed, 157 insertions, 105 deletions
diff --git a/frontends/AGENT.md b/frontends/AGENT.md index 19b17ce..6f133b2 100644 --- a/frontends/AGENT.md +++ b/frontends/AGENT.md @@ -1,18 +1,6 @@ -# Agent Learning Notes: Debugging 404 Errors for blowfish/fishfinger URLs +# Frontend Infrastructure Knowledge -## Problem Summary - -URLs `https://blowfish.buetow.org/index.txt` and `https://fishfinger.buetow.org/index.txt` were returning 404 errors instead of serving the health check files. - -## Root Cause - -The hostnames `blowfish.buetow.org` and `fishfinger.buetow.org` were missing from the `@acme_hosts` array in the Rexfile. This caused: - -1. **No explicit routing rules in relayd**: Only hosts in `@acme_hosts` get explicit routing to `<localhost>` (httpd) in `relayd.conf.tpl:45-50` -2. **Fall-through to f3s backends**: Without routing rules, requests fell through to the default f3s cluster backends -3. **404 from f3s cluster**: The k3s cluster didn't know about these server hostnames, resulting in 404 errors - -## Architecture Understanding +## Architecture Overview ### Request Flow ``` @@ -21,140 +9,204 @@ Internet → relayd (port 443) → routing decision → httpd (port 8080) or f3s ### Key Components -1. **relayd.conf.tpl**: Reverse proxy that: - - Terminates TLS on port 443 - - Routes requests based on Host header matching - - Has two backend pools: `<localhost>` (httpd) and `<f3s>` (k3s cluster) - - Falls back to f3s cluster when no explicit routing match +**relayd** - Reverse proxy that: +- Terminates TLS on port 443 (IPv4 and IPv6) +- Routes requests based on Host header matching +- Has two backend pools: + - `<localhost>` (127.0.0.1, ::1) - Routes to local httpd on port 8080 + - `<f3s>` (192.168.2.120-122) - Routes to f3s k3s cluster on port 80 +- Falls back to f3s cluster when no explicit routing match exists + +**httpd** - OpenBSD httpd that: +- Listens on port 8080 (behind relayd) +- Listens on port 80 for ACME challenges and HTTP→HTTPS redirects +- Serves static content for various domains +- Has server-specific blocks for each server's own hostname + +**Rexfile** - Configuration management using Rex (Perl): +- Defines configuration arrays (`@acme_hosts`, `@f3s_hosts`, etc.) +- Templates use these arrays to generate httpd and relayd configs +- Deploys to both blowfish and fishfinger servers in parallel +- Each server receives templates processed with its own `$hostname` value + +## Configuration Arrays + +### @acme_hosts +Controls which hosts get: +- ACME certificate requests +- HTTP port 80 server blocks for ACME challenges +- Explicit routing rules in relayd to `<localhost>` + +**Critical**: Hosts NOT in `@acme_hosts` will fall through to f3s cluster backends in relayd. -2. **httpd.conf.tpl**: OpenBSD httpd that: - - Listens on port 8080 (behind relayd) - - Serves static content for various domains - - Has a dedicated "Current server's FQDN" block for each server's own hostname +### @f3s_hosts +Hosts served by the f3s k3s cluster: +- Get fallback page served by httpd +- Special routing rules in relayd to f3s backends -3. **Rexfile**: Configuration management using Rex (Perl): - - Defines `@acme_hosts` array controlling which hosts get ACME certs and routing rules - - Templates use this array to generate both httpd and relayd configs - - Deployed to both blowfish and fishfinger servers +### @prefixes +Array: `('', 'www.', 'standby.')` -## Solution Implementation +Used in loops to create hostname variants: +- `foo.zone` +- `www.foo.zone` +- `standby.foo.zone` + +## Template Processing + +Rex processes `.tpl` files using embedded Perl: -### 1. Added Hostnames to @acme_hosts (Rexfile:86) ```perl -our @acme_hosts = - qw/.../gogios.buetow.org blowfish.buetow.org fishfinger.buetow.org/; +<% ... -%> # Perl code (- suppresses trailing newline) +<%= $var %> # Print variable value ``` -This ensures both servers are included in routing rules. +Templates are processed **per-server** with different values: +- `$hostname` = "blowfish" or "fishfinger" +- `$domain` = "buetow.org" +- `$hostname.$domain` = "blowfish.buetow.org" or "fishfinger.buetow.org" + +## Routing Configuration + +### Explicit Routing Rules (relayd.conf.tpl:45-50) -### 2. Prevented Duplicate Server Blocks (httpd.conf.tpl:3-5) ```perl <% for my $host (@$acme_hosts) { - # Skip current server's hostname - handled by dedicated block below - next if $host eq "$hostname.$domain"; + next if grep { $_ eq $host } @$f3s_hosts; + for my $prefix (@prefixes) { -%> +match request header "Host" value "<%= $prefix.$host -%>" forward to <localhost> ``` -Each server has a dedicated block at lines 18-37 serving from `/htdocs/buetow.org/self`. Without this skip, adding them to `@acme_hosts` would create duplicate server blocks on port 80, causing httpd to fail. +- Only hosts in `@acme_hosts` get explicit routing to `<localhost>` +- Excludes f3s hosts (they have separate routing) +- Creates rules for all prefixes ('', 'www.', 'standby.') + +### Routing Logic + +**Routing is explicit, not implicit**: Just because httpd has a server block doesn't mean relayd will route to it. The routing decision happens in relayd based on: + +1. Explicit Host header match → route to specified backend +2. No match → fall through to default relay backends (f3s cluster first, then localhost) + +## TLS Certificate Management + +### Certificate Loading (relayd.conf.tpl:24-31) -### 3. Prevented Missing TLS Certificates (relayd.conf.tpl:25-27) ```perl -<% for my $host (@$acme_hosts) { - # Skip server hostnames - each server only has its own cert - next if $host eq 'blowfish.buetow.org' or $host eq 'fishfinger.buetow.org'; +http protocol "https" { + <% for my $host (@$acme_hosts) { -%> + tls keypair <%= $host %> + tls keypair standby.<%= $host %> + <% } -%> + tls keypair <%= $hostname.'.'.$domain -%> ``` -**Critical insight**: When deploying to blowfish, the config tries to load TLS certs for ALL hosts in `@acme_hosts`. But blowfish only has `blowfish.buetow.org.crt`, not `fishfinger.buetow.org.crt`. Similarly, fishfinger only has its own cert. The dedicated line `tls keypair <%= $hostname.'.'.$domain -%>` at line 31 loads the correct cert for each server. +**Critical insight**: In multi-server deployments, each server only has its own TLS certificate. -## Debugging Methodology +- blowfish has: `blowfish.buetow.org.crt` (NOT fishfinger's cert) +- fishfinger has: `fishfinger.buetow.org.crt` (NOT blowfish's cert) -### 1. Test Actual Endpoints First -```bash -curl -s https://blowfish.buetow.org/index.txt # Test reality -``` -vs. relying on monitoring dashboards which may show cached/stale data. +When the template runs on blowfish, it tries to load certs for ALL hosts in `@acme_hosts`. If fishfinger.buetow.org is in the array, relayd will fail to start because that cert doesn't exist on blowfish. -### 2. Check Configuration Syntax Before Deploy -```bash -ssh rex@server "doas httpd -n" # Test httpd config -ssh rex@server "doas relayd -n" # Test relayd config +**Solution pattern**: Skip server-specific hostnames in the loop, use dedicated keypair line: +```perl +<% for my $host (@$acme_hosts) { + next if $host eq 'blowfish.buetow.org' or $host eq 'fishfinger.buetow.org'; -%> ``` -### 3. Understand Monitoring Intervals -Gogios TLS checks have `RunInterval: 3600` (1 hour). After fixing issues, old failures may persist until: -- Next scheduled check -- Manual force run: `gogios -cfg /etc/gogios.json -force` +The line `tls keypair <%= $hostname.'.'.$domain -%>` loads the correct cert for each server. -However, `-force` only updates the report timestamp, it doesn't override individual check intervals. True verification requires manual testing or waiting for interval expiry. +## Server Block Management -## Template Architecture Insights +### httpd.conf.tpl Patterns -### Variable Scoping -- `$hostname`: Current server being deployed to (blowfish or fishfinger) -- `$domain`: Domain suffix (buetow.org) -- `@acme_hosts`: Global list of all hosts needing ACME certs and routing -- `@f3s_hosts`: Hosts served by f3s k3s cluster -- `@prefixes`: ('', 'www.', 'standby.') for creating hostname variants +**ACME and redirect blocks (port 80)**: +```perl +<% for my $host (@$acme_hosts) { + next if $host eq "$hostname.$domain"; # Skip current server + for my $prefix (@prefixes) { -%> +server "<%= $prefix.$host %>" { + listen on * port 80 +``` -### Template Processing -Rex processes `.tpl` files using embedded Perl: -- `<% ... -%>`: Perl code (suppress trailing newline with -) -- `<%= $var %>`: Print variable -- Templates are processed per-server with different `$hostname` values +**Why skip current server**: Each server has a dedicated "Current server's FQDN" block: -### Common Pitfall: Server-Specific vs. Shared Configuration -When adding a hostname to a shared array like `@acme_hosts`, consider: -1. Does each server have the required TLS certificates? -2. Will this create duplicate server blocks? -3. Is this hostname server-specific (like server FQDNs) or shared (like service domains)? +```perl +server "<%= "$hostname.$domain" %>" { + listen on * port 80 + ... +} +``` + +Without the skip, adding server hostnames to `@acme_hosts` creates duplicate server blocks, causing httpd to fail with "server defined twice" error. -For server FQDNs (blowfish.buetow.org, fishfinger.buetow.org): -- **Routing**: Needs to be in `@acme_hosts` for relayd routing rules -- **Server blocks**: Skip in loops, use dedicated blocks instead -- **TLS certs**: Skip in loops, use dedicated keypair line instead +### Content Serving Blocks (port 8080) -## Key Learnings +Different patterns based on content type: +- **Gemtexter sites**: Serve from `/htdocs/gemtexter/<host>` +- **Server self**: Serve from `/htdocs/buetow.org/self` +- **Special hosts**: Custom root paths (e.g., gogios, joern, dory) +- **f3s fallback**: Rewrite all to `/index.html` for cluster-down message -1. **Routing is explicit, not implicit**: Just because httpd has a server block doesn't mean relayd will route to it. Routing rules must be configured separately. +## Server-Specific vs. Shared Configuration -2. **Certificate management per server**: In a multi-server setup, each server only has its own certificate, not certificates for other servers in the pool. +### Shared Hosts (Service Domains) +Examples: foo.zone, irregular.ninja, f3s.buetow.org -3. **Template loops need guards**: When iterating over shared arrays in templates that deploy to multiple servers, check if items need server-specific handling. +- Same content/routing on both servers +- Both servers have TLS certs +- Include in `@acme_hosts` without guards +- Create with prefix loops for www/standby variants -4. **Monitoring vs. reality**: Always verify fixes by testing actual endpoints. Monitoring systems may show stale data due to caching intervals. +### Server-Specific Hosts (Server FQDNs) +Examples: blowfish.buetow.org, fishfinger.buetow.org -5. **Configuration deployment is atomic**: Rex deploys templates and restarts services. Brief service interruptions during restarts can trigger monitoring alerts that resolve once services stabilize. +- Different per server +- Each server has ONLY its own cert +- Include in `@acme_hosts` for routing +- **Must skip in template loops** +- Use dedicated server blocks and keypair lines -## Files Modified +### Pattern for Adding Server FQDNs -1. `Rexfile` - Added blowfish/fishfinger to @acme_hosts -2. `etc/httpd.conf.tpl` - Skip current hostname in @acme_hosts loop -3. `etc/relayd.conf.tpl` - Skip server hostnames in TLS keypair loop +1. **Routing**: Add to `@acme_hosts` (relayd needs routing rules) +2. **ACME loop**: Skip with `next if $host eq "$hostname.$domain"` +3. **TLS loop**: Skip with `next if $host eq 'blowfish.buetow.org' or $host eq 'fishfinger.buetow.org'` +4. **Server blocks**: Use existing dedicated "Current server's FQDN" block ## Deployment Process ```bash -rex httpd relayd # Deploy to both servers in parallel +rex httpd relayd # Deploy to both servers ``` -Rex connects to both blowfish and fishfinger, generates configs with server-specific `$hostname` values, deploys files, and restarts services. +Process: +1. Rex connects to both blowfish and fishfinger in parallel +2. For each server, processes templates with server-specific `$hostname` +3. Generates `/etc/httpd.conf` and `/etc/relayd.conf` +4. Writes files and restarts services via `on_change` handlers +5. Each server gets identical config structure but different hostname values -## Verification +## Monitoring System (Gogios) -```bash -# Test endpoints -curl -s https://blowfish.buetow.org/index.txt # Should return health check text -curl -s https://fishfinger.buetow.org/index.txt # Should return health check text +- Runs as user `_gogios` +- Config: `/etc/gogios.json` +- Output: `/var/www/htdocs/buetow.org/self/gogios/index.html` +- Cron schedule: Every 5 minutes between 08:00-22:00 +- Check intervals: Independent from cron (e.g., TLS checks every 3600s) -# Check service status -ssh -p 2 rex@server "doas rcctl check httpd && doas rcctl check relayd" -``` +**Important**: Check intervals (`RunInterval`) are independent from cron schedule. A check with 3600s interval won't re-run just because cron triggered, it runs only when interval expires. -## Future Considerations +## Configuration Testing -When adding new server hostnames: -1. Add to `@acme_hosts` for routing -2. Add `next if $host eq "$hostname.$domain"` guards in template loops -3. Ensure dedicated blocks exist for server-specific config -4. Remember each server only has its own TLS certificate -5. Test config syntax before deploying -6. Verify endpoints after deployment, don't rely on monitoring +Before deploying: +```bash +ssh rex@server "doas httpd -n" # Test httpd config syntax +ssh rex@server "doas relayd -n" # Test relayd config syntax +``` + +After deploying: +```bash +ssh rex@server "doas rcctl check httpd" +ssh rex@server "doas rcctl check relayd" +``` |
