summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-05-17 09:02:27 +0300
committerPaul Buetow <paul@buetow.org>2026-05-17 09:02:27 +0300
commit905d6179ab45df6438ba53273ab3b68a25f3ff28 (patch)
tree061de5b2fb44a120d3375749a769dc40e9834bcd
parentca37dfb4e3f5a8e4af00343fa262874da4e1080f (diff)
f3s: split storage/k3s-setup/observability references, correct thermal attribution
storage.md (942), k3s-setup.md (342), observability.md (273) split into per-topic sub-files under references/<topic>/ with short index files at the original paths. Drops the SSD TRIM Configuration section added in 8d94f79 and reframes the Thermal Troubleshooting cascade: the 2026-05-16 f0 incident was thermal alone, not the multi-cause cascade (autotrim, zrepl interval, encryption) the previous commit implied. Mitigations applied during diagnosis are not what fixed it. Other yesterday additions (zrepl DL-state recovery, CARP-when-ZFS- suspended, ZFS SUSPENDED runbook, nfs-mount-monitor improvements) are kept and routed to their respective sub-files. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
-rw-r--r--prompts/skills/f3s/SKILL.md6
-rw-r--r--prompts/skills/f3s/references/k3s-setup.md345
-rw-r--r--prompts/skills/f3s/references/k3s-setup/ingress.md120
-rw-r--r--prompts/skills/f3s/references/k3s-setup/install.md169
-rw-r--r--prompts/skills/f3s/references/k3s-setup/troubleshooting.md49
-rw-r--r--prompts/skills/f3s/references/observability.md264
-rw-r--r--prompts/skills/f3s/references/observability/freebsd.md111
-rw-r--r--prompts/skills/f3s/references/observability/stack.md158
-rw-r--r--prompts/skills/f3s/references/storage.md932
-rw-r--r--prompts/skills/f3s/references/storage/backups.md40
-rw-r--r--prompts/skills/f3s/references/storage/carp.md123
-rw-r--r--prompts/skills/f3s/references/storage/nfs-mount-monitor.md107
-rw-r--r--prompts/skills/f3s/references/storage/nfs.md162
-rw-r--r--prompts/skills/f3s/references/storage/troubleshooting.md147
-rw-r--r--prompts/skills/f3s/references/storage/zfs.md89
-rw-r--r--prompts/skills/f3s/references/storage/zrepl.md224
16 files changed, 1521 insertions, 1525 deletions
diff --git a/prompts/skills/f3s/SKILL.md b/prompts/skills/f3s/SKILL.md
index a3e7c06..858518a 100644
--- a/prompts/skills/f3s/SKILL.md
+++ b/prompts/skills/f3s/SKILL.md
@@ -24,9 +24,9 @@ Detailed reference documentation is in the `references/` subfolder:
- [f3 Rocky VM](references/f3-rocky-vm.md) — Plain Rocky Linux 9 VM on f3 (`rocky`, `192.168.1.123`), autostart policy, root SSH
- [Bootstrap Rocky bhyve VM](references/bootstrap-rocky-bhyve.md) — Runbook for creating a new plain Rocky Linux bhyve guest with unattended kickstart
- [WireGuard Mesh](references/wireguard.md) — Mesh topology, IP assignments, peer configs
-- [Storage](references/storage.md) — ZFS (zdata), CARP, NFS over stunnel, zrepl replication
-- [k3s Setup](references/k3s-setup.md) — HA k3s cluster, etcd, node IPs, kubeconfig, ArgoCD
-- [Observability](references/observability.md) — Prometheus, Grafana, Loki, Alloy, Tempo
+- [Storage](references/storage.md) — index into `references/storage/`: ZFS (zdata), zrepl, CARP, NFS over stunnel, nfs-mount-monitor, troubleshooting (incl. thermal), backups & local-path
+- [k3s Setup](references/k3s-setup.md) — index into `references/k3s-setup/`: install (bootstrap, kubeconfig, PVs, ArgoCD), ingress (OpenBSD/FreeBSD relayd, cert-manager), troubleshooting (etcd recovery)
+- [Observability](references/observability.md) — index into `references/observability/`: stack (Prometheus/Alloy/Loki/Tempo + alerting), FreeBSD monitoring (node_exporter + recording rules)
- [Immich](references/immich.md) — Photo server deployment, job queue stats, troubleshooting
- [Garage](references/garage.md) — Garage cluster, edge domain routing, S3 bucket/key workflow, troubleshooting
- [DTail / dserver](references/dtail.md) — dserver: Pis **arm64** vs r0–r2 **amd64**, r-VM **root** + `root.authorized_keys` cache, firewalld **2222**, systemd timers
diff --git a/prompts/skills/f3s/references/k3s-setup.md b/prompts/skills/f3s/references/k3s-setup.md
index 7d2239b..0fc7605 100644
--- a/prompts/skills/f3s/references/k3s-setup.md
+++ b/prompts/skills/f3s/references/k3s-setup.md
@@ -1,342 +1,11 @@
# k3s Setup
-## Overview
+3-node HA k3s cluster running on the Rocky Linux VMs r0/r1/r2 (one per
+FreeBSD bhyve host f0/f1/f2). All control-plane and etcd traffic flows
+over WireGuard.
-3-node HA k3s cluster running on Rocky Linux VMs (r0, r1, r2). All nodes act as both control-plane and etcd members (no separate worker nodes).
+## Sub-references
-- k3s version: **v1.32.6+k3s1** (as of Part 7)
-- etcd mode: **embedded HA** (`--cluster-init`)
-- All control-plane traffic goes over **WireGuard** (192.168.2.x IPs)
-
-## Prerequisites
-
-- All Rocky Linux VMs (r0, r1, r2) updated and running
-- WireGuard mesh fully configured (see wireguard.md)
-- NVMe disk emulation in place (see rocky-linux-vms.md) — critical for etcd performance
-
-## Installation
-
-### Generate shared token
-
-```sh
-# On Fedora laptop
-pwgen -n 32
-# Copy output to all r nodes:
-echo -n SECRET_TOKEN > ~/.k3s_token # on r0, r1, r2
-```
-
-### Bootstrap first node (r0)
-
-```sh
-[root@r0 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
- sh -s - server --cluster-init \
- --node-ip=192.168.2.120 \
- --advertise-address=192.168.2.120 \
- --tls-san=r0.wg0.wan.buetow.org
-```
-
-`--node-ip` and `--advertise-address` bind etcd to the WireGuard interface so all control-plane traffic is encrypted.
-
-### Join remaining nodes (r1, r2)
-
-```sh
-[root@r1 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
- sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
- --node-ip=192.168.2.121 \
- --advertise-address=192.168.2.121 \
- --tls-san=r1.wg0.wan.buetow.org
-
-[root@r2 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
- sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
- --node-ip=192.168.2.122 \
- --advertise-address=192.168.2.122 \
- --tls-san=r2.wg0.wan.buetow.org
-```
-
-### Verify cluster
-
-```sh
-kubectl get nodes
-# Expected: r0, r1, r2 all Ready with role control-plane,etcd,master
-```
-
-## kubeconfig
-
-```sh
-# Copy from any r node to laptop
-scp root@r0.lan.buetow.org:/etc/rancher/k3s/k3s.yaml ~/.kube/config
-# Edit: replace server address with r0.lan.buetow.org
-# (repeat with r1 or r2 if r0 is down)
-```
-
-## k3s config.yaml — expose etcd and controller-manager metrics
-
-For Prometheus to scrape etcd and controller-manager metrics, add to `/etc/rancher/k3s/config.yaml` on each r node:
-
-```sh
-cat >> /etc/rancher/k3s/config.yaml << 'EOF'
-kube-controller-manager-arg:
- - bind-address=0.0.0.0
-etcd-expose-metrics: true
-EOF
-systemctl restart k3s
-```
-
-Verify: `curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader`
-
-## Built-in Components
-
-| Component | Purpose |
-|-----------|---------|
-| CoreDNS | DNS for pods |
-| Traefik | Ingress controller |
-| local-path-provisioner | Local PVC storage |
-| metrics-server | Resource metrics |
-| svclb-traefik | ServiceLB for Traefik |
-
-### Scale Traefik to 2 replicas (faster failover)
-
-```sh
-kubectl -n kube-system scale deployment traefik --replicas=2
-```
-
-## NFS Persistent Volumes
-
-Persistent volumes use `hostPath` pointing to NFS-mounted paths:
-
-```
-/data/nfs/k3svolumes/<app>/
-```
-
-NFS is mounted on all r nodes at `/data/nfs/k3svolumes` via stunnel → CARP VIP → freeBSD NFS (see storage.md).
-
-Example PV:
-
-```yaml
-apiVersion: v1
-kind: PersistentVolume
-metadata:
- name: example-pv
-spec:
- capacity:
- storage: 1Gi
- accessModes:
- - ReadWriteOnce
- persistentVolumeReclaimPolicy: Retain
- hostPath:
- path: /data/nfs/k3svolumes/example-volume
- type: Directory
-```
-
-Create the directory on the NFS share before deploying: `mkdir /data/nfs/k3svolumes/<app>/`
-
-### NFS Mount Health Monitor (on r0, r1, r2)
-
-Each Rocky Linux node runs `/usr/local/bin/check-nfs-mount.sh` via cron (every minute) to detect and fix stale/missing NFS mounts. After a successful remount, the script also **force-deletes stuck pods** on the local node (status Unknown, Pending, or ContainerCreating) so Kubernetes reschedules them with the healthy mount.
-
-```sh
-# Cron entry (on all r-nodes, as root)
-* * * * * /usr/local/bin/check-nfs-mount.sh >> /var/log/nfs-mount-check.log 2>&1
-```
-
-The script:
-1. Checks if `/data/nfs/k3svolumes` is a mountpoint and responsive (2s timeout)
-2. If stale/missing: force-unmounts + remounts NFS
-3. After successful remount: uses `kubectl` to find and delete stuck pods on this node
-4. Uses a lock file (`/var/run/nfs-mount-check.lock`) to prevent concurrent runs
-
-**Important**: If NFS goes down cluster-wide, the root cause is usually on the FreeBSD NFS server side (f0/f1). Check CARP state, stunnel, nfsd, and `vfs.nfsd.nfs_privport` (see storage.md).
-
-## Deployment: GitOps with ArgoCD
-
-Config repository: `https://codeberg.org/snonux/conf` (directory: `f3s/`)
-
-ArgoCD app structure:
-```
-argocd-apps/
- monitoring/ # Prometheus, Grafana, Loki, etc.
- services/ # User-facing services
- infra/ # Infrastructure components
- test/ # Test deployments
-```
-
-**To view pre-ArgoCD state** (how things were in Part 7):
-```sh
-git clone https://codeberg.org/snonux/conf.git
-cd conf && git checkout 15a86f3 # last commit before ArgoCD migration
-cd f3s/
-```
-
-## Node IP Summary
-
-| Node | LAN IP | WireGuard IP | k3s API |
-|------|--------|-------------|---------|
-| r0 | 192.168.1.120 | 192.168.2.120 | r0.wg0.wan.buetow.org:6443 |
-| r1 | 192.168.1.121 | 192.168.2.121 | r1.wg0.wan.buetow.org:6443 |
-| r2 | 192.168.1.122 | 192.168.2.122 | r2.wg0.wan.buetow.org:6443 |
-
-## External Connectivity: OpenBSD relayd
-
-Default traffic flow for public k3s-backed services: `Internet → OpenBSD relayd (TLS, Let's Encrypt) → WireGuard → k3s Traefik :80 → Service`
-
-### relayd.conf on blowfish/fishfinger
-
-```
-table <f3s> {
- 192.168.2.120
- 192.168.2.121
- 192.168.2.122
-}
-
-http protocol "https" {
- tls keypair f3s.foo.zone
- # ... all f3s service TLS keypairs ...
- # Non-f3s hosts explicitly forwarded to localhost:
- match request header "Host" value "foo.zone" forward to <localhost>
- # f3s hosts have NO match rules — use relay-level failover
-}
-
-relay "https4" {
- listen on <PUBLIC_IP> port 443 tls
- protocol "https"
- forward to <f3s> port 80 check tcp # primary
- forward to <localhost> port 8080 # fallback when f3s down
-}
-```
-
-`f3s.buetow.org` is now a special case: it no longer points at the k3s/apache backend and is forwarded by OpenBSD `relayd` to `pi0` (`192.168.2.203`) and `pi1` (`192.168.2.204`) via a dedicated `<f3s_static>` backend table.
-
-When all k3s-backed f3s nodes are down, relayd falls back to `localhost:8080` (OpenBSD httpd serving a "Server turned off" page) for the hosts that still use the shared `<f3s>` backend.
-
-## LAN Ingress: FreeBSD relayd on CARP VIP
-
-For LAN access without going through internet gateways:
-`LAN → CARP VIP (192.168.1.138) → FreeBSD relayd → k3s Traefik :443 → Service`
-
-### FreeBSD relayd config (`/usr/local/etc/relayd.conf`)
-
-```
-table <k3s_nodes> { 192.168.1.120 192.168.1.121 192.168.1.122 }
-
-relay "lan_http" {
- listen on 192.168.1.138 port 80
- forward to <k3s_nodes> port 80 check tcp
-}
-
-relay "lan_https" {
- listen on 192.168.1.138 port 443
- forward to <k3s_nodes> port 443 check tcp
-}
-```
-
-Minimal `/etc/pf.conf` (PF required for relayd):
-
-```
-set skip on lo0
-pass in quick
-pass out quick
-```
-
-```sh
-doas pkg install -y relayd
-doas sysrc pf_enable=YES pflog_enable=YES relayd_enable=YES
-doas service pf start && doas service pflog start && doas service relayd start
-```
-
-Run on both f0 and f1. Only CARP MASTER responds to VIP traffic.
-
-### cert-manager for LAN TLS
-
-LAN services use `*.f3s.lan.foo.zone` with a self-signed CA managed by cert-manager:
-
-```sh
-cd conf/f3s/cert-manager && just install
-# Creates: selfsigned ClusterIssuer, CA cert, wildcard cert (f3s-lan-tls)
-```
-
-Copy secret to service namespace:
-```sh
-kubectl get secret f3s-lan-tls -n cert-manager -o yaml | \
- sed 's/namespace: cert-manager/namespace: services/' | \
- kubectl apply -f -
-```
-
-### LAN ingress pattern
-
-```yaml
-apiVersion: networking.k8s.io/v1
-kind: Ingress
-metadata:
- name: ingress-lan
- namespace: services
- annotations:
- spec.ingressClassName: traefik
- traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
-spec:
- tls:
- - hosts:
- - myservice.f3s.lan.foo.zone
- secretName: f3s-lan-tls
- rules:
- - host: myservice.f3s.lan.foo.zone
- http:
- paths:
- - path: /
- pathType: Prefix
- backend:
- service:
- name: myservice
- port:
- number: 8080
-```
-
-## Etcd Raft Log Corruption Recovery
-
-**Symptom**: k3s crashes on startup with panic:
-```
-tocommit(XXXXXXX) is out of range [lastIndex(YYYYYYY)]
-```
-Caused by `kill -9` on the bhyve process mid-write (corrupts etcd WAL). k3s enters a crash loop and stops after ~2 minutes.
-
-**Recovery procedure** (example: r1 is corrupt):
-
-```sh
-# 1. Stop k3s on the affected node
-ssh root@r1.lan.buetow.org 'systemctl stop k3s'
-
-# 2. Download etcdctl on a healthy node (not bundled with k3s)
-ssh root@r0.lan.buetow.org
-curl -sL https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz \
- | tar -xz -C /tmp etcd-v3.5.17-linux-amd64/etcdctl
-mv /tmp/etcd-v3.5.17-linux-amd64/etcdctl /tmp/etcdctl
-
-# 3. Find and remove the corrupt member from the cluster
-ETCDCTL_API=3 /tmp/etcdctl \
- --endpoints=https://127.0.0.1:2379 \
- --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
- --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
- --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
- member list
-# Find the member ID for r1, then:
-ETCDCTL_API=3 /tmp/etcdctl ... member remove <MEMBER_ID>
-
-# 4. Delete the corrupted etcd data on the affected node
-ssh root@r1.lan.buetow.org 'rm -rf /var/lib/rancher/k3s/server/db/etcd'
-
-# 5. Restart k3s — it rejoins as a fresh member
-ssh root@r1.lan.buetow.org 'systemctl start k3s'
-
-# 6. Verify
-kubectl get nodes # r1 should return to Ready
-```
-
-> **Prevention**: Always use `doas vm stop rocky` and wait for clean shutdown before stopping the bhyve host. Only use `kill -9` on the bhyve process as a last resort — it can corrupt the etcd WAL.
-
-## Useful Commands
-
-```sh
-kubectl get nodes # cluster status
-kubectl get pods --all-namespaces # all running pods
-kubectl get namespaces
-kubectl config set-context --current --namespace=<ns>
-```
+- [Install](k3s-setup/install.md) — bootstrap, kubeconfig, etcd/controller-manager metrics, built-in components, NFS PV pattern, ArgoCD, node IP summary, useful commands
+- [Ingress](k3s-setup/ingress.md) — OpenBSD `relayd` (internet) and FreeBSD `relayd` on CARP VIP (LAN), cert-manager wildcard, ingress pattern
+- [Troubleshooting](k3s-setup/troubleshooting.md) — etcd Raft log corruption recovery; cluster-wide NFS outage pointer
diff --git a/prompts/skills/f3s/references/k3s-setup/ingress.md b/prompts/skills/f3s/references/k3s-setup/ingress.md
new file mode 100644
index 0000000..6beb4ea
--- /dev/null
+++ b/prompts/skills/f3s/references/k3s-setup/ingress.md
@@ -0,0 +1,120 @@
+# k3s Ingress (Internet + LAN)
+
+Two ingress paths into the k3s cluster:
+- **Internet → OpenBSD relayd** (TLS termination on `blowfish`/`fishfinger`) → WireGuard → Traefik
+- **LAN → FreeBSD relayd on CARP VIP** → k3s Traefik
+
+## External Connectivity: OpenBSD relayd
+
+Default traffic flow for public k3s-backed services: `Internet → OpenBSD relayd (TLS, Let's Encrypt) → WireGuard → k3s Traefik :80 → Service`
+
+### relayd.conf on blowfish/fishfinger
+
+```
+table <f3s> {
+ 192.168.2.120
+ 192.168.2.121
+ 192.168.2.122
+}
+
+http protocol "https" {
+ tls keypair f3s.foo.zone
+ # ... all f3s service TLS keypairs ...
+ # Non-f3s hosts explicitly forwarded to localhost:
+ match request header "Host" value "foo.zone" forward to <localhost>
+ # f3s hosts have NO match rules — use relay-level failover
+}
+
+relay "https4" {
+ listen on <PUBLIC_IP> port 443 tls
+ protocol "https"
+ forward to <f3s> port 80 check tcp # primary
+ forward to <localhost> port 8080 # fallback when f3s down
+}
+```
+
+`f3s.buetow.org` is now a special case: it no longer points at the k3s/apache backend and is forwarded by OpenBSD `relayd` to `pi0` (`192.168.2.203`) and `pi1` (`192.168.2.204`) via a dedicated `<f3s_static>` backend table.
+
+When all k3s-backed f3s nodes are down, relayd falls back to `localhost:8080` (OpenBSD httpd serving a "Server turned off" page) for the hosts that still use the shared `<f3s>` backend.
+
+## LAN Ingress: FreeBSD relayd on CARP VIP
+
+For LAN access without going through internet gateways:
+`LAN → CARP VIP (192.168.1.138) → FreeBSD relayd → k3s Traefik :443 → Service`
+
+### FreeBSD relayd config (`/usr/local/etc/relayd.conf`)
+
+```
+table <k3s_nodes> { 192.168.1.120 192.168.1.121 192.168.1.122 }
+
+relay "lan_http" {
+ listen on 192.168.1.138 port 80
+ forward to <k3s_nodes> port 80 check tcp
+}
+
+relay "lan_https" {
+ listen on 192.168.1.138 port 443
+ forward to <k3s_nodes> port 443 check tcp
+}
+```
+
+Minimal `/etc/pf.conf` (PF required for relayd):
+
+```
+set skip on lo0
+pass in quick
+pass out quick
+```
+
+```sh
+doas pkg install -y relayd
+doas sysrc pf_enable=YES pflog_enable=YES relayd_enable=YES
+doas service pf start && doas service pflog start && doas service relayd start
+```
+
+Run on both f0 and f1. Only CARP MASTER responds to VIP traffic.
+
+### cert-manager for LAN TLS
+
+LAN services use `*.f3s.lan.foo.zone` with a self-signed CA managed by cert-manager:
+
+```sh
+cd conf/f3s/cert-manager && just install
+# Creates: selfsigned ClusterIssuer, CA cert, wildcard cert (f3s-lan-tls)
+```
+
+Copy secret to service namespace:
+```sh
+kubectl get secret f3s-lan-tls -n cert-manager -o yaml | \
+ sed 's/namespace: cert-manager/namespace: services/' | \
+ kubectl apply -f -
+```
+
+### LAN ingress pattern
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+ name: ingress-lan
+ namespace: services
+ annotations:
+ spec.ingressClassName: traefik
+ traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
+spec:
+ tls:
+ - hosts:
+ - myservice.f3s.lan.foo.zone
+ secretName: f3s-lan-tls
+ rules:
+ - host: myservice.f3s.lan.foo.zone
+ http:
+ paths:
+ - path: /
+ pathType: Prefix
+ backend:
+ service:
+ name: myservice
+ port:
+ number: 8080
+```
diff --git a/prompts/skills/f3s/references/k3s-setup/install.md b/prompts/skills/f3s/references/k3s-setup/install.md
new file mode 100644
index 0000000..fae9899
--- /dev/null
+++ b/prompts/skills/f3s/references/k3s-setup/install.md
@@ -0,0 +1,169 @@
+# k3s Install
+
+3-node HA k3s cluster running on Rocky Linux VMs (r0, r1, r2). All nodes act as both control-plane and etcd members (no separate worker nodes).
+
+- k3s version: **v1.32.6+k3s1** (as of Part 7)
+- etcd mode: **embedded HA** (`--cluster-init`)
+- All control-plane traffic goes over **WireGuard** (192.168.2.x IPs)
+
+## Prerequisites
+
+- All Rocky Linux VMs (r0, r1, r2) updated and running
+- WireGuard mesh fully configured (see [wireguard.md](../wireguard.md))
+- NVMe disk emulation in place (see [rocky-linux-vms.md](../rocky-linux-vms.md)) — critical for etcd performance
+
+## Installation
+
+### Generate shared token
+
+```sh
+# On Fedora laptop
+pwgen -n 32
+# Copy output to all r nodes:
+echo -n SECRET_TOKEN > ~/.k3s_token # on r0, r1, r2
+```
+
+### Bootstrap first node (r0)
+
+```sh
+[root@r0 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
+ sh -s - server --cluster-init \
+ --node-ip=192.168.2.120 \
+ --advertise-address=192.168.2.120 \
+ --tls-san=r0.wg0.wan.buetow.org
+```
+
+`--node-ip` and `--advertise-address` bind etcd to the WireGuard interface so all control-plane traffic is encrypted.
+
+### Join remaining nodes (r1, r2)
+
+```sh
+[root@r1 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
+ sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
+ --node-ip=192.168.2.121 \
+ --advertise-address=192.168.2.121 \
+ --tls-san=r1.wg0.wan.buetow.org
+
+[root@r2 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
+ sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
+ --node-ip=192.168.2.122 \
+ --advertise-address=192.168.2.122 \
+ --tls-san=r2.wg0.wan.buetow.org
+```
+
+### Verify cluster
+
+```sh
+kubectl get nodes
+# Expected: r0, r1, r2 all Ready with role control-plane,etcd,master
+```
+
+## kubeconfig
+
+```sh
+# Copy from any r node to laptop
+scp root@r0.lan.buetow.org:/etc/rancher/k3s/k3s.yaml ~/.kube/config
+# Edit: replace server address with r0.lan.buetow.org
+# (repeat with r1 or r2 if r0 is down)
+```
+
+## k3s config.yaml — expose etcd and controller-manager metrics
+
+For Prometheus to scrape etcd and controller-manager metrics, add to `/etc/rancher/k3s/config.yaml` on each r node:
+
+```sh
+cat >> /etc/rancher/k3s/config.yaml << 'EOF'
+kube-controller-manager-arg:
+ - bind-address=0.0.0.0
+etcd-expose-metrics: true
+EOF
+systemctl restart k3s
+```
+
+Verify: `curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader`
+
+## Built-in Components
+
+| Component | Purpose |
+|-----------|---------|
+| CoreDNS | DNS for pods |
+| Traefik | Ingress controller |
+| local-path-provisioner | Local PVC storage |
+| metrics-server | Resource metrics |
+| svclb-traefik | ServiceLB for Traefik |
+
+### Scale Traefik to 2 replicas (faster failover)
+
+```sh
+kubectl -n kube-system scale deployment traefik --replicas=2
+```
+
+## NFS Persistent Volumes
+
+Persistent volumes use `hostPath` pointing to NFS-mounted paths:
+
+```
+/data/nfs/k3svolumes/<app>/
+```
+
+NFS is mounted on all r nodes at `/data/nfs/k3svolumes` via stunnel → CARP VIP →
+freeBSD NFS — see [storage/nfs.md](../storage/nfs.md). The
+[`nfs-mount-monitor`](../storage/nfs-mount-monitor.md) watchdog auto-repairs
+hung mounts and force-deletes stuck pods.
+
+Example PV:
+
+```yaml
+apiVersion: v1
+kind: PersistentVolume
+metadata:
+ name: example-pv
+spec:
+ capacity:
+ storage: 1Gi
+ accessModes:
+ - ReadWriteOnce
+ persistentVolumeReclaimPolicy: Retain
+ hostPath:
+ path: /data/nfs/k3svolumes/example-volume
+ type: Directory
+```
+
+Create the directory on the NFS share before deploying: `mkdir /data/nfs/k3svolumes/<app>/`
+
+## Deployment: GitOps with ArgoCD
+
+Config repository: `https://codeberg.org/snonux/conf` (directory: `f3s/`)
+
+ArgoCD app structure:
+```
+argocd-apps/
+ monitoring/ # Prometheus, Grafana, Loki, etc.
+ services/ # User-facing services
+ infra/ # Infrastructure components
+ test/ # Test deployments
+```
+
+**To view pre-ArgoCD state** (how things were in Part 7):
+```sh
+git clone https://codeberg.org/snonux/conf.git
+cd conf && git checkout 15a86f3 # last commit before ArgoCD migration
+cd f3s/
+```
+
+## Node IP Summary
+
+| Node | LAN IP | WireGuard IP | k3s API |
+|------|--------|-------------|---------|
+| r0 | 192.168.1.120 | 192.168.2.120 | r0.wg0.wan.buetow.org:6443 |
+| r1 | 192.168.1.121 | 192.168.2.121 | r1.wg0.wan.buetow.org:6443 |
+| r2 | 192.168.1.122 | 192.168.2.122 | r2.wg0.wan.buetow.org:6443 |
+
+## Useful Commands
+
+```sh
+kubectl get nodes # cluster status
+kubectl get pods --all-namespaces # all running pods
+kubectl get namespaces
+kubectl config set-context --current --namespace=<ns>
+```
diff --git a/prompts/skills/f3s/references/k3s-setup/troubleshooting.md b/prompts/skills/f3s/references/k3s-setup/troubleshooting.md
new file mode 100644
index 0000000..01c6029
--- /dev/null
+++ b/prompts/skills/f3s/references/k3s-setup/troubleshooting.md
@@ -0,0 +1,49 @@
+# k3s Troubleshooting
+
+## Etcd Raft Log Corruption Recovery
+
+**Symptom**: k3s crashes on startup with panic:
+```
+tocommit(XXXXXXX) is out of range [lastIndex(YYYYYYY)]
+```
+Caused by `kill -9` on the bhyve process mid-write (corrupts etcd WAL). k3s enters a crash loop and stops after ~2 minutes.
+
+**Recovery procedure** (example: r1 is corrupt):
+
+```sh
+# 1. Stop k3s on the affected node
+ssh root@r1.lan.buetow.org 'systemctl stop k3s'
+
+# 2. Download etcdctl on a healthy node (not bundled with k3s)
+ssh root@r0.lan.buetow.org
+curl -sL https://github.com/etcd-io/etcd/releases/download/v3.5.17/etcd-v3.5.17-linux-amd64.tar.gz \
+ | tar -xz -C /tmp etcd-v3.5.17-linux-amd64/etcdctl
+mv /tmp/etcd-v3.5.17-linux-amd64/etcdctl /tmp/etcdctl
+
+# 3. Find and remove the corrupt member from the cluster
+ETCDCTL_API=3 /tmp/etcdctl \
+ --endpoints=https://127.0.0.1:2379 \
+ --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+ --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+ --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+ member list
+# Find the member ID for r1, then:
+ETCDCTL_API=3 /tmp/etcdctl ... member remove <MEMBER_ID>
+
+# 4. Delete the corrupted etcd data on the affected node
+ssh root@r1.lan.buetow.org 'rm -rf /var/lib/rancher/k3s/server/db/etcd'
+
+# 5. Restart k3s — it rejoins as a fresh member
+ssh root@r1.lan.buetow.org 'systemctl start k3s'
+
+# 6. Verify
+kubectl get nodes # r1 should return to Ready
+```
+
+> **Prevention**: Always use `doas vm stop rocky` and wait for clean shutdown before stopping the bhyve host. Only use `kill -9` on the bhyve process as a last resort — it can corrupt the etcd WAL.
+
+## Cluster-wide NFS Outages
+
+If NFS goes down cluster-wide, the root cause is usually on the FreeBSD NFS
+server side (f0/f1). Check CARP state, stunnel, nfsd, and
+`vfs.nfsd.nfs_privport` — see [storage/troubleshooting.md](../storage/troubleshooting.md).
diff --git a/prompts/skills/f3s/references/observability.md b/prompts/skills/f3s/references/observability.md
index 4407b1b..b6d7c35 100644
--- a/prompts/skills/f3s/references/observability.md
+++ b/prompts/skills/f3s/references/observability.md
@@ -1,7 +1,5 @@
# Observability Stack
-## Overview
-
Observability stack deployed into the `monitoring` namespace of the k3s cluster.
**Current state (as of 2026-05-16)**: Prometheus + Alloy only. Grafana, Loki, and Tempo are **disabled** — their ArgoCD manifests are renamed to `.disabled` and their pods do not run.
@@ -22,230 +20,10 @@ Observability stack deployed into the `monitoring` namespace of the k3s cluster.
| **Loki** | Log aggregation (single-binary mode) | **Disabled** |
| **Tempo** | Distributed tracing backend | **Disabled** |
-## Deployment
-
-All components deployed via **ArgoCD** (GitOps). Manifests:
-```
-https://codeberg.org/snonux/conf/src/branch/master/f3s
-argocd-apps/monitoring/
-```
-
-Deployment tool: `just` (Justfile in each component directory).
-
-### Namespaces
-
-```sh
-kubectl create namespace monitoring
-```
-
-### Disabled component manifests
-
-These files exist in the repo but are renamed `.disabled` so ArgoCD ignores them:
-```
-f3s/argocd-apps/monitoring/loki.yaml.disabled
-f3s/argocd-apps/monitoring/tempo.yaml.disabled
-f3s/argocd-apps/monitoring/grafana-ingress.yaml.disabled
-```
-
-To re-enable, rename back to `.yaml` and ensure Grafana is using a non-NFS PVC (local-path).
-
-## Installing Prometheus
-
-Uses `kube-prometheus-stack` Helm chart with **Grafana subchart disabled** (`grafana.enabled: false`):
-
-```sh
-helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-helm repo update
-
-# Create NFS storage directory first
-mkdir -p /data/nfs/k3svolumes/prometheus/data
-
-cd conf/f3s/prometheus && just install
-```
-
-### Enable etcd and controller-manager scraping
-
-Add to `persistence-values.yaml`:
-
-```yaml
-kubeEtcd:
- enabled: true
- endpoints: [192.168.2.120, 192.168.2.121, 192.168.2.122]
- service:
- port: 2381
- targetPort: 2381
-
-kubeControllerManager:
- enabled: true
- endpoints: [192.168.2.120, 192.168.2.121, 192.168.2.122]
- service:
- port: 10257
- targetPort: 10257
- serviceMonitor:
- enabled: true
- https: true
- insecureSkipVerify: true
-```
-
-Also requires k3s config changes on each r node — see k3s-setup.md.
-
-### Grafana credentials
-
-Default: `admin` / `prom-operator` — change immediately after first login.
-
-Grafana accessible at `grafana.f3s.foo.zone` via Traefik ingress.
-
-## Installing Alloy (minimal)
-
-Alloy is installed as part of the Loki Helm chart but runs with a minimal config (no log shipping):
-
-```sh
-cd conf/f3s/loki && just install
-# installs alloy only (loki itself is disabled via loki.yaml.disabled)
-```
-
-### Current Alloy config (`alloy-values.yaml`)
-
-Minimal — only emits Alloy's own operational logs:
-
-```
-logging {
- level = "info"
-}
-```
-
-To re-enable log shipping (once Loki is running again), restore the full `discovery.kubernetes` + `loki.source.kubernetes` + `loki.write` pipeline.
-
-## Installing Loki (disabled)
-
-```sh
-mkdir -p /data/nfs/k3svolumes/loki/data
-# Rename loki.yaml.disabled → loki.yaml first, then:
-cd conf/f3s/loki && just install
-```
+## Sub-references
-Loki URL (internal): `http://loki.monitoring.svc.cluster.local:3100`
-
-## Installing Tempo (disabled)
-
-```sh
-mkdir -p /data/nfs/k3svolumes/tempo/data
-# Rename tempo.yaml.disabled → tempo.yaml first, then:
-cd conf/f3s/tempo && just install
-```
-
-## Monitoring FreeBSD Hosts (f0, f1, f2)
-
-### Install node_exporter on FreeBSD
-
-```sh
-# On each FreeBSD host
-doas pkg install -y node_exporter
-doas sysrc node_exporter_enable=YES
-# Bind to WireGuard interface (f0=192.168.2.130, f1=192.168.2.131, f2=192.168.2.132)
-doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
-doas service node_exporter start
-```
-
-### Prometheus scrape config for FreeBSD
-
-`additional-scrape-configs.yaml`:
-
-```yaml
-- job_name: 'node-exporter'
- static_configs:
- - targets:
- - '192.168.2.130:9100' # f0 via WireGuard
- - '192.168.2.131:9100' # f1 via WireGuard
- - '192.168.2.132:9100' # f2 via WireGuard
- labels:
- os: freebsd
-```
-
-```sh
-kubectl create secret generic additional-scrape-configs \
- --from-file=additional-scrape-configs.yaml -n monitoring
-```
-
-Add to `persistence-values.yaml`:
-
-```yaml
-prometheus:
- prometheusSpec:
- additionalScrapeConfigsSecret:
- enabled: true
- name: additional-scrape-configs
- key: additional-scrape-configs.yaml
-```
-
-### FreeBSD memory compatibility rules
-
-FreeBSD uses different metric names than Linux. PrometheusRule to create Linux-compatible metrics:
-
-```yaml
-apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
- name: freebsd-memory-rules
- namespace: monitoring
- labels:
- release: prometheus
-spec:
- groups:
- - name: freebsd-memory
- rules:
- - record: node_memory_MemTotal_bytes
- expr: node_memory_size_bytes{os="freebsd"}
- - record: node_memory_MemAvailable_bytes
- expr: |
- node_memory_free_bytes{os="freebsd"}
- + node_memory_inactive_bytes{os="freebsd"}
- + node_memory_cache_bytes{os="freebsd"}
- - record: node_memory_MemFree_bytes
- expr: node_memory_free_bytes{os="freebsd"}
- - record: node_memory_Buffers_bytes
- expr: node_memory_buffer_bytes{os="freebsd"}
- - record: node_memory_Cached_bytes
- expr: node_memory_cache_bytes{os="freebsd"}
-```
-
-Note: Disk I/O metrics (`node_disk_*`) are not available on FreeBSD — use ZFS-specific dashboards instead.
-
-### ZFS monitoring recording rules
-
-```yaml
-apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
- name: freebsd-zfs-rules
- namespace: monitoring
- labels:
- release: prometheus
-spec:
- groups:
- - name: freebsd-zfs-arc
- interval: 30s
- rules:
- - record: node_zfs_arc_hit_rate_percent
- expr: |
- 100 * (
- rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
- (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
- rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
- )
- - record: node_zfs_arc_memory_usage_percent
- expr: |
- 100 * (
- node_zfs_arcstats_size_bytes{os="freebsd"} /
- node_zfs_arcstats_c_max_bytes{os="freebsd"}
- )
-```
-
-## Alerting
-
-Prometheus → Alertmanager → **Gogios** (custom lightweight monitoring tool running on OpenBSD gateway `blowfish`/`fishfinger`).
-
-Gogios scrapes Alertmanager at regular intervals and sends email notifications. Reaches Alertmanager via WireGuard mesh.
+- [Stack](observability/stack.md) — install Prometheus / Alloy / Loki / Tempo, alerting → Gogios, Prometheus TSDB recovery, LogQL queries, NFS storage paths
+- [FreeBSD Monitoring](observability/freebsd.md) — `node_exporter` on f-hosts, scrape config, memory & ZFS recording rules
## Monitoring Scope
@@ -255,39 +33,3 @@ Gogios scrapes Alertmanager at regular intervals and sends email notifications.
- Application performance metrics
- ~~Log aggregation from all pods (via Alloy → Loki)~~ — disabled
- ~~Distributed traces (via Alloy → Tempo)~~ — disabled
-
-## Prometheus TSDB Recovery
-
-If Prometheus fails to start with `opening storage failed: get segment range: segments are not sequential`, WAL segments are corrupt (can happen after a cluster blip leaving zero-byte WAL files).
-
-Full TSDB wipe (loses all historical data — confirm first):
-
-```sh
-# On the NFS server (f0 or CARP MASTER)
-rm -rf /data/nfs/k3svolumes/prometheus/data/prometheus-db/
-mkdir -p /data/nfs/k3svolumes/prometheus/data/prometheus-db
-chown 1000:1000 /data/nfs/k3svolumes/prometheus/data/prometheus-db
-# Prometheus will recreate the TSDB on next start
-```
-
-## Useful LogQL Queries
-
-```
-# All logs from services namespace
-{namespace="services"}
-
-# Filter by log content
-{namespace="services"} |= "error"
-
-# Parse JSON logs
-{namespace="services"} | json | level="error"
-```
-
-## NFS Storage Paths for Observability
-
-```
-/data/nfs/k3svolumes/prometheus/data # active
-/data/nfs/k3svolumes/grafana/data # exists but unused (grafana disabled)
-/data/nfs/k3svolumes/loki/data # exists but unused (loki disabled)
-/data/nfs/k3svolumes/tempo/data # exists but unused (tempo disabled)
-```
diff --git a/prompts/skills/f3s/references/observability/freebsd.md b/prompts/skills/f3s/references/observability/freebsd.md
new file mode 100644
index 0000000..3005355
--- /dev/null
+++ b/prompts/skills/f3s/references/observability/freebsd.md
@@ -0,0 +1,111 @@
+# Monitoring FreeBSD Hosts (f0, f1, f2)
+
+Scraping the FreeBSD bhyve hosts from in-cluster Prometheus. Includes the
+node_exporter setup, the additional scrape config, and the recording rules
+that bridge FreeBSD metric names into the Linux-style names Grafana
+dashboards expect.
+
+## Install node_exporter on FreeBSD
+
+```sh
+# On each FreeBSD host
+doas pkg install -y node_exporter
+doas sysrc node_exporter_enable=YES
+# Bind to WireGuard interface (f0=192.168.2.130, f1=192.168.2.131, f2=192.168.2.132)
+doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
+doas service node_exporter start
+```
+
+## Prometheus scrape config for FreeBSD
+
+`additional-scrape-configs.yaml`:
+
+```yaml
+- job_name: 'node-exporter'
+ static_configs:
+ - targets:
+ - '192.168.2.130:9100' # f0 via WireGuard
+ - '192.168.2.131:9100' # f1 via WireGuard
+ - '192.168.2.132:9100' # f2 via WireGuard
+ labels:
+ os: freebsd
+```
+
+```sh
+kubectl create secret generic additional-scrape-configs \
+ --from-file=additional-scrape-configs.yaml -n monitoring
+```
+
+Add to `persistence-values.yaml`:
+
+```yaml
+prometheus:
+ prometheusSpec:
+ additionalScrapeConfigsSecret:
+ enabled: true
+ name: additional-scrape-configs
+ key: additional-scrape-configs.yaml
+```
+
+## FreeBSD memory compatibility rules
+
+FreeBSD uses different metric names than Linux. PrometheusRule to create Linux-compatible metrics:
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: freebsd-memory-rules
+ namespace: monitoring
+ labels:
+ release: prometheus
+spec:
+ groups:
+ - name: freebsd-memory
+ rules:
+ - record: node_memory_MemTotal_bytes
+ expr: node_memory_size_bytes{os="freebsd"}
+ - record: node_memory_MemAvailable_bytes
+ expr: |
+ node_memory_free_bytes{os="freebsd"}
+ + node_memory_inactive_bytes{os="freebsd"}
+ + node_memory_cache_bytes{os="freebsd"}
+ - record: node_memory_MemFree_bytes
+ expr: node_memory_free_bytes{os="freebsd"}
+ - record: node_memory_Buffers_bytes
+ expr: node_memory_buffer_bytes{os="freebsd"}
+ - record: node_memory_Cached_bytes
+ expr: node_memory_cache_bytes{os="freebsd"}
+```
+
+Note: Disk I/O metrics (`node_disk_*`) are not available on FreeBSD — use ZFS-specific dashboards instead.
+
+## ZFS monitoring recording rules
+
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: freebsd-zfs-rules
+ namespace: monitoring
+ labels:
+ release: prometheus
+spec:
+ groups:
+ - name: freebsd-zfs-arc
+ interval: 30s
+ rules:
+ - record: node_zfs_arc_hit_rate_percent
+ expr: |
+ 100 * (
+ rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
+ (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
+ rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
+ )
+ - record: node_zfs_arc_memory_usage_percent
+ expr: |
+ 100 * (
+ node_zfs_arcstats_size_bytes{os="freebsd"} /
+ node_zfs_arcstats_c_max_bytes{os="freebsd"}
+ )
+```
diff --git a/prompts/skills/f3s/references/observability/stack.md b/prompts/skills/f3s/references/observability/stack.md
new file mode 100644
index 0000000..e752419
--- /dev/null
+++ b/prompts/skills/f3s/references/observability/stack.md
@@ -0,0 +1,158 @@
+# Observability Stack (k3s side)
+
+Install and operation of the in-cluster observability components: Prometheus,
+Alloy, Loki, Tempo, Alertmanager → Gogios.
+
+## Deployment
+
+All components deployed via **ArgoCD** (GitOps). Manifests:
+```
+https://codeberg.org/snonux/conf/src/branch/master/f3s
+argocd-apps/monitoring/
+```
+
+Deployment tool: `just` (Justfile in each component directory).
+
+### Namespaces
+
+```sh
+kubectl create namespace monitoring
+```
+
+### Disabled component manifests
+
+These files exist in the repo but are renamed `.disabled` so ArgoCD ignores them:
+```
+f3s/argocd-apps/monitoring/loki.yaml.disabled
+f3s/argocd-apps/monitoring/tempo.yaml.disabled
+f3s/argocd-apps/monitoring/grafana-ingress.yaml.disabled
+```
+
+To re-enable, rename back to `.yaml` and ensure Grafana is using a non-NFS PVC (local-path).
+
+## Installing Prometheus
+
+Uses `kube-prometheus-stack` Helm chart with **Grafana subchart disabled** (`grafana.enabled: false`):
+
+```sh
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm repo update
+
+# Create NFS storage directory first
+mkdir -p /data/nfs/k3svolumes/prometheus/data
+
+cd conf/f3s/prometheus && just install
+```
+
+### Enable etcd and controller-manager scraping
+
+Add to `persistence-values.yaml`:
+
+```yaml
+kubeEtcd:
+ enabled: true
+ endpoints: [192.168.2.120, 192.168.2.121, 192.168.2.122]
+ service:
+ port: 2381
+ targetPort: 2381
+
+kubeControllerManager:
+ enabled: true
+ endpoints: [192.168.2.120, 192.168.2.121, 192.168.2.122]
+ service:
+ port: 10257
+ targetPort: 10257
+ serviceMonitor:
+ enabled: true
+ https: true
+ insecureSkipVerify: true
+```
+
+Also requires k3s config changes on each r node — see [k3s-setup/install.md](../k3s-setup/install.md).
+
+### Grafana credentials
+
+Default: `admin` / `prom-operator` — change immediately after first login.
+
+Grafana accessible at `grafana.f3s.foo.zone` via Traefik ingress.
+
+## Installing Alloy (minimal)
+
+Alloy is installed as part of the Loki Helm chart but runs with a minimal config (no log shipping):
+
+```sh
+cd conf/f3s/loki && just install
+# installs alloy only (loki itself is disabled via loki.yaml.disabled)
+```
+
+### Current Alloy config (`alloy-values.yaml`)
+
+Minimal — only emits Alloy's own operational logs:
+
+```
+logging {
+ level = "info"
+}
+```
+
+To re-enable log shipping (once Loki is running again), restore the full `discovery.kubernetes` + `loki.source.kubernetes` + `loki.write` pipeline.
+
+## Installing Loki (disabled)
+
+```sh
+mkdir -p /data/nfs/k3svolumes/loki/data
+# Rename loki.yaml.disabled → loki.yaml first, then:
+cd conf/f3s/loki && just install
+```
+
+Loki URL (internal): `http://loki.monitoring.svc.cluster.local:3100`
+
+## Installing Tempo (disabled)
+
+```sh
+mkdir -p /data/nfs/k3svolumes/tempo/data
+# Rename tempo.yaml.disabled → tempo.yaml first, then:
+cd conf/f3s/tempo && just install
+```
+
+## Alerting
+
+Prometheus → Alertmanager → **Gogios** (custom lightweight monitoring tool running on OpenBSD gateway `blowfish`/`fishfinger`).
+
+Gogios scrapes Alertmanager at regular intervals and sends email notifications. Reaches Alertmanager via WireGuard mesh.
+
+## Prometheus TSDB Recovery
+
+If Prometheus fails to start with `opening storage failed: get segment range: segments are not sequential`, WAL segments are corrupt (can happen after a cluster blip leaving zero-byte WAL files).
+
+Full TSDB wipe (loses all historical data — confirm first):
+
+```sh
+# On the NFS server (f0 or CARP MASTER)
+rm -rf /data/nfs/k3svolumes/prometheus/data/prometheus-db/
+mkdir -p /data/nfs/k3svolumes/prometheus/data/prometheus-db
+chown 1000:1000 /data/nfs/k3svolumes/prometheus/data/prometheus-db
+# Prometheus will recreate the TSDB on next start
+```
+
+## Useful LogQL Queries
+
+```
+# All logs from services namespace
+{namespace="services"}
+
+# Filter by log content
+{namespace="services"} |= "error"
+
+# Parse JSON logs
+{namespace="services"} | json | level="error"
+```
+
+## NFS Storage Paths
+
+```
+/data/nfs/k3svolumes/prometheus/data # active
+/data/nfs/k3svolumes/grafana/data # exists but unused (grafana disabled)
+/data/nfs/k3svolumes/loki/data # exists but unused (loki disabled)
+/data/nfs/k3svolumes/tempo/data # exists but unused (tempo disabled)
+```
diff --git a/prompts/skills/f3s/references/storage.md b/prompts/skills/f3s/references/storage.md
index 0658dd0..a715358 100644
--- a/prompts/skills/f3s/references/storage.md
+++ b/prompts/skills/f3s/references/storage.md
@@ -1,932 +1,18 @@
# Storage
-## Architecture Overview
-
Persistent storage for k3s is served via **NFS over stunnel** from the FreeBSD hosts, backed by **ZFS** (`zdata` pool) with **CARP** for high availability and **zrepl** for continuous replication.
-Note: Original plan was HAST, replaced by **zrepl** (ZFS send/receive) — more reliable, avoids ZFS corruption during failover that HAST caused.
-
-## Physical Disks
-
-- **f0**: 512GB M.2 (OS/zroot) + Samsung SSD 870 EVO 1TB (zdata)
-- **f1**: 512GB M.2 (OS/zroot) + Crucial CT1000BX500SSD1 1TB (zdata)
-- **f2**: No second drive (no zdata pool)
-- **f3**: 512GB M.2 (OS/zroot); no zdata pool yet (planned)
-
-## ZFS: zdata Pool Setup
-
-On f0 and f1, create the zdata pool on the second SSD:
-
-```sh
-# Pool setup (f0 and f1 only)
-doas zpool create zdata ada1 # ada1 = second SSD
-```
-
-## SSD TRIM Configuration
-
-All f-hosts run on consumer SATA SSDs without power-loss protection
-(SanDisk Ultra 3D, Samsung 870 EVO, Crucial BX500). Without TRIM, the
-SSD controller can't reclaim freed pages and write amplification
-explodes — observed on f0 (2026-05-16) as txg sync times of 5-14
-seconds (should be <100 ms) and per-op latency of 374 ms (should be
-<5 ms on an SSD). The encrypted dataset makes this worse because
-AES-256-GCM ciphertext is full-entropy and the controller can't
-opportunistically reclaim space.
-
-Enable `autotrim` on every pool on every f-host (`zdata` and `zroot`
-on f0/f1/f2; `zroot` only on f3):
-
-```sh
-# Persisted in pool metadata — survives reboot
-for pool in $(zpool list -H -o name); do
- doas zpool set autotrim=on "$pool"
-done
-```
-
-After turning autotrim on for the first time (or on a pool that has
-never been trimmed), run a one-shot pool-wide TRIM to catch up on all
-the historical free space the controller has been managing blind:
-
-```sh
-for pool in $(zpool list -H -o name); do
- doas zpool trim "$pool" # async; monitor with `zpool status -t`
-done
-```
-
-Caveat: `zpool trim` runs at low ZFS priority. On a heavily-loaded
-disk (active rsync, frequent zrepl snapshots, bhyve VM under load) it
-can stall at 0% indefinitely because regular I/O never drains.
-Quietening the workload first (kill rsync, raise zrepl `interval` from
-`1m` to `15m`+, pause/cancel scrub) lets TRIM make progress; once
-caught up, autotrim keeps it steady-state in the background.
-
-Verify across the fleet:
-
-```sh
-for h in f0 f1 f2 f3; do
- printf '%-3s ' "$h"
- ssh "$h" "sh -c 'for p in \$(zpool list -H -o name); do \
- printf \"%s=%s \" \"\$p\" \"\$(zpool get -H -o value autotrim \$p)\"; \
- done; echo'"
-done
-```
-
-## ZFS Encryption Keys (USB Key Storage)
-
-Encryption keys are stored on USB flash drives (UFS-formatted, mounted at `/keys`).
-All four hosts (f0/f1/f2/f3) have USB keys at `/dev/da0` mounted at `/keys`, each holding
-all 8 key files as cross-host backups.
-
-```sh
-# Format and mount USB key (on each node)
-doas newfs /dev/da0
-echo '/dev/da0 /keys ufs rw 0 2' | doas tee -a /etc/fstab
-doas mkdir /keys
-doas mount /keys
-
-# Generate keys (on f0, then copy to f1, f2, f3)
-doas openssl rand -out /keys/f0.lan.buetow.org:bhyve.key 32
-doas openssl rand -out /keys/f1.lan.buetow.org:bhyve.key 32
-doas openssl rand -out /keys/f2.lan.buetow.org:bhyve.key 32
-doas openssl rand -out /keys/f3.lan.buetow.org:bhyve.key 32
-doas openssl rand -out /keys/f0.lan.buetow.org:zdata.key 32
-doas openssl rand -out /keys/f1.lan.buetow.org:zdata.key 32
-doas openssl rand -out /keys/f2.lan.buetow.org:zdata.key 32
-doas openssl rand -out /keys/f3.lan.buetow.org:zdata.key 32
-doas chown root /keys/* && doas chmod 400 /keys/*
-# Copy to f1, f2, f3 via tarball
-```
-
-## ZFS Encryption Setup
-
-```sh
-# On f0 - create encrypted zdata dataset
-doas zfs create -o encryption=on -o keyformat=raw \
- -o keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/enc
-
-# Create the NFS data dataset (replicated to f1)
-doas zfs create zdata/enc/nfsdata
-doas zfs set mountpoint=/data/nfs zdata/enc/nfsdata
-doas mkdir -p /data/nfs/k3svolumes
-
-# Encrypt Bhyve VM dataset (zroot/bhyve)
-# Stop VMs first, rename old, create new encrypted, zfs send snapshot, then destroy old
-doas vm stop rocky
-doas zfs rename zroot/bhyve zroot/bhyve_old
-doas zfs set mountpoint=/mnt zroot/bhyve_old
-doas zfs snapshot zroot/bhyve_old/rocky@hamburger
-doas zfs create -o encryption=on -o keyformat=raw \
- -o keylocation=file:///keys/f0.lan.buetow.org:bhyve.key zroot/bhyve
-doas zfs send zroot/bhyve_old/rocky@hamburger | doas zfs recv zroot/bhyve/rocky
-# Copy vm-bhyve metadata: .config, .img, .templates, .iso
-doas zfs destroy -R zroot/bhyve_old
-```
-
-### Auto-load encryption keys on boot
-
-```sh
-# On f0
-doas sysrc zfskeys_enable=YES
-doas sysrc zfskeys_datasets="zdata/enc zdata/enc/nfsdata zroot/bhyve"
-
-# On f1
-doas sysrc zfskeys_enable=YES
-doas sysrc zfskeys_datasets="zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata"
-
-# On f3 (bhyve VMs only, no zdata pool yet)
-doas sysrc zfskeys_enable=YES
-doas sysrc zfskeys_datasets="zroot/bhyve"
-doas zfs set keylocation=file:///keys/f0.lan.buetow.org:zdata.key \
- zdata/sink/f0/zdata/enc/nfsdata
-```
-
-## zrepl: Continuous ZFS Replication (f0 → f1)
-
-Install on both f0 and f1:
-```sh
-doas pkg install -y zrepl
-```
-
-### f0 configuration (`/usr/local/etc/zrepl/zrepl.yml`)
-
-```yaml
-global:
- logging:
- - type: stdout
- level: info
- format: human
-
-jobs:
- - name: f0_to_f1_nfsdata
- type: push
- connect:
- type: tcp
- address: "192.168.2.131:8888" # f1 WireGuard IP
- filesystems:
- "zdata/enc/nfsdata": true
- send:
- encrypted: true
- snapshotting:
- type: periodic
- prefix: zrepl_
- interval: 1m # every minute
- pruning:
- keep_sender:
- - type: last_n
- count: 10
- - type: grid
- grid: 24x1h | 14x1d | 6x30d
- regex: "^zrepl_.*"
- keep_receiver:
- - type: last_n
- count: 10
- - type: grid
- grid: 24x1h | 14x1d | 6x30d
- regex: "^zrepl_.*"
-
- # Note: f0_to_f1_freebsd job removed — the FreeBSD VM was migrated to f3.
- # It is now replicated from f3 → f2 (see f3 zrepl config below).
-```
-
-### f3 configuration (push: freebsd VM → f2)
-
-```yaml
-global:
- logging:
- - type: stdout
- level: info
- format: human
-
-jobs:
- - name: f3_to_f2_freebsd
- type: push
- connect:
- type: tcp
- address: "192.168.2.132:8888" # f2 WireGuard IP
- filesystems:
- "zroot/bhyve/freebsd": true # development FreeBSD VM
- send:
- encrypted: true
- snapshotting:
- type: periodic
- prefix: zrepl_
- interval: 10m
- pruning:
- keep_sender:
- - type: last_n
- count: 10
- - type: grid
- grid: 24x1h | 14x1d
- regex: "^zrepl_.*"
- keep_receiver:
- - type: last_n
- count: 10
- - type: grid
- grid: 24x1h | 14x1d
- regex: "^zrepl_.*"
-```
-
-### f2 configuration (sink for f3's freebsd VM)
-
-f2 has no second drive so the sink lives in `zroot/sink`:
-
-```sh
-doas zfs create zroot/sink
-```
-
-`/usr/local/etc/zrepl/zrepl.yml`:
-
-```yaml
-global:
- logging:
- - type: stdout
- level: info
- format: human
-
-jobs:
- - name: sink
- type: sink
- serve:
- type: tcp
- listen: "192.168.2.132:8888" # f2 WireGuard IP
- clients:
- "192.168.2.133": "f3"
- recv:
- placeholder:
- encryption: inherit
- root_fs: "zroot/sink"
-```
-
-Replicated path: `zroot/bhyve/freebsd` → `zroot/sink/f3/zroot/bhyve/freebsd`
-
-Important: do not let `zfs-periodic` snapshot zrepl-managed sender or receiver
-datasets. Snapshot creation should be owned by zrepl. On f2,
-`/etc/periodic.conf` disables `zfs-periodic` snapshot creation:
-
-```sh
-daily_zfs_snapshot_enable="NO"
-weekly_zfs_snapshot_enable="NO"
-monthly_zfs_snapshot_enable="NO"
-```
-
-The local zrepl `snap` job on f2 also explicitly excludes `zroot/sink<`.
-
-### f1 configuration (sink)
-
-```sh
-doas zfs create zdata/sink # receive dataset
-```
-
-`/usr/local/etc/zrepl/zrepl.yml`:
-
-```yaml
-global:
- logging:
- - type: stdout
- level: info
- format: human
-
-jobs:
- - name: sink
- type: sink
- serve:
- type: tcp
- listen: "192.168.2.131:8888"
- clients:
- "192.168.2.130": "f0"
- recv:
- placeholder:
- encryption: inherit
- root_fs: "zdata/sink"
-```
-
-### Enable and start
-
-```sh
-doas sysrc zrepl_enable=YES
-doas service zrepl start
-doas zrepl status # monitor replication
-```
-
-Replicated paths: `zdata/enc/nfsdata` → `zdata/sink/f0/zdata/enc/nfsdata`
-
-### Mount replica on f1 (read-only standby)
-
-```sh
-doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
- zdata/sink/f0/zdata/enc/nfsdata
-doas mkdir -p /data/nfs
-doas zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
-doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
-doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata # prevent replication breakage
-```
-
-### Failover design: intentionally read-only replica
-
-The standby replica is read-only by design. Manual failover (not automatic) to prevent split-brain. To fix broken replication after accidental writes: `doas zfs rollback <snapshot>`.
-
-### zrepl troubleshooting
-
-```sh
-# Signal manual replication
-doas zrepl signal wakeup f0_to_f1_nfsdata
-
-# Fix "no common snapshot" — destroy and re-replicate
-doas zfs destroy -r zdata/sink/f0/zdata/enc/nfsdata
-
-# Test network connectivity
-nc -zv 192.168.2.131 8888
-
-# Monitor progress
-doas zrepl status --mode raw | grep BytesReplicated
-```
-
-**zrepl DL-state on f1 after mid-replication f0 reboot**: if f0 reboots while zrepl is
-actively replicating, f1's `[zfskern]` thread can enter **DL state** (disk + locked).
-Symptoms: `zpool list`, `zfs list`, `ls /data/nfs/` all hang indefinitely; `zfs set
-readonly=off` may return immediately (the kernel path differs). To recover on f1:
-
-```sh
-# Stop zrepl to release the replication lock
-doas service zrepl stop
-
-# Wait ~30–60 s for the kernel state to drain; then verify
-doas zpool list
-doas zfs list
-doas service zrepl start
-```
-
-If ZFS commands still hang after stopping zrepl, a reboot of f1 is required.
-The NFS data is still available on f0 so k3s is unaffected during f1 recovery.
-
-## CARP: High-Availability VIP
-
-CARP (Common Address Redundancy Protocol) provides **VIP 192.168.1.138** that floats between f0 (primary) and f1 (standby).
-
-### /etc/rc.conf configuration
-
-```sh
-# On f0 (default advskew=0, wins elections)
-ifconfig_re0_alias0="inet vhid 1 pass YOURPASSWORD alias 192.168.1.138/32"
-
-# On f1 (advskew=100, loses elections to f0)
-ifconfig_re0_alias0="inet vhid 1 advskew 100 pass YOURPASSWORD alias 192.168.1.138/32"
-```
-
-### Load CARP module
-
-```sh
-echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
-# or immediately: doas kldload carp
-```
-
-### /etc/hosts for CARP VIP
-
-```
-192.168.1.138 f3s-storage-ha f3s-storage-ha.lan f3s-storage-ha.lan.buetow.org
-192.168.2.138 f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
-```
-
-### devd: CARP state change hook
-
-Add to `/etc/devd.conf` on f0 and f1:
-
-```
-notify 0 {
- match "system" "CARP";
- match "subsystem" "[0-9]+@[0-9a-z.]+";
- match "type" "(MASTER|BACKUP)";
- action "/usr/local/bin/carpcontrol.sh $subsystem $type";
-};
-```
-
-```sh
-doas service devd restart
-```
-
-### carpcontrol.sh — start/stop NFS+stunnel on failover
-
-```sh
-#!/bin/sh
-HOSTNAME=`hostname`
-
-if [ ! -f /data/nfs/nfs.DO_NOT_REMOVE ]; then
- logger '/data/nfs not mounted, mounting it now!'
- if [ "$HOSTNAME" = 'f0.lan.buetow.org' ]; then
- zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
- zfs set mountpoint=/data/nfs zdata/enc/nfsdata
- else
- zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
- zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
- zfs mount zdata/sink/f0/zdata/enc/nfsdata
- zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
- fi
- service nfsd stop 2>&1
- service mountd stop 2>&1
-fi
-
-case "$2" in
- MASTER)
- logger "CARP state changed to MASTER, starting services"
- service rpcbind start >/dev/null 2>&1
- service mountd start >/dev/null 2>&1
- service nfsd start >/dev/null 2>&1
- service nfsuserd start >/dev/null 2>&1
- service stunnel restart >/dev/null 2>&1
- ;;
- BACKUP)
- logger "CARP state changed to BACKUP, stopping services"
- service stunnel stop >/dev/null 2>&1
- service nfsd stop >/dev/null 2>&1
- service mountd stop >/dev/null 2>&1
- service nfsuserd stop >/dev/null 2>&1
- ;;
-esac
-```
-
-Install: `doas chmod +x /usr/local/bin/carpcontrol.sh` (copy to f1 too)
-
-### CARP management script (`/usr/local/bin/carp`)
-
-```sh
-doas carp # show current state
-doas carp master # force MASTER (e.g. reclaim after maintenance)
-doas carp backup # force BACKUP (trigger failover to f1)
-doas carp auto-failback disable # prevent auto-failback (for maintenance)
-doas carp auto-failback enable # re-enable auto-failback
-```
-
-### CARP failover limitation when ZFS is suspended
-
-If f0's ZFS pool is SUSPENDED but f0's OS is still running, f0 remains CARP MASTER
-(it keeps sending CARP advertisements). Attempts to manually demote f0 via:
-
-```sh
-doas carp backup # may return exit=0 but has no effect
-doas ifconfig re0 vhid 1 state backup # may return exit=1 silently
-doas ifconfig re0 vhid 1 advskew 254 # may return exit=1 silently
-```
-
-…can all silently fail because the kernel has too many stuck IO threads blocking
-the ifconfig ioctl path. The CARP VIP will **not** float to f1 in this case.
-**Only a hard power cycle of f0 reliably triggers CARP failover.**
-
-### Auto-failback from f1 to f0
-
-Script `/usr/local/bin/carp-auto-failback.sh` runs every minute via cron on f0. Checks: currently BACKUP? `/data/nfs` mounted? Marker file exists? Failback not blocked? If all conditions met, promotes f0 to MASTER.
-
-```sh
-echo "* * * * * /usr/local/bin/carp-auto-failback.sh" | doas crontab -
-doas touch /data/nfs/nfs.DO_NOT_REMOVE # marker file required for auto-failback
-```
-
-Logs to `/var/log/carp-auto-failback.log`.
-
-## NFS Server Configuration (f0 and f1)
-
-```sh
-doas sysrc nfs_server_enable=YES
-doas sysrc nfsv4_server_enable=YES
-doas sysrc nfsuserd_enable=YES
-doas sysrc nfsuserd_flags="-domain lan.buetow.org"
-doas sysrc mountd_enable=YES
-doas sysrc rpcbind_enable=YES
-doas sysrc nfs_reserved_port_only=NO # Required for NFS over stunnel (unprivileged ports)
-
-doas mkdir -p /data/nfs/k3svolumes
-doas chmod 755 /data/nfs/k3svolumes
-```
-
-> **FreeBSD 15.0 note**: FreeBSD 15.0 sets `nfs_reserved_port_only=YES` by default in `/etc/defaults/rc.conf`. The nfsd rc script (`/etc/rc.d/nfsd`) checks this variable and explicitly runs `sysctl vfs.nfsd.nfs_privport=1` at startup, overriding any value set in `/etc/sysctl.conf` or `/boot/loader.conf`. This blocks NFS clients connecting via stunnel (unprivileged ports). Fix on **each f-host**:
-> ```sh
-> # The ONLY correct fix — setting sysctl.conf does NOT work
-> doas sysrc nfs_reserved_port_only=NO
-> # Apply immediately without reboot
-> doas sysctl vfs.nfsd.nfs_privport=0
-> # Remount on each r-host
-> mount -a
-> ```
-
-`/etc/exports` (stunnel clients appear as localhost):
-
-```
-V4: /data/nfs -sec=sys
-/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
-```
-
-Start services:
-
-```sh
-doas service rpcbind start
-doas service mountd start
-doas service nfsd start
-doas service nfsuserd start
-```
-
-## stunnel: Encrypted NFS over TLS
-
-stunnel binds to the CARP VIP (192.168.1.138), so only the CARP MASTER accepts connections. Uses mutual TLS with client certificate authentication.
-
-### Create CA and certificates (on f0)
-
-```sh
-doas mkdir -p /usr/local/etc/stunnel/ca
-cd /usr/local/etc/stunnel/ca
-doas openssl genrsa -out ca-key.pem 4096
-doas openssl req -new -x509 -days 3650 -key ca-key.pem -out ca-cert.pem \
- -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=F3S Stunnel CA'
-
-cd /usr/local/etc/stunnel
-doas openssl genrsa -out server-key.pem 4096
-doas openssl req -new -key server-key.pem -out server.csr \
- -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=f3s-storage-ha.lan'
-doas openssl x509 -req -days 3650 -in server.csr -CA ca/ca-cert.pem \
- -CAkey ca/ca-key.pem -CAcreateserial -out server-cert.pem
-
-# Client certs for r0, r1, r2, earth
-for client in r0 r1 r2 earth; do
- openssl genrsa -out ca/${client}-key.pem 4096
- openssl req -new -key ca/${client}-key.pem -out ca/${client}.csr \
- -subj "/C=US/ST=State/L=City/O=F3S Storage/CN=${client}.lan.buetow.org"
- openssl x509 -req -days 3650 -in ca/${client}.csr -CA ca/ca-cert.pem \
- -CAkey ca/ca-key.pem -CAcreateserial -out ca/${client}-cert.pem
- cat ca/${client}-cert.pem ca/${client}-key.pem > ca/${client}-stunnel.pem
-done
-```
-
-### stunnel server config (`/usr/local/etc/stunnel/stunnel.conf`)
-
-```
-cert = /usr/local/etc/stunnel/server-cert.pem
-key = /usr/local/etc/stunnel/server-key.pem
-setuid = stunnel
-setgid = stunnel
-
-[nfs-tls]
-accept = 192.168.1.138:2323
-connect = 127.0.0.1:2049
-CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
-verify = 2
-requireCert = yes
-```
-
-```sh
-doas pkg install -y stunnel
-doas sysrc stunnel_enable=YES
-doas service stunnel start
-# Copy certs to f1 via tarball, configure identically
-```
-
-## NFS Client Configuration (Rocky Linux r0, r1, r2)
-
-```sh
-dnf install -y stunnel nfs-utils
-
-# Copy client cert and CA from f0
-scp f0:/usr/local/etc/stunnel/ca/r0-stunnel.pem /etc/stunnel/
-scp f0:/usr/local/etc/stunnel/ca/ca-cert.pem /etc/stunnel/
-```
-
-`/etc/stunnel/stunnel.conf` (r0 example):
-
-```
-cert = /etc/stunnel/r0-stunnel.pem
-CAfile = /etc/stunnel/ca-cert.pem
-client = yes
-verify = 2
-
-[nfs-ha]
-accept = 127.0.0.1:2323
-connect = 192.168.1.138:2323
-```
-
-```sh
-systemctl enable --now stunnel
-```
-
-### NFSv4 user mapping
-
-`/etc/idmapd.conf` on r0, r1, r2:
-
-```
-[General]
-Domain = lan.buetow.org
-```
-
-Fix inotify limit:
-
-```sh
-echo 'fs.inotify.max_user_instances = 512' > /etc/sysctl.d/99-inotify.conf
-sysctl -w fs.inotify.max_user_instances=512
-systemctl enable --now nfs-client.target nfs-idmapd
-```
-
-### Mount NFS
-
-```sh
-mkdir -p /data/nfs/k3svolumes
-mount -t nfs4 -o port=2323 127.0.0.1:/k3svolumes /data/nfs/k3svolumes
-```
-
-`/etc/fstab`:
-
-```
-127.0.0.1:/k3svolumes /data/nfs/k3svolumes nfs4 port=2323,_netdev,hard,timeo=600,retrans=3 0 0
-```
-
-NFS path structure on k3s nodes: `/data/nfs/k3svolumes/<app>/`
-
-## NFS Troubleshooting
-
-### All r-nodes show "access denied" when mounting NFS
-
-**Most likely cause**: `vfs.nfsd.nfs_privport=1` on the CARP MASTER. This happens after f-host reboots if `nfs_reserved_port_only` is not set to `NO` in rc.conf. The nfsd rc script (`/etc/rc.d/nfsd`) explicitly sets the sysctl based on this variable, overriding `/etc/sysctl.conf`. Fix: `doas sysrc nfs_reserved_port_only=NO` on both f0 and f1.
-
-### stunnel appears not running but port 2323 is bound
-
-`carpcontrol.sh` starts stunnel on CARP MASTER transition, but doesn't write a PID file. So `service stunnel status` reports "not running" even though stunnel is actually serving connections. Check with `doas sockstat -l | grep 2323`. If there's a stale stunnel process, kill it and restart: `doas kill <pid> && doas service stunnel start`.
-
-### Pods stuck in ContainerCreating/Unknown after NFS recovery
-
-After NFS is restored on the server side, the `nfs-mount-monitor` systemd timer on each r-node will auto-remount within ~10 seconds and force-delete stuck pods. If immediate recovery is needed: `mount /data/nfs/k3svolumes` on each r-node, then delete the stuck pods manually.
-
-**Note:** The monitor catches three failure modes: missing mountpoint, stat hang (reads unresponsive), and **silent write hang** (reads OK but writes block — the hardest case, e.g. stunnel-wrapped NFSv4 after a CARP failover). Watch the consecutive-failure counter via Prometheus (`nfs_mount_monitor_consecutive_failures`) — warning fires at ≥3, critical at ≥5. At 5 consecutive failures the node cordons itself and reboots.
-
-### ZFS pool SUSPENDED recovery
-
-**Symptoms**: `doas zpool status zdata` shows `state: SUSPENDED`. All IO to the pool is
-halted — ZFS suspends itself to prevent corruption when IO errors exceed the threshold.
-Commands like `zpool clear`, `zpool scrub`, `zpool offline`, and even `ls /data/nfs/` hang
-indefinitely because they wait for kernel IO that will never complete.
-
-**Known cause (2026-05-15)**: Samsung 870 EVO 1TB on f0 (ada1) hit 107 read errors and
-105M+ write errors during normal operation — likely thermal throttling or a momentary
-SATA connection loss. A previous resilver on 2026-01-27 suggests the drive has been
-marginal for months.
-
-**Recovery — hard power cycle only**:
-- Do NOT attempt `doas shutdown -r now` — if ZFS is suspended, the graceful shutdown hangs
- at ZFS pool export and may stay stuck for 30–60+ minutes.
-- Do NOT attempt `doas zpool clear zdata` — it hangs because ada1 is unresponsive.
-- Do NOT attempt `doas ifconfig re0 vhid 1 state backup` or `doas carp backup` to fail
- over to f1 first — these ifconfig ioctls can also be blocked when the kernel has too
- many stuck IO threads. They may return exit=1 silently.
-- **Hard power cycle** (pull power or hold the power button) resolves the issue in ~9 s
- (Rocky Linux VMs come up automatically, ZFS pool imports cleanly on next boot).
-
-**Post-recovery**:
-```sh
-# 1. Verify pool health
-doas zpool status zdata # should show ONLINE, 0 errors
-
-# 2. Check SMART for drive health
-doas smartctl -a /dev/ada1 | grep -iE '(temperature|reallocated|pending|uncorrectable|error)'
-
-# 3. Start a scrub to verify data integrity
-doas zpool scrub zdata
-doas zpool status zdata # monitor; "scrub repaired 0 in ..." means data intact
-
-# 4. Verify NFS is serving (stunnel listening on CARP VIP)
-doas sockstat -l | grep 2323
-```
-
-**After cluster recovery**:
-- Check for cordoned nodes: `kubectl get nodes` — if r0/r1/r2 show `SchedulingDisabled`,
- uncordon them (see nfs-mount-monitor escalation section above).
-- Reset fail counters on all r-nodes: `echo 0 > /var/lib/nfs-mount-monitor/fail-count`
-
-**Temperature monitoring** to detect thermal issues before they cause pool suspension:
-```sh
-# FreeBSD: load coretemp for CPU package temperature
-doas kldload coretemp
-sysctl -a | grep temperature # hw.acpi.thermal.*: and dev.cpu.*:
-# Persist across reboots
-echo 'coretemp_load="YES"' | doas tee -a /boot/loader.conf
-
-# SSD temperature (install smartmontools if absent)
-doas pkg install -y smartmontools
-doas smartctl -a /dev/ada1 | grep -i temperature # "194 Temperature_Celsius"
-```
-
-## Thermal Troubleshooting
-
-### Symptoms of thermal throttling on f-hosts
-
-- SSD I/O slowness (writes dropping from MB/s to KB/s)
-- ZFS txg sync times jumping from <100ms to 5-37 seconds
-- `zpool trim` stuck at 0% or paused indefinitely
-- rsync / zrepl jobs going into D-state (waiting on ZFS I/O)
-- High system CPU (80%+) from encryption overhead (ZFS native AES-256-GCM)
-
-### How to check temperatures
-
-- **coretemp (real per-core die temps)**: `kldload coretemp; sysctl dev.cpu | grep temperature`
- - Should now auto-load via `/boot/loader.conf` (`coretemp_load="YES"`)
-- **hw.acpi.thermal.tz0**: Often a constant lie (e.g. always 27.9°C) — do NOT rely on it
-- **SSD temperature**: `smartctl -a /dev/adaN` (requires smartmontools; may not be installed)
-- **Disk I/O performance**: `gstat -bp -I 1s -d` (FreeBSD gstat, not Linux iostat)
-- **ZFS txg sync times**: `zpool events | grep -i sync` or check via `zpool status -v`
-
-### Beelink S12 Pro specifics
-
-- Small enclosure with passive/minimal cooling — heat accumulates fast under sustained load
-- N100 CPU: normal idle ~40-55°C, warn >70°C idle, critical >85°C under load
-- NVMe sits close to CPU — both heat each other in the small chassis
-- Enclosure gets hot to the touch before temps fully register in software
-
-### Cascade failure pattern (2026-05-16 f0 incident)
-
-The following cascade was observed:
-
-1. Hot enclosure (NVMe physically very hot) → SSD thermal throttling
-2. Concurrent rsync + 1-min zrepl snapshots + paused scrub → high I/O demand
-3. autotrim=off (never trimmed) → SSD write amplification → further slowdown
-4. ZFS native AES-256-GCM encryption → high CPU per I/O → txg sync times 5-37s
-5. TRIM stuck at 0% for hours (couldn't make progress under continuous I/O load)
-6. rsync went into D-state waiting on ZFS → appeared "hung"
-
-**Root causes**: (a) autotrim=off (SSD never trimmed); (b) hot enclosure + thermal throttling;
-(c) zrepl snapshot interval too aggressive (1m).
-
-**Resolution**: Reseat/inspect drive + enclosure. After hardware fix, autotrim=on enabled,
-manual TRIM ran to completion at ~2.4 GB/s. See "SSD TRIM Configuration" section.
-
-### Remediation steps
-
-1. SSH in and check temps: `kldload coretemp && sysctl dev.cpu | grep temperature`
-2. If >80°C: stop heavy I/O workloads immediately (`service zrepl stop`, cancel scrubs)
-3. Physical: shut down, reseat NVMe, clean dust from vents, improve airflow
-4. After hardware fix: enable autotrim (`zpool set autotrim=on <pool>`) and run `zpool trim <pool>`
-5. Monitor trim progress: `zpool status | grep trim`
-6. Persist coretemp: ensure `/boot/loader.conf` has `coretemp_load="YES"` (see task 95)
-
-### Checklist for NFS outage on CARP MASTER (f0 or f1)
-
-```sh
-# 1. Check which host is CARP MASTER
-ssh paul@f0 'ifconfig re0 | grep carp'
-ssh paul@f1 'ifconfig re0 | grep carp'
-
-# 2. On the MASTER, verify:
-doas sysctl vfs.nfsd.nfs_privport # must be 0
-doas service nfsd status # must be running
-doas sockstat -l | grep 2323 # stunnel must be listening
-ls /data/nfs/nfs.DO_NOT_REMOVE # ZFS dataset must be mounted
-
-# 3. Fix if needed:
-doas sysrc nfs_reserved_port_only=NO # persist the fix
-doas sysctl vfs.nfsd.nfs_privport=0 # apply immediately
-doas service nfsd restart
-# For stunnel, kill stale process if needed, then:
-doas service stunnel start
-```
-
-## NFS Auto-Repair: nfs-mount-monitor
-
-A systemd timer+service pair on r0/r1/r2 checks the NFS mount every 10 seconds and automatically repairs it if stale or missing.
-
-### Repo location
-
-```
-f3s/r-nodes/nfs-mount-monitor/
- check-nfs-mount.sh # repair script → /usr/local/bin/
- nfs-mount-monitor.service # one-shot service → /etc/systemd/system/
- nfs-mount-monitor.timer # 10-second timer → /etc/systemd/system/
-f3s/r-nodes/Rexfile # Rex deploy task: nfs_mount_monitor
-```
-
-### Deploy
-
-```sh
-# From repo root — pushes to all three r-nodes and reloads systemd if anything changed
-rex -f f3s/r-nodes/Rexfile nfs_mount_monitor
-```
-
-### What it does
-
-Three probes run in sequence on every 10-second tick:
-
-1. **mountpoint probe** — detects completely missing mounts.
-2. **stat probe** (`timeout 2s stat`) — detects read hangs / stale cache misses.
-3. **write probe** (`timeout 5s sh -c "echo $$ > .healthcheck.<host> && rm -f ..."`) —
- detects the "reads OK, writes hang" failure mode. Stunnel-wrapped NFSv4 can enter
- a state where `stat` returns from cache but all writes block indefinitely; only this
- probe catches it.
-
-If any probe fails, `fix_mount` runs:
-
-1. `mount -o remount -f` (cheapest, no disruption if mount is merely stale)
-2. Kill D-state processes pinning the mount (`kill_pinning_processes` — SIGKILLs
- processes whose `wchan` starts with `nfs_` and whose cwd/fds point into the mountpoint)
-3. `umount -f` (force unmount)
-4. `umount -l` (lazy detach VFS node if `-f` failed)
-5. `systemctl restart stunnel` + 2s sleep (refresh the TLS transport)
-6. `mount -t nfs4 -o port=2323,soft,timeo=50,retrans=3` (explicit soft NFS mount — NOT
- `mount $MOUNT_POINT` which reads fstab's `hard` flag and enters uninterruptible D-state
- if the server is unreachable; SIGKILL cannot wake a D-state process on Linux;
- `soft,timeo=50,retrans=3` returns ETIMEDOUT after ~15 s so the fail counter can
- increment and eventually trigger the reboot escalation)
-
-A hard **60-second deadline** prevents `fix_mount` from outlasting its own timer interval.
-
-On successful repair, force-deletes pods on this node stuck in
-Unknown / Pending / ContainerCreating so the kubelet can reschedule them.
-
-**Consecutive-failure escalation**: each `fix_mount` failure increments a counter
-persisted to `/var/lib/nfs-mount-monitor/fail-count`. At `NFS_FAIL_THRESHOLD=5`
-consecutive failures (~50 s), the node cordons itself (`kubectl cordon`) and issues
-`systemctl reboot`. The cordon is stored in etcd and **persists across reboots** —
-after the underlying NFS issue is resolved, manually uncordon each affected node:
-```sh
-kubectl uncordon r0.lan.buetow.org
-kubectl uncordon r1.lan.buetow.org
-kubectl uncordon r2.lan.buetow.org
-```
-
-The counter is also exported to `/var/lib/node_exporter/textfile_collector/nfs_mount_monitor.prom`
-so Prometheus can alert on `nfs_mount_monitor_consecutive_failures` without parsing
-journal logs (warning ≥3, critical ≥5 — see
-`f3s/prometheus/manifests/nfs-mount-monitor-alerts.yaml`).
-
-Uses a lock file (`/var/run/nfs-mount-check.lock`) to prevent overlapping runs
-since the timer fires faster than the script's worst-case runtime. If the lock is
-older than **90 seconds** it was left by a run that was SIGKILLed before its EXIT
-trap could clean up (systemd kills with SIGKILL after its own timeout, bypassing
-`trap "rm -f $LOCK_FILE" EXIT`); the stale lock is removed and the run continues,
-preventing all health checks from being silently skipped forever.
-
-### Timer configuration
-
-| Parameter | Value | Reason |
-|-----------|-------|--------|
-| `OnBootSec` | 30s | Let network and NFS client start before first check |
-| `OnUnitActiveSec` | 10s | Check interval; each run is bounded by a 60-second deadline |
-| `AccuracySec` | 1s | Prevent systemd batching from delaying the 10 s interval |
-
-### Managing the monitor during an extended NFS outage
-
-During a prolonged NFS outage (e.g. while the storage host is being power-cycled or
-repaired), stop the timer on affected r-nodes to prevent the escalation counter from
-reaching the auto-reboot threshold prematurely:
-
-```sh
-# On each affected r-node (as root)
-systemctl stop nfs-mount-monitor.timer
-echo 0 > /var/lib/nfs-mount-monitor/fail-count # reset counter
-
-# After NFS is restored, restart and verify
-systemctl start nfs-mount-monitor.timer
-journalctl -u nfs-mount-monitor -f
-```
-
-Also reset the counter to 0 after uncordoning nodes (see escalation section above),
-because the old counter value would lower the effective threshold for the next outage.
-
-### Status and logs
-
-```sh
-systemctl status nfs-mount-monitor.timer
-journalctl -u nfs-mount-monitor -f
-```
-
-## AWS S3 Glacier Deep Archive Backups
-
-Encrypted incremental ZFS snapshots from `zdata` pool backed up daily to **AWS S3 Glacier Deep Archive** via cron. Scripts adapted from FreeBSD Home NAS setup. Also performs periodic zpool scrubbing.
-
-## Local-Path Storage for SQLite Workloads
-
-Some k3s workloads use `local-path` (k3s default storageClass) instead of NFS for
-their data volumes. This is appropriate when:
-
-- The application uses SQLite: NFS file-lock semantics cause `fcntl()` races on
- pod restarts, and `Recreate` strategy only reduces (not eliminates) the risk.
-- Cache-heavy workloads: NFS over stunnel adds TLS round-trip latency to every
- cache read. Navidrome's image/background cache init took ~19s over NFS; it
- takes ~25ms from local disk.
-
-**Trade-off**: a local-path PV lives on one specific node. If that node is down,
-the pod reschedules elsewhere but finds no data volume — it starts with an empty DB,
-losing play history, scrobble queue, etc. For a home server this is acceptable.
-The deployment must pin the pod to the same node via `nodeSelector` so the local
-PV is always reachable.
-
-### Workloads using local-path
-
-| App | Node | Path on node |
-|-----|------|--------------|
-| navidrome `/data` (DB + cache) | r1 | `/var/lib/rancher/k3s/storage/pvc-*_services_navidrome-data-pvc` |
+Note: original plan was HAST, replaced by **zrepl** (ZFS send/receive) — more reliable, avoids the ZFS corruption during failover that HAST caused.
-### Migrating NFS hostPath → local-path
+## Sub-references
-1. Disable ArgoCD auto-sync: `kubectl patch application <app> -n cicd --type=json -p='[{"op":"replace","path":"/spec/syncPolicy","value":{}}]'`
-2. Scale deployment to 0: `kubectl scale deployment <app> -n services --replicas=0`
-3. Delete old PVC and static PV.
-4. Create new PVC with `storageClassName: local-path`.
-5. Create a migration pod pinned to the target node that mounts both the NFS hostPath
- (source) and the new PVC (target); copy data with `cp -av /src/. /dst/`.
-6. Delete migration pod, apply updated deployment (with `nodeSelector`), scale back up.
-7. Re-enable ArgoCD auto-sync and push manifests to git; push to in-cluster git-server
- (`git push r0 master`) so ArgoCD picks up the new storageClass spec.
+- [ZFS Pools & Encryption](storage/zfs.md) — `zdata` pool, physical disks, USB-stored keys, encrypted datasets, boot-time key loading
+- [zrepl Replication](storage/zrepl.md) — `f0 → f1` nfsdata, `f3 → f2` freebsd VM, sink configs, troubleshooting, DL-state recovery
+- [CARP HA VIP](storage/carp.md) — VIP `192.168.1.138`, `carpcontrol.sh`, mgmt script, auto-failback, SUSPENDED-pool limitation
+- [NFS over stunnel](storage/nfs.md) — NFS server, mutual-TLS stunnel, Rocky client config, `/etc/fstab`
+- [nfs-mount-monitor](storage/nfs-mount-monitor.md) — systemd watchdog on r-nodes (mount/stat/write probes, fail counter, cordon-and-reboot escalation)
+- [Troubleshooting](storage/troubleshooting.md) — NFS issues, ZFS pool SUSPENDED recovery, **thermal** troubleshooting (Beelink S12 Pro)
+- [Backups & Local-Path](storage/backups.md) — S3 Glacier Deep Archive, when to use `local-path` instead of NFS
## Storage Summary
diff --git a/prompts/skills/f3s/references/storage/backups.md b/prompts/skills/f3s/references/storage/backups.md
new file mode 100644
index 0000000..99fd425
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/backups.md
@@ -0,0 +1,40 @@
+# Backups and Local-Path Storage
+
+## AWS S3 Glacier Deep Archive Backups
+
+Encrypted incremental ZFS snapshots from `zdata` pool backed up daily to **AWS S3 Glacier Deep Archive** via cron. Scripts adapted from FreeBSD Home NAS setup. Also performs periodic zpool scrubbing.
+
+## Local-Path Storage for SQLite Workloads
+
+Some k3s workloads use `local-path` (k3s default storageClass) instead of NFS for
+their data volumes. This is appropriate when:
+
+- The application uses SQLite: NFS file-lock semantics cause `fcntl()` races on
+ pod restarts, and `Recreate` strategy only reduces (not eliminates) the risk.
+- Cache-heavy workloads: NFS over stunnel adds TLS round-trip latency to every
+ cache read. Navidrome's image/background cache init took ~19s over NFS; it
+ takes ~25ms from local disk.
+
+**Trade-off**: a local-path PV lives on one specific node. If that node is down,
+the pod reschedules elsewhere but finds no data volume — it starts with an empty DB,
+losing play history, scrobble queue, etc. For a home server this is acceptable.
+The deployment must pin the pod to the same node via `nodeSelector` so the local
+PV is always reachable.
+
+### Workloads using local-path
+
+| App | Node | Path on node |
+|-----|------|--------------|
+| navidrome `/data` (DB + cache) | r1 | `/var/lib/rancher/k3s/storage/pvc-*_services_navidrome-data-pvc` |
+
+### Migrating NFS hostPath → local-path
+
+1. Disable ArgoCD auto-sync: `kubectl patch application <app> -n cicd --type=json -p='[{"op":"replace","path":"/spec/syncPolicy","value":{}}]'`
+2. Scale deployment to 0: `kubectl scale deployment <app> -n services --replicas=0`
+3. Delete old PVC and static PV.
+4. Create new PVC with `storageClassName: local-path`.
+5. Create a migration pod pinned to the target node that mounts both the NFS hostPath
+ (source) and the new PVC (target); copy data with `cp -av /src/. /dst/`.
+6. Delete migration pod, apply updated deployment (with `nodeSelector`), scale back up.
+7. Re-enable ArgoCD auto-sync and push manifests to git; push to in-cluster git-server
+ (`git push r0 master`) so ArgoCD picks up the new storageClass spec.
diff --git a/prompts/skills/f3s/references/storage/carp.md b/prompts/skills/f3s/references/storage/carp.md
new file mode 100644
index 0000000..abf0685
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/carp.md
@@ -0,0 +1,123 @@
+# CARP: High-Availability VIP
+
+CARP (Common Address Redundancy Protocol) provides **VIP 192.168.1.138** that floats between f0 (primary) and f1 (standby). The VIP is what NFS clients and the FreeBSD `relayd` ingress connect to, so only the current MASTER serves traffic.
+
+## /etc/rc.conf configuration
+
+```sh
+# On f0 (default advskew=0, wins elections)
+ifconfig_re0_alias0="inet vhid 1 pass YOURPASSWORD alias 192.168.1.138/32"
+
+# On f1 (advskew=100, loses elections to f0)
+ifconfig_re0_alias0="inet vhid 1 advskew 100 pass YOURPASSWORD alias 192.168.1.138/32"
+```
+
+## Load CARP module
+
+```sh
+echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
+# or immediately: doas kldload carp
+```
+
+## /etc/hosts for CARP VIP
+
+```
+192.168.1.138 f3s-storage-ha f3s-storage-ha.lan f3s-storage-ha.lan.buetow.org
+192.168.2.138 f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
+```
+
+## devd: CARP state change hook
+
+Add to `/etc/devd.conf` on f0 and f1:
+
+```
+notify 0 {
+ match "system" "CARP";
+ match "subsystem" "[0-9]+@[0-9a-z.]+";
+ match "type" "(MASTER|BACKUP)";
+ action "/usr/local/bin/carpcontrol.sh $subsystem $type";
+};
+```
+
+```sh
+doas service devd restart
+```
+
+## carpcontrol.sh — start/stop NFS+stunnel on failover
+
+```sh
+#!/bin/sh
+HOSTNAME=`hostname`
+
+if [ ! -f /data/nfs/nfs.DO_NOT_REMOVE ]; then
+ logger '/data/nfs not mounted, mounting it now!'
+ if [ "$HOSTNAME" = 'f0.lan.buetow.org' ]; then
+ zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
+ zfs set mountpoint=/data/nfs zdata/enc/nfsdata
+ else
+ zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
+ zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
+ zfs mount zdata/sink/f0/zdata/enc/nfsdata
+ zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
+ fi
+ service nfsd stop 2>&1
+ service mountd stop 2>&1
+fi
+
+case "$2" in
+ MASTER)
+ logger "CARP state changed to MASTER, starting services"
+ service rpcbind start >/dev/null 2>&1
+ service mountd start >/dev/null 2>&1
+ service nfsd start >/dev/null 2>&1
+ service nfsuserd start >/dev/null 2>&1
+ service stunnel restart >/dev/null 2>&1
+ ;;
+ BACKUP)
+ logger "CARP state changed to BACKUP, stopping services"
+ service stunnel stop >/dev/null 2>&1
+ service nfsd stop >/dev/null 2>&1
+ service mountd stop >/dev/null 2>&1
+ service nfsuserd stop >/dev/null 2>&1
+ ;;
+esac
+```
+
+Install: `doas chmod +x /usr/local/bin/carpcontrol.sh` (copy to f1 too)
+
+## CARP management script (`/usr/local/bin/carp`)
+
+```sh
+doas carp # show current state
+doas carp master # force MASTER (e.g. reclaim after maintenance)
+doas carp backup # force BACKUP (trigger failover to f1)
+doas carp auto-failback disable # prevent auto-failback (for maintenance)
+doas carp auto-failback enable # re-enable auto-failback
+```
+
+## CARP failover limitation when ZFS is suspended
+
+If f0's ZFS pool is SUSPENDED but f0's OS is still running, f0 remains CARP MASTER
+(it keeps sending CARP advertisements). Attempts to manually demote f0 via:
+
+```sh
+doas carp backup # may return exit=0 but has no effect
+doas ifconfig re0 vhid 1 state backup # may return exit=1 silently
+doas ifconfig re0 vhid 1 advskew 254 # may return exit=1 silently
+```
+
+…can all silently fail because the kernel has too many stuck IO threads blocking
+the ifconfig ioctl path. The CARP VIP will **not** float to f1 in this case.
+**Only a hard power cycle of f0 reliably triggers CARP failover.** See
+[troubleshooting.md](troubleshooting.md) for the full SUSPENDED-pool recovery runbook.
+
+## Auto-failback from f1 to f0
+
+Script `/usr/local/bin/carp-auto-failback.sh` runs every minute via cron on f0. Checks: currently BACKUP? `/data/nfs` mounted? Marker file exists? Failback not blocked? If all conditions met, promotes f0 to MASTER.
+
+```sh
+echo "* * * * * /usr/local/bin/carp-auto-failback.sh" | doas crontab -
+doas touch /data/nfs/nfs.DO_NOT_REMOVE # marker file required for auto-failback
+```
+
+Logs to `/var/log/carp-auto-failback.log`.
diff --git a/prompts/skills/f3s/references/storage/nfs-mount-monitor.md b/prompts/skills/f3s/references/storage/nfs-mount-monitor.md
new file mode 100644
index 0000000..a9b71e7
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/nfs-mount-monitor.md
@@ -0,0 +1,107 @@
+# NFS Auto-Repair: nfs-mount-monitor
+
+A systemd timer+service pair on r0/r1/r2 checks the NFS mount every 10 seconds and automatically repairs it if stale or missing.
+
+## Repo location
+
+```
+f3s/r-nodes/nfs-mount-monitor/
+ check-nfs-mount.sh # repair script → /usr/local/bin/
+ nfs-mount-monitor.service # one-shot service → /etc/systemd/system/
+ nfs-mount-monitor.timer # 10-second timer → /etc/systemd/system/
+f3s/r-nodes/Rexfile # Rex deploy task: nfs_mount_monitor
+```
+
+## Deploy
+
+```sh
+# From repo root — pushes to all three r-nodes and reloads systemd if anything changed
+rex -f f3s/r-nodes/Rexfile nfs_mount_monitor
+```
+
+## What it does
+
+Three probes run in sequence on every 10-second tick:
+
+1. **mountpoint probe** — detects completely missing mounts.
+2. **stat probe** (`timeout 2s stat`) — detects read hangs / stale cache misses.
+3. **write probe** (`timeout 5s sh -c "echo $$ > .healthcheck.<host> && rm -f ..."`) —
+ detects the "reads OK, writes hang" failure mode. Stunnel-wrapped NFSv4 can enter
+ a state where `stat` returns from cache but all writes block indefinitely; only this
+ probe catches it.
+
+If any probe fails, `fix_mount` runs:
+
+1. `mount -o remount -f` (cheapest, no disruption if mount is merely stale)
+2. Kill D-state processes pinning the mount (`kill_pinning_processes` — SIGKILLs
+ processes whose `wchan` starts with `nfs_` and whose cwd/fds point into the mountpoint)
+3. `umount -f` (force unmount)
+4. `umount -l` (lazy detach VFS node if `-f` failed)
+5. `systemctl restart stunnel` + 2s sleep (refresh the TLS transport)
+6. `mount -t nfs4 -o port=2323,soft,timeo=50,retrans=3` (explicit soft NFS mount — NOT
+ `mount $MOUNT_POINT` which reads fstab's `hard` flag and enters uninterruptible D-state
+ if the server is unreachable; SIGKILL cannot wake a D-state process on Linux;
+ `soft,timeo=50,retrans=3` returns ETIMEDOUT after ~15 s so the fail counter can
+ increment and eventually trigger the reboot escalation)
+
+A hard **60-second deadline** prevents `fix_mount` from outlasting its own timer interval.
+
+On successful repair, force-deletes pods on this node stuck in
+Unknown / Pending / ContainerCreating so the kubelet can reschedule them.
+
+**Consecutive-failure escalation**: each `fix_mount` failure increments a counter
+persisted to `/var/lib/nfs-mount-monitor/fail-count`. At `NFS_FAIL_THRESHOLD=5`
+consecutive failures (~50 s), the node cordons itself (`kubectl cordon`) and issues
+`systemctl reboot`. The cordon is stored in etcd and **persists across reboots** —
+after the underlying NFS issue is resolved, manually uncordon each affected node:
+```sh
+kubectl uncordon r0.lan.buetow.org
+kubectl uncordon r1.lan.buetow.org
+kubectl uncordon r2.lan.buetow.org
+```
+
+The counter is also exported to `/var/lib/node_exporter/textfile_collector/nfs_mount_monitor.prom`
+so Prometheus can alert on `nfs_mount_monitor_consecutive_failures` without parsing
+journal logs (warning ≥3, critical ≥5 — see
+`f3s/prometheus/manifests/nfs-mount-monitor-alerts.yaml`).
+
+Uses a lock file (`/var/run/nfs-mount-check.lock`) to prevent overlapping runs
+since the timer fires faster than the script's worst-case runtime. If the lock is
+older than **90 seconds** it was left by a run that was SIGKILLed before its EXIT
+trap could clean up (systemd kills with SIGKILL after its own timeout, bypassing
+`trap "rm -f $LOCK_FILE" EXIT`); the stale lock is removed and the run continues,
+preventing all health checks from being silently skipped forever.
+
+## Timer configuration
+
+| Parameter | Value | Reason |
+|-----------|-------|--------|
+| `OnBootSec` | 30s | Let network and NFS client start before first check |
+| `OnUnitActiveSec` | 10s | Check interval; each run is bounded by a 60-second deadline |
+| `AccuracySec` | 1s | Prevent systemd batching from delaying the 10 s interval |
+
+## Managing the monitor during an extended NFS outage
+
+During a prolonged NFS outage (e.g. while the storage host is being power-cycled or
+repaired), stop the timer on affected r-nodes to prevent the escalation counter from
+reaching the auto-reboot threshold prematurely:
+
+```sh
+# On each affected r-node (as root)
+systemctl stop nfs-mount-monitor.timer
+echo 0 > /var/lib/nfs-mount-monitor/fail-count # reset counter
+
+# After NFS is restored, restart and verify
+systemctl start nfs-mount-monitor.timer
+journalctl -u nfs-mount-monitor -f
+```
+
+Also reset the counter to 0 after uncordoning nodes (see escalation section above),
+because the old counter value would lower the effective threshold for the next outage.
+
+## Status and logs
+
+```sh
+systemctl status nfs-mount-monitor.timer
+journalctl -u nfs-mount-monitor -f
+```
diff --git a/prompts/skills/f3s/references/storage/nfs.md b/prompts/skills/f3s/references/storage/nfs.md
new file mode 100644
index 0000000..5aa0743
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/nfs.md
@@ -0,0 +1,162 @@
+# NFS over stunnel
+
+NFSv4 served from f0/f1 to the Rocky Linux k3s nodes (r0/r1/r2) over a
+TLS tunnel that terminates on the CARP VIP. NFS itself stays on localhost;
+stunnel handles transport encryption with mutual TLS.
+
+## NFS Server Configuration (f0 and f1)
+
+```sh
+doas sysrc nfs_server_enable=YES
+doas sysrc nfsv4_server_enable=YES
+doas sysrc nfsuserd_enable=YES
+doas sysrc nfsuserd_flags="-domain lan.buetow.org"
+doas sysrc mountd_enable=YES
+doas sysrc rpcbind_enable=YES
+doas sysrc nfs_reserved_port_only=NO # Required for NFS over stunnel (unprivileged ports)
+
+doas mkdir -p /data/nfs/k3svolumes
+doas chmod 755 /data/nfs/k3svolumes
+```
+
+> **FreeBSD 15.0 note**: FreeBSD 15.0 sets `nfs_reserved_port_only=YES` by default in `/etc/defaults/rc.conf`. The nfsd rc script (`/etc/rc.d/nfsd`) checks this variable and explicitly runs `sysctl vfs.nfsd.nfs_privport=1` at startup, overriding any value set in `/etc/sysctl.conf` or `/boot/loader.conf`. This blocks NFS clients connecting via stunnel (unprivileged ports). Fix on **each f-host**:
+> ```sh
+> # The ONLY correct fix — setting sysctl.conf does NOT work
+> doas sysrc nfs_reserved_port_only=NO
+> # Apply immediately without reboot
+> doas sysctl vfs.nfsd.nfs_privport=0
+> # Remount on each r-host
+> mount -a
+> ```
+
+`/etc/exports` (stunnel clients appear as localhost):
+
+```
+V4: /data/nfs -sec=sys
+/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
+```
+
+Start services:
+
+```sh
+doas service rpcbind start
+doas service mountd start
+doas service nfsd start
+doas service nfsuserd start
+```
+
+## stunnel: Encrypted NFS over TLS
+
+stunnel binds to the CARP VIP (192.168.1.138), so only the CARP MASTER accepts connections. Uses mutual TLS with client certificate authentication.
+
+### Create CA and certificates (on f0)
+
+```sh
+doas mkdir -p /usr/local/etc/stunnel/ca
+cd /usr/local/etc/stunnel/ca
+doas openssl genrsa -out ca-key.pem 4096
+doas openssl req -new -x509 -days 3650 -key ca-key.pem -out ca-cert.pem \
+ -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=F3S Stunnel CA'
+
+cd /usr/local/etc/stunnel
+doas openssl genrsa -out server-key.pem 4096
+doas openssl req -new -key server-key.pem -out server.csr \
+ -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=f3s-storage-ha.lan'
+doas openssl x509 -req -days 3650 -in server.csr -CA ca/ca-cert.pem \
+ -CAkey ca/ca-key.pem -CAcreateserial -out server-cert.pem
+
+# Client certs for r0, r1, r2, earth
+for client in r0 r1 r2 earth; do
+ openssl genrsa -out ca/${client}-key.pem 4096
+ openssl req -new -key ca/${client}-key.pem -out ca/${client}.csr \
+ -subj "/C=US/ST=State/L=City/O=F3S Storage/CN=${client}.lan.buetow.org"
+ openssl x509 -req -days 3650 -in ca/${client}.csr -CA ca/ca-cert.pem \
+ -CAkey ca/ca-key.pem -CAcreateserial -out ca/${client}-cert.pem
+ cat ca/${client}-cert.pem ca/${client}-key.pem > ca/${client}-stunnel.pem
+done
+```
+
+### stunnel server config (`/usr/local/etc/stunnel/stunnel.conf`)
+
+```
+cert = /usr/local/etc/stunnel/server-cert.pem
+key = /usr/local/etc/stunnel/server-key.pem
+setuid = stunnel
+setgid = stunnel
+
+[nfs-tls]
+accept = 192.168.1.138:2323
+connect = 127.0.0.1:2049
+CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
+verify = 2
+requireCert = yes
+```
+
+```sh
+doas pkg install -y stunnel
+doas sysrc stunnel_enable=YES
+doas service stunnel start
+# Copy certs to f1 via tarball, configure identically
+```
+
+## NFS Client Configuration (Rocky Linux r0, r1, r2)
+
+```sh
+dnf install -y stunnel nfs-utils
+
+# Copy client cert and CA from f0
+scp f0:/usr/local/etc/stunnel/ca/r0-stunnel.pem /etc/stunnel/
+scp f0:/usr/local/etc/stunnel/ca/ca-cert.pem /etc/stunnel/
+```
+
+`/etc/stunnel/stunnel.conf` (r0 example):
+
+```
+cert = /etc/stunnel/r0-stunnel.pem
+CAfile = /etc/stunnel/ca-cert.pem
+client = yes
+verify = 2
+
+[nfs-ha]
+accept = 127.0.0.1:2323
+connect = 192.168.1.138:2323
+```
+
+```sh
+systemctl enable --now stunnel
+```
+
+### NFSv4 user mapping
+
+`/etc/idmapd.conf` on r0, r1, r2:
+
+```
+[General]
+Domain = lan.buetow.org
+```
+
+Fix inotify limit:
+
+```sh
+echo 'fs.inotify.max_user_instances = 512' > /etc/sysctl.d/99-inotify.conf
+sysctl -w fs.inotify.max_user_instances=512
+systemctl enable --now nfs-client.target nfs-idmapd
+```
+
+### Mount NFS
+
+```sh
+mkdir -p /data/nfs/k3svolumes
+mount -t nfs4 -o port=2323 127.0.0.1:/k3svolumes /data/nfs/k3svolumes
+```
+
+`/etc/fstab`:
+
+```
+127.0.0.1:/k3svolumes /data/nfs/k3svolumes nfs4 port=2323,_netdev,hard,timeo=600,retrans=3 0 0
+```
+
+NFS path structure on k3s nodes: `/data/nfs/k3svolumes/<app>/`
+
+The `nfs-mount-monitor` watchdog on each r-node detects and repairs stale or
+hung mounts automatically — see [nfs-mount-monitor.md](nfs-mount-monitor.md).
diff --git a/prompts/skills/f3s/references/storage/troubleshooting.md b/prompts/skills/f3s/references/storage/troubleshooting.md
new file mode 100644
index 0000000..87951fa
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/troubleshooting.md
@@ -0,0 +1,147 @@
+# Storage Troubleshooting
+
+NFS issues, ZFS pool SUSPENDED recovery, and thermal problems on the
+Beelink S12 Pro mini-PCs.
+
+## NFS Troubleshooting
+
+### All r-nodes show "access denied" when mounting NFS
+
+**Most likely cause**: `vfs.nfsd.nfs_privport=1` on the CARP MASTER. This happens after f-host reboots if `nfs_reserved_port_only` is not set to `NO` in rc.conf. The nfsd rc script (`/etc/rc.d/nfsd`) explicitly sets the sysctl based on this variable, overriding `/etc/sysctl.conf`. Fix: `doas sysrc nfs_reserved_port_only=NO` on both f0 and f1.
+
+### stunnel appears not running but port 2323 is bound
+
+`carpcontrol.sh` starts stunnel on CARP MASTER transition, but doesn't write a PID file. So `service stunnel status` reports "not running" even though stunnel is actually serving connections. Check with `doas sockstat -l | grep 2323`. If there's a stale stunnel process, kill it and restart: `doas kill <pid> && doas service stunnel start`.
+
+### Pods stuck in ContainerCreating/Unknown after NFS recovery
+
+After NFS is restored on the server side, the `nfs-mount-monitor` systemd timer on each r-node will auto-remount within ~10 seconds and force-delete stuck pods. If immediate recovery is needed: `mount /data/nfs/k3svolumes` on each r-node, then delete the stuck pods manually.
+
+**Note:** The monitor catches three failure modes: missing mountpoint, stat hang (reads unresponsive), and **silent write hang** (reads OK but writes block — the hardest case, e.g. stunnel-wrapped NFSv4 after a CARP failover). Watch the consecutive-failure counter via Prometheus (`nfs_mount_monitor_consecutive_failures`) — warning fires at ≥3, critical at ≥5. At 5 consecutive failures the node cordons itself and reboots.
+
+### Checklist for NFS outage on CARP MASTER (f0 or f1)
+
+```sh
+# 1. Check which host is CARP MASTER
+ssh paul@f0 'ifconfig re0 | grep carp'
+ssh paul@f1 'ifconfig re0 | grep carp'
+
+# 2. On the MASTER, verify:
+doas sysctl vfs.nfsd.nfs_privport # must be 0
+doas service nfsd status # must be running
+doas sockstat -l | grep 2323 # stunnel must be listening
+ls /data/nfs/nfs.DO_NOT_REMOVE # ZFS dataset must be mounted
+
+# 3. Fix if needed:
+doas sysrc nfs_reserved_port_only=NO # persist the fix
+doas sysctl vfs.nfsd.nfs_privport=0 # apply immediately
+doas service nfsd restart
+# For stunnel, kill stale process if needed, then:
+doas service stunnel start
+```
+
+## ZFS pool SUSPENDED recovery
+
+**Symptoms**: `doas zpool status zdata` shows `state: SUSPENDED`. All IO to the pool is
+halted — ZFS suspends itself to prevent corruption when IO errors exceed the threshold.
+Commands like `zpool clear`, `zpool scrub`, `zpool offline`, and even `ls /data/nfs/` hang
+indefinitely because they wait for kernel IO that will never complete.
+
+**Known cause (2026-05-15)**: Samsung 870 EVO 1TB on f0 (ada1) hit 107 read errors and
+105M+ write errors during normal operation. Subsequent investigation pointed at
+**thermal throttling** in the small Beelink S12 Pro enclosure — see the Thermal
+section below.
+
+**Recovery — hard power cycle only**:
+- Do NOT attempt `doas shutdown -r now` — if ZFS is suspended, the graceful shutdown hangs
+ at ZFS pool export and may stay stuck for 30–60+ minutes.
+- Do NOT attempt `doas zpool clear zdata` — it hangs because ada1 is unresponsive.
+- Do NOT attempt `doas ifconfig re0 vhid 1 state backup` or `doas carp backup` to fail
+ over to f1 first — these ifconfig ioctls can also be blocked when the kernel has too
+ many stuck IO threads. They may return exit=1 silently.
+- **Hard power cycle** (pull power or hold the power button) resolves the issue in ~9 s
+ (Rocky Linux VMs come up automatically, ZFS pool imports cleanly on next boot).
+
+**Post-recovery**:
+```sh
+# 1. Verify pool health
+doas zpool status zdata # should show ONLINE, 0 errors
+
+# 2. Check SMART for drive health
+doas smartctl -a /dev/ada1 | grep -iE '(temperature|reallocated|pending|uncorrectable|error)'
+
+# 3. Start a scrub to verify data integrity
+doas zpool scrub zdata
+doas zpool status zdata # monitor; "scrub repaired 0 in ..." means data intact
+
+# 4. Verify NFS is serving (stunnel listening on CARP VIP)
+doas sockstat -l | grep 2323
+```
+
+**After cluster recovery**:
+- Check for cordoned nodes: `kubectl get nodes` — if r0/r1/r2 show `SchedulingDisabled`,
+ uncordon them (see `nfs-mount-monitor.md` escalation section).
+- Reset fail counters on all r-nodes: `echo 0 > /var/lib/nfs-mount-monitor/fail-count`
+
+## Thermal Troubleshooting
+
+The 2026-05-16 f0 incident — and the 2026-05-15 ZFS SUSPENDED above — both trace
+back to **thermal problems in the Beelink S12 Pro enclosure**, not to any
+software-side cause. The mitigations and side-investigations (zrepl interval,
+autotrim, encryption overhead) are not what fixed it; reseating the drive and
+improving cooling did.
+
+### Symptoms of thermal throttling on f-hosts
+
+- SSD I/O slowness (writes dropping from MB/s to KB/s)
+- ZFS txg sync times jumping from <100 ms to many seconds
+- rsync / zrepl jobs going into D-state (waiting on ZFS I/O)
+- SMART reporting elevated drive temperature
+
+### How to check temperatures
+
+- **coretemp (real per-core die temps)**: `kldload coretemp; sysctl dev.cpu | grep temperature`
+ - Persist via `/boot/loader.conf` (`coretemp_load="YES"`)
+- **hw.acpi.thermal.tz0**: often a constant lie (e.g. always 27.9 °C) — do NOT rely on it
+- **SSD temperature**: `smartctl -a /dev/adaN` (requires `smartmontools`; may not be installed)
+- **Disk I/O performance**: `gstat -bp -I 1s -d` (FreeBSD `gstat`, not Linux `iostat`)
+
+### Beelink S12 Pro specifics
+
+- Small enclosure with passive/minimal cooling — heat accumulates fast under sustained load
+- N100 CPU: normal idle ~40–55 °C; warn >70 °C idle; critical >85 °C under load
+- NVMe sits close to CPU — both heat each other in the small chassis
+- Enclosure gets hot to the touch before temps fully register in software
+
+### Cause and resolution (2026-05-16 f0)
+
+The cascade was thermal-only:
+
+1. Hot enclosure (NVMe physically very hot) → SSD/SATA thermal throttling
+2. Throttled disk → ZFS txg syncs balloon from <100 ms to multi-second
+3. rsync / zrepl block on ZFS → D-state, hung pods on r-nodes
+
+**Root cause**: hot enclosure / inadequate cooling. **Resolution**: shut down,
+reseat the drive, clean dust and improve airflow; the disk recovered immediately
+and ZFS txg sync times returned to normal.
+
+### Remediation steps
+
+1. SSH in and check temps: `kldload coretemp && sysctl dev.cpu | grep temperature`
+2. If >80 °C: stop heavy I/O workloads to prevent thermal-induced ZFS errors
+3. Physical: shut down, reseat NVMe, clean dust from vents, improve airflow
+4. Persist coretemp: ensure `/boot/loader.conf` has `coretemp_load="YES"`
+
+### Temperature monitoring
+
+```sh
+# FreeBSD: load coretemp for CPU package temperature
+doas kldload coretemp
+sysctl -a | grep temperature # hw.acpi.thermal.*: and dev.cpu.*:
+# Persist across reboots
+echo 'coretemp_load="YES"' | doas tee -a /boot/loader.conf
+
+# SSD temperature (install smartmontools if absent)
+doas pkg install -y smartmontools
+doas smartctl -a /dev/ada1 | grep -i temperature # "194 Temperature_Celsius"
+```
diff --git a/prompts/skills/f3s/references/storage/zfs.md b/prompts/skills/f3s/references/storage/zfs.md
new file mode 100644
index 0000000..12f354c
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/zfs.md
@@ -0,0 +1,89 @@
+# ZFS Pools & Encryption
+
+Covers the `zdata` pool layout on f0/f1, encryption keys held on per-host
+USB sticks, and how to roll a new encrypted dataset (data and bhyve).
+
+## Physical Disks
+
+- **f0**: 512GB M.2 (OS/zroot) + Samsung SSD 870 EVO 1TB (zdata)
+- **f1**: 512GB M.2 (OS/zroot) + Crucial CT1000BX500SSD1 1TB (zdata)
+- **f2**: No second drive (no zdata pool)
+- **f3**: 512GB M.2 (OS/zroot); no zdata pool yet (planned)
+
+## zdata Pool Setup
+
+On f0 and f1, create the zdata pool on the second SSD:
+
+```sh
+# Pool setup (f0 and f1 only)
+doas zpool create zdata ada1 # ada1 = second SSD
+```
+
+## Encryption Keys (USB Key Storage)
+
+Encryption keys are stored on USB flash drives (UFS-formatted, mounted at `/keys`).
+All four hosts (f0/f1/f2/f3) have USB keys at `/dev/da0` mounted at `/keys`, each holding
+all 8 key files as cross-host backups.
+
+```sh
+# Format and mount USB key (on each node)
+doas newfs /dev/da0
+echo '/dev/da0 /keys ufs rw 0 2' | doas tee -a /etc/fstab
+doas mkdir /keys
+doas mount /keys
+
+# Generate keys (on f0, then copy to f1, f2, f3)
+doas openssl rand -out /keys/f0.lan.buetow.org:bhyve.key 32
+doas openssl rand -out /keys/f1.lan.buetow.org:bhyve.key 32
+doas openssl rand -out /keys/f2.lan.buetow.org:bhyve.key 32
+doas openssl rand -out /keys/f3.lan.buetow.org:bhyve.key 32
+doas openssl rand -out /keys/f0.lan.buetow.org:zdata.key 32
+doas openssl rand -out /keys/f1.lan.buetow.org:zdata.key 32
+doas openssl rand -out /keys/f2.lan.buetow.org:zdata.key 32
+doas openssl rand -out /keys/f3.lan.buetow.org:zdata.key 32
+doas chown root /keys/* && doas chmod 400 /keys/*
+# Copy to f1, f2, f3 via tarball
+```
+
+## Encryption Setup
+
+```sh
+# On f0 - create encrypted zdata dataset
+doas zfs create -o encryption=on -o keyformat=raw \
+ -o keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/enc
+
+# Create the NFS data dataset (replicated to f1)
+doas zfs create zdata/enc/nfsdata
+doas zfs set mountpoint=/data/nfs zdata/enc/nfsdata
+doas mkdir -p /data/nfs/k3svolumes
+
+# Encrypt Bhyve VM dataset (zroot/bhyve)
+# Stop VMs first, rename old, create new encrypted, zfs send snapshot, then destroy old
+doas vm stop rocky
+doas zfs rename zroot/bhyve zroot/bhyve_old
+doas zfs set mountpoint=/mnt zroot/bhyve_old
+doas zfs snapshot zroot/bhyve_old/rocky@hamburger
+doas zfs create -o encryption=on -o keyformat=raw \
+ -o keylocation=file:///keys/f0.lan.buetow.org:bhyve.key zroot/bhyve
+doas zfs send zroot/bhyve_old/rocky@hamburger | doas zfs recv zroot/bhyve/rocky
+# Copy vm-bhyve metadata: .config, .img, .templates, .iso
+doas zfs destroy -R zroot/bhyve_old
+```
+
+### Auto-load encryption keys on boot
+
+```sh
+# On f0
+doas sysrc zfskeys_enable=YES
+doas sysrc zfskeys_datasets="zdata/enc zdata/enc/nfsdata zroot/bhyve"
+
+# On f1
+doas sysrc zfskeys_enable=YES
+doas sysrc zfskeys_datasets="zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata"
+
+# On f3 (bhyve VMs only, no zdata pool yet)
+doas sysrc zfskeys_enable=YES
+doas sysrc zfskeys_datasets="zroot/bhyve"
+doas zfs set keylocation=file:///keys/f0.lan.buetow.org:zdata.key \
+ zdata/sink/f0/zdata/enc/nfsdata
+```
diff --git a/prompts/skills/f3s/references/storage/zrepl.md b/prompts/skills/f3s/references/storage/zrepl.md
new file mode 100644
index 0000000..fdeff9a
--- /dev/null
+++ b/prompts/skills/f3s/references/storage/zrepl.md
@@ -0,0 +1,224 @@
+# zrepl: Continuous ZFS Replication
+
+Continuous ZFS replication for the encrypted NFS dataset (f0 → f1) and the
+standalone FreeBSD dev VM (f3 → f2). Original plan was HAST, replaced by
+zrepl (`zfs send/recv`) — more reliable and avoids the HAST-induced ZFS
+corruption that hit us during failover testing.
+
+Install on the participating hosts:
+
+```sh
+doas pkg install -y zrepl
+```
+
+## f0 configuration (`/usr/local/etc/zrepl/zrepl.yml`)
+
+```yaml
+global:
+ logging:
+ - type: stdout
+ level: info
+ format: human
+
+jobs:
+ - name: f0_to_f1_nfsdata
+ type: push
+ connect:
+ type: tcp
+ address: "192.168.2.131:8888" # f1 WireGuard IP
+ filesystems:
+ "zdata/enc/nfsdata": true
+ send:
+ encrypted: true
+ snapshotting:
+ type: periodic
+ prefix: zrepl_
+ interval: 1m # every minute
+ pruning:
+ keep_sender:
+ - type: last_n
+ count: 10
+ - type: grid
+ grid: 24x1h | 14x1d | 6x30d
+ regex: "^zrepl_.*"
+ keep_receiver:
+ - type: last_n
+ count: 10
+ - type: grid
+ grid: 24x1h | 14x1d | 6x30d
+ regex: "^zrepl_.*"
+
+ # Note: f0_to_f1_freebsd job removed — the FreeBSD VM was migrated to f3.
+ # It is now replicated from f3 → f2 (see f3 zrepl config below).
+```
+
+## f3 configuration (push: freebsd VM → f2)
+
+```yaml
+global:
+ logging:
+ - type: stdout
+ level: info
+ format: human
+
+jobs:
+ - name: f3_to_f2_freebsd
+ type: push
+ connect:
+ type: tcp
+ address: "192.168.2.132:8888" # f2 WireGuard IP
+ filesystems:
+ "zroot/bhyve/freebsd": true # development FreeBSD VM
+ send:
+ encrypted: true
+ snapshotting:
+ type: periodic
+ prefix: zrepl_
+ interval: 10m
+ pruning:
+ keep_sender:
+ - type: last_n
+ count: 10
+ - type: grid
+ grid: 24x1h | 14x1d
+ regex: "^zrepl_.*"
+ keep_receiver:
+ - type: last_n
+ count: 10
+ - type: grid
+ grid: 24x1h | 14x1d
+ regex: "^zrepl_.*"
+```
+
+## f2 configuration (sink for f3's freebsd VM)
+
+f2 has no second drive so the sink lives in `zroot/sink`:
+
+```sh
+doas zfs create zroot/sink
+```
+
+`/usr/local/etc/zrepl/zrepl.yml`:
+
+```yaml
+global:
+ logging:
+ - type: stdout
+ level: info
+ format: human
+
+jobs:
+ - name: sink
+ type: sink
+ serve:
+ type: tcp
+ listen: "192.168.2.132:8888" # f2 WireGuard IP
+ clients:
+ "192.168.2.133": "f3"
+ recv:
+ placeholder:
+ encryption: inherit
+ root_fs: "zroot/sink"
+```
+
+Replicated path: `zroot/bhyve/freebsd` → `zroot/sink/f3/zroot/bhyve/freebsd`
+
+Important: do not let `zfs-periodic` snapshot zrepl-managed sender or receiver
+datasets. Snapshot creation should be owned by zrepl. On f2,
+`/etc/periodic.conf` disables `zfs-periodic` snapshot creation:
+
+```sh
+daily_zfs_snapshot_enable="NO"
+weekly_zfs_snapshot_enable="NO"
+monthly_zfs_snapshot_enable="NO"
+```
+
+The local zrepl `snap` job on f2 also explicitly excludes `zroot/sink<`.
+
+## f1 configuration (sink)
+
+```sh
+doas zfs create zdata/sink # receive dataset
+```
+
+`/usr/local/etc/zrepl/zrepl.yml`:
+
+```yaml
+global:
+ logging:
+ - type: stdout
+ level: info
+ format: human
+
+jobs:
+ - name: sink
+ type: sink
+ serve:
+ type: tcp
+ listen: "192.168.2.131:8888"
+ clients:
+ "192.168.2.130": "f0"
+ recv:
+ placeholder:
+ encryption: inherit
+ root_fs: "zdata/sink"
+```
+
+## Enable and start
+
+```sh
+doas sysrc zrepl_enable=YES
+doas service zrepl start
+doas zrepl status # monitor replication
+```
+
+Replicated paths: `zdata/enc/nfsdata` → `zdata/sink/f0/zdata/enc/nfsdata`
+
+## Mount replica on f1 (read-only standby)
+
+```sh
+doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
+ zdata/sink/f0/zdata/enc/nfsdata
+doas mkdir -p /data/nfs
+doas zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
+doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
+doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata # prevent replication breakage
+```
+
+## Failover design: intentionally read-only replica
+
+The standby replica is read-only by design. Manual failover (not automatic) to prevent split-brain. To fix broken replication after accidental writes: `doas zfs rollback <snapshot>`.
+
+## Troubleshooting
+
+```sh
+# Signal manual replication
+doas zrepl signal wakeup f0_to_f1_nfsdata
+
+# Fix "no common snapshot" — destroy and re-replicate
+doas zfs destroy -r zdata/sink/f0/zdata/enc/nfsdata
+
+# Test network connectivity
+nc -zv 192.168.2.131 8888
+
+# Monitor progress
+doas zrepl status --mode raw | grep BytesReplicated
+```
+
+**zrepl DL-state on f1 after mid-replication f0 reboot**: if f0 reboots while zrepl is
+actively replicating, f1's `[zfskern]` thread can enter **DL state** (disk + locked).
+Symptoms: `zpool list`, `zfs list`, `ls /data/nfs/` all hang indefinitely; `zfs set
+readonly=off` may return immediately (the kernel path differs). To recover on f1:
+
+```sh
+# Stop zrepl to release the replication lock
+doas service zrepl stop
+
+# Wait ~30–60 s for the kernel state to drain; then verify
+doas zpool list
+doas zfs list
+doas service zrepl start
+```
+
+If ZFS commands still hang after stopping zrepl, a reboot of f1 is required.
+The NFS data is still available on f0 so k3s is unaffected during f1 recovery.