Update

author: Paul Buetow <paul@buetow.org> 2026-04-07 22:26:06 +0300
committer: Paul Buetow <paul@buetow.org> 2026-04-07 22:26:06 +0300
commit: b9108ae9119a699dbf5e1cdf2816bfe46a459c22 (patch)
tree: c168f66d6d91b4aebd1ed17c54063ccfbcefbbfe
parent: 26c1d0221a69d4ab3ea157b0e999e1a1f867b5ea (diff)
1 files changed, 17 insertions, 0 deletions
diff --git a/prompts/skills/f3s/references/immich.md b/prompts/skills/f3s/references/immich.md
index 848d4a9..543fb8d 100644
--- a/prompts/skills/f3s/references/immich.md
+++ b/prompts/skills/f3s/references/immich.md
@@ -42,6 +42,10 @@ diff ~/git/conf/f3s/immich/snapshots/immich-queues-<old>.txt ~/git/conf/f3s/immi
 
 Decreasing `waiting` and stable/zero `failed` means healthy progress.
 
+### Always Check for Progress
+
+When gathering new stats, **always compare against the most recent saved snapshot** (check `~/git/conf/f3s/immich/snapshots/`). If a queue's `waiting` count has not decreased since the last snapshot, the queue is likely stuck — investigate immediately (see "Stuck job queue" in Troubleshooting below).
+
 ## Job Control via API
 
 The API key is stored at `~/.immich_paul_key`. Use it to pause/resume jobs:
@@ -76,6 +80,19 @@ On the N100 (4-core) nodes, ML jobs compete for CPU. To speed up slow queues:
 
 ## Troubleshooting
 
+- **Stuck job queue**: If a queue has `waiting` jobs but no progress since the last snapshot:
+  1. Check ML pod logs for activity: `kubectl logs -n services deploy/immich-machine-learning --tail=30`. Look for "Shutting down due to inactivity" — this means jobs are not being dispatched.
+  2. Check server/microservices logs: `kubectl logs -n services deploy/immich-server --tail=30`. If there's no job processing output (only version checks and websocket events), the worker is stuck.
+  3. A stale `active` job in Valkey can block the entire queue. Clear it:
+     ```sh
+     kubectl exec -n services deploy/immich-valkey -- valkey-cli DEL "immich_bull:<queue>:active"
+     ```
+  4. If clearing the stale job doesn't help, **restart the server deployment** — this is the most reliable fix:
+     ```sh
+     kubectl rollout restart deploy/immich-server -n services
+     kubectl rollout status deploy/immich-server -n services --timeout=120s
+     ```
+  5. After restart, wait ~20 seconds, then verify via the API that `isActive: true` and `waiting` is decreasing.
 - **Postgres crash loop**: Usually caused by liveness probe killing postgres during WAL recovery. Check `kubectl describe pod` for probe failures and postgres logs for "database system was interrupted while in recovery". Fix by relaxing probe timeouts/thresholds and adding resource limits.
 - **Server crash loop**: Often caused by postgres being unavailable. Fix postgres first.
 - **ML errors**: "Machine learning repository not been setup" is transient — resolves once the ML pod health check passes.
author	Paul Buetow <paul@buetow.org>	2026-04-07 22:26:06 +0300
committer	Paul Buetow <paul@buetow.org>	2026-04-07 22:26:06 +0300
commit	b9108ae9119a699dbf5e1cdf2816bfe46a459c22 (patch)
tree	c168f66d6d91b4aebd1ed17c54063ccfbcefbbfe
parent	26c1d0221a69d4ab3ea157b0e999e1a1f867b5ea (diff)