f3s: Kubernetes with FreeBSD - Part 8: Observability

f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

+For a preview of what distributed tracing with Tempo looks like in Grafana, check out the X-RAG blog post:
+
+X-RAG Observability Hackathon

f3s: Kubernetes with FreeBSD - Part 8: Observability
⇢ Introduction
⇢ Important Note: GitOps Migration
⇢ Persistent storage recap
⇢ The monitoring namespace
⇢ Installing Prometheus and Grafana
⇢ ⇢ Prerequisites
⇢ ⇢ Deploying with the Justfile
⇢ ⇢ Exposing Grafana via ingress
⇢ Installing Loki and Alloy
⇢ ⇢ Prerequisites
⇢ ⇢ Deploying Loki and Alloy
⇢ ⇢ Configuring Alloy
⇢ ⇢ Adding Loki as a Grafana data source
⇢ The complete monitoring stack
⇢ Using the observability stack
⇢ ⇢ Viewing metrics in Grafana
⇢ ⇢ Querying logs with LogQL
⇢ ⇢ Creating alerts
⇢ Monitoring external FreeBSD hosts
⇢ ⇢ Installing Node Exporter on FreeBSD
⇢ ⇢ Adding FreeBSD hosts to Prometheus
⇢ ⇢ FreeBSD memory metrics compatibility
⇢ ⇢ Disk I/O metrics limitation
⇢ ZFS Monitoring for FreeBSD Servers
⇢ ⇢ Node Exporter ZFS Collector
⇢ ⇢ Verifying ZFS Metrics
⇢ ⇢ ZFS Recording Rules
⇢ ⇢ Grafana Dashboards
f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo
⇢ Why Distributed Tracing?
⇢ Deploying Grafana Tempo
⇢ ⇢ Tempo Helm Values
⇢ ⇢ Persistent Volumes
⇢ ⇢ Grafana Datasource Provisioning
⇢ ⇢ Installation
⇢ Configuring Alloy for Trace Collection
⇢ Demo Tracing Application
⇢ ⇢ Architecture
⇢ ⇢ OpenTelemetry Instrumentation
⇢ ⇢ Deployment
⇢ ⇢ Verifying ZFS Metrics in Prometheus
⇢ ⇢ Key Metrics to Monitor
⇢ ⇢ ZFS Pool and Dataset Metrics via Textfile Collector
⇢ Monitoring external OpenBSD hosts
⇢ ⇢ Installing Node Exporter on OpenBSD
⇢ ⇢ Adding OpenBSD hosts to Prometheus
⇢ ⇢ OpenBSD memory metrics compatibility
⇢ Distributed Tracing with Grafana Tempo
⇢ ⇢ Why Distributed Tracing?
⇢ ⇢ Deploying Grafana Tempo
⇢ ⇢# Configuration Strategy
⇢ ⇢# Tempo Deployment Files
⇢ ⇢# Installation
⇢ ⇢ Configuring Grafana Alloy for Trace Collection
⇢ ⇢# OTLP Receiver Configuration
⇢ ⇢# Upgrade Alloy
⇢ ⇢ Demo Tracing Application
⇢ ⇢# Application Architecture
⇢ ⇢ Visualizing Traces in Grafana
⇢ ⇢# Accessing Traces
⇢ ⇢# Service Graph Visualization
⇢ ⇢ Correlation Between Observability Signals
⇢ ⇢# Traces-to-Logs
⇢ ⇢# Traces-to-Metrics
⇢ ⇢# Logs-to-Traces
⇢ ⇢ Generating Traces for Testing
⇢ ⇢ Verifying the Complete Pipeline
⇢ ⇢ Practical Example: Viewing a Distributed Trace
⇢ ⇢ Storage and Retention
⇢ ⇢ Configuration Files
⇢ Summary
⇢ Visualizing Traces in Grafana
⇢ ⇢ Searching for Traces
⇢ ⇢ Service Graph
⇢ Practical Example: End-to-End Trace
⇢ Correlation Between Signals
⇢ Storage and Retention
⇢ Configuration Files

Introduction

Why Distributed Tracing?

-In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what's happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the monitoring namespace:
+In a microservices setup, a single user request can hop through multiple services. Tracing gives you:

Prometheus: time-series database for metrics collection and alerting
Grafana: visualisation and dashboarding frontend
Loki: log aggregation system (like Prometheus, but for logs)
Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo
Tempo: distributed tracing backend for request flow analysis across microservices
Request tracking across service boundaries
Performance bottleneck identification
Service dependency visualization
Correlation with logs and metrics

-Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.
-
-All manifests for the f3s stack live in my configuration repository:
-
-codeberg.org/snonux/conf/f3s
-
-

Important Note: GitOps Migration

+Without it, you're basically guessing where time gets spent.

-**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests, Helm charts, and Justfiles in the repository have been reorganized for ArgoCD-based continuous deployment.
+

Deploying Grafana Tempo

-**To view the exact configuration as it existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:
+Tempo runs in monolithic mode — all components in one process, same pattern as Loki's SingleBinary deployment. Keeps things simple for a home lab.

- -

$ git clone https://codeberg.org/snonux/conf.git
-$ cd conf
-$ git checkout 15a86f3  # Last commit before ArgoCD migration
-$ cd f3s/prometheus/
-

+The setup:

-**Current master branch** contains the ArgoCD-managed versions with:

Application manifests organized under argocd-apps/{monitoring,services,infra,test}/
Resources organized under prometheus/manifests/, loki/, etc.
Justfiles updated to trigger ArgoCD syncs instead of direct Helm commands
Filesystem backend using hostPath (10Gi at /data/nfs/k3svolumes/tempo/data)
7-day retention (168h)
OTLP receivers on gRPC (4317) and HTTP (4318)
Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue

-The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD).
-
-

Persistent storage recap

-
-All observability components need persistent storage so that metrics and logs survive pod restarts. As covered in Part 6 of this series, the cluster uses NFS-backed persistent volumes:
+

Tempo Helm Values

-f3s: Kubernetes with FreeBSD - Part 6: Storage
+

+tempo:
+  retention: 168h
+  storage:
+    trace:
+      backend: local
+      local:
+        path: /var/tempo/traces
+      wal:
+        path: /var/tempo/wal
+  receivers:
+    otlp:
+      protocols:
+        grpc:
+          endpoint: 0.0.0.0:4317
+        http:
+          endpoint: 0.0.0.0:4318
+
+persistence:
+  enabled: true
+  size: 10Gi
+  storageClassName: ""
+
+resources:
+  limits:
+    cpu: 1000m
+    memory: 2Gi
+  requests:
+    cpu: 500m
+    memory: 1Gi
+

-The FreeBSD hosts (f0, f1) serve as master-standby NFS servers, exporting ZFS datasets that are replicated across hosts using zrepl. The Rocky Linux k3s nodes (r0, r1, r2) mount these exports at /data/nfs/k3svolumes. This directory contains subdirectories for each application that needs persistent storage—including Prometheus, Grafana, and Loki.
+

Persistent Volumes

-For example, the observability stack uses these paths on the NFS share:
+

+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: tempo-data-pv
+spec:
+  capacity:
+    storage: 10Gi
+  accessModes:
+    - ReadWriteOnce
+  persistentVolumeReclaimPolicy: Retain
+  hostPath:
+    path: /data/nfs/k3svolumes/tempo/data
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: tempo-data-pvc
+  namespace: monitoring
+spec:
+  storageClassName: ""
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 10Gi
+

/data/nfs/k3svolumes/prometheus/data — Prometheus time-series database
/data/nfs/k3svolumes/grafana/data — Grafana configuration, dashboards, and plugins
/data/nfs/k3svolumes/loki/data — Loki log chunks and index
/data/nfs/k3svolumes/tempo/data — Tempo trace data and WAL

-Each path gets a corresponding PersistentVolume and PersistentVolumeClaim in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.
+

Grafana Datasource Provisioning

The monitoring namespace

+All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a single ConfigMap mounted directly to the Grafana pod. No sidecar discovery needed.

-First, I created the monitoring namespace where all observability components will live:
+In grafana-datasources-all.yaml:

- -

$ kubectl create namespace monitoring
-namespace/monitoring created
++apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-datasources-all
+  namespace: monitoring
+data:
+  datasources.yaml: |
+    apiVersion: 1
+    datasources:
+      - name: Prometheus
+        type: prometheus
+        uid: prometheus
+        url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
+        access: proxy
+        isDefault: true
+      - name: Alertmanager
+        type: alertmanager
+        uid: alertmanager
+        url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
+      - name: Loki
+        type: loki
+        uid: loki
+        url: http://loki.monitoring.svc.cluster.local:3100
+      - name: Tempo
+        type: tempo
+        uid: tempo
+        url: http://tempo.monitoring.svc.cluster.local:3200
+        jsonData:
+          tracesToLogsV2:
+            datasourceUid: loki
+            spanStartTimeShift: -1h
+            spanEndTimeShift: 1h
+          tracesToMetrics:
+            datasourceUid: prometheus
+          serviceMap:
+            datasourceUid: prometheus
+          nodeGraph:
+            enabled: true
 
 

-Installing Prometheus and Grafana


-

-Prometheus and Grafana are deployed together using the kube-prometheus-stack Helm chart from the Prometheus community. This chart bundles Prometheus, Grafana, Alertmanager, and various exporters (Node Exporter, Kube State Metrics) into a single deployment. Ill explain what each component does in detail later when we look at the running pods.

+The Tempo datasource config links traces to Loki logs and Prometheus metrics — so you can jump between signals directly in Grafana.

 

-Prerequisites


+The kube-prometheus-stack Helm values disable sidecar-based discovery and mount this ConfigMap directly to /etc/grafana/provisioning/datasources/.

 

-Add the Prometheus Helm chart repository:

+Installation


 

-
-$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-$ helm repo update
++cd /home/paul/git/conf/f3s/tempo
+just install
 
 

-Create the directories on the NFS server for persistent storage:

+Verify it's running:

 

-
-[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/prometheus/data
-[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/grafana/data
++kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
+kubectl exec -n monitoring <tempo-pod> -- wget -qO- http://localhost:3200/ready
 
 

-Deploying with the Justfile


-

-The configuration repository contains a Justfile that automates the deployment. just is a handy command runner—think of it as a simpler, more modern alternative to make. I use it throughout the f3s repository to wrap repetitive Helm and kubectl commands:

+Configuring Alloy for Trace Collection


 

-just - A handy way to save and run project-specific commands

-codeberg.org/snonux/conf/f3s/prometheus

+I updated the Alloy values to add OTLP receivers for traces alongside the existing log collection.

 

-To install everything:

+Added to the Alloy config:

 

-
-$ cd conf/f3s/prometheus
-$ just install
-kubectl apply -f persistent-volumes.yaml
-persistentvolume/prometheus-data-pv created
-persistentvolume/grafana-data-pv created
-persistentvolumeclaim/grafana-data-pvc created
-helm install prometheus prometheus-community/kube-prometheus-stack \
-    --namespace monitoring -f persistence-values.yaml
-NAME: prometheus
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
++// OTLP receiver for traces via gRPC and HTTP
+otelcol.receiver.otlp "default" {
+  grpc {
+    endpoint = "0.0.0.0:4317"
+  }
+  http {
+    endpoint = "0.0.0.0:4318"
+  }
+  output {
+    traces = [otelcol.processor.batch.default.input]
+  }
+}
+
+// Batch processor — accumulates spans before forwarding to Tempo
+otelcol.processor.batch "default" {
+  timeout = "5s"
+  send_batch_size = 100
+  send_batch_max_size = 200
+  output {
+    traces = [otelcol.exporter.otlp.tempo.input]
+  }
+}
+
+// OTLP exporter to Tempo
+otelcol.exporter.otlp "tempo" {
+  client {
+    endpoint = "tempo.monitoring.svc.cluster.local:4317"
+    tls {
+      insecure = true
+    }
+    compression = "gzip"
+  }
+}
 
 

-The persistence-values.yaml configures Prometheus and Grafana to use the NFS-backed persistent volumes I mentioned earlier, ensuring data survives pod restarts. It also enables scraping of etcd and kube-controller-manager metrics:

+Upgrade Alloy:

 

 -kubeEtcd:
-  enabled: true
-  endpoints:
-    - 192.168.2.120
-    - 192.168.2.121
-    - 192.168.2.122
-  service:
-    enabled: true
-    port: 2381
-    targetPort: 2381
-
-kubeControllerManager:
-  enabled: true
-  endpoints:
-    - 192.168.2.120
-    - 192.168.2.121
-    - 192.168.2.122
-  service:
-    enabled: true
-    port: 10257
-    targetPort: 10257
-  serviceMonitor:
-    enabled: true
-    https: true
-    insecureSkipVerify: true
+cd /home/paul/git/conf/f3s/loki
+just upgrade
 
 

-By default, k3s binds the controller-manager to localhost only and doesn't expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to /etc/rancher/k3s/config.yaml on each k3s server node:

+Demo Tracing Application


 

-
-[root@r0 ~]# cat >> /etc/rancher/k3s/config.yaml << 'EOF'
-kube-controller-manager-arg:
-  - bind-address=0.0.0.0
-etcd-expose-metrics: true
-EOF
-[root@r0 ~]# systemctl restart k3s
+To actually see traces, I built a three-tier Python app. Nothing fancy — just enough to generate real distributed traces.

+

+Architecture


+

++User -> Frontend (Flask:5000) -> Middleware (Flask:5001) -> Backend (Flask:5002)
+           |                          |                        |
+                    Alloy (OTLP:4317) -> Tempo -> Grafana
 
 

-Repeat for r1 and r2. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.

+
+Frontend: receives requests at /api/process, forwards to middleware
+Middleware: transforms data at /api/transform, calls backend
+Backend: returns data at /api/data, simulates a 100ms database query
+


+OpenTelemetry Instrumentation


 

-Verify etcd metrics are exposed:

+All three services use Python OpenTelemetry libraries:

+

+Dependencies:

+

++flask==3.0.0
+requests==2.31.0
+opentelemetry-distro==0.49b0
+opentelemetry-exporter-otlp==1.28.0
+opentelemetry-instrumentation-flask==0.49b0
+opentelemetry-instrumentation-requests==0.49b0
+
+

+Auto-instrumentation pattern (same across all services, just change the service name):

 

 
-[root@r0 ~]# curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader
-etcd_server_has_leader 1
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.requests import RequestsInstrumentor
+from opentelemetry.sdk.resources import Resource
+
+resource = Resource(attributes={
+    "service.name": "frontend",
+    "service.namespace": "tracing-demo",
+    "service.version": "1.0.0"
+})
+
+provider = TracerProvider(resource=resource)
+
+otlp_exporter = OTLPSpanExporter(
+    endpoint="http://alloy.monitoring.svc.cluster.local:4317",
+    insecure=True
+)
+
+processor = BatchSpanProcessor(otlp_exporter)
+provider.add_span_processor(processor)
+trace.set_tracer_provider(provider)
+
+FlaskInstrumentor().instrument_app(app)
+RequestsInstrumentor().instrument()
 
 

-The full persistence-values.yaml and all other Prometheus configuration files are available on Codeberg:

-

-codeberg.org/snonux/conf/f3s/prometheus

-

-The persistent volume definitions bind to specific paths on the NFS share using hostPath volumes—the same pattern used for other services in Part 7:

+The auto-instrumentation creates spans for HTTP requests, propagates trace context via W3C headers, and links parent/child spans across services automatically.

 

-f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

+Deployment


 

-Exposing Grafana via ingress


+The demo app has a Helm chart in the conf repo. Build, import the container images, and install:

 

-The chart also deploys an ingress for Grafana, making it accessible at grafana.f3s.foo.zone. The ingress configuration follows the same pattern as other services in the cluster—Traefik handles the routing internally, while the OpenBSD edge relays terminate TLS and forward traffic through WireGuard.

++cd /home/paul/git/conf/f3s/tracing-demo
+just build
+just import
+just install
+
 

-Once deployed, Grafana is accessible and comes pre-configured with Prometheus as a data source. You can verify the Prometheus service is running:

+Verify:

 

-
-$ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
-NAME                                    TYPE        CLUSTER-IP      PORT(S)
-prometheus-kube-prometheus-prometheus   ClusterIP   10.43.152.163   9090/TCP,8080/TCP
++kubectl get pods -n services | grep tracing-demo
+kubectl get ingress -n services tracing-demo-ingress
 
 

-Grafana connects to Prometheus using the internal service URL http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090. The default Grafana credentials are admin/prom-operator, which should be changed immediately after first login.

+Access at:

 

-

+http://tracing-demo.f3s.foo.zone

 

-

+Visualizing Traces in Grafana


 

-

+Searching for Traces


 

-Installing Loki and Alloy


+In Grafana, go to Explore, select the Tempo datasource, and you can search by trace ID, service name, or tags.

 

-While Prometheus handles metrics, Loki handles logs. It's designed to be cost-effective and easy to operate—it doesn't index the contents of logs, only the metadata (labels), making it very efficient for storage.

+Some useful TraceQL queries:

 

-Alloy is Grafana's telemetry collector (the successor to Promtail). It runs as a DaemonSet on each node, tails container logs, and ships them to Loki.

+Find all traces from the demo app:

++{ resource.service.namespace = "tracing-demo" }
+
 

-Prerequisites


+Find slow requests (>200ms):

++{ duration > 200ms }
+
 

-Create the data directory on the NFS server:

+Find traces from a specific service:

++{ resource.service.name = "frontend" }
+
 

-
-[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/loki/data
+Find errors:

++{ status = error }
 
 

-Deploying Loki and Alloy


+Frontend traces with server errors:

++{ resource.service.namespace = "tracing-demo" } && { span.http.status_code >= 500 }
+
 

-The Loki configuration also lives in the repository:

+Service Graph


 

-codeberg.org/snonux/conf/f3s/loki

+The service graph view shows visual connections between services — Frontend to Middleware to Backend — with request rates and latencies. It's generated automatically from trace data using Prometheus metrics.

 

-To install:

+Practical Example: End-to-End Trace


+

+Here's what it looks like to generate and examine a trace.

+

+Generate a trace:

+

++curl -H "Host: tracing-demo.f3s.foo.zone" http://r0/api/process
+
+

+Response (HTTP 200):

 

 
-$ cd conf/f3s/loki
-$ just install
-helm repo add grafana https://grafana.github.io/helm-charts || true
-helm repo update
-kubectl apply -f persistent-volumes.yaml
-persistentvolume/loki-data-pv created
-persistentvolumeclaim/loki-data-pvc created
-helm install loki grafana/loki --namespace monitoring -f values.yaml
-NAME: loki
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
-...
-helm install alloy grafana/alloy --namespace monitoring -f alloy-values.yaml
-NAME: alloy
-LAST DEPLOYED: ...
-NAMESPACE: monitoring
-STATUS: deployed
+{
+  "middleware_response": {
+    "backend_data": {
+      "data": {
+        "id": 12345,
+        "query_time_ms": 100.0,
+        "timestamp": "2025-12-28T18:35:01.064538",
+        "value": "Sample data from backend service"
+      },
+      "service": "backend"
+    },
+    "middleware_processed": true,
+    "original_data": {
+      "source": "GET request"
+    },
+    "transformation_time_ms": 50
+  },
+  "request_data": {
+    "source": "GET request"
+  },
+  "service": "frontend",
+  "status": "success"
+}
 
 

-Loki runs in single-binary mode with a single replica (loki-0), which is appropriate for a home lab cluster. This means there's only one Loki pod running at any time. If the node hosting Loki fails, Kubernetes will automatically reschedule the pod to another worker node—but there will be a brief downtime (typically under a minute) while this happens. For my home lab use case, this is perfectly acceptable.

-

-For full high-availability, you'd deploy Loki in microservices mode with separate read, write, and backend components, backed by object storage like S3 or MinIO instead of local filesystem storage. That's a more complex setup that I might explore in a future blog post—but for now, the single-binary mode with NFS-backed persistence strikes the right balance between simplicity and durability.

+After a few seconds (batch export delay), search for traces via Tempo API:

 

-Configuring Alloy


++kubectl exec -n monitoring tempo-0 -- wget -qO- \
+  'http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&limit=5' 2>/dev/null | \
+  python3 -m json.tool
+
 

-Alloy is configured via alloy-values.yaml to discover all pods in the cluster and forward their logs to Loki:

+Returns something like:

 

 
-discovery.kubernetes "pods" {
-  role = "pod"
+{
+  "traceID": "4be1151c0bdcd5625ac7e02b98d95bd5",
+  "rootServiceName": "frontend",
+  "rootTraceName": "GET /api/process",
+  "durationMs": 221
 }
+
+

+The full trace has 8 spans across 3 services:

+

++Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
 
-discovery.relabel "pods" {
-  targets = discovery.kubernetes.pods.targets
+Service: frontend
+  GET /api/process                 221.10ms  (HTTP server span)
+  frontend-process                 216.23ms  (business logic)
+  POST                             209.97ms  (HTTP client -> middleware)
 
-  rule {
-    source_labels = ["__meta_kubernetes_namespace"]
-    target_label  = "namespace"
-  }
+Service: middleware
+  POST /api/transform              186.02ms  (HTTP server span)
+  middleware-transform             180.96ms  (business logic)
+  GET                              127.52ms  (HTTP client -> backend)
 
-  rule {
-    source_labels = ["__meta_kubernetes_pod_name"]
-    target_label  = "pod"
-  }
-
-  rule {
-    source_labels = ["__meta_kubernetes_pod_container_name"]
-    target_label  = "container"
-  }
-
-  rule {
-    source_labels = ["__meta_kubernetes_pod_label_app"]
-    target_label  = "app"
-  }
-}
-
-loki.source.kubernetes "pods" {
-  targets    = discovery.relabel.pods.output
-  forward_to = [loki.write.default.receiver]
-}
-
-loki.write "default" {
-  endpoint {
-    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
-  }
-}
+Service: backend
+  GET /api/data                    103.93ms  (HTTP server span)
+  backend-get-data                 102.11ms  (business logic, 100ms sleep)
 
 

-This configuration automatically labels each log line with the namespace, pod name, container name, and app label, making it easy to filter logs in Grafana.

-

-Adding Loki as a Grafana data source


-

-Loki doesn't have its own web UI—you query it through Grafana. First, verify the Loki service is running:

+In Grafana, paste the trace ID in the Tempo search box or use TraceQL:

 

-
-$ kubectl get svc -n monitoring loki
-NAME   TYPE        CLUSTER-IP    PORT(S)
-loki   ClusterIP   10.43.64.60   3100/TCP,9095/TCP
++{ resource.service.namespace = "tracing-demo" }
 
 

-To add Loki as a data source in Grafana:

-

-
-Navigate to Configuration → Data Sources
-Click "Add data source"
-Select "Loki"
-Set the URL to: http://loki.monitoring.svc.cluster.local:3100
-Click "Save & Test"
-


-Once configured, you can explore logs in Grafana's "Explore" view. I'll show some example queries in the "Using the observability stack" section below.

-

-

-

-The complete monitoring stack


-

-After deploying everything, here's what's running in the monitoring namespace:

+The waterfall view shows the complete request flow with timing:

 

-
-$ kubectl get pods -n monitoring
-NAME                                                     READY   STATUS    RESTARTS   AGE
-alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          42d
-alloy-g5fgj                                              2/2     Running   0          29m
-alloy-nfw8w                                              2/2     Running   0          29m
-alloy-tg9vj                                              2/2     Running   0          29m
-loki-0                                                   2/2     Running   0          25m
-prometheus-grafana-868f9dc7cf-lg2vl                      3/3     Running   0          42d
-prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4      1/1     Running   0          42d
-prometheus-kube-state-metrics-7c5fb9d798-hh2fx           1/1     Running   0          42d
-prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          42d
-prometheus-prometheus-node-exporter-2nsg9                1/1     Running   0          42d
-prometheus-prometheus-node-exporter-mqr25                1/1     Running   0          42d
-prometheus-prometheus-node-exporter-wp4ds                1/1     Running   0          42d
-tempo-0                                                  1/1     Running   0          1d
-
+

 

-Note: Tempo (tempo-0) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.

+More Tempo trace screenshots in the X-RAG blog post:

 

-And the services:

+X-RAG Observability Hackathon

 

-
-$ kubectl get svc -n monitoring
-NAME                                      TYPE        CLUSTER-IP      PORT(S)
-alertmanager-operated                     ClusterIP   None            9093/TCP,9094/TCP
-alloy                                     ClusterIP   10.43.74.14     12345/TCP
-loki                                      ClusterIP   10.43.64.60     3100/TCP,9095/TCP
-loki-headless                             ClusterIP   None            3100/TCP
-prometheus-grafana                        ClusterIP   10.43.46.82     80/TCP
-prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.208.43    9093/TCP,8080/TCP
-prometheus-kube-prometheus-operator       ClusterIP   10.43.246.121   443/TCP
-prometheus-kube-prometheus-prometheus     ClusterIP   10.43.152.163   9090/TCP,8080/TCP
-prometheus-kube-state-metrics             ClusterIP   10.43.64.26     8080/TCP
-prometheus-prometheus-node-exporter       ClusterIP   10.43.127.242   9100/TCP
-tempo                                     ClusterIP   10.43.91.44     3200/TCP,4317/TCP,4318/TCP
-
+Correlation Between Signals


 

-Let me break down what each pod does:

+This is where the observability stack really comes together. Tempo integrates with Loki and Prometheus so you can jump between traces, logs, and metrics.

 

-
-alertmanager-prometheus-kube-prometheus-alertmanager-0: the Alertmanager instance that receives alerts from Prometheus, deduplicates them, groups related alerts together, and routes notifications to the appropriate receivers (email, Slack, PagerDuty, etc.). It runs as a StatefulSet with persistent storage for silences and notification state.
-


-
-alloy-g5fgj, alloy-nfw8w, alloy-tg9vj: three Alloy pods running as a DaemonSet, one on each k3s node. Each pod tails the container logs from its local node via the Kubernetes API and forwards them to Loki. This ensures log collection continues even if a node becomes isolated from the others.
-


-
-loki-0: the single Loki instance running in single-binary mode. It receives log streams from Alloy, stores them in chunks on the NFS-backed persistent volume, and serves queries from Grafana. The -0 suffix indicates it's a StatefulSet pod.
-


-
-prometheus-grafana-...: the Grafana web interface for visualising metrics and logs. It comes pre-configured with Prometheus as a data source and includes dozens of dashboards for Kubernetes monitoring. Dashboards, users, and settings are persisted to the NFS share.
-


-
-prometheus-kube-prometheus-operator-...: the Prometheus Operator that watches for custom resources (ServiceMonitor, PodMonitor, PrometheusRule) and automatically configures Prometheus to scrape new targets. This allows applications to declare their own monitoring requirements.
-


-
-prometheus-kube-state-metrics-...: generates metrics about the state of Kubernetes objects themselves: how many pods are running, pending, or failed; deployment replica counts; node conditions; PVC status; and more. Essential for cluster-level dashboards.
-


-
-prometheus-prometheus-kube-prometheus-prometheus-0: the Prometheus server that scrapes metrics from all configured targets (pods, services, nodes), stores them in a time-series database, evaluates alerting rules, and serves queries to Grafana.
-


-
-prometheus-prometheus-node-exporter-...: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.
-


-
-tempo-0: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.
-


-Using the observability stack


+Traces to logs: click on any span and select "Logs for this span." Loki filters by time range, service name, namespace, and pod. Super useful for figuring out what a service was doing during a specific request.

 

-Viewing metrics in Grafana


+Traces to metrics: from a trace view, the "Metrics" tab shows Prometheus data like request rate, error rate, and duration percentiles for the services involved.

 

-The kube-prometheus-stack comes with many pre-built dashboards. Some useful ones include:

+Logs to traces: in Loki, logs containing trace IDs are automatically linked. Click the trace ID and you jump straight to the full trace in Tempo.

 

-
-Kubernetes / Compute Resources / Cluster: overview of CPU and memory usage across the cluster
-Kubernetes / Compute Resources / Namespace (Pods): resource usage by namespace
-Node Exporter / Nodes: detailed host metrics like disk I/O, network, and CPU
-


-Querying logs with LogQL


+Storage and Retention


 

-In Grafana's Explore view, select Loki as the data source and try queries like:

+With 10Gi storage and 7-day retention, the system handles moderate trace volumes. Check usage:

 

 -# All logs from the services namespace
-{namespace="services"}
-
-# Logs from pods matching a pattern
-{pod=~"miniflux.*"}
-
-# Filter by log content
-{namespace="services"} |= "error"
-
-# Parse JSON logs and filter
-{namespace="services"} | json | level="error"
+kubectl exec -n monitoring <tempo-pod> -- df -h /var/tempo
 
 

-Creating alerts


-

-Prometheus supports alerting rules that can notify you when something goes wrong. The kube-prometheus-stack includes many default alerts for common issues like high CPU usage, pod crashes, and node problems. These can be customised via PrometheusRule CRDs.

-

-Monitoring external FreeBSD hosts


-

-The observability stack can also monitor servers outside the Kubernetes cluster. The FreeBSD hosts (f0, f1, f2) that serve NFS storage can be added to Prometheus using the Node Exporter.

-

-Installing Node Exporter on FreeBSD


-

-On each FreeBSD host, install the node_exporter package:

+If storage fills up, you can reduce retention to 72h, add sampling in Alloy, or increase the PV size.

 

-
-paul@f0:~ % doas pkg install -y node_exporter
-
+Configuration Files


 

-Enable the service to start at boot:

+All config files are on Codeberg:

 

-
-paul@f0:~ % doas sysrc node_exporter_enable=YES
-node_exporter_enable:  -> YES
-
+Tempo configuration

+Alloy configuration (updated for traces)

+Demo tracing application

 

-Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:

+Other *BSD-related posts:

 

-
-paul@f0:~ % doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
-node_exporter_args:  -> --web.listen-address=192.168.2.130:9100
-
+2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo (You are currently reading this)

+2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

+2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

+2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

+2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network

+2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs

+2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts

+2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation

+2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage

+2024-04-01 KISS high-availability with OpenBSD

+2024-01-13 One reason why I love OpenBSD

+2022-10-30 Installing DTail on OpenBSD

+2022-07-30 Let's Encrypt with OpenBSD and Rex

+2016-04-09 Jails and ZFS with Puppet on FreeBSD

 

-Start the service:

+E-Mail your comments to paul@nospam.buetow.org

 

-
-paul@f0:~ % doas service node_exporter start
-Starting node_exporter.
-
+Back to the main site

+

f3s: Kubernetes with FreeBSD - Part 8: Observability

-Verify it's running:
+Published at 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026

- -

paul@f0:~ % curl -s http://192.168.2.130:9100/metrics | head -3
-# HELP go_gc_duration_seconds A summary of the wall-time pause...
-# TYPE go_gc_duration_seconds summary
-go_gc_duration_seconds{quantile="0"} 0
-

Adding FreeBSD hosts to Prometheus

-Create a file additional-scrape-configs.yaml in the prometheus configuration directory:
+

-- job_name: 'node-exporter'
-  static_configs:
-    - targets:
-      - '192.168.2.130:9100'  # f0 via WireGuard
-      - '192.168.2.131:9100'  # f1 via WireGuard
-      - '192.168.2.132:9100'  # f2 via WireGuard
-      labels:
-        os: freebsd
-

f3s: Kubernetes with FreeBSD - Part 8: Observability
⇢ Introduction
⇢ Important Note: GitOps Migration
⇢ Persistent storage recap
⇢ The monitoring namespace
⇢ Installing Prometheus and Grafana
⇢ ⇢ Prerequisites
⇢ ⇢ Deploying with the Justfile
⇢ ⇢ Exposing Grafana via ingress
⇢ Installing Loki and Alloy
⇢ ⇢ Prerequisites
⇢ ⇢ Deploying Loki and Alloy
⇢ ⇢ Configuring Alloy
⇢ ⇢ Adding Loki as a Grafana data source
⇢ The complete monitoring stack
⇢ Using the observability stack
⇢ ⇢ Viewing metrics in Grafana
⇢ ⇢ Querying logs with LogQL
⇢ ⇢ Creating alerts
⇢ Monitoring external FreeBSD hosts
⇢ ⇢ Installing Node Exporter on FreeBSD
⇢ ⇢ Adding FreeBSD hosts to Prometheus
⇢ ⇢ FreeBSD memory metrics compatibility
⇢ ⇢ Disk I/O metrics limitation
⇢ ZFS Monitoring for FreeBSD Servers
⇢ ⇢ Node Exporter ZFS Collector
⇢ ⇢ Verifying ZFS Metrics
⇢ ⇢ ZFS Recording Rules
⇢ ⇢ Grafana Dashboards
⇢ ⇢ Deployment
⇢ ⇢ Verifying ZFS Metrics in Prometheus
⇢ ⇢ Key Metrics to Monitor
⇢ ⇢ ZFS Pool and Dataset Metrics via Textfile Collector
⇢ Monitoring external OpenBSD hosts
⇢ ⇢ Installing Node Exporter on OpenBSD
⇢ ⇢ Adding OpenBSD hosts to Prometheus
⇢ ⇢ OpenBSD memory metrics compatibility
⇢ Summary

Introduction

-The job_name must be node-exporter to match the existing dashboards. The os: freebsd label allows filtering these hosts separately if needed.
+In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what's happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the monitoring namespace:

-Create a Kubernetes secret from this file:
+

Prometheus: time-series database for metrics collection and alerting
Grafana: visualisation and dashboarding frontend
Loki: log aggregation system (like Prometheus, but for logs)
Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo
Tempo: distributed tracing backend for request flow analysis across microservices

+Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.
+
+All manifests for the f3s stack live in my configuration repository:
+
+codeberg.org/snonux/conf/f3s
+
+

Important Note: GitOps Migration

+
+**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests, Helm charts, and Justfiles in the repository have been reorganized for ArgoCD-based continuous deployment.
+
+**To view the exact configuration as it existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:

-

$ kubectl create secret generic additional-scrape-configs \
-    --from-file=additional-scrape-configs.yaml \
-    -n monitoring
+$ git clone https://codeberg.org/snonux/conf.git
+$ cd conf
+$ git checkout 15a86f3  # Last commit before ArgoCD migration
+$ cd f3s/prometheus/
 
 

-Update persistence-values.yaml to reference the secret:

+**Current master branch** contains the ArgoCD-managed versions with:

+
+Application manifests organized under argocd-apps/{monitoring,services,infra,test}/
+Resources organized under prometheus/manifests/, loki/, etc.
+Justfiles updated to trigger ArgoCD syncs instead of direct Helm commands
+


+The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD). 

 

--prometheus:
-  prometheusSpec:
-    additionalScrapeConfigsSecret:
-      enabled: true
-      name: additional-scrape-configs
-      key: additional-scrape-configs.yaml
-
+Persistent storage recap


 

-Upgrade the Prometheus deployment:

+All observability components need persistent storage so that metrics and logs survive pod restarts. As covered in Part 6 of this series, the cluster uses NFS-backed persistent volumes:

+

+f3s: Kubernetes with FreeBSD - Part 6: Storage

+

+The FreeBSD hosts (f0, f1) serve as master-standby NFS servers, exporting ZFS datasets that are replicated across hosts using zrepl. The Rocky Linux k3s nodes (r0, r1, r2) mount these exports at /data/nfs/k3svolumes. This directory contains subdirectories for each application that needs persistent storage—including Prometheus, Grafana, and Loki.

+

+For example, the observability stack uses these paths on the NFS share:

+

+
+/data/nfs/k3svolumes/prometheus/data — Prometheus time-series database
+/data/nfs/k3svolumes/grafana/data — Grafana configuration, dashboards, and plugins
+/data/nfs/k3svolumes/loki/data — Loki log chunks and index
+/data/nfs/k3svolumes/tempo/data — Tempo trace data and WAL
+


+Each path gets a corresponding PersistentVolume and PersistentVolumeClaim in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.

+

+The monitoring namespace


+

+First, I created the monitoring namespace where all observability components will live:

 

 
-$ just upgrade
+$ kubectl create namespace monitoring
+namespace/monitoring created
 
 

-After a minute or so, the FreeBSD hosts appear in the Prometheus targets and in the Node Exporter dashboards in Grafana.

+Installing Prometheus and Grafana


 

-

+Prometheus and Grafana are deployed together using the kube-prometheus-stack Helm chart from the Prometheus community. This chart bundles Prometheus, Grafana, Alertmanager, and various exporters (Node Exporter, Kube State Metrics) into a single deployment. Ill explain what each component does in detail later when we look at the running pods.

 

-FreeBSD memory metrics compatibility


+Prerequisites


 

-The default Node Exporter dashboards are designed for Linux and expect metrics like node_memory_MemAvailable_bytes. FreeBSD uses different metric names (node_memory_size_bytes, node_memory_free_bytes, etc.), so memory panels will show "No data" out of the box.

+Add the Prometheus Helm chart repository:

 

-To fix this, I created a PrometheusRule that generates synthetic Linux-compatible metrics from the FreeBSD equivalents:

+
+$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+$ helm repo update
+
 

--apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
-  name: freebsd-memory-rules
-  namespace: monitoring
-  labels:
-    release: prometheus
-spec:
-  groups:
-    - name: freebsd-memory
-      rules:
-        - record: node_memory_MemTotal_bytes
-          expr: node_memory_size_bytes{os="freebsd"}
-        - record: node_memory_MemAvailable_bytes
-          expr: |
-            node_memory_free_bytes{os="freebsd"}
-              + node_memory_inactive_bytes{os="freebsd"}
-              + node_memory_cache_bytes{os="freebsd"}
-        - record: node_memory_MemFree_bytes
-          expr: node_memory_free_bytes{os="freebsd"}
-        - record: node_memory_Buffers_bytes
-          expr: node_memory_buffer_bytes{os="freebsd"}
-        - record: node_memory_Cached_bytes
-          expr: node_memory_cache_bytes{os="freebsd"}
+Create the directories on the NFS server for persistent storage:

+

+
+[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/prometheus/data
+[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/grafana/data
 
 

-This file is saved as freebsd-recording-rules.yaml and applied as part of the Prometheus installation. The os="freebsd" label (set in the scrape config) ensures these rules only apply to FreeBSD hosts. After applying, the memory panels in the Node Exporter dashboards populate correctly for FreeBSD.

+Deploying with the Justfile


 

-freebsd-recording-rules.yaml on Codeberg

+The configuration repository contains a Justfile that automates the deployment. just is a handy command runner—think of it as a simpler, more modern alternative to make. I use it throughout the f3s repository to wrap repetitive Helm and kubectl commands:

 

-Disk I/O metrics limitation


+just - A handy way to save and run project-specific commands

+codeberg.org/snonux/conf/f3s/prometheus

 

-Unlike memory metrics, disk I/O metrics (node_disk_read_bytes_total, node_disk_written_bytes_total, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn't have a FreeBSD equivalent in the node_exporter.

+To install everything:

 

-The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (node_zfs_arcstats_*) for ARC cache performance, and per-dataset I/O stats are available via sysctl kstat.zfs, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.

+
+$ cd conf/f3s/prometheus
+$ just install
+kubectl apply -f persistent-volumes.yaml
+persistentvolume/prometheus-data-pv created
+persistentvolume/grafana-data-pv created
+persistentvolumeclaim/grafana-data-pvc created
+helm install prometheus prometheus-community/kube-prometheus-stack \
+    --namespace monitoring -f persistence-values.yaml
+NAME: prometheus
+LAST DEPLOYED: ...
+NAMESPACE: monitoring
+STATUS: deployed
+
 

-ZFS Monitoring for FreeBSD Servers


+The persistence-values.yaml configures Prometheus and Grafana to use the NFS-backed persistent volumes I mentioned earlier, ensuring data survives pod restarts. It also enables scraping of etcd and kube-controller-manager metrics:

 

-The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.

++kubeEtcd:
+  enabled: true
+  endpoints:
+    - 192.168.2.120
+    - 192.168.2.121
+    - 192.168.2.122
+  service:
+    enabled: true
+    port: 2381
+    targetPort: 2381
+
+kubeControllerManager:
+  enabled: true
+  endpoints:
+    - 192.168.2.120
+    - 192.168.2.121
+    - 192.168.2.122
+  service:
+    enabled: true
+    port: 10257
+    targetPort: 10257
+  serviceMonitor:
+    enabled: true
+    https: true
+    insecureSkipVerify: true
+
 

-Node Exporter ZFS Collector


+By default, k3s binds the controller-manager to localhost only and doesn't expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to /etc/rancher/k3s/config.yaml on each k3s server node:

 

-The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:

+
+[root@r0 ~]# cat >> /etc/rancher/k3s/config.yaml << 'EOF'
+kube-controller-manager-arg:
+  - bind-address=0.0.0.0
+etcd-expose-metrics: true
+EOF
+[root@r0 ~]# systemctl restart k3s
+
 

-
-ARC (Adaptive Replacement Cache) statistics
-Cache hit/miss rates
-Memory usage and allocation
-MRU/MFU cache breakdown
-Data vs metadata distribution
-


-Verifying ZFS Metrics


+Repeat for r1 and r2. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.

 

-On any FreeBSD server, check that ZFS metrics are being exposed:

+Verify etcd metrics are exposed:

 

--paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
-      69
+
+[root@r0 ~]# curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader
+etcd_server_has_leader 1
 
 

-The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.

+The full persistence-values.yaml and all other Prometheus configuration files are available on Codeberg:

 

-ZFS Recording Rules


+codeberg.org/snonux/conf/f3s/prometheus

 

-Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:

+The persistent volume definitions bind to specific paths on the NFS share using hostPath volumes—the same pattern used for other services in Part 7:

 

--apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
-  name: freebsd-zfs-rules
-  namespace: monitoring
-  labels:
-    release: prometheus
-spec:
-  groups:
-    - name: freebsd-zfs-arc
-      interval: 30s
-      rules:
-        - record: node_zfs_arc_hit_rate_percent
-          expr: |
-            100 * (
-              rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
-              (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
-               rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
-            )
-          labels:
-            os: freebsd
-        - record: node_zfs_arc_memory_usage_percent
-          expr: |
-            100 * (
-              node_zfs_arcstats_size_bytes{os="freebsd"} /
-              node_zfs_arcstats_c_max_bytes{os="freebsd"}
-            )
-          labels:
-            os: freebsd
-        # Additional rules for metadata %, target %, MRU/MFU %, etc.
-
+f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 

-These recording rules calculate:

+Exposing Grafana via ingress


 

-
-ARC hit rate percentage
-ARC memory usage percentage (current vs maximum)
-ARC target percentage (target vs maximum)
-Metadata vs data percentages
-MRU vs MFU cache percentages
-Demand data and metadata hit rates
-


-Grafana Dashboards


+The chart also deploys an ingress for Grafana, making it accessible at grafana.f3s.foo.zone. The ingress configuration follows the same pattern as other services in the cluster—Traefik handles the routing internally, while the OpenBSD edge relays terminate TLS and forward traffic through WireGuard.

 

-Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):

+Once deployed, Grafana is accessible and comes pre-configured with Prometheus as a data source. You can verify the Prometheus service is running:

 

-**Dashboard 1: FreeBSD ZFS (per-host detailed view)**

+
+$ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
+NAME                                    TYPE        CLUSTER-IP      PORT(S)
+prometheus-kube-prometheus-prometheus   ClusterIP   10.43.152.163   9090/TCP,8080/TCP
+
 

-Includes variables to select:

+Grafana connects to Prometheus using the internal service URL http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090. The default Grafana credentials are admin/prom-operator, which should be changed immediately after first login.

 

-
-FreeBSD server (f0, f1, or f2)
-ZFS pool (zdata, zroot, or all)
-


-Pool Overview Row:

+

 

-
-Pool Capacity gauge (with thresholds: green <70%, yellow <85%, red >85%)
-Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)
-Total Pool Size stat
-Free Space stat
-Pool Space Usage Over Time (stacked: used + free)
-Pool Capacity Trend time series
-


-Dataset Statistics Row:

+

 

-
-Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced
-Automatically filters by selected pool
-


-ARC Cache Statistics Row:

+

 

-
-ARC Hit Rate gauge (red <70%, yellow <90%, green >=90%)
-ARC Size time series (current, target, max)
-ARC Memory Usage percentage gauge
-ARC Hits vs Misses rate
-ARC Data vs Metadata stacked time series
-


-**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**

+Installing Loki and Alloy


 

-Cluster-Wide Pool Statistics Row:

+While Prometheus handles metrics, Loki handles logs. It's designed to be cost-effective and easy to operate—it doesn't index the contents of logs, only the metadata (labels), making it very efficient for storage.

 

-
-Total Storage Capacity across all servers
-Total Used space
-Total Free space
-Average Pool Capacity gauge
-Pool Health Status (worst case across cluster)
-Total Pool Space Usage Over Time
-Per-Pool Capacity time series (all pools on all hosts)
-


-Per-Host Pool Breakdown Row:

+Alloy is Grafana's telemetry collector (the successor to Promtail). It runs as a DaemonSet on each node, tails container logs, and ships them to Loki.

 

-
-Bar gauge showing capacity by host and pool
-Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health
-


-Cluster-Wide ARC Statistics Row:

+Prerequisites


 

-
-Average ARC Hit Rate gauge across all hosts
-ARC Hit Rate by Host time series
-Total ARC Size Across Cluster
-Total ARC Hits vs Misses (cluster-wide sum)
-ARC Size by Host
-


-Dashboard Visualization:

+Create the data directory on the NFS server:

 

-

-

-

+
+[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/loki/data
+
 

-Deployment


+Deploying Loki and Alloy


 

-Applied the resources to the cluster:

+The Loki configuration also lives in the repository:

 

--cd /home/paul/git/conf/f3s/prometheus
-kubectl apply -f zfs-recording-rules.yaml
-kubectl apply -f zfs-dashboards.yaml
-
+codeberg.org/snonux/conf/f3s/loki

 

-Updated Justfile to include ZFS recording rules in install and upgrade targets:

+To install:

 

--install:
-    kubectl apply -f persistent-volumes.yaml
-    kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
-    helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
-    kubectl apply -f freebsd-recording-rules.yaml
-    kubectl apply -f openbsd-recording-rules.yaml
-    kubectl apply -f zfs-recording-rules.yaml
-    just -f grafana-ingress/Justfile install
+
+$ cd conf/f3s/loki
+$ just install
+helm repo add grafana https://grafana.github.io/helm-charts || true
+helm repo update
+kubectl apply -f persistent-volumes.yaml
+persistentvolume/loki-data-pv created
+persistentvolumeclaim/loki-data-pvc created
+helm install loki grafana/loki --namespace monitoring -f values.yaml
+NAME: loki
+LAST DEPLOYED: ...
+NAMESPACE: monitoring
+STATUS: deployed
+...
+helm install alloy grafana/alloy --namespace monitoring -f alloy-values.yaml
+NAME: alloy
+LAST DEPLOYED: ...
+NAMESPACE: monitoring
+STATUS: deployed
 
 

-Verifying ZFS Metrics in Prometheus


+Loki runs in single-binary mode with a single replica (loki-0), which is appropriate for a home lab cluster. This means there's only one Loki pod running at any time. If the node hosting Loki fails, Kubernetes will automatically reschedule the pod to another worker node—but there will be a brief downtime (typically under a minute) while this happens. For my home lab use case, this is perfectly acceptable.

 

-Check that ZFS metrics are being collected:

+For full high-availability, you'd deploy Loki in microservices mode with separate read, write, and backend components, backed by object storage like S3 or MinIO instead of local filesystem storage. That's a more complex setup that I might explore in a future blog post—but for now, the single-binary mode with NFS-backed persistence strikes the right balance between simplicity and durability.

 

--kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
-  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes'
-
+Configuring Alloy


 

-Check recording rules are calculating correctly:

+Alloy is configured via alloy-values.yaml to discover all pods in the cluster and forward their logs to Loki:

 

--kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
-  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent'
+
+discovery.kubernetes "pods" {
+  role = "pod"
+}
+
+discovery.relabel "pods" {
+  targets = discovery.kubernetes.pods.targets
+
+  rule {
+    source_labels = ["__meta_kubernetes_namespace"]
+    target_label  = "namespace"
+  }
+
+  rule {
+    source_labels = ["__meta_kubernetes_pod_name"]
+    target_label  = "pod"
+  }
+
+  rule {
+    source_labels = ["__meta_kubernetes_pod_container_name"]
+    target_label  = "container"
+  }
+
+  rule {
+    source_labels = ["__meta_kubernetes_pod_label_app"]
+    target_label  = "app"
+  }
+}
+
+loki.source.kubernetes "pods" {
+  targets    = discovery.relabel.pods.output
+  forward_to = [loki.write.default.receiver]
+}
+
+loki.write "default" {
+  endpoint {
+    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
+  }
+}
 
 

-Example output shows memory usage percentage for each FreeBSD server:

+This configuration automatically labels each log line with the namespace, pod name, container name, and app label, making it easy to filter logs in Grafana.

 

--"result":[
-  {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
-  {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
-  {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
-]
+Adding Loki as a Grafana data source


+

+Loki doesn't have its own web UI—you query it through Grafana. First, verify the Loki service is running:

+

+
+$ kubectl get svc -n monitoring loki
+NAME   TYPE        CLUSTER-IP    PORT(S)
+loki   ClusterIP   10.43.64.60   3100/TCP,9095/TCP
 
 

-Key Metrics to Monitor


+To add Loki as a data source in Grafana:

 

 
-ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.
-ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.
-Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.
-MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.
-Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.
-Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.
-Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.
+Navigate to Configuration → Data Sources
+Click "Add data source"
+Select "Loki"
+Set the URL to: http://loki.monitoring.svc.cluster.local:3100
+Click "Save & Test"
 


-ZFS Pool and Dataset Metrics via Textfile Collector


+Once configured, you can explore logs in Grafana's "Explore" view. I'll show some example queries in the "Using the observability stack" section below.

 

-To complement the ARC statistics from node_exporter's built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.

+

 

-Created a script at /usr/local/bin/zfs_pool_metrics.sh on each FreeBSD server:

+The complete monitoring stack


 

--#!/bin/sh
-# ZFS Pool and Dataset Metrics Collector for Prometheus
-
-OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
-FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
-
-mkdir -p /var/tmp/node_exporter
-
-{
-    # Pool metrics
-    echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
-    echo "# TYPE zfs_pool_size_bytes gauge"
-    echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
-    echo "# TYPE zfs_pool_allocated_bytes gauge"
-    echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
-    echo "# TYPE zfs_pool_free_bytes gauge"
-    echo "# HELP zfs_pool_capacity_percent Capacity percentage"
-    echo "# TYPE zfs_pool_capacity_percent gauge"
-    echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
-    echo "# TYPE zfs_pool_health gauge"
-
-    zpool list -Hp -o name,size,allocated,free,capacity,health | \
-    while IFS=$'\t' read name size alloc free cap health; do
-        case "$health" in
-            ONLINE)   health_val=0 ;;
-            DEGRADED) health_val=1 ;;
-            FAULTED)  health_val=2 ;;
-            *)        health_val=6 ;;
-        esac
-        cap_num=$(echo "$cap" | sed 's/%//')
-
-        echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
-        echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
-        echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
-        echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
-        echo "zfs_pool_health{pool=\"$name\"} $health_val"
-    done
-
-    # Dataset metrics
-    echo "# HELP zfs_dataset_used_bytes Used space in dataset"
-    echo "# TYPE zfs_dataset_used_bytes gauge"
-    echo "# HELP zfs_dataset_available_bytes Available space"
-    echo "# TYPE zfs_dataset_available_bytes gauge"
-    echo "# HELP zfs_dataset_referenced_bytes Referenced space"
-    echo "# TYPE zfs_dataset_referenced_bytes gauge"
-
-    zfs list -Hp -t filesystem -o name,used,available,referenced | \
-    while IFS=$'\t' read name used avail ref; do
-        pool=$(echo "$name" | cut -d/ -f1)
-        echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
-        echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
-        echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
-    done
-} > "$OUTPUT_FILE"
-
-mv "$OUTPUT_FILE" "$FINAL_FILE"
+After deploying everything, here's what's running in the monitoring namespace:

+

+
+$ kubectl get pods -n monitoring
+NAME                                                     READY   STATUS    RESTARTS   AGE
+alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          42d
+alloy-g5fgj                                              2/2     Running   0          29m
+alloy-nfw8w                                              2/2     Running   0          29m
+alloy-tg9vj                                              2/2     Running   0          29m
+loki-0                                                   2/2     Running   0          25m
+prometheus-grafana-868f9dc7cf-lg2vl                      3/3     Running   0          42d
+prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4      1/1     Running   0          42d
+prometheus-kube-state-metrics-7c5fb9d798-hh2fx           1/1     Running   0          42d
+prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          42d
+prometheus-prometheus-node-exporter-2nsg9                1/1     Running   0          42d
+prometheus-prometheus-node-exporter-mqr25                1/1     Running   0          42d
+prometheus-prometheus-node-exporter-wp4ds                1/1     Running   0          42d
+tempo-0                                                  1/1     Running   0          1d
 
 

-Deployed to all FreeBSD servers:

+Note: Tempo (tempo-0) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.

 

--for host in f0 f1 f2; do
-    scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
-    ssh paul@$host 'doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ && \
-                    doas chmod +x /usr/local/bin/zfs_pool_metrics.sh'
-done
+And the services:

+

+
+$ kubectl get svc -n monitoring
+NAME                                      TYPE        CLUSTER-IP      PORT(S)
+alertmanager-operated                     ClusterIP   None            9093/TCP,9094/TCP
+alloy                                     ClusterIP   10.43.74.14     12345/TCP
+loki                                      ClusterIP   10.43.64.60     3100/TCP,9095/TCP
+loki-headless                             ClusterIP   None            3100/TCP
+prometheus-grafana                        ClusterIP   10.43.46.82     80/TCP
+prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.208.43    9093/TCP,8080/TCP
+prometheus-kube-prometheus-operator       ClusterIP   10.43.246.121   443/TCP
+prometheus-kube-prometheus-prometheus     ClusterIP   10.43.152.163   9090/TCP,8080/TCP
+prometheus-kube-state-metrics             ClusterIP   10.43.64.26     8080/TCP
+prometheus-prometheus-node-exporter       ClusterIP   10.43.127.242   9100/TCP
+tempo                                     ClusterIP   10.43.91.44     3200/TCP,4317/TCP,4318/TCP
 
 

-Set up cron jobs to run every minute:

+Let me break down what each pod does:

 

--for host in f0 f1 f2; do
-    ssh paul@$host 'echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh >/dev/null 2>&1" | \
-                    doas crontab -'
-done
-
+
+alertmanager-prometheus-kube-prometheus-alertmanager-0: the Alertmanager instance that receives alerts from Prometheus, deduplicates them, groups related alerts together, and routes notifications to the appropriate receivers (email, Slack, PagerDuty, etc.). It runs as a StatefulSet with persistent storage for silences and notification state.
+


+
+alloy-g5fgj, alloy-nfw8w, alloy-tg9vj: three Alloy pods running as a DaemonSet, one on each k3s node. Each pod tails the container logs from its local node via the Kubernetes API and forwards them to Loki. This ensures log collection continues even if a node becomes isolated from the others.
+


+
+loki-0: the single Loki instance running in single-binary mode. It receives log streams from Alloy, stores them in chunks on the NFS-backed persistent volume, and serves queries from Grafana. The -0 suffix indicates it's a StatefulSet pod.
+


+
+prometheus-grafana-...: the Grafana web interface for visualising metrics and logs. It comes pre-configured with Prometheus as a data source and includes dozens of dashboards for Kubernetes monitoring. Dashboards, users, and settings are persisted to the NFS share.
+


+
+prometheus-kube-prometheus-operator-...: the Prometheus Operator that watches for custom resources (ServiceMonitor, PodMonitor, PrometheusRule) and automatically configures Prometheus to scrape new targets. This allows applications to declare their own monitoring requirements.
+


+
+prometheus-kube-state-metrics-...: generates metrics about the state of Kubernetes objects themselves: how many pods are running, pending, or failed; deployment replica counts; node conditions; PVC status; and more. Essential for cluster-level dashboards.
+


+
+prometheus-prometheus-kube-prometheus-prometheus-0: the Prometheus server that scrapes metrics from all configured targets (pods, services, nodes), stores them in a time-series database, evaluates alerting rules, and serves queries to Grafana.
+


+
+prometheus-prometheus-node-exporter-...: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.
+


+
+tempo-0: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.
+


+Using the observability stack


 

-The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.

+Viewing metrics in Grafana


 

-Verify metrics are being exposed:

+The kube-prometheus-stack comes with many pre-built dashboards. Some useful ones include:

+

+
+Kubernetes / Compute Resources / Cluster: overview of CPU and memory usage across the cluster
+Kubernetes / Compute Resources / Namespace (Pods): resource usage by namespace
+Node Exporter / Nodes: detailed host metrics like disk I/O, network, and CPU
+


+Querying logs with LogQL


+

+In Grafana's Explore view, select Loki as the data source and try queries like:

 

 -paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
-zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
-zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
-zfs_pool_capacity_percent{pool="zdata"} 64
-zfs_pool_capacity_percent{pool="zroot"} 10
-zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
+# All logs from the services namespace
+{namespace="services"}
+
+# Logs from pods matching a pattern
+{pod=~"miniflux.*"}
+
+# Filter by log content
+{namespace="services"} |= "error"
+
+# Parse JSON logs and filter
+{namespace="services"} | json | level="error"
 
 

-All ZFS-related configuration files are available on Codeberg:

+Creating alerts


 

-zfs-recording-rules.yaml on Codeberg

-zfs-dashboards.yaml on Codeberg

+Prometheus supports alerting rules that can notify you when something goes wrong. The kube-prometheus-stack includes many default alerts for common issues like high CPU usage, pod crashes, and node problems. These can be customised via PrometheusRule CRDs.

 

-Monitoring external OpenBSD hosts


+Monitoring external FreeBSD hosts


 

-The same approach works for OpenBSD hosts. I have two OpenBSD edge relay servers (blowfish, fishfinger) that handle TLS termination and forward traffic through WireGuard to the cluster. These can also be monitored with Node Exporter.

+The observability stack can also monitor servers outside the Kubernetes cluster. The FreeBSD hosts (f0, f1, f2) that serve NFS storage can be added to Prometheus using the Node Exporter.

 

-Installing Node Exporter on OpenBSD


+Installing Node Exporter on FreeBSD


 

-On each OpenBSD host, install the node_exporter package:

+On each FreeBSD host, install the node_exporter package:

 

 
-blowfish:~ $ doas pkg_add node_exporter
-quirks-7.103 signed on 2025-10-13T22:55:16Z
-The following new rcscripts were installed: /etc/rc.d/node_exporter
-See rcctl(8) for details.
+paul@f0:~ % doas pkg install -y node_exporter
 
 

 Enable the service to start at boot:

@@ -6202,7 +6244,8 @@ See rcctl(8) for
 by Lorenzo Bettini
 http://www.lorenzobettini.it
 http://www.gnu.org/software/src-highlite -->
-blowfish:~ $ doas rcctl enable node_exporter
+paul@f0:~ % doas sysrc node_exporter_enable=YES
+node_exporter_enable:  -> YES
 
 

 Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:

@@ -6211,7 +6254,8 @@ http://www.gnu.org/software/src-highlite -->
 by Lorenzo Bettini
 http://www.lorenzobettini.it
 http://www.gnu.org/software/src-highlite -->
-blowfish:~ $ doas rcctl set node_exporter flags '--web.listen-address=192.168.2.110:9100'
+paul@f0:~ % doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
+node_exporter_args:  -> --web.listen-address=192.168.2.130:9100
 
 

 Start the service:

@@ -6219,758 +6263,574 @@ http://www.gnu.org/software/src-highlite -->
 
-blowfish:~ $ doas rcctl start node_exporter
-node_exporter(ok)
-
-

-Verify it's running:

-

-
-blowfish:~ $ curl -s http://192.168.2.110:9100/metrics | head -3
-# HELP go_gc_duration_seconds A summary of the wall-time pause...
-# TYPE go_gc_duration_seconds summary
-go_gc_duration_seconds{quantile="0"} 0
-
-

-Repeat for the other OpenBSD host (fishfinger) with its respective WireGuard IP (192.168.2.111).

-

-Adding OpenBSD hosts to Prometheus


-

-Update additional-scrape-configs.yaml to include the OpenBSD targets:

-

--- job_name: 'node-exporter'
-  static_configs:
-    - targets:
-      - '192.168.2.130:9100'  # f0 via WireGuard
-      - '192.168.2.131:9100'  # f1 via WireGuard
-      - '192.168.2.132:9100'  # f2 via WireGuard
-      labels:
-        os: freebsd
-    - targets:
-      - '192.168.2.110:9100'  # blowfish via WireGuard
-      - '192.168.2.111:9100'  # fishfinger via WireGuard
-      labels:
-        os: openbsd
-
-

-The os: openbsd label allows filtering these hosts separately from FreeBSD and Linux nodes.

-

-OpenBSD memory metrics compatibility


-

-OpenBSD uses the same memory metric names as FreeBSD (node_memory_size_bytes, node_memory_free_bytes, etc.), so a similar PrometheusRule is needed to generate Linux-compatible metrics:

-

--apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
-  name: openbsd-memory-rules
-  namespace: monitoring
-  labels:
-    release: prometheus
-spec:
-  groups:
-    - name: openbsd-memory
-      rules:
-        - record: node_memory_MemTotal_bytes
-          expr: node_memory_size_bytes{os="openbsd"}
-          labels:
-            os: openbsd
-        - record: node_memory_MemAvailable_bytes
-          expr: |
-            node_memory_free_bytes{os="openbsd"}
-              + node_memory_inactive_bytes{os="openbsd"}
-              + node_memory_cache_bytes{os="openbsd"}
-          labels:
-            os: openbsd
-        - record: node_memory_MemFree_bytes
-          expr: node_memory_free_bytes{os="openbsd"}
-          labels:
-            os: openbsd
-        - record: node_memory_Cached_bytes
-          expr: node_memory_cache_bytes{os="openbsd"}
-          labels:
-            os: openbsd
-
-

-This file is saved as openbsd-recording-rules.yaml and applied alongside the FreeBSD rules. Note that OpenBSD doesn't expose a buffer memory metric, so that rule is omitted.

-

-openbsd-recording-rules.yaml on Codeberg

-

-After running just upgrade, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.

-

-Distributed Tracing with Grafana Tempo


-

-After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.

-

-For a preview of what distributed tracing with Tempo looks like in Grafana, see the X-RAG blog post:

-

-X-RAG Observability Hackathon

-

-Why Distributed Tracing?


-

-In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:

-

-
-Tracks requests across service boundaries
-Identifies performance bottlenecks
-Visualizes service dependencies
-Correlates with logs and metrics
-Helps debug complex distributed systems
-


-Deploying Grafana Tempo


-

-Tempo is deployed in monolithic mode, following the same pattern as Loki's SingleBinary deployment.

-

-#### Configuration Strategy

-

-**Deployment Mode:** Monolithic (all components in one process)

-
-Simpler operation than microservices mode
-Suitable for the cluster scale
-Consistent with Loki deployment pattern
-


-**Storage:** Filesystem backend using hostPath

-
-10Gi storage at /data/nfs/k3svolumes/tempo/data
-7-day retention (168h)
-Local storage is the only option for monolithic mode
-


-**OTLP Receivers:** Standard OpenTelemetry Protocol ports

-
-gRPC: 4317
-HTTP: 4318
-Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue
-


-#### Tempo Deployment Files

-

-Created in /home/paul/git/conf/f3s/tempo/:

-

-**values.yaml** - Helm chart configuration:

-

--tempo:
-  retention: 168h
-  storage:
-    trace:
-      backend: local
-      local:
-        path: /var/tempo/traces
-      wal:
-        path: /var/tempo/wal
-  receivers:
-    otlp:
-      protocols:
-        grpc:
-          endpoint: 0.0.0.0:4317
-        http:
-          endpoint: 0.0.0.0:4318
-
-persistence:
-  enabled: true
-  size: 10Gi
-  storageClassName: ""
-
-resources:
-  limits:
-    cpu: 1000m
-    memory: 2Gi
-  requests:
-    cpu: 500m
-    memory: 1Gi
-
-

-**persistent-volumes.yaml** - Storage configuration:

-

--apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: tempo-data-pv
-spec:
-  capacity:
-    storage: 10Gi
-  accessModes:
-    - ReadWriteOnce
-  persistentVolumeReclaimPolicy: Retain
-  hostPath:
-    path: /data/nfs/k3svolumes/tempo/data
----
-apiVersion: v1
-kind: PersistentVolumeClaim
-metadata:
-  name: tempo-data-pvc
-  namespace: monitoring
-spec:
-  storageClassName: ""
-  accessModes:
-    - ReadWriteOnce
-  resources:
-    requests:
-      storage: 10Gi
+http://www.gnu.org/software/src-highlite -->
+paul@f0:~ % doas service node_exporter start
+Starting node_exporter.
+
+

+Verify it's running:

+

+
+paul@f0:~ % curl -s http://192.168.2.130:9100/metrics | head -3
+# HELP go_gc_duration_seconds A summary of the wall-time pause...
+# TYPE go_gc_duration_seconds summary
+go_gc_duration_seconds{quantile="0"} 0
 
 

-**Grafana Datasource Provisioning**

+Repeat for the other FreeBSD hosts (f1, f2) with their respective WireGuard IPs.

 

-All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.

+Adding FreeBSD hosts to Prometheus


 

-In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:

+Create a file additional-scrape-configs.yaml in the prometheus configuration directory:

 

 -apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: grafana-datasources-all
-  namespace: monitoring
-data:
-  datasources.yaml: |
-    apiVersion: 1
-    datasources:
-      - name: Prometheus
-        type: prometheus
-        uid: prometheus
-        url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
-        access: proxy
-        isDefault: true
-      - name: Alertmanager
-        type: alertmanager
-        uid: alertmanager
-        url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
-      - name: Loki
-        type: loki
-        uid: loki
-        url: http://loki.monitoring.svc.cluster.local:3100
-      - name: Tempo
-        type: tempo
-        uid: tempo
-        url: http://tempo.monitoring.svc.cluster.local:3200
-        jsonData:
-          tracesToLogsV2:
-            datasourceUid: loki
-            spanStartTimeShift: -1h
-            spanEndTimeShift: 1h
-          tracesToMetrics:
-            datasourceUid: prometheus
-          serviceMap:
-            datasourceUid: prometheus
-          nodeGraph:
-            enabled: true
+- job_name: 'node-exporter'
+  static_configs:
+    - targets:
+      - '192.168.2.130:9100'  # f0 via WireGuard
+      - '192.168.2.131:9100'  # f1 via WireGuard
+      - '192.168.2.132:9100'  # f2 via WireGuard
+      labels:
+        os: freebsd
 
 

-The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:

-
-Disable sidecar-based datasource provisioning
-Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/
-


-This direct mounting approach is simpler and more reliable than sidecar-based discovery.

+The job_name must be node-exporter to match the existing dashboards. The os: freebsd label allows filtering these hosts separately if needed.

 

-#### Installation

+Create a Kubernetes secret from this file:

 

--cd /home/paul/git/conf/f3s/tempo
-just install
+
+$ kubectl create secret generic additional-scrape-configs \
+    --from-file=additional-scrape-configs.yaml \
+    -n monitoring
 
 

-Verify Tempo is running:

+Update persistence-values.yaml to reference the secret:

 

 -kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
-kubectl exec -n monitoring <tempo-pod> -- wget -qO- http://localhost:3200/ready
+prometheus:
+  prometheusSpec:
+    additionalScrapeConfigsSecret:
+      enabled: true
+      name: additional-scrape-configs
+      key: additional-scrape-configs.yaml
 
 

-Configuring Grafana Alloy for Trace Collection


+Upgrade the Prometheus deployment:

 

-Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.

+
+$ just upgrade
+
 

-#### OTLP Receiver Configuration

+After a minute or so, the FreeBSD hosts appear in the Prometheus targets and in the Node Exporter dashboards in Grafana.

 

-Added to Alloy configuration after the log collection pipeline:

+

 

--// OTLP receiver for traces via gRPC and HTTP
-otelcol.receiver.otlp "default" {
-  grpc {
-    endpoint = "0.0.0.0:4317"
-  }
-  http {
-    endpoint = "0.0.0.0:4318"
-  }
-  output {
-    traces = [otelcol.processor.batch.default.input]
-  }
-}
-
-// Batch processor for efficient trace forwarding
-otelcol.processor.batch "default" {
-  timeout = "5s"
-  send_batch_size = 100
-  send_batch_max_size = 200
-  output {
-    traces = [otelcol.exporter.otlp.tempo.input]
-  }
-}
-
-// OTLP exporter to send traces to Tempo
-otelcol.exporter.otlp "tempo" {
-  client {
-    endpoint = "tempo.monitoring.svc.cluster.local:4317"
-    tls {
-      insecure = true
-    }
-    compression = "gzip"
-  }
-}
-
+FreeBSD memory metrics compatibility


 

-The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.

+The default Node Exporter dashboards are designed for Linux and expect metrics like node_memory_MemAvailable_bytes. FreeBSD uses different metric names (node_memory_size_bytes, node_memory_free_bytes, etc.), so memory panels will show "No data" out of the box.

 

-#### Upgrade Alloy

+To fix this, I created a PrometheusRule that generates synthetic Linux-compatible metrics from the FreeBSD equivalents:

 

 -cd /home/paul/git/conf/f3s/loki
-just upgrade
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: freebsd-memory-rules
+  namespace: monitoring
+  labels:
+    release: prometheus
+spec:
+  groups:
+    - name: freebsd-memory
+      rules:
+        - record: node_memory_MemTotal_bytes
+          expr: node_memory_size_bytes{os="freebsd"}
+        - record: node_memory_MemAvailable_bytes
+          expr: |
+            node_memory_free_bytes{os="freebsd"}
+              + node_memory_inactive_bytes{os="freebsd"}
+              + node_memory_cache_bytes{os="freebsd"}
+        - record: node_memory_MemFree_bytes
+          expr: node_memory_free_bytes{os="freebsd"}
+        - record: node_memory_Buffers_bytes
+          expr: node_memory_buffer_bytes{os="freebsd"}
+        - record: node_memory_Cached_bytes
+          expr: node_memory_cache_bytes{os="freebsd"}
 
 

-Verify OTLP receivers are listening:

+This file is saved as freebsd-recording-rules.yaml and applied as part of the Prometheus installation. The os="freebsd" label (set in the scrape config) ensures these rules only apply to FreeBSD hosts. After applying, the memory panels in the Node Exporter dashboards populate correctly for FreeBSD.

 

--kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
-kubectl exec -n monitoring <alloy-pod> -- netstat -ln | grep -E ':(4317|4318)'
-
+freebsd-recording-rules.yaml on Codeberg

 

-Demo Tracing Application


+Disk I/O metrics limitation


 

-Created a three-tier Python application to demonstrate distributed tracing in action.

+Unlike memory metrics, disk I/O metrics (node_disk_read_bytes_total, node_disk_written_bytes_total, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn't have a FreeBSD equivalent in the node_exporter.

 

-#### Application Architecture

+The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (node_zfs_arcstats_*) for ARC cache performance, and per-dataset I/O stats are available via sysctl kstat.zfs, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.

 

--User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
-           ↓                          ↓                        ↓
-                    Alloy (OTLP:4317) → Tempo → Grafana
-
+ZFS Monitoring for FreeBSD Servers


 

-Frontend Service:

+The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.

 

-
-Receives HTTP requests at /api/process
-Forwards to middleware service
-Creates parent span for the entire request
-


-Middleware Service:

+Node Exporter ZFS Collector


 

-
-Transforms data at /api/transform
-Calls backend service
-Creates child span linked to frontend
-


-Backend Service:

+The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:

 

 
-Returns data at /api/data
-Simulates database query (100ms sleep)
-Creates leaf span in the trace
+ARC (Adaptive Replacement Cache) statistics
+Cache hit/miss rates
+Memory usage and allocation
+MRU/MFU cache breakdown
+Data vs metadata distribution
 


-OpenTelemetry Instrumentation:

+Verifying ZFS Metrics


 

-All services use Python OpenTelemetry libraries:

+On any FreeBSD server, check that ZFS metrics are being exposed:

 

-**Dependencies:**

 -flask==3.0.0
-requests==2.31.0
-opentelemetry-distro==0.49b0
-opentelemetry-exporter-otlp==1.28.0
-opentelemetry-instrumentation-flask==0.49b0
-opentelemetry-instrumentation-requests==0.49b0
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
+      69
 
 

-**Auto-instrumentation pattern** (used in all services):

+The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.

 

-
-from opentelemetry import trace
-from opentelemetry.sdk.trace import TracerProvider
-from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
-from opentelemetry.instrumentation.flask import FlaskInstrumentor
-from opentelemetry.instrumentation.requests import RequestsInstrumentor
-from opentelemetry.sdk.resources import Resource
-
-# Define service identity
-resource = Resource(attributes={
-    "service.name": "frontend",
-    "service.namespace": "tracing-demo",
-    "service.version": "1.0.0"
-})
-
-provider = TracerProvider(resource=resource)
-
-# Export to Alloy
-otlp_exporter = OTLPSpanExporter(
-    endpoint="http://alloy.monitoring.svc.cluster.local:4317",
-    insecure=True
-)
-
-processor = BatchSpanProcessor(otlp_exporter)
-provider.add_span_processor(processor)
-trace.set_tracer_provider(provider)
-
-# Auto-instrument Flask and requests
-FlaskInstrumentor().instrument_app(app)
-RequestsInstrumentor().instrument()
+ZFS Recording Rules


+

+Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:

+

++apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: freebsd-zfs-rules
+  namespace: monitoring
+  labels:
+    release: prometheus
+spec:
+  groups:
+    - name: freebsd-zfs-arc
+      interval: 30s
+      rules:
+        - record: node_zfs_arc_hit_rate_percent
+          expr: |
+            100 * (
+              rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
+              (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
+               rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
+            )
+          labels:
+            os: freebsd
+        - record: node_zfs_arc_memory_usage_percent
+          expr: |
+            100 * (
+              node_zfs_arcstats_size_bytes{os="freebsd"} /
+              node_zfs_arcstats_c_max_bytes{os="freebsd"}
+            )
+          labels:
+            os: freebsd
+        # Additional rules for metadata %, target %, MRU/MFU %, etc.
 
 

-The auto-instrumentation automatically:

+These recording rules calculate:

+

 
-Creates spans for HTTP requests
-Propagates trace context via W3C Trace Context headers
-Links parent and child spans across service boundaries
+ARC hit rate percentage
+ARC memory usage percentage (current vs maximum)
+ARC target percentage (target vs maximum)
+Metadata vs data percentages
+MRU vs MFU cache percentages
+Demand data and metadata hit rates
 


-Deployment:

-

-Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.

+Grafana Dashboards


 

-Build and deploy:

+Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):

 

--cd /home/paul/git/conf/f3s/tracing-demo
-just build
-just import
-just install
-
+**Dashboard 1: FreeBSD ZFS (per-host detailed view)**

 

-Verify deployment:

+Includes variables to select:

 

--kubectl get pods -n services | grep tracing-demo
-kubectl get ingress -n services tracing-demo-ingress
-
+
+FreeBSD server (f0, f1, or f2)
+ZFS pool (zdata, zroot, or all)
+


+Pool Overview Row:

 

-Access the application at:

+
+Pool Capacity gauge (with thresholds: green <70%, yellow <85%, red >85%)
+Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)
+Total Pool Size stat
+Free Space stat
+Pool Space Usage Over Time (stacked: used + free)
+Pool Capacity Trend time series
+


+Dataset Statistics Row:

 

-http://tracing-demo.f3s.buetow.org

+
+Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced
+Automatically filters by selected pool
+


+ARC Cache Statistics Row:

 

-Visualizing Traces in Grafana


+
+ARC Hit Rate gauge (red <70%, yellow <90%, green >=90%)
+ARC Size time series (current, target, max)
+ARC Memory Usage percentage gauge
+ARC Hits vs Misses rate
+ARC Data vs Metadata stacked time series
+


+**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**

 

-The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.

+Cluster-Wide Pool Statistics Row:

 

-#### Accessing Traces

+
+Total Storage Capacity across all servers
+Total Used space
+Total Free space
+Average Pool Capacity gauge
+Pool Health Status (worst case across cluster)
+Total Pool Space Usage Over Time
+Per-Pool Capacity time series (all pools on all hosts)
+


+Per-Host Pool Breakdown Row:

 

-Navigate to Grafana → Explore → Select "Tempo" datasource

+
+Bar gauge showing capacity by host and pool
+Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health
+


+Cluster-Wide ARC Statistics Row:

 

-**Search Interface:**

 
-Search by Trace ID
-Search by service name
-Search by tags
+Average ARC Hit Rate gauge across all hosts
+ARC Hit Rate by Host time series
+Total ARC Size Across Cluster
+Total ARC Hits vs Misses (cluster-wide sum)
+ARC Size by Host
 


-**TraceQL Queries:**

+Dashboard Visualization:

 

-Find all traces from demo app:

--{ resource.service.namespace = "tracing-demo" }
-
+

+

+

 

-Find slow requests (>200ms):

--{ duration > 200ms }
-
+Deployment


 

-Find traces from specific service:

--{ resource.service.name = "frontend" }
-
+Applied the resources to the cluster:

 

-Find errors:

 -{ status = error }
+cd /home/paul/git/conf/f3s/prometheus
+kubectl apply -f zfs-recording-rules.yaml
+kubectl apply -f zfs-dashboards.yaml
 
 

-Complex query - frontend traces calling middleware:

+Updated Justfile to include ZFS recording rules in install and upgrade targets:

+

 -{ resource.service.namespace = "tracing-demo" } && { span.http.status_code >= 500 }
+install:
+    kubectl apply -f persistent-volumes.yaml
+    kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
+    helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
+    kubectl apply -f freebsd-recording-rules.yaml
+    kubectl apply -f openbsd-recording-rules.yaml
+    kubectl apply -f zfs-recording-rules.yaml
+    just -f grafana-ingress/Justfile install
 
 

-#### Service Graph Visualization

-

-The service graph shows visual connections between services:

-

-1. Navigate to Explore → Tempo

-2. Enable "Service Graph" view

-3. Shows: Frontend → Middleware → Backend with request rates

-

-The service graph uses Prometheus metrics generated from trace data.

-

-Correlation Between Observability Signals


-

-Tempo integrates with Loki and Prometheus to provide unified observability.

-

-#### Traces-to-Logs

-

-Click on any span in a trace to see related logs:

-

-1. View trace in Grafana

-2. Click on a span

-3. Select "Logs for this span"

-4. Loki shows logs filtered by:

-   * Time range (span duration ± 1 hour)

-   * Service name

-   * Namespace

-   * Pod

-

-This helps correlate what the service was doing when the span was created.

-

-#### Traces-to-Metrics

-

-View Prometheus metrics for services in the trace:

-

-1. View trace in Grafana

-2. Select "Metrics" tab

-3. Shows metrics like:

-   * Request rate

-   * Error rate

-   * Duration percentiles

-

-#### Logs-to-Traces

-

-From logs, you can jump to related traces:

+Verifying ZFS Metrics in Prometheus


 

-1. In Loki, logs that contain trace IDs are automatically linked

-2. Click the trace ID to view the full trace

-3. See the complete request flow

+Check that ZFS metrics are being collected:

 

-Generating Traces for Testing


++kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes'
+
 

-Test the demo application:

+Check recording rules are calculating correctly:

 

 -curl http://tracing-demo.f3s.buetow.org/api/process
+kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
+  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent'
 
 

-Load test (generates 50 traces):

+Example output shows memory usage percentage for each FreeBSD server:

 

 -cd /home/paul/git/conf/f3s/tracing-demo
-just load-test
+"result":[
+  {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
+  {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
+  {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
+]
 
 

-Each request creates a distributed trace spanning all three services.

+Key Metrics to Monitor


 

-Verifying the Complete Pipeline


+
+ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.
+ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.
+Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.
+MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.
+Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.
+Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.
+Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.
+


+ZFS Pool and Dataset Metrics via Textfile Collector


 

-Check the trace flow end-to-end:

+To complement the ARC statistics from node_exporter's built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.

 

-**1. Application generates traces:**

--kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
-
+Created a script at /usr/local/bin/zfs_pool_metrics.sh on each FreeBSD server:

 

-**2. Alloy receives traces:**

 -kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
+#!/bin/sh
+# ZFS Pool and Dataset Metrics Collector for Prometheus
+
+OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
+FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
+
+mkdir -p /var/tmp/node_exporter
+
+{
+    # Pool metrics
+    echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
+    echo "# TYPE zfs_pool_size_bytes gauge"
+    echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
+    echo "# TYPE zfs_pool_allocated_bytes gauge"
+    echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
+    echo "# TYPE zfs_pool_free_bytes gauge"
+    echo "# HELP zfs_pool_capacity_percent Capacity percentage"
+    echo "# TYPE zfs_pool_capacity_percent gauge"
+    echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
+    echo "# TYPE zfs_pool_health gauge"
+
+    zpool list -Hp -o name,size,allocated,free,capacity,health | \
+    while IFS=$'\t' read name size alloc free cap health; do
+        case "$health" in
+            ONLINE)   health_val=0 ;;
+            DEGRADED) health_val=1 ;;
+            FAULTED)  health_val=2 ;;
+            *)        health_val=6 ;;
+        esac
+        cap_num=$(echo "$cap" | sed 's/%//')
+
+        echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
+        echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
+        echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
+        echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
+        echo "zfs_pool_health{pool=\"$name\"} $health_val"
+    done
+
+    # Dataset metrics
+    echo "# HELP zfs_dataset_used_bytes Used space in dataset"
+    echo "# TYPE zfs_dataset_used_bytes gauge"
+    echo "# HELP zfs_dataset_available_bytes Available space"
+    echo "# TYPE zfs_dataset_available_bytes gauge"
+    echo "# HELP zfs_dataset_referenced_bytes Referenced space"
+    echo "# TYPE zfs_dataset_referenced_bytes gauge"
+
+    zfs list -Hp -t filesystem -o name,used,available,referenced | \
+    while IFS=$'\t' read name used avail ref; do
+        pool=$(echo "$name" | cut -d/ -f1)
+        echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
+        echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
+        echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
+    done
+} > "$OUTPUT_FILE"
+
+mv "$OUTPUT_FILE" "$FINAL_FILE"
 
 

-**3. Tempo stores traces:**

+Deployed to all FreeBSD servers:

+

 -kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
+for host in f0 f1 f2; do
+    scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
+    ssh paul@$host 'doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ && \
+                    doas chmod +x /usr/local/bin/zfs_pool_metrics.sh'
+done
 
 

-**4. Grafana displays traces:**

-Navigate to Explore → Tempo → Search for traces

+Set up cron jobs to run every minute:

 

-Practical Example: Viewing a Distributed Trace


++for host in f0 f1 f2; do
+    ssh paul@$host 'echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh >/dev/null 2>&1" | \
+                    doas crontab -'
+done
+
 

-Let's generate a trace and examine it in Grafana.

+The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.

 

-**1. Generate a trace by calling the demo application:**

+Verify metrics are being exposed:

 

 -curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
+paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
+zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
+zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
+zfs_pool_capacity_percent{pool="zdata"} 64
+zfs_pool_capacity_percent{pool="zroot"} 10
+zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
 
 

-**Response (HTTP 200):**

-

-
-{
-  "middleware_response": {
-    "backend_data": {
-      "data": {
-        "id": 12345,
-        "query_time_ms": 100.0,
-        "timestamp": "2025-12-28T18:35:01.064538",
-        "value": "Sample data from backend service"
-      },
-      "service": "backend"
-    },
-    "middleware_processed": true,
-    "original_data": {
-      "source": "GET request"
-    },
-    "transformation_time_ms": 50
-  },
-  "request_data": {
-    "source": "GET request"
-  },
-  "service": "frontend",
-  "status": "success"
-}
-
+All ZFS-related configuration files are available on Codeberg:

 

-**2. Find the trace in Tempo via API:**

+zfs-recording-rules.yaml on Codeberg

+zfs-dashboards.yaml on Codeberg

 

-After a few seconds (for batch export), search for recent traces:

+Monitoring external OpenBSD hosts


 

--kubectl exec -n monitoring tempo-0 -- wget -qO- \
-  'http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&limit=5' 2>/dev/null | \
-  python3 -m json.tool
-
+The same approach works for OpenBSD hosts. I have two OpenBSD edge relay servers (blowfish, fishfinger) that handle TLS termination and forward traffic through WireGuard to the cluster. These can also be monitored with Node Exporter.

+

+Installing Node Exporter on OpenBSD


 

-Returns traces including:

+On each OpenBSD host, install the node_exporter package:

 

 
-{
-  "traceID": "4be1151c0bdcd5625ac7e02b98d95bd5",
-  "rootServiceName": "frontend",
-  "rootTraceName": "GET /api/process",
-  "durationMs": 221
-}
+blowfish:~ $ doas pkg_add node_exporter
+quirks-7.103 signed on 2025-10-13T22:55:16Z
+The following new rcscripts were installed: /etc/rc.d/node_exporter
+See rcctl(8) for details.
 
 

-**3. Fetch complete trace details:**

+Enable the service to start at boot:

 

--kubectl exec -n monitoring tempo-0 -- wget -qO- \
-  'http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5' 2>/dev/null | \
-  python3 -m json.tool
+
+blowfish:~ $ doas rcctl enable node_exporter
 
 

-**Trace structure (8 spans across 3 services):**

+Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:

 

--Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
-Services: 3 (frontend, middleware, backend)
-
-Service: frontend
-  └─ GET /api/process                 221.10ms  (HTTP server span)
-  └─ frontend-process                 216.23ms  (custom business logic span)
-  └─ POST                             209.97ms  (HTTP client span to middleware)
-
-Service: middleware
-  └─ POST /api/transform              186.02ms  (HTTP server span)
-  └─ middleware-transform             180.96ms  (custom business logic span)
-  └─ GET                              127.52ms  (HTTP client span to backend)
-
-Service: backend
-  └─ GET /api/data                    103.93ms  (HTTP server span)
-  └─ backend-get-data                 102.11ms  (custom business logic span with 100ms sleep)
+
+blowfish:~ $ doas rcctl set node_exporter flags '--web.listen-address=192.168.2.110:9100'
 
 

-**4. View the trace in Grafana UI:**

-

-Navigate to: Grafana → Explore → Tempo datasource

-

-Search using TraceQL:

--{ resource.service.namespace = "tracing-demo" }
-
+Start the service:

 

-Or directly open the trace by pasting the trace ID in the search box:

--4be1151c0bdcd5625ac7e02b98d95bd5
+
+blowfish:~ $ doas rcctl start node_exporter
+node_exporter(ok)
 
 

-**5. Trace visualization:**

-

-The trace waterfall view in Grafana shows the complete request flow with timing:

-

-

-

-For additional examples of Tempo trace visualization, see also:

+Verify it's running:

 

-X-RAG Observability Hackathon (more Grafana Tempo screenshots)

+
+blowfish:~ $ curl -s http://192.168.2.110:9100/metrics | head -3
+# HELP go_gc_duration_seconds A summary of the wall-time pause...
+# TYPE go_gc_duration_seconds summary
+go_gc_duration_seconds{quantile="0"} 0
+
 

-The trace reveals the distributed request flow:

+Repeat for the other OpenBSD host (fishfinger) with its respective WireGuard IP (192.168.2.111).

 

-
-Frontend (221ms): Receives GET /api/process, executes business logic, calls middleware
-Middleware (186ms): Receives POST /api/transform, transforms data, calls backend
-Backend (104ms): Receives GET /api/data, simulates database query with 100ms sleep
-Total request time: 221ms end-to-end
-Span propagation: W3C Trace Context headers automatically link all spans
-


-**6. Service graph visualization:**

+Adding OpenBSD hosts to Prometheus


 

-The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.

+Update additional-scrape-configs.yaml to include the OpenBSD targets:

 

-X-RAG Observability Hackathon (includes service graph screenshots)

++- job_name: 'node-exporter'
+  static_configs:
+    - targets:
+      - '192.168.2.130:9100'  # f0 via WireGuard
+      - '192.168.2.131:9100'  # f1 via WireGuard
+      - '192.168.2.132:9100'  # f2 via WireGuard
+      labels:
+        os: freebsd
+    - targets:
+      - '192.168.2.110:9100'  # blowfish via WireGuard
+      - '192.168.2.111:9100'  # fishfinger via WireGuard
+      labels:
+        os: openbsd
+
 

-This visualization helps identify:

+The os: openbsd label allows filtering these hosts separately from FreeBSD and Linux nodes.

 

-
-Request rates between services
-Average latency for each hop
-Error rates (if any)
-Service dependencies and communication patterns
-


-Storage and Retention


+OpenBSD memory metrics compatibility


 

-Monitor Tempo storage usage:

+OpenBSD uses the same memory metric names as FreeBSD (node_memory_size_bytes, node_memory_free_bytes, etc.), so a similar PrometheusRule is needed to generate Linux-compatible metrics:

 

 -kubectl exec -n monitoring <tempo-pod> -- df -h /var/tempo
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: openbsd-memory-rules
+  namespace: monitoring
+  labels:
+    release: prometheus
+spec:
+  groups:
+    - name: openbsd-memory
+      rules:
+        - record: node_memory_MemTotal_bytes
+          expr: node_memory_size_bytes{os="openbsd"}
+          labels:
+            os: openbsd
+        - record: node_memory_MemAvailable_bytes
+          expr: |
+            node_memory_free_bytes{os="openbsd"}
+              + node_memory_inactive_bytes{os="openbsd"}
+              + node_memory_cache_bytes{os="openbsd"}
+          labels:
+            os: openbsd
+        - record: node_memory_MemFree_bytes
+          expr: node_memory_free_bytes{os="openbsd"}
+          labels:
+            os: openbsd
+        - record: node_memory_Cached_bytes
+          expr: node_memory_cache_bytes{os="openbsd"}
+          labels:
+            os: openbsd
 
 

-With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:

-

-
-Reduce retention to 72h (3 days)
-Implement sampling in Alloy
-Increase PV size
-


-Configuration Files


+This file is saved as openbsd-recording-rules.yaml and applied alongside the FreeBSD rules. Note that OpenBSD doesn't expose a buffer memory metric, so that rule is omitted.

 

-All configuration files are available on Codeberg:

+openbsd-recording-rules.yaml on Codeberg

 

-Tempo configuration

-Alloy configuration (updated for traces)

-Demo tracing application

+After running just upgrade, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.

 

 Summary


 

-With Prometheus, Grafana, Loki, Alloy, and Tempo deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:

+With Prometheus, Grafana, Loki, and Alloy deployed, I now have visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:

 

 
 Metrics: Prometheus collects and stores time-series data from all components, including etcd and ZFS
 Logs: Loki aggregates logs from all containers, searchable via Grafana
-Traces: Tempo provides distributed request tracing with service dependency mapping
-Visualisation: Grafana provides dashboards and exploration tools with correlation between all three signals
+Visualisation: Grafana provides dashboards and exploration tools
 Alerting: Alertmanager can notify on conditions defined in Prometheus rules
 


-This observability stack runs entirely on the home lab infrastructure, with data persisted to the NFS share. It's lightweight enough for a three-node cluster but provides the same capabilities as production-grade setups.

+The next part covers the final pillar of observability: distributed tracing with Grafana Tempo.

+

+Part 8b: Distributed Tracing with Tempo

 

 All configuration files are available on Codeberg:

 

 Prometheus, Grafana, and recording rules configuration

 Loki and Alloy configuration

-Tempo configuration

-Demo tracing application

 

 Other *BSD-related posts:

 

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

@@ -7895,6 +7755,7 @@ p hash.values_at(:a, :c)
 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

 

 

@@ -9364,6 +9225,7 @@ replicaset.apps/miniflux-server-85d7c64664     1         1         1       54d
 Other *BSD-related posts:

 

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)

 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

@@ -10671,6 +10533,7 @@ content = "{CODE}"
 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

 

 

@@ -12827,6 +12690,7 @@ http://www.gnu.org/software/src-highlite -->
 Other *BSD-related posts:

 

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)

@@ -13901,6 +13765,7 @@ http://www.gnu.org/software/src-highlite -->
 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

 

 

@@ -15448,6 +15313,7 @@ earth$ curl https://ifconfig.me  # Should show gateway's
 Other *BSD-related posts:

 

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

@@ -16041,6 +15907,7 @@ __ejm\___/________dwb`---`______________________
 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

 

 

@@ -16733,6 +16600,7 @@ etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 408
 Other *BSD-related posts:

 

 2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

+2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

 2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage

@@ -17458,6 +17326,7 @@ This is perl, v5.8.8 built 
-        https://foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.html
-        2024-08-05T17:39:30+03:00
-        
-            Paul Buetow aka snonux
-            paul@dev.buetow.org
-        
-        After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?...
-        
-            
-                Typing 127.1 words per minute (>100wpm average)


-

-Published at 2024-08-05T17:39:30+03:00; Updated at 2025-02-22

-

--,---,---,---,---,---,---,---,---,---,---,---,---,---,-------,
-|1/2| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | + | ' | <-    |
-|---'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-----|
-| ->| | Q | W | E | R | T | Y | U | I | O | P | ] | ^ |     |
-|-----',--',--',--',--',--',--',--',--',--',--',--',--'|    |
-| Caps | A | S | D | F | G | H | J | K | L | \ | [ | * |    |
-|----,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'---'----|
-|    | < | Z | X | C | V | B | N | M | , | . | - |          |
-|----'-,-',--'--,'---'---'---'---'---'---'-,-'---',--,------|
-| ctrl |  | alt |                          |altgr |  | ctrl |
-'------'  '-----'--------------------------'------'  '------'
-      Nieminen Mika	
-
-

-Table of Contents


-

-
-Typing 127.1 words per minute (>100wpm average)
-⇢ Introduction
-⇢ Kinesis review
-⇢ ⇢ Top build quality
-⇢ ⇢ Bluetooth connectivity
-⇢ ⇢ Gateron Brown key switches
-⇢ ⇢ Keycaps
-⇢ ⇢ Keymap editor
-⇢ First steps
-⇢ Considering alternate layouts
-⇢ Training how to type
-⇢ ⇢ Tools
-⇢ My keybr.com statistics
-⇢ Tips and tricks
-⇢ ⇢ Relax
-⇢ ⇢ Focus on accuracy first
-⇢ ⇢ Chording
-⇢ ⇢ Punctuation and Capitalization
-⇢ ⇢ Reverse shifting
-⇢ ⇢ Enter the flow state
-⇢ ⇢ Repeat every word
-⇢ ⇢ Don't use the same finger for two consecutive keystrokes
-⇢ ⇢ Warm-up
-⇢ Travel keyboard
-⇢ Upcoming custom Kinesis build
-⇢ Conclusion
-


-Introduction


-

-After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?...

-

-
-Concave: Some fingers are longer than others. A concave keyboard makes it so that the keycaps meant to be pressed by the longer fingers are further down (e.g., left middle finger for e on a Qwerty layout), and keycaps meant to be pressed by shorter fingers are further up (e.g., right pinky finger for the letter p).
-Ortholinear: The keys are arranged in a straight vertical line, unlike most conventional keyboards. The conventional keyboards still resemble the old typewriters, where the placement of the keys was optimized so that the typewriter would not jam. There is no such requirement anymore.
-Split: The keyboard is split into two halves (left and right), allowing one to place either hand where it is most ergonomic.
-


-After discovering ThePrimagen (I found him long ago, but I never bothered buying the same keyboard he is on) on YouTube and reading/watching a couple of reviews, I thought that as a computer professional, the equipment could be expensive anyway (laptop, adjustable desk, comfortable chair), so why not invest a bit more into the keyboard? I purchased myself the Kinesis Advantage360 Professional keyboard. 

-

-Kinesis review


-

-For an in-depth review, have a look at this great article:

-

-Review of the Kinesis Advantage360 Professional keyboard

-

-Top build quality


-

-Overall, the keyboard feels excellent quality and robust. It has got some weight to it. Because of that, it is not ideally suited for travel, though. But I have a different keyboard to solve this (see later in this post). Overall, I love how it is built and how it feels.

-

-

-

-Bluetooth connectivity


-

-Despite encountering concerns about Bluetooth connectivity issues with the Kinesis keyboard during my research, I purchased one anyway as I intended to use it only via USB. However, I discovered that the firmware updates available afterwards had addressed these reported Bluetooth issues, and as a result, I did not experience any difficulties with the Bluetooth functionality. This positive outcome allowed me to enjoy using the keyboard also wirelessly.

-

-Gateron Brown key switches


-

-Many voices on the internet seem to dislike the Gateron Brown switches, the only official choice for non-clicky tactile switches in the Kinesis, so I was also a bit concerned. I almost went with Cherry MX Browns for my Kinesis (a custom build from a 3rd party provider that is partnershipping with Kinesis). Still, I decided on Gateron Browns to try different switches than the Cherry MX Browns I already have on my ZSA Moonlander keyboard (another ortho-linear split keyboard, but without a concave keycap layout). 

-

-At first, I was disappointed by the Gaterons, as they initially felt a bit meshy compared to the Cherries. Still, over the weeks I grew to prefer them because of their smoothness. Over time, the tactile bumps also became more noticeable (as my perception of them improved). Because of their less pronounced tactile feedback, the Gaterons are less tiring for long typing sessions and better suited for a relaxed typing experience.

-

-So, the Cherry MX feel sharper but are more tiring in the long run, and the Gaterons are easier to write on and the tactile Feedback is slightly less pronounced. 

-

-Keycaps


-

-If you ever purchase a Kinesis keyboard, go with the PCB keycaps. They upgrade the typing experience a lot. The only thing you will lose is that the backlighting won't shine through them. But that is a reasonable tradeoff. When do I need backlighting? I am supposed to look at the screen and not the keyboard while typing. 

-

-I went with the blank keycaps, by the way.

-

-

-

-Keymap editor


-

-There is no official keymap editor. You have to edit a configuration file manually, build the firmware from scratch, and upload the firmware with the new keymap to both keyboard halves. The Professional version of his keyboard, by the way, runs on the ZMK open-source firmware.

-

-Many users find the need for an easy-to-use keymap editor an issue. But this is the Pro model. You can also go with the non-Pro, which runs on non-open-source firmware and has no Bluetooth (it must be operated entirely on USB).

-

-There is a 3rd party solution which is supposed to configure the keymap for the Professional model as bliss, but I have never used it. As a part-time programmer and full-time Site Reliability Engineer, I am okay configuring the keymap in my text editor and building it in a local docker container. This is one of the standard ways of doing it here. You could also use a GitHub pipeline for the firmware build, but I prefer building it locally on my machine. This all seems natural to me, but this may be an issue for "the average Joe" user.

-

-First steps


-

-I didn't measure the usual words per minute (wpm) on my previous keyboard, the ZSA Moonlander, but I guess that it was around 40-50wpm. Once the Kinesis arrived, I started practising. The experience was quite different due to the concave keycaps, so I barely managed 10wpm on the first day.

-

-I quickly noticed that I could not continue using the freestyle 6-finger typing system I was used to on my Moonlander or any previous keyboards I worked with. I learned ten-finger touch typing from scratch to be more efficient with the Kinesis keyboard. The keyboard forces you to embrace touch typing.

-

-Sometimes, there were brain farts, and I couldn't type at all. The trick was not to freak out about it, but to move on. If your average goes down a bit for a day, it doesn't matter; the long-term trend over several days and weeks matters, not the one-off wpm high score.

-

-Although my wrist pain seemed to go away aftre the first week of using the Kinesis, my fingers became tired of adjusting to the new way of typing. My hands were stiff, as if I had been training for the Olympics. Only after three weeks did I start to feel comfortable with it. If it weren't for the comments I read online, I would have sent it back after week 2.

-

-I also had a problem with the left pinky finger, where I could not comfortably reach the p key. This involved moving the whole hand. An easy fix was to swap p with ; on the keyboard layout.

-

-Considering alternate layouts


-

-As I was going to learn 10-finger touch typing from scratch, I also played with the thought of switching from the Qwerty to the Dvorak or Colemak keymap, but after reading some comments on the internet, I decided against it: 

-

-
-These layouts (Dvorak and Colemak) will minimize the finger travel for the most commonly used English words, but they necessarily don't give you a better wpm score. 
-One comment on Redit also mentioned that getting stiffer fingers with these layouts is more likely than with Qwerty, as in Qwerty, he had to stretch out his fingers more often, which helps here.
-There are also many applications and websites with keyboard shortcuts and are Qwerty-optimized.
-You won't be able to use someone else's computer as there will be likely Qwerty. Some report that after using an alternative layout for a while, they forget how to use Qwerty.
-


-Training how to type


-

-Tools


-

-One of the most influential tools in my touch typing journey has been keybr.com. This site/app helped me learn 10-finger touch typing, and I practice daily for 30 minutes (in the first two weeks, up to an hour every day). The key is persistence and focus on technique rather than speed; the latter naturally improves with regular practice. Precision matters, too, so I always correct my errors using the backspace key.

-

-https://keybr.com

-

-I also used a command-line tool called tt, which is written in Go. It has a feature that I found very helpful: the ability to practice typing by piping custom text into it. Additionally, I appreciated its customization options, such as choosing a colour theme and specifying how statistics are displayed.

-

-https://github.com/lemnos/tt

-

-I wrote myself a small Ruby script that would randomly select a paragraph from one of my eBooks or book notes and pipe it to tt. This helped me remember some of the books I read and also practice touch typing.

-

-My keybr.com statistics


-

-Overall, I trained for around 4 months in more than 5,000 sessions. My top speed in a session was 127.1wpm (up from barely 10wpm at the beginning).

-

-

-

-My overall average speed over those 5,000 sessions was 80wpm. The average speed over the last week was over 100wpm. The green line represents the wpm average (increasing trend), the purple line represents the number of keys in the practices (not much movement there, as all keys are unlocked), and the red line represents the average typing accuracy.

-

-

-

-Around the middle, you see a break-in of the wpm average value. This was where I swapped the p and ; keys, but after some retraining, I came back to the previous level and beyond.

-

-Tips and tricks


-

-These are some tips and tricks I learned along the way to improve my typing speed:

-

-Relax


-

-It's easy to get cramped when trying to hit this new wpm mark, but this is just holding you back. Relax and type at a natural pace. Now I also understand why my Katate Sensei back in London kept screaming "RELAAAX" at me during practice.... It didn't help much back then, though, as it is difficult to relax while someone screams at you! 

-

-Focus on accuracy first


-

-This goes with the previous point. Instead of trying to speed through sessions as quickly as possible, slow down and try to type the words correctly—so don't rush it. If you aren't fast yet, the reason is that your brain hasn't trained enough. It will come over time, and you will be faster.

-

-Chording


-

-A trick to getting faster is to type by word and pause between each word so you learn the words by chords. From 80wpm and beyond, this makes a real difference. 

-

-Punctuation and Capitalization


-

-I included 10% punctuation and 20% capital letters in my keybr.com practice sessions to simulate real typing conditions, which improved my overall working efficiency. I guess I would have gone to 120wpm in average if I didn't include this options...

-

-Reverse shifting


-

-Reverse shifting aka left-right shifting is to... 

-

-
-...use the left shift key for letters on the right keyboard side.
-...use the right shift key for letters on the left keyboard side.
-


-This makes using the shift key a blaze.

-

-Enter the flow state


-

-Listening to music helps me enter a flow state during practice sessions, which makes typing training a bit addictive (which is good, or isn't it?).

-

-Repeat every word


-

-There's a setting on keybr.com that makes it so that every word is always repeated, having you type every word twice in a row. I liked this feature very much, and I think it also helped to improve my practice.

-

-Don't use the same finger for two consecutive keystrokes


-

-Apparently, if you want to type fast, avoid using the same finger for two consecutive keystrokes. This means you don't always need to use the same finger for the same keys. 

-However, there are no hard and fast rules. Thus, everyone develops their system for typing word combinations. An exception would be if you are typing the very same letter in a row (e.g., t in letter)—here, you are using the same finger for both ts.

-

-Warm-up


-

-You can't reach your average typing speed first ting the morning. It would help if you warmed up before the exercise or practice later during the day. Also, some days are good, others not so, e.g., after a bad night's sleep. What matters is the mid- and long-term trend, not the fluctuations here, though.

-

-Travel keyboard


-

-As mentioned, the Kinesis is a great keyboard, but it is not meant for travel.

-

-I guess keyboards will always be my expensive hobby, so I also purchased another ergonomic, ortho-linear, concave split keyboard, the Glove80 (with the Red Pro low-profile switches). This keyboard is much lighter and, in my opinion, much better suited for travel than the Kinesis. It also comes with a great travel case. 

-

-Here is a photo of me using it with my Surface Go 2 (it runs Linux, by the way) while waiting for the baggage drop at the airport:

-

-

-

-For everyday work, I prefer the tactile Browns on the Kinesis over the Red Pro I have on the Glove80 (normal profile vs. low profile). The Kinesis feels much more premium, whereas the Glove80 is much lighter and easier to store away in a rucksack (the official travel case is a bit bulky, so I wrapped it simply in bubble plastic).

-

-The F-key row is odd at the Glove80. I would have preferred more keys on the sides like the Kinesis, and I use them for [] {} (), which is pretty handy there. However, I like the thumb cluster of the Glove80 more than the one on the Kinesis.

-

-The good thing is that I can switch between both keyboards instantly without retraining my typing memories. I've configured (as much as possible) the same keymaps on both my Kinesis and Glove80, making it easy to switch between them at any occasion. 

-

-Interested in the Glove80? I suggest also reading this review:

-

-Review of the Glove80 keyboard

-

-Upcoming custom Kinesis build


-

-As I mentioned, keyboards will remain an expensive hobby of mine. I don't regret anything here, though. After all, I use keyboards at my day job. I've ordered a Kinesis custom build with the Gateron Kangaroo switches, and I'm excited to see how that compares to my current setup. I'm still deciding whether to keep my Gateron Brown-equipped Kinesis as a secondary keyboard or possibly leave it at my in-laws for use when visiting or to sell it.

-

-Update 2025-02-22: I've received my custom Kinesis Adv. 360 build with the Gateron Baby Kangaroo key switches. I am absolutely in love! I will keep my Gateron Brown version around, though.

-

-Conclusion


-

-When I traveled with the Glove80 for work to the London office, a colleague stared at my keyboard and made jokes that it might be broken (split into two halves). But other than that... 

-

-Ten-finger touch typing has improved my efficiency and has become a rewarding discipline. Whether it's the keyboards I use, the tools I practice with, or the techniques I've adopted, each step has been a learning experience. I hope sharing my journey provides valuable insights and inspiration for anyone looking to improve their touch typing skills.

-

-I also accidentally started using a 10-finger-like system (maybe still 6 fingers, but better than before) on my regular laptop keyboard. I could be more efficient on the laptop keyboard. The form is different there (not ortholinear, not concave keycaps, etc.), but my typing has improved there too (even if it is only by a little bit).

-

-I don't want to return to a non-concave keyboard as my default. I will use other keyboards still once in a while but only for short periods or when I have to (e.g. travelling with my Laptop and when there is no space to put an external keyboard)

-

-Learning to touch type has been an eye-opening experience for me, not just for work but also for personal projects. Now, writing documentation is so much fun; who could believe that? Furthermore, working with Slack (communicating with colleagues) is more fun now as well.

-

-E-Mail your comments to paul@nospam.buetow.org :-)

-

 Back to the main site

             
         
diff --git a/gemfeed/index.html b/gemfeed/index.html
index 9b5356cf..5a20ed38 100644
--- a/gemfeed/index.html
+++ b/gemfeed/index.html
@@ -28,6 +28,7 @@
 2026-01-01 - Posts from July to December 2025

 2026-01-01 - Cloudless Kobo Forma with KOReader

 2025-12-24 - X-RAG Observability Hackathon

+2025-12-14 - f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 - f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-11-02 - 'The Courage To Be Disliked' book notes

 2025-11-02 - Perl New Features and Foostats

diff --git a/index.html b/index.html
index 23885f17..a5731fb7 100644
--- a/index.html
+++ b/index.html
@@ -56,6 +56,7 @@
 2026-01-01 - Posts from July to December 2025

 2026-01-01 - Cloudless Kobo Forma with KOReader

 2025-12-24 - X-RAG Observability Hackathon

+2025-12-14 - f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

 2025-12-07 - f3s: Kubernetes with FreeBSD - Part 8: Observability

 2025-11-02 - 'The Courage To Be Disliked' book notes

 2025-11-02 - Perl New Features and Foostats

-- 
cgit v1.2.3