diff options
| -rw-r--r-- | f3s/DRAFT-observability2.gmi | 78 | ||||
| -rw-r--r-- | f3s/prometheus/persistence-values.yaml | 16 |
2 files changed, 15 insertions, 79 deletions
diff --git a/f3s/DRAFT-observability2.gmi b/f3s/DRAFT-observability2.gmi index 5972df1..74771fd 100644 --- a/f3s/DRAFT-observability2.gmi +++ b/f3s/DRAFT-observability2.gmi @@ -2,7 +2,7 @@ ## Introduction -This post covers improvements to the observability stack set up in Part 8. The main focus is making the Grafana dashboards more readable by displaying hostnames instead of IP addresses, and enabling etcd metrics monitoring for the k3s cluster. +This post covers improvements to the observability stack set up in Part 8. The main focus is making the Node Exporter dashboards more readable by displaying hostnames instead of IP addresses, and enabling etcd metrics monitoring for the k3s cluster. => ./2025-12-07-f3s-kubernetes-with-freebsd-part-8.html Part 8: Observability @@ -68,34 +68,13 @@ prometheus-node-exporter: This uses the Kubernetes node name metadata (__meta_kubernetes_pod_node_name) to set the instance label, which automatically gives us r0.lan.buetow.org, r1.lan.buetow.org, and r2.lan.buetow.org. -### Applying the changes - -After updating the configuration files, I recreated the secret and upgraded Prometheus: - -``` -just upgrade -``` - -### Purging old metrics - -To avoid having both old IP-based and new hostname-based metrics in Prometheus, I purged all historical data by uninstalling and reinstalling Prometheus: - -``` -just uninstall -# On NFS server (f0 or f1): -rm -rf /data/nfs/k3svolumes/prometheus/data/* -just install -``` - -This gives a clean start with only hostname-based instance labels. - ## Enabling etcd metrics monitoring -The etcd dashboard in Grafana initially showed no data because k3s uses an embedded etcd that doesn't expose metrics by default. +The etcd dashboard initially showed no data because k3s uses an embedded etcd that doesn't expose metrics by default. ### Enabling etcd metrics in k3s -On each control-plane node (r0, r1, r2), create or edit /etc/rancher/k3s/config.yaml: +On each control-plane node (r0, r1, r2), create /etc/rancher/k3s/config.yaml: ``` etcd-expose-metrics: true @@ -115,7 +94,7 @@ curl http://127.0.0.1:2381/metrics | grep etcd ### Configuring Prometheus to scrape etcd -In persistence-values.yaml, enable kubeEtcd with the node IP addresses and relabeling for hostnames: +In persistence-values.yaml, enable kubeEtcd with the node IP addresses: ``` kubeEtcd: @@ -128,20 +107,6 @@ kubeEtcd: enabled: true port: 2381 targetPort: 2381 - serviceMonitor: - relabelings: - - sourceLabels: [__address__] - regex: '192\.168\.1\.120:2381' - targetLabel: instance - replacement: 'r0.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.121:2381' - targetLabel: instance - replacement: 'r1.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.122:2381' - targetLabel: instance - replacement: 'r2.lan.buetow.org' ``` Apply the changes: @@ -152,26 +117,25 @@ just upgrade ### Verifying etcd metrics -After the changes, all etcd targets show hostnames and are being scraped: +After the changes, all etcd targets are being scraped: ``` kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 \ - -c prometheus -- wget -qO- 'http://localhost:9090/api/v1/targets' | \ - jq -r '.data.activeTargets[] | select(.labels.job | test("etcd")) | - "\(.labels.job): \(.labels.instance) - \(.health)"' + -c prometheus -- wget -qO- 'http://localhost:9090/api/v1/query?query=etcd_server_has_leader' | \ + jq -r '.data.result[] | "\(.metric.instance): \(.value[1])"' ``` Output: ``` -kube-etcd: r0.lan.buetow.org - up -kube-etcd: r1.lan.buetow.org - up -kube-etcd: r2.lan.buetow.org - up +192.168.1.120:2381: 1 +192.168.1.121:2381: 1 +192.168.1.122:2381: 1 ``` -The etcd dashboard in Grafana now displays metrics for all three control-plane nodes. +The etcd dashboard in Grafana now displays metrics including Raft proposals, leader elections, and peer round trip times. -## Updated persistence-values.yaml +## Complete persistence-values.yaml The complete updated persistence-values.yaml: @@ -193,20 +157,6 @@ kubeEtcd: enabled: true port: 2381 targetPort: 2381 - serviceMonitor: - relabelings: - - sourceLabels: [__address__] - regex: '192\.168\.1\.120:2381' - targetLabel: instance - replacement: 'r0.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.121:2381' - targetLabel: instance - replacement: 'r1.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.122:2381' - targetLabel: instance - replacement: 'r2.lan.buetow.org' prometheus: prometheusSpec: @@ -246,9 +196,9 @@ grafana: Two improvements were made to the observability stack: -* Instance labels now show hostnames (e.g., f0.lan.buetow.org) instead of IP addresses +* Node Exporter instance labels now show hostnames (e.g., f0.lan.buetow.org) instead of IP addresses * Enabled etcd metrics monitoring for the k3s embedded etcd -These changes make the dashboards more readable and provide visibility into etcd cluster health. +These changes make the Node Exporter dashboards more readable and provide visibility into etcd cluster health. => https://codeberg.org/snonux/conf/src/branch/master/f3s/prometheus prometheus configuration on Codeberg diff --git a/f3s/prometheus/persistence-values.yaml b/f3s/prometheus/persistence-values.yaml index d3b8ae0..477410a 100644 --- a/f3s/prometheus/persistence-values.yaml +++ b/f3s/prometheus/persistence-values.yaml @@ -15,20 +15,6 @@ kubeEtcd: enabled: true port: 2381 targetPort: 2381 - serviceMonitor: - relabelings: - - sourceLabels: [__address__] - regex: '192\.168\.1\.120:2381' - targetLabel: instance - replacement: 'r0.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.121:2381' - targetLabel: instance - replacement: 'r1.lan.buetow.org' - - sourceLabels: [__address__] - regex: '192\.168\.1\.122:2381' - targetLabel: instance - replacement: 'r2.lan.buetow.org' prometheus: prometheusSpec: @@ -61,4 +47,4 @@ grafana: podSecurityContext: fsGroup: 911 runAsUser: 911 - runAsGroup: 911
\ No newline at end of file + runAsGroup: 911 |
