From 4069484ca68ad89ddc720b7fd70cd01922290c60 Mon Sep 17 00:00:00 2001 From: Paul Buetow Date: Mon, 30 Mar 2026 09:17:15 +0300 Subject: Update content for gemtext --- ...-03-31-distributed-systems-simulator-part-1.gmi | 294 + ...31-distributed-systems-simulator-part-1.gmi.tpl | 268 + ...-04-01-distributed-systems-simulator-part-2.gmi | 367 + ...01-distributed-systems-simulator-part-2.gmi.tpl | 349 + ...-04-02-distributed-systems-simulator-part-3.gmi | 264 + ...02-distributed-systems-simulator-part-3.gmi.tpl | 247 + gemfeed/DRAFT-distributed-systems-simulator.gmi | 851 - .../DRAFT-distributed-systems-simulator.gmi.tpl | 806 - gemfeed/atom.xml | 29246 +++++++++---------- gemfeed/index.gmi | 3 + index.gmi | 3 + 11 files changed, 16328 insertions(+), 16370 deletions(-) create mode 100644 gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi create mode 100644 gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi.tpl create mode 100644 gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi create mode 100644 gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi.tpl create mode 100644 gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi create mode 100644 gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi.tpl delete mode 100644 gemfeed/DRAFT-distributed-systems-simulator.gmi delete mode 100644 gemfeed/DRAFT-distributed-systems-simulator.gmi.tpl diff --git a/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi b/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi new file mode 100644 index 00000000..556f8281 --- /dev/null +++ b/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi @@ -0,0 +1,294 @@ +# Distributed Systems Simulator - Part 1: Introduction and GUI + +> Published at 2026-03-31T00:00:00+03:00 + +This is the first blog post of the Distributed Systems Simulator series, written for the recent v1.1.0 release. It explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +=> ./2026-03-31-distributed-systems-simulator-part-1.gmi 2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI (You are currently reading this) +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi 2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi 2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +## Table of Contents + +* ⇢ Distributed Systems Simulator - Part 1: Introduction and GUI +* ⇢ ⇢ Motivation +* ⇢ ⇢ Installation +* ⇢ ⇢ Fundamentals +* ⇢ ⇢ ⇢ Client/Server Model +* ⇢ ⇢ ⇢ Processes and Their Roles +* ⇢ ⇢ ⇢ Messages +* ⇢ ⇢ ⇢ Local and Global Clocks +* ⇢ ⇢ ⇢ Events +* ⇢ ⇢ ⇢ Protocols +* ⇢ ⇢ Graphical User Interface (GUI) +* ⇢ ⇢ ⇢ Simple Mode +* ⇢ ⇢ ⇢ The Menu Bar +* ⇢ ⇢ ⇢ The Toolbar +* ⇢ ⇢ ⇢ The Visualization +* ⇢ ⇢ ⇢ Color Differentiation +* ⇢ ⇢ ⇢ The Sidebar +* ⇢ ⇢ ⇢ The Log Window +* ⇢ ⇢ ⇢ Expert Mode +* ⇢ ⇢ ⇢ Configuration Settings + +## Motivation + +Distributed systems are complex—interactions between nodes, network partitions, failure scenarios are hard to debug in production. A simulator lets you experiment with architectures, observe how systems behave under failure, and learn consensus algorithms, replication strategies, and fault tolerance in a controlled, repeatable environment. No operational overhead, no real infrastructure—just focused exploration of system design. + +In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system: + +> "A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum + +The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system. + +This thesis aims to make distributed systems easier to understand from a different angle. Instead of the end-user perspective, it focuses on the functional methods of protocols and their processes, making all relevant events of a distributed system transparent. + +To achieve this, I developed a simulator, particularly for teaching and learning at the University of Applied Sciences Aachen. Protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there's room for personal experiments—no restriction to a fixed number of protocols. Users can design their own. + +The original simulator (VS-Sim) was written in Java 6 in 2008 with a German-language UI. In 2025, I revamped and modernized it as ds-sim: translated the entire codebase and UI from German to English, migrated the build system from hand-rolled Ant scripts to Maven, upgraded from Java 6 to Java 21 (adopting sealed class hierarchies, record types, formatted strings, pattern matching), introduced a proper exception hierarchy and consistent error handling, added comprehensive Javadoc documentation, implemented a headless testing framework (208 unit tests covering core components, the event system, and all protocol implementations), reorganized the project structure to follow standard Maven conventions, and added architecture documentation. Total: 199 files, over 15,000 lines of new code. Back in 2008, I wrote every line by hand in Vim. For the 2025 modernization, Claude Code did most of the heavy lifting—translation, refactoring, test generation, documentation. Times have changed. + +## Installation + +The modernized ds-sim requires Java 21 or higher and Maven 3.8 or higher. + +``` +# Clone the repository +git clone https://codeberg.org/snonux/ds-sim.git +cd ds-sim + +# Set JAVA_HOME if needed (e.g. on Fedora Linux) +export JAVA_HOME=/usr/lib/jvm/java-21-openjdk + +# Build the project +mvn clean package + +# Run the simulator +java -jar target/ds-sim-*.jar +``` + +For a faster development build without running tests: + +``` +mvn package -DskipTests +``` + +After building, the following artifacts are available in the `target/` directory: + +* `ds-sim-1.1.0.jar` - Executable JAR with all dependencies bundled +* `original-ds-sim-1.1.0.jar` - JAR without dependencies + +The project also includes 208 unit tests that can be run with `mvn test`. Example simulation files for all built-in protocols are included in the `saved-simulations/` directory. + +=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg + +## Fundamentals + +For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters. + +### Client/Server Model + +``` ++-----------------------------------------+ +| | +| +--------+ +--------+ | +| | Client |<------->| Server | | +| +--------+ +--------+ | +| | +| Sending of Messages | +| | ++-----------------------------------------+ + +Figure 1.1: Client/Server Model +``` + +The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate. + +### Processes and Their Roles + +A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID). + +### Messages + +In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID). + +### Local and Global Clocks + +In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong. + +Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later. + +``` ++---------------------+ +---------------------+ +| Process 1 | | Process 2 | +| | | | +| +-----------------+ | | +-----------------+ | +| |Server Protocol A| | | |Client Protocol A| | +| +-----------------+ | | +-----------------+ | +| | | | +| +-----------------+ | +---------------------+ +| |Client Protocol B| | +| +-----------------+ | +---------------------+ +| | | Process 3 | ++---------------------+ | | + | +-----------------+ | + | |Server Protocol B| | + | +-----------------+ | + | | + +---------------------+ + +Figure 1.2: Client/Server Protocols +``` + +In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in the "Additional Examples" section. + +### Events + +A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective. + +Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as: + +* Message receive events: Triggered when a message arrives at a recipient process +* Protocol schedule events (alarms): Triggered by a timer set by a protocol, e.g. for retransmission timeouts +* Random events: Such as random process crashes based on configured crash probability + +### Protocols + +A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol. + +In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other. + +Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol later). + +## Graphical User Interface (GUI) + +### Simple Mode + +=> ./distributed-systems-simulator/ds-sim-screenshot2.png Screenshot: The simulator showing the settings dialog. The visualization area displays process bars with message lines between them. The settings window allows configuring simulation parameters like number of processes, simulation duration, clock drift, message loss probability, and more. + +The simulator requires JDK 21 and can be started with the command `java -jar target/ds-sim-VERSION.jar` + +The simulator then presents itself with a main window. To create a new simulation, select "New Simulation" from the "File" menu, after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used. + +By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later. + +### The Menu Bar + +In the File menu, you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations. + +Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. The Simulator menu offers the same options as the toolbar, which is described in the next section. + +Some menu items are only accessible when a simulation has already been created or loaded in the current window. + +### The Toolbar + +The toolbar is located at the top left of the simulator. The toolbar contains the functions most frequently needed by users. The toolbar offers four different functions: + +* Reset simulation: can only be activated when the simulation has been paused or has finished +* Repeat simulation: cannot be activated if the simulation has not yet been started +* Pause simulation: can only be activated when the simulation is currently running +* Start simulation: can only be activated when the simulation is not currently running and has not yet finished + +### The Visualization + +The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. The visualization shows processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time. + +The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later). + +Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options. A process can only be forced to crash or be revived via the popup menu during a running simulation. + +In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time. + +### Color Differentiation + +Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the following colors (these are only the default colors, which can be changed via the settings): + +``` +Process Colors: + Black - The simulation is not currently running + Green - The process is running normally + Orange - The mouse is over the process bar + Red - The process has crashed + +Message Colors: + Green - The message is still in transit + Blue - The message has successfully reached its destination + Red - The message was lost +``` + +### The Sidebar + +The sidebar is used to program process events. At the top, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events along with their occurrence times and PIDs. + +To create a new event, the user can either right-click on a process bar and select "Insert local event", or select an event below the event table, enter the event occurrence time in the text field below, and click "Apply". + +Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective. + +In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process. There, all variables of the process can be edited, providing another way to access a process editor. + +### The Log Window + +The log window (at the bottom) logs all occurring events in chronological order. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples. + +``` +000000ms: New Simulation +000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +``` + +By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance. + +### Expert Mode + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator in expert mode, showing a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings. + +In expert mode, the following additional features become available: + +* Global events: In addition to local events, global events can now also be edited. Global events are triggered when a specific global simulation time is reached, rather than a local process time. This only makes a difference when local process times differ from the global time (e.g. due to clock drift). +* Direct PID selection: The user can directly select the associated PID when programming a new event. +* Lamport and Vector time switches: If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. Only one can be active at a time to maintain clarity. +* Anti-aliasing switch: Allows the user to activate or deactivate anti-aliasing for smoother graphics. Disabled by default for performance reasons. +* Log filter: A regular expression filter (Java syntax) that makes it possible to filter only the essential data from the logs. For example, `"PID: (1|2)"` shows only log lines containing "PID: 1" or "PID: 2". The filter can be activated retroactively and during a running simulation. + +### Configuration Settings + +The simulation settings window allows configuring many aspects of the simulation. Key settings include: + +* Processes receive own messages (default: false): Whether processes can receive messages they sent to themselves. +* Average message loss probabilities (default: true): Whether to average the loss probabilities of sender and receiver processes. +* Average transmission times (default: true): Whether to average the transmission times of sender and receiver processes. +* Show only relevant messages (default: true): Hides messages sent to processes that don't support the protocol. +* Expert mode (default: false): Enables expert mode features. +* Simulation speed (default: 0.5): The playback speed factor. A value of 1 means real-time, 0.5 means half speed. +* Number of processes (default: 3): Can also be changed during simulation via right-click. +* Simulation duration (default: 15s): Between 5 and 120 seconds. + +Each process also has individual settings: + +* Clock drift (default: 0.0): By what factor the local clock deviates. A value of 0.0 means no deviation. A value of 1.0 means double speed. Values > -1.0 are allowed. +* Random crash probability (default: 0%): Probability that the process crashes randomly during the simulation. +* Message loss probability (default: 0%): Probability that a message sent by this process is lost in transit. +* Min/Max transmission time (default: 500ms/2000ms): The range for random message delivery times. + +Read the next post of this series: + +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi Distributed Systems Simulator - Part 2: Built-in Protocols + +Other related posts are: + +=> ./2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 Loadbars 0.13.0 released +=> ./2022-12-24-ultrarelearning-java-my-takeaways.gmi 2022-12-24 (Re)learning Java - My takeaways +=> ./2022-03-06-the-release-of-dtail-4.0.0.gmi 2022-03-06 The release of DTail 4.0.0 +=> ./2016-11-20-object-oriented-programming-with-ansi-c.gmi 2016-11-20 Object oriented programming with ANSI C + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi.tpl b/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi.tpl new file mode 100644 index 00000000..bd49c8c2 --- /dev/null +++ b/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi.tpl @@ -0,0 +1,268 @@ +# Distributed Systems Simulator - Part 1: Introduction and GUI + +> Published at 2026-03-31T00:00:00+03:00 + +This is the first blog post of the Distributed Systems Simulator series, written for the recent v1.1.0 release. It explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +<< template::inline::index distributed-systems-simulator-part + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +<< template::inline::toc + +## Motivation + +Distributed systems are complex—interactions between nodes, network partitions, failure scenarios are hard to debug in production. A simulator lets you experiment with architectures, observe how systems behave under failure, and learn consensus algorithms, replication strategies, and fault tolerance in a controlled, repeatable environment. No operational overhead, no real infrastructure—just focused exploration of system design. + +In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system: + +> "A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum + +The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system. + +This thesis aims to make distributed systems easier to understand from a different angle. Instead of the end-user perspective, it focuses on the functional methods of protocols and their processes, making all relevant events of a distributed system transparent. + +To achieve this, I developed a simulator, particularly for teaching and learning at the University of Applied Sciences Aachen. Protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there's room for personal experiments—no restriction to a fixed number of protocols. Users can design their own. + +The original simulator (VS-Sim) was written in Java 6 in 2008 with a German-language UI. In 2025, I revamped and modernized it as ds-sim: translated the entire codebase and UI from German to English, migrated the build system from hand-rolled Ant scripts to Maven, upgraded from Java 6 to Java 21 (adopting sealed class hierarchies, record types, formatted strings, pattern matching), introduced a proper exception hierarchy and consistent error handling, added comprehensive Javadoc documentation, implemented a headless testing framework (208 unit tests covering core components, the event system, and all protocol implementations), reorganized the project structure to follow standard Maven conventions, and added architecture documentation. Total: 199 files, over 15,000 lines of new code. Back in 2008, I wrote every line by hand in Vim. For the 2025 modernization, Claude Code did most of the heavy lifting—translation, refactoring, test generation, documentation. Times have changed. + +## Installation + +The modernized ds-sim requires Java 21 or higher and Maven 3.8 or higher. + +``` +# Clone the repository +git clone https://codeberg.org/snonux/ds-sim.git +cd ds-sim + +# Set JAVA_HOME if needed (e.g. on Fedora Linux) +export JAVA_HOME=/usr/lib/jvm/java-21-openjdk + +# Build the project +mvn clean package + +# Run the simulator +java -jar target/ds-sim-*.jar +``` + +For a faster development build without running tests: + +``` +mvn package -DskipTests +``` + +After building, the following artifacts are available in the `target/` directory: + +* `ds-sim-1.1.0.jar` - Executable JAR with all dependencies bundled +* `original-ds-sim-1.1.0.jar` - JAR without dependencies + +The project also includes 208 unit tests that can be run with `mvn test`. Example simulation files for all built-in protocols are included in the `saved-simulations/` directory. + +=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg + +## Fundamentals + +For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters. + +### Client/Server Model + +``` ++-----------------------------------------+ +| | +| +--------+ +--------+ | +| | Client |<------->| Server | | +| +--------+ +--------+ | +| | +| Sending of Messages | +| | ++-----------------------------------------+ + +Figure 1.1: Client/Server Model +``` + +The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate. + +### Processes and Their Roles + +A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID). + +### Messages + +In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID). + +### Local and Global Clocks + +In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong. + +Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later. + +``` ++---------------------+ +---------------------+ +| Process 1 | | Process 2 | +| | | | +| +-----------------+ | | +-----------------+ | +| |Server Protocol A| | | |Client Protocol A| | +| +-----------------+ | | +-----------------+ | +| | | | +| +-----------------+ | +---------------------+ +| |Client Protocol B| | +| +-----------------+ | +---------------------+ +| | | Process 3 | ++---------------------+ | | + | +-----------------+ | + | |Server Protocol B| | + | +-----------------+ | + | | + +---------------------+ + +Figure 1.2: Client/Server Protocols +``` + +In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in the "Additional Examples" section. + +### Events + +A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective. + +Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as: + +* Message receive events: Triggered when a message arrives at a recipient process +* Protocol schedule events (alarms): Triggered by a timer set by a protocol, e.g. for retransmission timeouts +* Random events: Such as random process crashes based on configured crash probability + +### Protocols + +A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol. + +In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other. + +Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol later). + +## Graphical User Interface (GUI) + +### Simple Mode + +=> ./distributed-systems-simulator/ds-sim-screenshot2.png Screenshot: The simulator showing the settings dialog. The visualization area displays process bars with message lines between them. The settings window allows configuring simulation parameters like number of processes, simulation duration, clock drift, message loss probability, and more. + +The simulator requires JDK 21 and can be started with the command `java -jar target/ds-sim-VERSION.jar` + +The simulator then presents itself with a main window. To create a new simulation, select "New Simulation" from the "File" menu, after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used. + +By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later. + +### The Menu Bar + +In the File menu, you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations. + +Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. The Simulator menu offers the same options as the toolbar, which is described in the next section. + +Some menu items are only accessible when a simulation has already been created or loaded in the current window. + +### The Toolbar + +The toolbar is located at the top left of the simulator. The toolbar contains the functions most frequently needed by users. The toolbar offers four different functions: + +* Reset simulation: can only be activated when the simulation has been paused or has finished +* Repeat simulation: cannot be activated if the simulation has not yet been started +* Pause simulation: can only be activated when the simulation is currently running +* Start simulation: can only be activated when the simulation is not currently running and has not yet finished + +### The Visualization + +The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. The visualization shows processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time. + +The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later). + +Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options. A process can only be forced to crash or be revived via the popup menu during a running simulation. + +In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time. + +### Color Differentiation + +Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the following colors (these are only the default colors, which can be changed via the settings): + +``` +Process Colors: + Black - The simulation is not currently running + Green - The process is running normally + Orange - The mouse is over the process bar + Red - The process has crashed + +Message Colors: + Green - The message is still in transit + Blue - The message has successfully reached its destination + Red - The message was lost +``` + +### The Sidebar + +The sidebar is used to program process events. At the top, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events along with their occurrence times and PIDs. + +To create a new event, the user can either right-click on a process bar and select "Insert local event", or select an event below the event table, enter the event occurrence time in the text field below, and click "Apply". + +Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective. + +In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process. There, all variables of the process can be edited, providing another way to access a process editor. + +### The Log Window + +The log window (at the bottom) logs all occurring events in chronological order. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples. + +``` +000000ms: New Simulation +000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) +``` + +By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance. + +### Expert Mode + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator in expert mode, showing a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings. + +In expert mode, the following additional features become available: + +* Global events: In addition to local events, global events can now also be edited. Global events are triggered when a specific global simulation time is reached, rather than a local process time. This only makes a difference when local process times differ from the global time (e.g. due to clock drift). +* Direct PID selection: The user can directly select the associated PID when programming a new event. +* Lamport and Vector time switches: If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. Only one can be active at a time to maintain clarity. +* Anti-aliasing switch: Allows the user to activate or deactivate anti-aliasing for smoother graphics. Disabled by default for performance reasons. +* Log filter: A regular expression filter (Java syntax) that makes it possible to filter only the essential data from the logs. For example, `"PID: (1|2)"` shows only log lines containing "PID: 1" or "PID: 2". The filter can be activated retroactively and during a running simulation. + +### Configuration Settings + +The simulation settings window allows configuring many aspects of the simulation. Key settings include: + +* Processes receive own messages (default: false): Whether processes can receive messages they sent to themselves. +* Average message loss probabilities (default: true): Whether to average the loss probabilities of sender and receiver processes. +* Average transmission times (default: true): Whether to average the transmission times of sender and receiver processes. +* Show only relevant messages (default: true): Hides messages sent to processes that don't support the protocol. +* Expert mode (default: false): Enables expert mode features. +* Simulation speed (default: 0.5): The playback speed factor. A value of 1 means real-time, 0.5 means half speed. +* Number of processes (default: 3): Can also be changed during simulation via right-click. +* Simulation duration (default: 15s): Between 5 and 120 seconds. + +Each process also has individual settings: + +* Clock drift (default: 0.0): By what factor the local clock deviates. A value of 0.0 means no deviation. A value of 1.0 means double speed. Values > -1.0 are allowed. +* Random crash probability (default: 0%): Probability that the process crashes randomly during the simulation. +* Message loss probability (default: 0%): Probability that a message sent by this process is lost in transit. +* Min/Max transmission time (default: 500ms/2000ms): The range for random message delivery times. + +Read the next post of this series: + +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi Distributed Systems Simulator - Part 2: Built-in Protocols + +Other related posts are: + +<< template::inline::rindex java object-oriented-programming release + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi b/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi new file mode 100644 index 00000000..10a040de --- /dev/null +++ b/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi @@ -0,0 +1,367 @@ +# Distributed Systems Simulator - Part 2: Built-in Protocols + +> Published at 2026-04-01T00:00:00+03:00 + +This is the second blog post of the Distributed Systems Simulator series. This part covers all 10 built-in protocols with examples. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +=> ./2026-03-31-distributed-systems-simulator-part-1.gmi 2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi 2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols (You are currently reading this) +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi 2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +## Table of Contents + +* ⇢ Distributed Systems Simulator - Part 2: Built-in Protocols +* ⇢ ⇢ Protocols and Examples +* ⇢ ⇢ ⇢ Dummy Protocol +* ⇢ ⇢ ⇢ Ping-Pong Protocol +* ⇢ ⇢ ⇢ Broadcast Protocol +* ⇢ ⇢ ⇢ Internal Synchronization Protocol +* ⇢ ⇢ ⇢ Christian's Method (External Synchronization) +* ⇢ ⇢ ⇢ Berkeley Algorithm +* ⇢ ⇢ ⇢ One-Phase Commit Protocol +* ⇢ ⇢ ⇢ Two-Phase Commit Protocol +* ⇢ ⇢ ⇢ Basic Multicast Protocol +* ⇢ ⇢ ⇢ Reliable Multicast Protocol + +## Protocols and Examples + +The simulator comes with 10 built-in protocols. As described earlier, protocols are distinguished between server-side and client-side. Servers can respond to client messages, and clients can respond to server messages. Each process can support any number of protocols on both the client and server side. Users can also implement their own protocols using the simulator's Protocol API (see the Protocol API section). + +The program directory contains a `saved-simulations` folder with example simulations for each protocol as serialized `.dat` files. + +### Dummy Protocol + +The Dummy Protocol serves only as a template for creating custom protocols. When using the Dummy Protocol, only log messages are output when events occur. No further actions are performed. + +### Ping-Pong Protocol + +=> ./distributed-systems-simulator/ping-pong.png Visualization: The Ping-Pong Protocol showing two processes (P1 and P2) exchanging messages in a continuous back-and-forth pattern. Blue lines represent delivered messages bouncing between the process bars over a 15-second simulation. + +In the Ping-Pong Protocol, two processes -- Client P1 and Server P2 -- constantly send messages back and forth. The Ping-Pong client starts the first request, to which the server responds to the client. The client then responds again, and so on. Each message includes a counter that is incremented at each station and logged in the log window. + +``` +Programmed Ping-Pong Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------| +| 0 | 1 | Ping-Pong Client activate | +| 0 | 2 | Ping-Pong Server activate | +| 0 | 1 | Ping-Pong Client request start | +``` + +It is important that Process 1 activates its Ping-Pong client before starting a Ping-Pong client request. Before a process can start a request, it must have the corresponding protocol activated. This also applies to all other protocols. + +**Ping-Pong Storm Variant** + +=> ./distributed-systems-simulator/ping-pong-storm.png Visualization: The Ping-Pong Storm variant with three processes. P1 is the client, P2 and P3 are both servers. The visualization shows an exponentially growing number of messages as each client message generates two server responses, creating a dense web of blue and green message lines. + +By adding a third process P3 as an additional Ping-Pong server, a Ping-Pong "Storm" can be realized. Since every client message now receives two server responses, the number of messages doubles with each round, creating an exponential message flood. + +``` +Programmed Ping-Pong Storm Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------| +| 0 | 1 | Ping-Pong Client activate | +| 0 | 2 | Ping-Pong Server activate | +| 0 | 3 | Ping-Pong Server activate | +| 0 | 1 | Ping-Pong Client request start | +``` + +### Broadcast Protocol + +=> ./distributed-systems-simulator/broadcast.png Visualization: The Broadcast Protocol with 6 processes (P1-P6). Dense crossing message lines show how a broadcast from P1 propagates to all processes, with each process re-broadcasting to others. Blue lines indicate delivered messages, green lines indicate messages still in transit. + +The Broadcast Protocol behaves similarly to the Ping-Pong Protocol. The difference is that the protocol tracks -- using a unique Broadcast ID -- which messages have already been sent. Each process re-broadcasts all received messages to others, provided it has not already sent them. + +In this case, no distinction is made between client and server, so that the same action is performed when a message arrives at either side. This makes it possible, using multiple processes, to create a broadcast. P1 is the client and starts a request at 0ms and 2500ms. The simulation duration is exactly 5000ms. Since a client can only receive server messages and a server can only receive client messages, every process in this simulation is both server and client. + +``` +Programmed Broadcast Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------| +| 0 | 1-6 | Broadcast Client activate | +| 0 | 1-6 | Broadcast Server activate | +| 0 | 1 | Broadcast Client request start | +| 2500 | 1 | Broadcast Client request start | +``` + +### Internal Synchronization Protocol + +=> ./distributed-systems-simulator/int-sync.png Visualization: Internal Synchronization with 2 processes. P1 (client, clock drift 0.1) shows a faster-running clock reaching 15976ms by simulation end. The blue message lines show P1 periodically synchronizing with P2 (server, no drift), with the time corrections visible as slight adjustments in P1's timeline. + +The Internal Synchronization Protocol is used for synchronizing the local process time, which can be applied when a process time is running incorrectly due to clock drift. When the client wants to synchronize its (incorrect) local process time t_c with a server, it sends a client request. The server responds with its own local process time t_s, allowing the client to calculate a new, more accurate time for itself. + +After receiving the server response, the client P1 calculates its new local process time as: + +``` +t_c := t_s + 1/2 * (t'_min + t'_max) +``` + +This synchronizes P1's local time with an error of less than 1/2 * (t'_max - t'_min), where t'_min and t'_max are the assumed minimum and maximum transmission times configured in the protocol settings. + +In the example, the client process has a clock drift of 0.1 and the server has 0.0. The client starts a request at local process times 0ms, 5000ms, and 10000ms. By simulation end, P1's time is synchronized to 15976ms (an error of -976ms from the global 15000ms). + +``` +Programmed Internal Sync Events: + +| Time (ms) | PID | Event | +|-----------|-----|------------------------------------| +| 0 | 1 | Internal Sync Client activate | +| 0 | 2 | Internal Sync Server activate | +| 0 | 1 | Internal Sync Client request start | +| 5000 | 1 | Internal Sync Client request start | +| 10000 | 1 | Internal Sync Client request start | +``` + +Protocol variables (client-side): + +* Min. transmission time (Long: 500): The assumed t'_min in milliseconds +* Max. transmission time (Long: 2000): The assumed t'_max in milliseconds + +These can differ from the actual message transmission times t_min and t_max, allowing simulation of scenarios where the protocol is misconfigured and large synchronization errors occur. + +### Christian's Method (External Synchronization) + +=> ./distributed-systems-simulator/christians.png Visualization: Comparison of Internal Synchronization (P1) and Christian's Method (P3) with P2 as shared server. Both P1 and P3 have clock drift 0.1. The visualization shows P1 synchronized to 14567ms (error: -433ms) while P3 synchronized to 15539ms (error: -539ms), demonstrating the different accuracy of the two methods. + +Christian's Method uses the RTT (Round Trip Time) to approximate the transmission time of individual messages. When the client wants to synchronize its local time t_c with a server, it sends a request and measures the RTT t_rtt until the server response arrives. The server response contains the local process time t_s from the moment the server sent the response. The client then calculates its new local time as: + +``` +t_c := t_s + 1/2 * t_rtt +``` + +The accuracy is +/- (1/2 * t_rtt - u_min) where u_min is a lower bound for message transmission time. + +The visualization compares both synchronization methods side by side: P1 uses Internal Synchronization and P3 uses Christian's Method, with P2 serving both. Both P1 and P3 have clock drift 0.1. In this particular run, Internal Synchronization achieved a better result (-433ms error vs. -539ms), though results vary between runs due to random transmission times. + +``` +Programmed Comparison Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------------| +| 0 | 1 | Internal Sync Client activate | +| 0 | 1 | Internal Sync Client request start | +| 0 | 2 | Christian's Server activate | +| 0 | 2 | Internal Sync Server activate | +| 0 | 3 | Christian's Client activate | +| 0 | 3 | Christian's Client request start | +| 5000 | 1 | Internal Sync Client request start | +| 5000 | 3 | Christian's Client request start | +| 10000 | 1 | Internal Sync Client request start | +| 10000 | 3 | Christian's Client request start | +``` + +### Berkeley Algorithm + +=> ./distributed-systems-simulator/berkeley.png Visualization: The Berkeley Algorithm with 3 processes. P2 is the server (coordinator) sending time requests to clients P1 and P3. After collecting responses, P2 calculates correction values and sends them back. Final times show P1=16823ms, P2=14434ms, P3=13892ms -- all brought closer together through averaging. + +The Berkeley Algorithm is another method for synchronizing local clocks. This is the first protocol where the server initiates the requests. The server acts as a coordinator. The client processes are passive and must wait until a server request arrives. The server must know which client processes participate in the protocol, which is configured in the server's protocol settings. + +When the server wants to synchronize its local time t_s and the process times t_i of the clients (i = 1,...,n), it sends a server request. n is the number of participating clients. The clients then send their local process times back to the server. The server measures the RTTs r_i for all client responses. + +After all responses are received, the server sets its own time to the average t_avg of all known process times (including its own). The transmission time of a client response is estimated as half the RTT: + +``` +t_avg := 1/(n+1) * (t_s + SUM(r_i/2 + t_i)) +t_s := t_avg +``` + +The server then calculates a correction value k_i := t_avg - t_i for each client and sends it back. Each client sets its new time to t'_i := t'_i + k_i. + +``` +Programmed Berkeley Events: + +| Time (ms) | PID | Event | +|-----------|-----|-----------------------------------| +| 0 | 1 | Berkeley Client activate | +| 0 | 2 | Berkeley Server activate | +| 0 | 3 | Berkeley Client activate | +| 0 | 2 | Berkeley Server request start | +| 7500 | 2 | Berkeley Server request start | +``` + +Protocol variables (server-side): + +* PIDs of participating processes (Integer[]: [1,3]): The PIDs of the Berkeley client processes. The protocol will not work if a non-existent PID is specified or if the process does not support the Berkeley protocol on the client side. + +### One-Phase Commit Protocol + +=> ./distributed-systems-simulator/one-phase-commit.png Visualization: The One-Phase Commit Protocol with 3 processes. P1 crashes at 1000ms (shown in red) and recovers at 5000ms. P2 (server) periodically sends commit requests. The red lines show lost messages during P1's crash period, while blue lines show successful message exchanges after recovery. + +The One-Phase Commit Protocol is designed to move any number of clients to a commit. In practice, this could be creating or deleting a file that each client has a local copy of. The server is the coordinator and initiates the commit request. The server periodically resends the commit request until every client has acknowledged it. For this purpose, the PIDs of all participating client processes and a timer for resending must be configured. + +In the example, P1 and P3 are clients and P2 is the server. P1 crashes at 1000ms and recovers at 5000ms. The first two commit requests fail to reach P1 due to its crash. Only the third attempt succeeds. Each client acknowledges a commit request only once. + +``` +Programmed One-Phase Commit Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 1 | 1-Phase Commit Client activate | +| 0 | 2 | 1-Phase Commit Server activate | +| 0 | 3 | 1-Phase Commit Client activate | +| 0 | 2 | 1-Phase Commit Server request start | +| 1000 | 1 | Process crash | +| 5000 | 1 | Process revival | +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the commit request +* PIDs of participating processes (Integer[]: pids = [1,3]): The client process PIDs that should commit + +### Two-Phase Commit Protocol + +=> ./distributed-systems-simulator/two-phase-commit.png Visualization: The Two-Phase Commit Protocol with 3 processes. P2 (server) orchestrates a two-phase voting process with clients P1 and P3. The complex message pattern shows the voting phase followed by the commit/abort phase, with messages crossing between all three processes over a 10-second simulation. + +The Two-Phase Commit Protocol is an extension of the One-Phase Commit Protocol. The server first sends a request to all participating clients asking whether they want to commit. Each client responds with true or false. The server periodically retries until all results are collected. After receiving all votes, the server checks whether all clients voted true. If at least one client voted false, the commit process is aborted and a global result of false is sent to all clients. If all voted true, the global result true is sent. The global result is periodically resent until each client acknowledges receipt. + +In the example, P1 and P3 are clients and P2 is the server. The server sends its first request at 0ms. Here both P1 and P3 vote true, so the commit proceeds. + +``` +Programmed Two-Phase Commit Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 1 | 2-Phase Commit Client activate | +| 0 | 2 | 2-Phase Commit Server activate | +| 0 | 3 | 2-Phase Commit Client activate | +| 0 | 2 | 2-Phase Commit Server request start | +``` + +Example log extract showing the two-phase voting process: + +``` +000000ms: PID 2: Message sent; ID: 94; Protocol: 2-Phase Commit + Boolean: wantVote=true +000905ms: PID 3: Message received; ID: 94; Protocol: 2-Phase Commit +000905ms: PID 3: Message sent; ID: 95; Protocol: 2-Phase Commit + Integer: pid=3; Boolean: isVote=true; vote=true +000905ms: PID 3: Vote true sent +001880ms: PID 2: Message received; ID: 95; Protocol: 2-Phase Commit +001880ms: PID 2: Vote from Process 3 received! Result: true +001947ms: PID 1: Message received; ID: 94; Protocol: 2-Phase Commit +001947ms: PID 1: Vote true sent +003137ms: PID 2: Votes from all participating processes received! + Global result: true +003137ms: PID 2: Message sent; ID: 99; Protocol: 2-Phase Commit + Boolean: isVoteResult=true; voteResult=true +004124ms: PID 1: Global vote result received. Result: true +006051ms: PID 2: All participants have acknowledged the vote +010000ms: Simulation ended +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending +* PIDs of participating processes (Integer[]: pids = [1,3]): Client PIDs that should vote and commit + +Protocol variables (client-side): + +* Commit probability (Integer: ackProb = 50): The probability in percent that the client votes true (for commit) + +### Basic Multicast Protocol + +=> ./distributed-systems-simulator/basic-multicast.png Visualization: The Basic Multicast Protocol with 3 processes. P2 (client) sends periodic multicast messages to servers P1 and P3. P3 crashes at 3000ms (shown in red) and recovers at 6000ms. Red lines indicate lost messages, blue lines show delivered messages. Some messages to P1 are also lost due to the 30% message loss probability. + +The Basic Multicast Protocol is very simple. The client always initiates the request, which represents a simple multicast message. The Basic Multicast servers serve only to receive the message. No acknowledgments are sent. The client P2 sends a multicast message every 2500ms to servers P1 and P3. + +P1 can only receive multicast messages after 2500ms because it does not support the protocol before then. P3 is crashed from 3000ms to 6000ms and also cannot receive messages during that time. Each process has a 30% message loss probability, so some messages are lost in transit (shown in red). + +In this example, the 3rd multicast message to P3 and the 5th and 6th messages to P1 were lost. Only the 4th multicast message reached both destinations. + +``` +Programmed Basic Multicast Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 2 | Basic Multicast Client activate | +| 0 | 3 | Basic Multicast Server activate | +| 0 | 2 | Basic Multicast Client request start | +| 2500 | 1 | Basic Multicast Server activate | +| 2500 | 2 | Basic Multicast Client request start | +| 3000 | 3 | Process crash | +| 5000 | 2 | Basic Multicast Client request start | +| 6000 | 3 | Process revival | +| 7500 | 2 | Basic Multicast Client request start | +| 10000 | 2 | Basic Multicast Client request start | +| 12500 | 2 | Basic Multicast Client request start | +``` + +### Reliable Multicast Protocol + +=> ./distributed-systems-simulator/reliable-multicast.png Visualization: The Reliable Multicast Protocol with 3 processes. P2 (client) sends multicast messages to servers P1 and P3, retrying until acknowledgments are received from all servers. P3 crashes at 3000ms and recovers at 10000ms. Red lines show lost messages, blue lines show delivered ones. Despite failures, all servers eventually receive and acknowledge the multicast. + +In the Reliable Multicast Protocol, the client periodically resends its multicast message until it has received an acknowledgment from all participating servers. After each retry, the client "forgets" which servers have already acknowledged, so each new attempt must be acknowledged again by all participants. + +In the example, P2 is the client and P1 and P3 are the servers. At 0ms, the client initiates its multicast message. The message loss probability is set to 30% on all processes. The client needs exactly 5 attempts until successful delivery: + +* Attempt 1: P1 doesn't support the protocol yet. P3 receives the message but its ACK is lost. +* Attempt 2: The message to P1 is lost. P3 receives it but is crashed and can't process it. +* Attempt 3: P1 receives the message and ACKs successfully. The message to P3 is lost. +* Attempt 4: P1 receives and ACKs again. P3 receives it but is still crashed. +* Attempt 5: Both P1 and P3 receive the message and ACK successfully. + +``` +Programmed Reliable Multicast Events: + +| Time (ms) | PID | Event | +|-----------|-----|------------------------------------------| +| 0 | 3 | Reliable Multicast Server activate | +| 0 | 2 | Reliable Multicast Client activate | +| 0 | 2 | Reliable Multicast Client request start | +| 2500 | 1 | Reliable Multicast Server activate | +| 3000 | 3 | Process crash | +| 10000 | 3 | Process revival | +``` + +Example log extract: + +``` +000000ms: PID 2: Reliable Multicast Client activated +000000ms: PID 2: Message sent; ID: 280; Protocol: Reliable Multicast + Boolean: isMulticast=true +000000ms: PID 3: Reliable Multicast Server activated +001590ms: PID 3: Message received; ID: 280; Protocol: Reliable Multicast +001590ms: PID 3: ACK sent +002500ms: PID 1: Reliable Multicast Server activated +002500ms: PID 2: Message sent; ID: 282; Protocol: Reliable Multicast + Boolean: isMulticast=true +003000ms: PID 3: Crashed +005000ms: PID 2: Message sent; ID: 283; Protocol: Reliable Multicast +005952ms: PID 1: Message received; ID: 283 +005952ms: PID 1: ACK sent +007937ms: PID 2: ACK from Process 1 received! +... +011813ms: PID 2: ACK from Process 3 received! +011813ms: PID 2: ACKs from all participating processes received! +015000ms: Simulation ended +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the multicast +* PIDs of participating processes (Integer[]: pids = [1,3]): Server PIDs that should receive the multicast + +Read the next post of this series: + +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +Other related posts are: + +=> ./2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 Loadbars 0.13.0 released +=> ./2022-12-24-ultrarelearning-java-my-takeaways.gmi 2022-12-24 (Re)learning Java - My takeaways +=> ./2022-03-06-the-release-of-dtail-4.0.0.gmi 2022-03-06 The release of DTail 4.0.0 +=> ./2016-11-20-object-oriented-programming-with-ansi-c.gmi 2016-11-20 Object oriented programming with ANSI C + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi.tpl b/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi.tpl new file mode 100644 index 00000000..7b7a0e43 --- /dev/null +++ b/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi.tpl @@ -0,0 +1,349 @@ +# Distributed Systems Simulator - Part 2: Built-in Protocols + +> Published at 2026-04-01T00:00:00+03:00 + +This is the second blog post of the Distributed Systems Simulator series. This part covers all 10 built-in protocols with examples. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +<< template::inline::index distributed-systems-simulator-part + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +<< template::inline::toc + +## Protocols and Examples + +The simulator comes with 10 built-in protocols. As described earlier, protocols are distinguished between server-side and client-side. Servers can respond to client messages, and clients can respond to server messages. Each process can support any number of protocols on both the client and server side. Users can also implement their own protocols using the simulator's Protocol API (see the Protocol API section). + +The program directory contains a `saved-simulations` folder with example simulations for each protocol as serialized `.dat` files. + +### Dummy Protocol + +The Dummy Protocol serves only as a template for creating custom protocols. When using the Dummy Protocol, only log messages are output when events occur. No further actions are performed. + +### Ping-Pong Protocol + +=> ./distributed-systems-simulator/ping-pong.png Visualization: The Ping-Pong Protocol showing two processes (P1 and P2) exchanging messages in a continuous back-and-forth pattern. Blue lines represent delivered messages bouncing between the process bars over a 15-second simulation. + +In the Ping-Pong Protocol, two processes -- Client P1 and Server P2 -- constantly send messages back and forth. The Ping-Pong client starts the first request, to which the server responds to the client. The client then responds again, and so on. Each message includes a counter that is incremented at each station and logged in the log window. + +``` +Programmed Ping-Pong Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------| +| 0 | 1 | Ping-Pong Client activate | +| 0 | 2 | Ping-Pong Server activate | +| 0 | 1 | Ping-Pong Client request start | +``` + +It is important that Process 1 activates its Ping-Pong client before starting a Ping-Pong client request. Before a process can start a request, it must have the corresponding protocol activated. This also applies to all other protocols. + +**Ping-Pong Storm Variant** + +=> ./distributed-systems-simulator/ping-pong-storm.png Visualization: The Ping-Pong Storm variant with three processes. P1 is the client, P2 and P3 are both servers. The visualization shows an exponentially growing number of messages as each client message generates two server responses, creating a dense web of blue and green message lines. + +By adding a third process P3 as an additional Ping-Pong server, a Ping-Pong "Storm" can be realized. Since every client message now receives two server responses, the number of messages doubles with each round, creating an exponential message flood. + +``` +Programmed Ping-Pong Storm Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------| +| 0 | 1 | Ping-Pong Client activate | +| 0 | 2 | Ping-Pong Server activate | +| 0 | 3 | Ping-Pong Server activate | +| 0 | 1 | Ping-Pong Client request start | +``` + +### Broadcast Protocol + +=> ./distributed-systems-simulator/broadcast.png Visualization: The Broadcast Protocol with 6 processes (P1-P6). Dense crossing message lines show how a broadcast from P1 propagates to all processes, with each process re-broadcasting to others. Blue lines indicate delivered messages, green lines indicate messages still in transit. + +The Broadcast Protocol behaves similarly to the Ping-Pong Protocol. The difference is that the protocol tracks -- using a unique Broadcast ID -- which messages have already been sent. Each process re-broadcasts all received messages to others, provided it has not already sent them. + +In this case, no distinction is made between client and server, so that the same action is performed when a message arrives at either side. This makes it possible, using multiple processes, to create a broadcast. P1 is the client and starts a request at 0ms and 2500ms. The simulation duration is exactly 5000ms. Since a client can only receive server messages and a server can only receive client messages, every process in this simulation is both server and client. + +``` +Programmed Broadcast Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------| +| 0 | 1-6 | Broadcast Client activate | +| 0 | 1-6 | Broadcast Server activate | +| 0 | 1 | Broadcast Client request start | +| 2500 | 1 | Broadcast Client request start | +``` + +### Internal Synchronization Protocol + +=> ./distributed-systems-simulator/int-sync.png Visualization: Internal Synchronization with 2 processes. P1 (client, clock drift 0.1) shows a faster-running clock reaching 15976ms by simulation end. The blue message lines show P1 periodically synchronizing with P2 (server, no drift), with the time corrections visible as slight adjustments in P1's timeline. + +The Internal Synchronization Protocol is used for synchronizing the local process time, which can be applied when a process time is running incorrectly due to clock drift. When the client wants to synchronize its (incorrect) local process time t_c with a server, it sends a client request. The server responds with its own local process time t_s, allowing the client to calculate a new, more accurate time for itself. + +After receiving the server response, the client P1 calculates its new local process time as: + +``` +t_c := t_s + 1/2 * (t'_min + t'_max) +``` + +This synchronizes P1's local time with an error of less than 1/2 * (t'_max - t'_min), where t'_min and t'_max are the assumed minimum and maximum transmission times configured in the protocol settings. + +In the example, the client process has a clock drift of 0.1 and the server has 0.0. The client starts a request at local process times 0ms, 5000ms, and 10000ms. By simulation end, P1's time is synchronized to 15976ms (an error of -976ms from the global 15000ms). + +``` +Programmed Internal Sync Events: + +| Time (ms) | PID | Event | +|-----------|-----|------------------------------------| +| 0 | 1 | Internal Sync Client activate | +| 0 | 2 | Internal Sync Server activate | +| 0 | 1 | Internal Sync Client request start | +| 5000 | 1 | Internal Sync Client request start | +| 10000 | 1 | Internal Sync Client request start | +``` + +Protocol variables (client-side): + +* Min. transmission time (Long: 500): The assumed t'_min in milliseconds +* Max. transmission time (Long: 2000): The assumed t'_max in milliseconds + +These can differ from the actual message transmission times t_min and t_max, allowing simulation of scenarios where the protocol is misconfigured and large synchronization errors occur. + +### Christian's Method (External Synchronization) + +=> ./distributed-systems-simulator/christians.png Visualization: Comparison of Internal Synchronization (P1) and Christian's Method (P3) with P2 as shared server. Both P1 and P3 have clock drift 0.1. The visualization shows P1 synchronized to 14567ms (error: -433ms) while P3 synchronized to 15539ms (error: -539ms), demonstrating the different accuracy of the two methods. + +Christian's Method uses the RTT (Round Trip Time) to approximate the transmission time of individual messages. When the client wants to synchronize its local time t_c with a server, it sends a request and measures the RTT t_rtt until the server response arrives. The server response contains the local process time t_s from the moment the server sent the response. The client then calculates its new local time as: + +``` +t_c := t_s + 1/2 * t_rtt +``` + +The accuracy is +/- (1/2 * t_rtt - u_min) where u_min is a lower bound for message transmission time. + +The visualization compares both synchronization methods side by side: P1 uses Internal Synchronization and P3 uses Christian's Method, with P2 serving both. Both P1 and P3 have clock drift 0.1. In this particular run, Internal Synchronization achieved a better result (-433ms error vs. -539ms), though results vary between runs due to random transmission times. + +``` +Programmed Comparison Events: + +| Time (ms) | PID | Event | +|-----------|-----|--------------------------------------| +| 0 | 1 | Internal Sync Client activate | +| 0 | 1 | Internal Sync Client request start | +| 0 | 2 | Christian's Server activate | +| 0 | 2 | Internal Sync Server activate | +| 0 | 3 | Christian's Client activate | +| 0 | 3 | Christian's Client request start | +| 5000 | 1 | Internal Sync Client request start | +| 5000 | 3 | Christian's Client request start | +| 10000 | 1 | Internal Sync Client request start | +| 10000 | 3 | Christian's Client request start | +``` + +### Berkeley Algorithm + +=> ./distributed-systems-simulator/berkeley.png Visualization: The Berkeley Algorithm with 3 processes. P2 is the server (coordinator) sending time requests to clients P1 and P3. After collecting responses, P2 calculates correction values and sends them back. Final times show P1=16823ms, P2=14434ms, P3=13892ms -- all brought closer together through averaging. + +The Berkeley Algorithm is another method for synchronizing local clocks. This is the first protocol where the server initiates the requests. The server acts as a coordinator. The client processes are passive and must wait until a server request arrives. The server must know which client processes participate in the protocol, which is configured in the server's protocol settings. + +When the server wants to synchronize its local time t_s and the process times t_i of the clients (i = 1,...,n), it sends a server request. n is the number of participating clients. The clients then send their local process times back to the server. The server measures the RTTs r_i for all client responses. + +After all responses are received, the server sets its own time to the average t_avg of all known process times (including its own). The transmission time of a client response is estimated as half the RTT: + +``` +t_avg := 1/(n+1) * (t_s + SUM(r_i/2 + t_i)) +t_s := t_avg +``` + +The server then calculates a correction value k_i := t_avg - t_i for each client and sends it back. Each client sets its new time to t'_i := t'_i + k_i. + +``` +Programmed Berkeley Events: + +| Time (ms) | PID | Event | +|-----------|-----|-----------------------------------| +| 0 | 1 | Berkeley Client activate | +| 0 | 2 | Berkeley Server activate | +| 0 | 3 | Berkeley Client activate | +| 0 | 2 | Berkeley Server request start | +| 7500 | 2 | Berkeley Server request start | +``` + +Protocol variables (server-side): + +* PIDs of participating processes (Integer[]: [1,3]): The PIDs of the Berkeley client processes. The protocol will not work if a non-existent PID is specified or if the process does not support the Berkeley protocol on the client side. + +### One-Phase Commit Protocol + +=> ./distributed-systems-simulator/one-phase-commit.png Visualization: The One-Phase Commit Protocol with 3 processes. P1 crashes at 1000ms (shown in red) and recovers at 5000ms. P2 (server) periodically sends commit requests. The red lines show lost messages during P1's crash period, while blue lines show successful message exchanges after recovery. + +The One-Phase Commit Protocol is designed to move any number of clients to a commit. In practice, this could be creating or deleting a file that each client has a local copy of. The server is the coordinator and initiates the commit request. The server periodically resends the commit request until every client has acknowledged it. For this purpose, the PIDs of all participating client processes and a timer for resending must be configured. + +In the example, P1 and P3 are clients and P2 is the server. P1 crashes at 1000ms and recovers at 5000ms. The first two commit requests fail to reach P1 due to its crash. Only the third attempt succeeds. Each client acknowledges a commit request only once. + +``` +Programmed One-Phase Commit Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 1 | 1-Phase Commit Client activate | +| 0 | 2 | 1-Phase Commit Server activate | +| 0 | 3 | 1-Phase Commit Client activate | +| 0 | 2 | 1-Phase Commit Server request start | +| 1000 | 1 | Process crash | +| 5000 | 1 | Process revival | +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the commit request +* PIDs of participating processes (Integer[]: pids = [1,3]): The client process PIDs that should commit + +### Two-Phase Commit Protocol + +=> ./distributed-systems-simulator/two-phase-commit.png Visualization: The Two-Phase Commit Protocol with 3 processes. P2 (server) orchestrates a two-phase voting process with clients P1 and P3. The complex message pattern shows the voting phase followed by the commit/abort phase, with messages crossing between all three processes over a 10-second simulation. + +The Two-Phase Commit Protocol is an extension of the One-Phase Commit Protocol. The server first sends a request to all participating clients asking whether they want to commit. Each client responds with true or false. The server periodically retries until all results are collected. After receiving all votes, the server checks whether all clients voted true. If at least one client voted false, the commit process is aborted and a global result of false is sent to all clients. If all voted true, the global result true is sent. The global result is periodically resent until each client acknowledges receipt. + +In the example, P1 and P3 are clients and P2 is the server. The server sends its first request at 0ms. Here both P1 and P3 vote true, so the commit proceeds. + +``` +Programmed Two-Phase Commit Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 1 | 2-Phase Commit Client activate | +| 0 | 2 | 2-Phase Commit Server activate | +| 0 | 3 | 2-Phase Commit Client activate | +| 0 | 2 | 2-Phase Commit Server request start | +``` + +Example log extract showing the two-phase voting process: + +``` +000000ms: PID 2: Message sent; ID: 94; Protocol: 2-Phase Commit + Boolean: wantVote=true +000905ms: PID 3: Message received; ID: 94; Protocol: 2-Phase Commit +000905ms: PID 3: Message sent; ID: 95; Protocol: 2-Phase Commit + Integer: pid=3; Boolean: isVote=true; vote=true +000905ms: PID 3: Vote true sent +001880ms: PID 2: Message received; ID: 95; Protocol: 2-Phase Commit +001880ms: PID 2: Vote from Process 3 received! Result: true +001947ms: PID 1: Message received; ID: 94; Protocol: 2-Phase Commit +001947ms: PID 1: Vote true sent +003137ms: PID 2: Votes from all participating processes received! + Global result: true +003137ms: PID 2: Message sent; ID: 99; Protocol: 2-Phase Commit + Boolean: isVoteResult=true; voteResult=true +004124ms: PID 1: Global vote result received. Result: true +006051ms: PID 2: All participants have acknowledged the vote +010000ms: Simulation ended +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending +* PIDs of participating processes (Integer[]: pids = [1,3]): Client PIDs that should vote and commit + +Protocol variables (client-side): + +* Commit probability (Integer: ackProb = 50): The probability in percent that the client votes true (for commit) + +### Basic Multicast Protocol + +=> ./distributed-systems-simulator/basic-multicast.png Visualization: The Basic Multicast Protocol with 3 processes. P2 (client) sends periodic multicast messages to servers P1 and P3. P3 crashes at 3000ms (shown in red) and recovers at 6000ms. Red lines indicate lost messages, blue lines show delivered messages. Some messages to P1 are also lost due to the 30% message loss probability. + +The Basic Multicast Protocol is very simple. The client always initiates the request, which represents a simple multicast message. The Basic Multicast servers serve only to receive the message. No acknowledgments are sent. The client P2 sends a multicast message every 2500ms to servers P1 and P3. + +P1 can only receive multicast messages after 2500ms because it does not support the protocol before then. P3 is crashed from 3000ms to 6000ms and also cannot receive messages during that time. Each process has a 30% message loss probability, so some messages are lost in transit (shown in red). + +In this example, the 3rd multicast message to P3 and the 5th and 6th messages to P1 were lost. Only the 4th multicast message reached both destinations. + +``` +Programmed Basic Multicast Events: + +| Time (ms) | PID | Event | +|-----------|-----|----------------------------------------| +| 0 | 2 | Basic Multicast Client activate | +| 0 | 3 | Basic Multicast Server activate | +| 0 | 2 | Basic Multicast Client request start | +| 2500 | 1 | Basic Multicast Server activate | +| 2500 | 2 | Basic Multicast Client request start | +| 3000 | 3 | Process crash | +| 5000 | 2 | Basic Multicast Client request start | +| 6000 | 3 | Process revival | +| 7500 | 2 | Basic Multicast Client request start | +| 10000 | 2 | Basic Multicast Client request start | +| 12500 | 2 | Basic Multicast Client request start | +``` + +### Reliable Multicast Protocol + +=> ./distributed-systems-simulator/reliable-multicast.png Visualization: The Reliable Multicast Protocol with 3 processes. P2 (client) sends multicast messages to servers P1 and P3, retrying until acknowledgments are received from all servers. P3 crashes at 3000ms and recovers at 10000ms. Red lines show lost messages, blue lines show delivered ones. Despite failures, all servers eventually receive and acknowledge the multicast. + +In the Reliable Multicast Protocol, the client periodically resends its multicast message until it has received an acknowledgment from all participating servers. After each retry, the client "forgets" which servers have already acknowledged, so each new attempt must be acknowledged again by all participants. + +In the example, P2 is the client and P1 and P3 are the servers. At 0ms, the client initiates its multicast message. The message loss probability is set to 30% on all processes. The client needs exactly 5 attempts until successful delivery: + +* Attempt 1: P1 doesn't support the protocol yet. P3 receives the message but its ACK is lost. +* Attempt 2: The message to P1 is lost. P3 receives it but is crashed and can't process it. +* Attempt 3: P1 receives the message and ACKs successfully. The message to P3 is lost. +* Attempt 4: P1 receives and ACKs again. P3 receives it but is still crashed. +* Attempt 5: Both P1 and P3 receive the message and ACK successfully. + +``` +Programmed Reliable Multicast Events: + +| Time (ms) | PID | Event | +|-----------|-----|------------------------------------------| +| 0 | 3 | Reliable Multicast Server activate | +| 0 | 2 | Reliable Multicast Client activate | +| 0 | 2 | Reliable Multicast Client request start | +| 2500 | 1 | Reliable Multicast Server activate | +| 3000 | 3 | Process crash | +| 10000 | 3 | Process revival | +``` + +Example log extract: + +``` +000000ms: PID 2: Reliable Multicast Client activated +000000ms: PID 2: Message sent; ID: 280; Protocol: Reliable Multicast + Boolean: isMulticast=true +000000ms: PID 3: Reliable Multicast Server activated +001590ms: PID 3: Message received; ID: 280; Protocol: Reliable Multicast +001590ms: PID 3: ACK sent +002500ms: PID 1: Reliable Multicast Server activated +002500ms: PID 2: Message sent; ID: 282; Protocol: Reliable Multicast + Boolean: isMulticast=true +003000ms: PID 3: Crashed +005000ms: PID 2: Message sent; ID: 283; Protocol: Reliable Multicast +005952ms: PID 1: Message received; ID: 283 +005952ms: PID 1: ACK sent +007937ms: PID 2: ACK from Process 1 received! +... +011813ms: PID 2: ACK from Process 3 received! +011813ms: PID 2: ACKs from all participating processes received! +015000ms: Simulation ended +``` + +Protocol variables (server-side): + +* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the multicast +* PIDs of participating processes (Integer[]: pids = [1,3]): Server PIDs that should receive the multicast + +Read the next post of this series: + +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +Other related posts are: + +<< template::inline::rindex java object-oriented-programming release + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi b/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi new file mode 100644 index 00000000..1d549c11 --- /dev/null +++ b/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi @@ -0,0 +1,264 @@ +# Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +> Published at 2026-04-02T00:00:00+03:00 + +This is the third and final blog post of the Distributed Systems Simulator series. This part covers advanced simulation examples, the Raft consensus protocol, and the extensible Protocol API. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +=> ./2026-03-31-distributed-systems-simulator-part-1.gmi 2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi 2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi 2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API (You are currently reading this) + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +## Table of Contents + +* ⇢ Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API +* ⇢ ⇢ Additional Examples +* ⇢ ⇢ ⇢ Lamport and Vector Timestamps +* ⇢ ⇢ ⇢ Simulating Slow Connections +* ⇢ ⇢ ⇢ Raft Consensus Failover +* ⇢ ⇢ Protocol API +* ⇢ ⇢ ⇢ Class Hierarchy +* ⇢ ⇢ ⇢ Implementing a Custom Protocol +* ⇢ ⇢ ⇢ Available API Methods +* ⇢ ⇢ ⇢ Example: Reliable Multicast Implementation +* ⇢ ⇢ Project Statistics + +## Additional Examples + +### Lamport and Vector Timestamps + +=> ./distributed-systems-simulator/lamport-timestamps.png Visualization: Lamport Timestamps displayed on the Berkeley Algorithm simulation. Each event on a process bar shows its Lamport timestamp as a number in parentheses. The timestamps increase monotonically and are updated according to the Lamport clock rules when messages are sent and received between P1, P2, and P3. + +> "For many purposes, it is sufficient that all machines agree on the same time. It is not necessary that this time also agrees with real time, like every hour announced on the radio... For a certain class of algorithms, only the internal consistency of clocks is important." - Andrew Tanenbaum + +Clocks that provide such a time are also known as logical clocks. Two implementations are realized in the simulator: Lamport timestamps and vector timestamps. + +After activating the Lamport time switch in expert mode, the current Lamport timestamp appears at every event of a process. Each process has its own Lamport timestamp that is incremented when a message is sent or received. Each message carries the current Lamport time t_l(i) of the sending process i. When another process j receives this message, its Lamport timestamp t_l(j) is recalculated as: + +``` +t_l(j) := 1 + max(t_l(j), t_l(i)) +``` + +The larger Lamport time of the sender and receiver process is used and then incremented by 1. After the Berkeley simulation shown here, P1 has Lamport timestamp 16, P2 has 14, and P3 has 15. + +=> ./distributed-systems-simulator/vector-timestamps.png Visualization: Vector Timestamps displayed on the same Berkeley Algorithm simulation. Each event shows its vector timestamp as a tuple (v1,v2,v3) representing the known state of all three processes. The tuples grow as processes communicate and merge their knowledge of each other's progress. + +With the active vector time switch, all vector timestamps are displayed. Like the Lamport timestamp, each message includes the current vector timestamp of the sending process. With n participating processes, the vector timestamp v has size n. Each participating process i has its own index, accessible via v(i). When v is the vector timestamp of the receiving process j and w is the vector timestamp of the sending process, the new local vector timestamp of process j is calculated as follows: + +``` +for (i := 0; i < n; i++) { + if (i = j) { + v(i)++; + } else if (v(i) < w(i)) { + v(i) := w(i); + } +} +``` + +By default, the vector timestamp is only incremented when a message is sent or received. In both cases, the sender and receiver each increment their own index in the vector timestamp by 1. Upon receiving a message, the local vector timestamp is then compared with the sender's, and the larger value is taken for all indices. + +After the simulation, P1 has vector timestamp (8,10,6), P2 has (6,10,6), and P3 has (6,10,8). + +The simulation settings include boolean variables "Lamport times affect all events" and "Vector times affect all events" (both default to false). When set to true, all events (not just message send/receive) will update the timestamps. + +### Simulating Slow Connections + +=> ./distributed-systems-simulator/slow-connection.png Visualization: Slow connection simulation comparing Internal Synchronization (P1) and Christian's Method (P3) with P2 as server. P3 has high transmission times (2000-8000ms) simulating a slow network connection. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms), showing how slow connections degrade synchronization quality. + +The simulator can also simulate slow connections to a specific process. This example revisits the comparison of Internal Synchronization (P1) and Christian's Method (P3), with P2 serving both. In this scenario, P3 has a poor network connection, so messages to and from P3 always require a longer transmission time. + +P3's minimum transmission time is set to 2000ms and maximum to 8000ms, while P1 and P2 keep the defaults (500ms/2000ms). The simulation duration is 20000ms. With the "Average transmission times" setting enabled, the effective transmission time for messages involving P3 is: + +``` +1/2 * (rand(500,2000) + rand(2000,8000)) = 1/2 * rand(2500,10000) = rand(1250,5000)ms +``` + +Because P3 starts a new request before receiving the answer to its previous one, and because it always associates server responses with its most recently sent request, its RTT calculations become incorrect on each round, and its local time is poorly synchronized. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms). + +### Raft Consensus Failover + +=> ./distributed-systems-simulator/raft-consensus-failover.png Screenshot: A 60-second Raft simulation with three processes. P1 starts as the initial leader, crashes at 3500ms, later recovers, P2 wins the reelection and remains leader, and P3 crashes later. The blue and red message lines show the continuing heartbeat and acknowledgment traffic during and after failover. + +While modernizing ds-sim, I also added a simplified Raft Consensus example. The simulation is intentionally small: three processes, one initial leader, one crash, a clean reelection, a recovery of the old leader, and then another crash later in the run. This makes it possible to see the most important Raft transitions without being overwhelmed by cluster size. + +The event log tells a very readable story. At `0ms`, `P1` starts as the initial leader in `term 0`. It immediately sends a heartbeat and an `appendEntry` message carrying the log entry `cmd1`. `P2` joins at `100ms`, `P3` at `1700ms`, and both acknowledge the leader's traffic. At that point the cluster is healthy: one leader, two followers, successful heartbeats, and successful log replication. + +At `3500ms`, `P1` crashes. The followers still process the last in-flight messages, but once the election timeout expires, `P2` becomes a candidate and sends a `voteRequest` for `term 1`. `P3` grants that vote, and at `9395ms` the log records the decisive line: + +``` +009395ms: PID: 2; ... Leader elected by majority vote: process 2 (term 1) +``` + +That transition is followed immediately by new heartbeats and a new `appendEntry`, which is exactly what you want to see in a Raft simulation: leadership is not just declared, it is exercised. + +At `12002ms`, the old leader `P1` recovers. Importantly, it does not try to reclaim control. Instead, it receives heartbeats from `P2` and answers with `heartbeatAck` messages, rejoining the cluster as a follower. That is one of the most useful teaching moments in the log, because it makes the term-based leadership model concrete: the recovered node does not become leader again just because it used to be one. + +At `20000ms`, `P3` crashes. The cluster continues running with `P2` as leader and `P1` as follower for the rest of the 60-second simulation. The log remains dominated by periodic heartbeats from `P2` and acknowledgments from `P1`, showing that the system stays stable even after a second failure. + +This single scenario demonstrates several core Raft properties in one replay: + +* Stable startup leadership +* Heartbeats and follower acknowledgments +* Log replication +* Leader failure detection +* Majority-based reelection +* Safe reintegration of a recovered former leader +* Continued service after a later follower crash + +It is also a good example of why a simulator is useful for distributed systems. In a real production system, reconstructing this sort of sequence would require stitching together logs from multiple nodes. Here, the message flow, the crashes, the recoveries, and the Lamport/vector timestamps are all visible in one place. + +## Protocol API + +The simulator was designed from the ground up to be extensible. Users can implement their own protocols in Java by extending the `VSAbstractProtocol` base class. Each protocol has its own class in the `protocols.implementations` package. + +### Class Hierarchy + +``` +VSAbstractEvent + +-- VSAbstractProtocol (base class for all protocols) + +-- VSDummyProtocol + +-- VSPingPongProtocol + +-- VSBroadcastProtocol + +-- VSInternalTimeSyncProtocol + +-- VSExternalTimeSyncProtocol + +-- VSBerkeleyTimeProtocol + +-- VSOnePhaseCommitProtocol + +-- VSTwoPhaseCommitProtocol + +-- VSBasicMulticastProtocol + +-- VSReliableMulticastProtocol +``` + +### Implementing a Custom Protocol + +Each protocol class must implement the following methods: + +* A public constructor: Must specify whether the client or the server initiates requests, using `VSAbstractProtocol.HAS_ON_CLIENT_START` or `VSAbstractProtocol.HAS_ON_SERVER_START`. +* `onClientInit()` / `onServerInit()`: Called once before the protocol is first used. Used to initialize protocol variables and attributes via the VSPrefs methods (e.g. `initVector`, `initLong`). Variables initialized this way appear in the process editor and can be configured by the user. +* `onClientReset()` / `onServerReset()`: Called each time the simulation is reset. +* `onClientStart()` / `onServerStart()`: Called when the client/server initiates a request. Typically creates and sends a `VSMessage` object. +* `onClientRecv(VSMessage)` / `onServerRecv(VSMessage)`: Called when a message arrives. +* `onClientSchedule()` / `onServerSchedule()`: Called when a scheduled alarm fires. +* `toString()`: Optional. Customizes log output for this protocol. + +### Available API Methods + +Methods inherited from `VSAbstractProtocol`: + +* `sendMessage(VSMessage message)`: Sends a protocol message (automatically updates Lamport and Vector timestamps) +* `hasOnServerStart()`: Whether the server or client initiates requests +* `isServer()` / `isClient()`: Whether the current process has the protocol activated as server/client +* `scheduleAt(long time)`: Creates an alarm that fires at the given local process time, triggering `onClientSchedule()` or `onServerSchedule()` +* `removeSchedules()`: Cancels all pending alarms in the current context +* `getNumProcesses()`: Returns the total number of processes in the simulation + +Process methods available via the inherited `process` attribute: + +* `getTime()` / `setTime(long)`: Get/set the local process time +* `getGlobalTime()`: Get the current global simulation time +* `getClockVariance()` / `setClockVariance(float)`: Get/set the clock drift +* `getLamportTime()` / `setLamportTime(long)`: Get/set the Lamport timestamp +* `getVectorTime()` / `updateVectorTime(VSVectorTime)`: Get/update the vector timestamp +* `getProcessID()`: Get the process PID +* `isCrashed()` / `isCrashed(boolean)`: Check or set crash state +* `getRandomPercentage()`: Get a random value between 0 and 100 + +Message methods (`VSMessage`): + +* `new VSMessage()`: Create a new message +* `getMessageID()`: Get the message NID +* `setBoolean(key, value)` / `getBoolean(key)`: Set/get boolean data +* `setInteger(key, value)` / `getInteger(key)`: Set/get integer data +* `setLong(key, value)` / `getLong(key)`: Set/get long data +* `setString(key, value)` / `getString(key)`: Set/get string data +* `getSendingProcess()`: Get a reference to the sending process +* `isServerMessage()`: Whether it's a server or client message + +### Example: Reliable Multicast Implementation + +Here is a condensed example showing key parts of the Reliable Multicast Protocol implementation: + +```java +public class VSReliableMulticastProtocol extends VSAbstractProtocol { + public VSReliableMulticastProtocol() { + // The client initiates requests + super(VSAbstractProtocol.HAS_ON_CLIENT_START); + super.setClassname(super.getClass().toString()); + } + + private ArrayList pids; + + // Initialize protocol variables (editable in the process editor) + public void onClientInit() { + Vector vec = new Vector(); + vec.add(1); vec.add(3); + super.initVector("pids", vec, "PIDs of participating processes"); + super.initLong("timeout", 2500, "Time until resend", "ms"); + } + + // Send multicast to all servers that haven't ACKed yet + public void onClientStart() { + if (pids.size() != 0) { + long timeout = super.getLong("timeout") + process.getTime(); + super.scheduleAt(timeout); + VSMessage message = new VSMessage(); + message.setBoolean("isMulticast", true); + super.sendMessage(message); + } + } + + // Handle ACK from a server + public void onClientRecv(VSMessage recvMessage) { + if (pids.size() != 0 && recvMessage.getBoolean("isAck")) { + Integer pid = recvMessage.getIntegerObj("pid"); + if (pids.contains(pid)) + pids.remove(pid); + super.log("ACK from Process " + pid + " received!"); + if (pids.size() == 0) { + super.log("ACKs from all processes received!"); + super.removeSchedules(); + } + } + } + + // Retry on timeout + public void onClientSchedule() { onClientStart(); } +} +``` + +## Project Statistics + +The original VS-Sim project (August 2008) was written in Java 6 and consisted of: + +* 61 source files across 12 Java packages +* Approximately 15,710 lines of code +* 2.2 MB of generated Javadoc documentation +* 142 KB compiled JAR file +* 10 built-in protocols +* 163 configurable settings + +The modernized successor ds-sim (version 1.1.0) has been updated to Java 21 and translated to English: + +* 146 source files (117 main + 29 test) across 19 Java packages +* Approximately 27,900 lines of code (22,400 main + 5,500 test) +* 12 built-in protocols +* 208 unit tests +* 269 configurable settings + +=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg +=> https://codeberg.org/snonux/vs-sim vs-sim source code on Codeberg (original German version, 2008) + +Other related posts are: + +=> ./2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 Loadbars 0.13.0 released +=> ./2022-12-24-ultrarelearning-java-my-takeaways.gmi 2022-12-24 (Re)learning Java - My takeaways +=> ./2022-03-06-the-release-of-dtail-4.0.0.gmi 2022-03-06 The release of DTail 4.0.0 +=> ./2016-11-20-object-oriented-programming-with-ansi-c.gmi 2016-11-20 Object oriented programming with ANSI C + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi.tpl b/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi.tpl new file mode 100644 index 00000000..b54f6912 --- /dev/null +++ b/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi.tpl @@ -0,0 +1,247 @@ +# Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + +> Published at 2026-04-02T00:00:00+03:00 + +This is the third and final blog post of the Distributed Systems Simulator series. This part covers advanced simulation examples, the Raft consensus protocol, and the extensible Protocol API. + +=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) + +These are all the posts of this series: + +<< template::inline::index distributed-systems-simulator-part + +=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. + +<< template::inline::toc + +## Additional Examples + +### Lamport and Vector Timestamps + +=> ./distributed-systems-simulator/lamport-timestamps.png Visualization: Lamport Timestamps displayed on the Berkeley Algorithm simulation. Each event on a process bar shows its Lamport timestamp as a number in parentheses. The timestamps increase monotonically and are updated according to the Lamport clock rules when messages are sent and received between P1, P2, and P3. + +> "For many purposes, it is sufficient that all machines agree on the same time. It is not necessary that this time also agrees with real time, like every hour announced on the radio... For a certain class of algorithms, only the internal consistency of clocks is important." - Andrew Tanenbaum + +Clocks that provide such a time are also known as logical clocks. Two implementations are realized in the simulator: Lamport timestamps and vector timestamps. + +After activating the Lamport time switch in expert mode, the current Lamport timestamp appears at every event of a process. Each process has its own Lamport timestamp that is incremented when a message is sent or received. Each message carries the current Lamport time t_l(i) of the sending process i. When another process j receives this message, its Lamport timestamp t_l(j) is recalculated as: + +``` +t_l(j) := 1 + max(t_l(j), t_l(i)) +``` + +The larger Lamport time of the sender and receiver process is used and then incremented by 1. After the Berkeley simulation shown here, P1 has Lamport timestamp 16, P2 has 14, and P3 has 15. + +=> ./distributed-systems-simulator/vector-timestamps.png Visualization: Vector Timestamps displayed on the same Berkeley Algorithm simulation. Each event shows its vector timestamp as a tuple (v1,v2,v3) representing the known state of all three processes. The tuples grow as processes communicate and merge their knowledge of each other's progress. + +With the active vector time switch, all vector timestamps are displayed. Like the Lamport timestamp, each message includes the current vector timestamp of the sending process. With n participating processes, the vector timestamp v has size n. Each participating process i has its own index, accessible via v(i). When v is the vector timestamp of the receiving process j and w is the vector timestamp of the sending process, the new local vector timestamp of process j is calculated as follows: + +``` +for (i := 0; i < n; i++) { + if (i = j) { + v(i)++; + } else if (v(i) < w(i)) { + v(i) := w(i); + } +} +``` + +By default, the vector timestamp is only incremented when a message is sent or received. In both cases, the sender and receiver each increment their own index in the vector timestamp by 1. Upon receiving a message, the local vector timestamp is then compared with the sender's, and the larger value is taken for all indices. + +After the simulation, P1 has vector timestamp (8,10,6), P2 has (6,10,6), and P3 has (6,10,8). + +The simulation settings include boolean variables "Lamport times affect all events" and "Vector times affect all events" (both default to false). When set to true, all events (not just message send/receive) will update the timestamps. + +### Simulating Slow Connections + +=> ./distributed-systems-simulator/slow-connection.png Visualization: Slow connection simulation comparing Internal Synchronization (P1) and Christian's Method (P3) with P2 as server. P3 has high transmission times (2000-8000ms) simulating a slow network connection. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms), showing how slow connections degrade synchronization quality. + +The simulator can also simulate slow connections to a specific process. This example revisits the comparison of Internal Synchronization (P1) and Christian's Method (P3), with P2 serving both. In this scenario, P3 has a poor network connection, so messages to and from P3 always require a longer transmission time. + +P3's minimum transmission time is set to 2000ms and maximum to 8000ms, while P1 and P2 keep the defaults (500ms/2000ms). The simulation duration is 20000ms. With the "Average transmission times" setting enabled, the effective transmission time for messages involving P3 is: + +``` +1/2 * (rand(500,2000) + rand(2000,8000)) = 1/2 * rand(2500,10000) = rand(1250,5000)ms +``` + +Because P3 starts a new request before receiving the answer to its previous one, and because it always associates server responses with its most recently sent request, its RTT calculations become incorrect on each round, and its local time is poorly synchronized. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms). + +### Raft Consensus Failover + +=> ./distributed-systems-simulator/raft-consensus-failover.png Screenshot: A 60-second Raft simulation with three processes. P1 starts as the initial leader, crashes at 3500ms, later recovers, P2 wins the reelection and remains leader, and P3 crashes later. The blue and red message lines show the continuing heartbeat and acknowledgment traffic during and after failover. + +While modernizing ds-sim, I also added a simplified Raft Consensus example. The simulation is intentionally small: three processes, one initial leader, one crash, a clean reelection, a recovery of the old leader, and then another crash later in the run. This makes it possible to see the most important Raft transitions without being overwhelmed by cluster size. + +The event log tells a very readable story. At `0ms`, `P1` starts as the initial leader in `term 0`. It immediately sends a heartbeat and an `appendEntry` message carrying the log entry `cmd1`. `P2` joins at `100ms`, `P3` at `1700ms`, and both acknowledge the leader's traffic. At that point the cluster is healthy: one leader, two followers, successful heartbeats, and successful log replication. + +At `3500ms`, `P1` crashes. The followers still process the last in-flight messages, but once the election timeout expires, `P2` becomes a candidate and sends a `voteRequest` for `term 1`. `P3` grants that vote, and at `9395ms` the log records the decisive line: + +``` +009395ms: PID: 2; ... Leader elected by majority vote: process 2 (term 1) +``` + +That transition is followed immediately by new heartbeats and a new `appendEntry`, which is exactly what you want to see in a Raft simulation: leadership is not just declared, it is exercised. + +At `12002ms`, the old leader `P1` recovers. Importantly, it does not try to reclaim control. Instead, it receives heartbeats from `P2` and answers with `heartbeatAck` messages, rejoining the cluster as a follower. That is one of the most useful teaching moments in the log, because it makes the term-based leadership model concrete: the recovered node does not become leader again just because it used to be one. + +At `20000ms`, `P3` crashes. The cluster continues running with `P2` as leader and `P1` as follower for the rest of the 60-second simulation. The log remains dominated by periodic heartbeats from `P2` and acknowledgments from `P1`, showing that the system stays stable even after a second failure. + +This single scenario demonstrates several core Raft properties in one replay: + +* Stable startup leadership +* Heartbeats and follower acknowledgments +* Log replication +* Leader failure detection +* Majority-based reelection +* Safe reintegration of a recovered former leader +* Continued service after a later follower crash + +It is also a good example of why a simulator is useful for distributed systems. In a real production system, reconstructing this sort of sequence would require stitching together logs from multiple nodes. Here, the message flow, the crashes, the recoveries, and the Lamport/vector timestamps are all visible in one place. + +## Protocol API + +The simulator was designed from the ground up to be extensible. Users can implement their own protocols in Java by extending the `VSAbstractProtocol` base class. Each protocol has its own class in the `protocols.implementations` package. + +### Class Hierarchy + +``` +VSAbstractEvent + +-- VSAbstractProtocol (base class for all protocols) + +-- VSDummyProtocol + +-- VSPingPongProtocol + +-- VSBroadcastProtocol + +-- VSInternalTimeSyncProtocol + +-- VSExternalTimeSyncProtocol + +-- VSBerkeleyTimeProtocol + +-- VSOnePhaseCommitProtocol + +-- VSTwoPhaseCommitProtocol + +-- VSBasicMulticastProtocol + +-- VSReliableMulticastProtocol +``` + +### Implementing a Custom Protocol + +Each protocol class must implement the following methods: + +* A public constructor: Must specify whether the client or the server initiates requests, using `VSAbstractProtocol.HAS_ON_CLIENT_START` or `VSAbstractProtocol.HAS_ON_SERVER_START`. +* `onClientInit()` / `onServerInit()`: Called once before the protocol is first used. Used to initialize protocol variables and attributes via the VSPrefs methods (e.g. `initVector`, `initLong`). Variables initialized this way appear in the process editor and can be configured by the user. +* `onClientReset()` / `onServerReset()`: Called each time the simulation is reset. +* `onClientStart()` / `onServerStart()`: Called when the client/server initiates a request. Typically creates and sends a `VSMessage` object. +* `onClientRecv(VSMessage)` / `onServerRecv(VSMessage)`: Called when a message arrives. +* `onClientSchedule()` / `onServerSchedule()`: Called when a scheduled alarm fires. +* `toString()`: Optional. Customizes log output for this protocol. + +### Available API Methods + +Methods inherited from `VSAbstractProtocol`: + +* `sendMessage(VSMessage message)`: Sends a protocol message (automatically updates Lamport and Vector timestamps) +* `hasOnServerStart()`: Whether the server or client initiates requests +* `isServer()` / `isClient()`: Whether the current process has the protocol activated as server/client +* `scheduleAt(long time)`: Creates an alarm that fires at the given local process time, triggering `onClientSchedule()` or `onServerSchedule()` +* `removeSchedules()`: Cancels all pending alarms in the current context +* `getNumProcesses()`: Returns the total number of processes in the simulation + +Process methods available via the inherited `process` attribute: + +* `getTime()` / `setTime(long)`: Get/set the local process time +* `getGlobalTime()`: Get the current global simulation time +* `getClockVariance()` / `setClockVariance(float)`: Get/set the clock drift +* `getLamportTime()` / `setLamportTime(long)`: Get/set the Lamport timestamp +* `getVectorTime()` / `updateVectorTime(VSVectorTime)`: Get/update the vector timestamp +* `getProcessID()`: Get the process PID +* `isCrashed()` / `isCrashed(boolean)`: Check or set crash state +* `getRandomPercentage()`: Get a random value between 0 and 100 + +Message methods (`VSMessage`): + +* `new VSMessage()`: Create a new message +* `getMessageID()`: Get the message NID +* `setBoolean(key, value)` / `getBoolean(key)`: Set/get boolean data +* `setInteger(key, value)` / `getInteger(key)`: Set/get integer data +* `setLong(key, value)` / `getLong(key)`: Set/get long data +* `setString(key, value)` / `getString(key)`: Set/get string data +* `getSendingProcess()`: Get a reference to the sending process +* `isServerMessage()`: Whether it's a server or client message + +### Example: Reliable Multicast Implementation + +Here is a condensed example showing key parts of the Reliable Multicast Protocol implementation: + +```java +public class VSReliableMulticastProtocol extends VSAbstractProtocol { + public VSReliableMulticastProtocol() { + // The client initiates requests + super(VSAbstractProtocol.HAS_ON_CLIENT_START); + super.setClassname(super.getClass().toString()); + } + + private ArrayList pids; + + // Initialize protocol variables (editable in the process editor) + public void onClientInit() { + Vector vec = new Vector(); + vec.add(1); vec.add(3); + super.initVector("pids", vec, "PIDs of participating processes"); + super.initLong("timeout", 2500, "Time until resend", "ms"); + } + + // Send multicast to all servers that haven't ACKed yet + public void onClientStart() { + if (pids.size() != 0) { + long timeout = super.getLong("timeout") + process.getTime(); + super.scheduleAt(timeout); + VSMessage message = new VSMessage(); + message.setBoolean("isMulticast", true); + super.sendMessage(message); + } + } + + // Handle ACK from a server + public void onClientRecv(VSMessage recvMessage) { + if (pids.size() != 0 && recvMessage.getBoolean("isAck")) { + Integer pid = recvMessage.getIntegerObj("pid"); + if (pids.contains(pid)) + pids.remove(pid); + super.log("ACK from Process " + pid + " received!"); + if (pids.size() == 0) { + super.log("ACKs from all processes received!"); + super.removeSchedules(); + } + } + } + + // Retry on timeout + public void onClientSchedule() { onClientStart(); } +} +``` + +## Project Statistics + +The original VS-Sim project (August 2008) was written in Java 6 and consisted of: + +* 61 source files across 12 Java packages +* Approximately 15,710 lines of code +* 2.2 MB of generated Javadoc documentation +* 142 KB compiled JAR file +* 10 built-in protocols +* 163 configurable settings + +The modernized successor ds-sim (version 1.1.0) has been updated to Java 21 and translated to English: + +* 146 source files (117 main + 29 test) across 19 Java packages +* Approximately 27,900 lines of code (22,400 main + 5,500 test) +* 12 built-in protocols +* 208 unit tests +* 269 configurable settings + +=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg +=> https://codeberg.org/snonux/vs-sim vs-sim source code on Codeberg (original German version, 2008) + +Other related posts are: + +<< template::inline::rindex java object-oriented-programming release + +E-Mail your comments to `paul@nospam.buetow.org` + +=> ../ Back to the main site diff --git a/gemfeed/DRAFT-distributed-systems-simulator.gmi b/gemfeed/DRAFT-distributed-systems-simulator.gmi deleted file mode 100644 index b05f64c0..00000000 --- a/gemfeed/DRAFT-distributed-systems-simulator.gmi +++ /dev/null @@ -1,851 +0,0 @@ -# Distributed Systems Simulator v1.1.0 released - -> DRAFT - Not yet published - -This blog explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment. - -=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) - -=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. - -## Table of Contents - -* ⇢ Distributed Systems Simulator v1.1.0 released -* ⇢ ⇢ Motivation -* ⇢ ⇢ Installation -* ⇢ ⇢ Fundamentals -* ⇢ ⇢ ⇢ Client/Server Model -* ⇢ ⇢ ⇢ Processes and Their Roles -* ⇢ ⇢ ⇢ Messages -* ⇢ ⇢ ⇢ Local and Global Clocks -* ⇢ ⇢ ⇢ Events -* ⇢ ⇢ ⇢ Protocols -* ⇢ ⇢ Graphical User Interface (GUI) -* ⇢ ⇢ ⇢ Simple Mode -* ⇢ ⇢ ⇢ The Menu Bar -* ⇢ ⇢ ⇢ The Toolbar -* ⇢ ⇢ ⇢ The Visualization -* ⇢ ⇢ ⇢ Color Differentiation -* ⇢ ⇢ ⇢ The Sidebar -* ⇢ ⇢ ⇢ The Log Window -* ⇢ ⇢ ⇢ Expert Mode -* ⇢ ⇢ ⇢ Configuration Settings -* ⇢ ⇢ Protocols and Examples -* ⇢ ⇢ ⇢ Dummy Protocol -* ⇢ ⇢ ⇢ Ping-Pong Protocol -* ⇢ ⇢ ⇢ Broadcast Protocol -* ⇢ ⇢ ⇢ Internal Synchronization Protocol -* ⇢ ⇢ ⇢ Christian's Method (External Synchronization) -* ⇢ ⇢ ⇢ Berkeley Algorithm -* ⇢ ⇢ ⇢ One-Phase Commit Protocol -* ⇢ ⇢ ⇢ Two-Phase Commit Protocol -* ⇢ ⇢ ⇢ Basic Multicast Protocol -* ⇢ ⇢ ⇢ Reliable Multicast Protocol -* ⇢ ⇢ Additional Examples -* ⇢ ⇢ ⇢ Lamport and Vector Timestamps -* ⇢ ⇢ ⇢ Simulating Slow Connections -* ⇢ ⇢ ⇢ Raft Consensus Failover -* ⇢ ⇢ Protocol API -* ⇢ ⇢ ⇢ Class Hierarchy -* ⇢ ⇢ ⇢ Implementing a Custom Protocol -* ⇢ ⇢ ⇢ Available API Methods -* ⇢ ⇢ ⇢ Example: Reliable Multicast Implementation -* ⇢ ⇢ Project Statistics - -## Motivation - -Distributed systems are complex—interactions between nodes, network partitions, failure scenarios are hard to debug in production. A simulator lets you experiment with architectures, observe how systems behave under failure, and learn consensus algorithms, replication strategies, and fault tolerance in a controlled, repeatable environment. No operational overhead, no real infrastructure—just focused exploration of system design. - -In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system: - -> "A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum - -The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system. - -This thesis aims to make distributed systems easier to understand from a different angle. Instead of the end-user perspective, it focuses on the functional methods of protocols and their processes, making all relevant events of a distributed system transparent. - -To achieve this, I developed a simulator, particularly for teaching and learning at the University of Applied Sciences Aachen. Protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there's room for personal experiments—no restriction to a fixed number of protocols. Users can design their own. - -The original simulator (VS-Sim) was written in Java 6 in 2008 with a German-language UI. In 2025, I revamped and modernized it as ds-sim: translated the entire codebase and UI from German to English, migrated the build system from hand-rolled Ant scripts to Maven, upgraded from Java 6 to Java 21 (adopting sealed class hierarchies, record types, formatted strings, pattern matching), introduced a proper exception hierarchy and consistent error handling, added comprehensive Javadoc documentation, implemented a headless testing framework (208 unit tests covering core components, the event system, and all protocol implementations), reorganized the project structure to follow standard Maven conventions, and added architecture documentation. Total: 199 files, over 15,000 lines of new code. Back in 2008, I wrote every line by hand in Vim. For the 2025 modernization, Claude Code did most of the heavy lifting—translation, refactoring, test generation, documentation. Times have changed. - -## Installation - -The modernized ds-sim requires Java 21 or higher and Maven 3.8 or higher. - -``` -# Clone the repository -git clone https://codeberg.org/snonux/ds-sim.git -cd ds-sim - -# Set JAVA_HOME if needed (e.g. on Fedora Linux) -export JAVA_HOME=/usr/lib/jvm/java-21-openjdk - -# Build the project -mvn clean package - -# Run the simulator -java -jar target/ds-sim-*.jar -``` - -For a faster development build without running tests: - -``` -mvn package -DskipTests -``` - -After building, the following artifacts are available in the `target/` directory: - -* `ds-sim-1.1.0.jar` - Executable JAR with all dependencies bundled -* `original-ds-sim-1.1.0.jar` - JAR without dependencies - -The project also includes 208 unit tests that can be run with `mvn test`. Example simulation files for all built-in protocols are included in the `saved-simulations/` directory. - -=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg - -## Fundamentals - -For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters. - -### Client/Server Model - -``` -+-----------------------------------------+ -| | -| +--------+ +--------+ | -| | Client |<------->| Server | | -| +--------+ +--------+ | -| | -| Sending of Messages | -| | -+-----------------------------------------+ - -Figure 1.1: Client/Server Model -``` - -The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate. - -### Processes and Their Roles - -A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID). - -### Messages - -In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID). - -### Local and Global Clocks - -In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong. - -Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later. - -``` -+---------------------+ +---------------------+ -| Process 1 | | Process 2 | -| | | | -| +-----------------+ | | +-----------------+ | -| |Server Protocol A| | | |Client Protocol A| | -| +-----------------+ | | +-----------------+ | -| | | | -| +-----------------+ | +---------------------+ -| |Client Protocol B| | -| +-----------------+ | +---------------------+ -| | | Process 3 | -+---------------------+ | | - | +-----------------+ | - | |Server Protocol B| | - | +-----------------+ | - | | - +---------------------+ - -Figure 1.2: Client/Server Protocols -``` - -In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in the "Additional Examples" section. - -### Events - -A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective. - -Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as: - -* Message receive events: Triggered when a message arrives at a recipient process -* Protocol schedule events (alarms): Triggered by a timer set by a protocol, e.g. for retransmission timeouts -* Random events: Such as random process crashes based on configured crash probability - -### Protocols - -A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol. - -In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other. - -Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol later). - -## Graphical User Interface (GUI) - -### Simple Mode - -=> ./distributed-systems-simulator/ds-sim-screenshot2.png Screenshot: The simulator showing the settings dialog. The visualization area displays process bars with message lines between them. The settings window allows configuring simulation parameters like number of processes, simulation duration, clock drift, message loss probability, and more. - -The simulator requires JDK 21 and can be started with the command `java -jar target/ds-sim-VERSION.jar` - -The simulator then presents itself with a main window. To create a new simulation, select "New Simulation" from the "File" menu, after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used. - -By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later. - -### The Menu Bar - -In the File menu, you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations. - -Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. The Simulator menu offers the same options as the toolbar, which is described in the next section. - -Some menu items are only accessible when a simulation has already been created or loaded in the current window. - -### The Toolbar - -The toolbar is located at the top left of the simulator. The toolbar contains the functions most frequently needed by users. The toolbar offers four different functions: - -* Reset simulation: can only be activated when the simulation has been paused or has finished -* Repeat simulation: cannot be activated if the simulation has not yet been started -* Pause simulation: can only be activated when the simulation is currently running -* Start simulation: can only be activated when the simulation is not currently running and has not yet finished - -### The Visualization - -The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. The visualization shows processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time. - -The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later). - -Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options. A process can only be forced to crash or be revived via the popup menu during a running simulation. - -In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time. - -### Color Differentiation - -Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the following colors (these are only the default colors, which can be changed via the settings): - -``` -Process Colors: - Black - The simulation is not currently running - Green - The process is running normally - Orange - The mouse is over the process bar - Red - The process has crashed - -Message Colors: - Green - The message is still in transit - Blue - The message has successfully reached its destination - Red - The message was lost -``` - -### The Sidebar - -The sidebar is used to program process events. At the top, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events along with their occurrence times and PIDs. - -To create a new event, the user can either right-click on a process bar and select "Insert local event", or select an event below the event table, enter the event occurrence time in the text field below, and click "Apply". - -Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective. - -In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process. There, all variables of the process can be edited, providing another way to access a process editor. - -### The Log Window - -The log window (at the bottom) logs all occurring events in chronological order. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples. - -``` -000000ms: New Simulation -000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -``` - -By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance. - -### Expert Mode - -=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator in expert mode, showing a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. - -The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings. - -In expert mode, the following additional features become available: - -* Global events: In addition to local events, global events can now also be edited. Global events are triggered when a specific global simulation time is reached, rather than a local process time. This only makes a difference when local process times differ from the global time (e.g. due to clock drift). -* Direct PID selection: The user can directly select the associated PID when programming a new event. -* Lamport and Vector time switches: If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. Only one can be active at a time to maintain clarity. -* Anti-aliasing switch: Allows the user to activate or deactivate anti-aliasing for smoother graphics. Disabled by default for performance reasons. -* Log filter: A regular expression filter (Java syntax) that makes it possible to filter only the essential data from the logs. For example, `"PID: (1|2)"` shows only log lines containing "PID: 1" or "PID: 2". The filter can be activated retroactively and during a running simulation. - -### Configuration Settings - -The simulation settings window allows configuring many aspects of the simulation. Key settings include: - -* Processes receive own messages (default: false): Whether processes can receive messages they sent to themselves. -* Average message loss probabilities (default: true): Whether to average the loss probabilities of sender and receiver processes. -* Average transmission times (default: true): Whether to average the transmission times of sender and receiver processes. -* Show only relevant messages (default: true): Hides messages sent to processes that don't support the protocol. -* Expert mode (default: false): Enables expert mode features. -* Simulation speed (default: 0.5): The playback speed factor. A value of 1 means real-time, 0.5 means half speed. -* Number of processes (default: 3): Can also be changed during simulation via right-click. -* Simulation duration (default: 15s): Between 5 and 120 seconds. - -Each process also has individual settings: - -* Clock drift (default: 0.0): By what factor the local clock deviates. A value of 0.0 means no deviation. A value of 1.0 means double speed. Values > -1.0 are allowed. -* Random crash probability (default: 0%): Probability that the process crashes randomly during the simulation. -* Message loss probability (default: 0%): Probability that a message sent by this process is lost in transit. -* Min/Max transmission time (default: 500ms/2000ms): The range for random message delivery times. - -## Protocols and Examples - -The simulator comes with 10 built-in protocols. As described earlier, protocols are distinguished between server-side and client-side. Servers can respond to client messages, and clients can respond to server messages. Each process can support any number of protocols on both the client and server side. Users can also implement their own protocols using the simulator's Protocol API (see the Protocol API section). - -The program directory contains a `saved-simulations` folder with example simulations for each protocol as serialized `.dat` files. - -### Dummy Protocol - -The Dummy Protocol serves only as a template for creating custom protocols. When using the Dummy Protocol, only log messages are output when events occur. No further actions are performed. - -### Ping-Pong Protocol - -=> ./distributed-systems-simulator/ping-pong.png Visualization: The Ping-Pong Protocol showing two processes (P1 and P2) exchanging messages in a continuous back-and-forth pattern. Blue lines represent delivered messages bouncing between the process bars over a 15-second simulation. - -In the Ping-Pong Protocol, two processes -- Client P1 and Server P2 -- constantly send messages back and forth. The Ping-Pong client starts the first request, to which the server responds to the client. The client then responds again, and so on. Each message includes a counter that is incremented at each station and logged in the log window. - -``` -Programmed Ping-Pong Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------| -| 0 | 1 | Ping-Pong Client activate | -| 0 | 2 | Ping-Pong Server activate | -| 0 | 1 | Ping-Pong Client request start | -``` - -It is important that Process 1 activates its Ping-Pong client before starting a Ping-Pong client request. Before a process can start a request, it must have the corresponding protocol activated. This also applies to all other protocols. - -**Ping-Pong Storm Variant** - -=> ./distributed-systems-simulator/ping-pong-storm.png Visualization: The Ping-Pong Storm variant with three processes. P1 is the client, P2 and P3 are both servers. The visualization shows an exponentially growing number of messages as each client message generates two server responses, creating a dense web of blue and green message lines. - -By adding a third process P3 as an additional Ping-Pong server, a Ping-Pong "Storm" can be realized. Since every client message now receives two server responses, the number of messages doubles with each round, creating an exponential message flood. - -``` -Programmed Ping-Pong Storm Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------| -| 0 | 1 | Ping-Pong Client activate | -| 0 | 2 | Ping-Pong Server activate | -| 0 | 3 | Ping-Pong Server activate | -| 0 | 1 | Ping-Pong Client request start | -``` - -### Broadcast Protocol - -=> ./distributed-systems-simulator/broadcast.png Visualization: The Broadcast Protocol with 6 processes (P1-P6). Dense crossing message lines show how a broadcast from P1 propagates to all processes, with each process re-broadcasting to others. Blue lines indicate delivered messages, green lines indicate messages still in transit. - -The Broadcast Protocol behaves similarly to the Ping-Pong Protocol. The difference is that the protocol tracks -- using a unique Broadcast ID -- which messages have already been sent. Each process re-broadcasts all received messages to others, provided it has not already sent them. - -In this case, no distinction is made between client and server, so that the same action is performed when a message arrives at either side. This makes it possible, using multiple processes, to create a broadcast. P1 is the client and starts a request at 0ms and 2500ms. The simulation duration is exactly 5000ms. Since a client can only receive server messages and a server can only receive client messages, every process in this simulation is both server and client. - -``` -Programmed Broadcast Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------| -| 0 | 1-6 | Broadcast Client activate | -| 0 | 1-6 | Broadcast Server activate | -| 0 | 1 | Broadcast Client request start | -| 2500 | 1 | Broadcast Client request start | -``` - -### Internal Synchronization Protocol - -=> ./distributed-systems-simulator/int-sync.png Visualization: Internal Synchronization with 2 processes. P1 (client, clock drift 0.1) shows a faster-running clock reaching 15976ms by simulation end. The blue message lines show P1 periodically synchronizing with P2 (server, no drift), with the time corrections visible as slight adjustments in P1's timeline. - -The Internal Synchronization Protocol is used for synchronizing the local process time, which can be applied when a process time is running incorrectly due to clock drift. When the client wants to synchronize its (incorrect) local process time t_c with a server, it sends a client request. The server responds with its own local process time t_s, allowing the client to calculate a new, more accurate time for itself. - -After receiving the server response, the client P1 calculates its new local process time as: - -``` -t_c := t_s + 1/2 * (t'_min + t'_max) -``` - -This synchronizes P1's local time with an error of less than 1/2 * (t'_max - t'_min), where t'_min and t'_max are the assumed minimum and maximum transmission times configured in the protocol settings. - -In the example, the client process has a clock drift of 0.1 and the server has 0.0. The client starts a request at local process times 0ms, 5000ms, and 10000ms. By simulation end, P1's time is synchronized to 15976ms (an error of -976ms from the global 15000ms). - -``` -Programmed Internal Sync Events: - -| Time (ms) | PID | Event | -|-----------|-----|------------------------------------| -| 0 | 1 | Internal Sync Client activate | -| 0 | 2 | Internal Sync Server activate | -| 0 | 1 | Internal Sync Client request start | -| 5000 | 1 | Internal Sync Client request start | -| 10000 | 1 | Internal Sync Client request start | -``` - -Protocol variables (client-side): - -* Min. transmission time (Long: 500): The assumed t'_min in milliseconds -* Max. transmission time (Long: 2000): The assumed t'_max in milliseconds - -These can differ from the actual message transmission times t_min and t_max, allowing simulation of scenarios where the protocol is misconfigured and large synchronization errors occur. - -### Christian's Method (External Synchronization) - -=> ./distributed-systems-simulator/christians.png Visualization: Comparison of Internal Synchronization (P1) and Christian's Method (P3) with P2 as shared server. Both P1 and P3 have clock drift 0.1. The visualization shows P1 synchronized to 14567ms (error: -433ms) while P3 synchronized to 15539ms (error: -539ms), demonstrating the different accuracy of the two methods. - -Christian's Method uses the RTT (Round Trip Time) to approximate the transmission time of individual messages. When the client wants to synchronize its local time t_c with a server, it sends a request and measures the RTT t_rtt until the server response arrives. The server response contains the local process time t_s from the moment the server sent the response. The client then calculates its new local time as: - -``` -t_c := t_s + 1/2 * t_rtt -``` - -The accuracy is +/- (1/2 * t_rtt - u_min) where u_min is a lower bound for message transmission time. - -The visualization compares both synchronization methods side by side: P1 uses Internal Synchronization and P3 uses Christian's Method, with P2 serving both. Both P1 and P3 have clock drift 0.1. In this particular run, Internal Synchronization achieved a better result (-433ms error vs. -539ms), though results vary between runs due to random transmission times. - -``` -Programmed Comparison Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------------| -| 0 | 1 | Internal Sync Client activate | -| 0 | 1 | Internal Sync Client request start | -| 0 | 2 | Christian's Server activate | -| 0 | 2 | Internal Sync Server activate | -| 0 | 3 | Christian's Client activate | -| 0 | 3 | Christian's Client request start | -| 5000 | 1 | Internal Sync Client request start | -| 5000 | 3 | Christian's Client request start | -| 10000 | 1 | Internal Sync Client request start | -| 10000 | 3 | Christian's Client request start | -``` - -### Berkeley Algorithm - -=> ./distributed-systems-simulator/berkeley.png Visualization: The Berkeley Algorithm with 3 processes. P2 is the server (coordinator) sending time requests to clients P1 and P3. After collecting responses, P2 calculates correction values and sends them back. Final times show P1=16823ms, P2=14434ms, P3=13892ms -- all brought closer together through averaging. - -The Berkeley Algorithm is another method for synchronizing local clocks. This is the first protocol where the server initiates the requests. The server acts as a coordinator. The client processes are passive and must wait until a server request arrives. The server must know which client processes participate in the protocol, which is configured in the server's protocol settings. - -When the server wants to synchronize its local time t_s and the process times t_i of the clients (i = 1,...,n), it sends a server request. n is the number of participating clients. The clients then send their local process times back to the server. The server measures the RTTs r_i for all client responses. - -After all responses are received, the server sets its own time to the average t_avg of all known process times (including its own). The transmission time of a client response is estimated as half the RTT: - -``` -t_avg := 1/(n+1) * (t_s + SUM(r_i/2 + t_i)) -t_s := t_avg -``` - -The server then calculates a correction value k_i := t_avg - t_i for each client and sends it back. Each client sets its new time to t'_i := t'_i + k_i. - -``` -Programmed Berkeley Events: - -| Time (ms) | PID | Event | -|-----------|-----|-----------------------------------| -| 0 | 1 | Berkeley Client activate | -| 0 | 2 | Berkeley Server activate | -| 0 | 3 | Berkeley Client activate | -| 0 | 2 | Berkeley Server request start | -| 7500 | 2 | Berkeley Server request start | -``` - -Protocol variables (server-side): - -* PIDs of participating processes (Integer[]: [1,3]): The PIDs of the Berkeley client processes. The protocol will not work if a non-existent PID is specified or if the process does not support the Berkeley protocol on the client side. - -### One-Phase Commit Protocol - -=> ./distributed-systems-simulator/one-phase-commit.png Visualization: The One-Phase Commit Protocol with 3 processes. P1 crashes at 1000ms (shown in red) and recovers at 5000ms. P2 (server) periodically sends commit requests. The red lines show lost messages during P1's crash period, while blue lines show successful message exchanges after recovery. - -The One-Phase Commit Protocol is designed to move any number of clients to a commit. In practice, this could be creating or deleting a file that each client has a local copy of. The server is the coordinator and initiates the commit request. The server periodically resends the commit request until every client has acknowledged it. For this purpose, the PIDs of all participating client processes and a timer for resending must be configured. - -In the example, P1 and P3 are clients and P2 is the server. P1 crashes at 1000ms and recovers at 5000ms. The first two commit requests fail to reach P1 due to its crash. Only the third attempt succeeds. Each client acknowledges a commit request only once. - -``` -Programmed One-Phase Commit Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 1 | 1-Phase Commit Client activate | -| 0 | 2 | 1-Phase Commit Server activate | -| 0 | 3 | 1-Phase Commit Client activate | -| 0 | 2 | 1-Phase Commit Server request start | -| 1000 | 1 | Process crash | -| 5000 | 1 | Process revival | -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the commit request -* PIDs of participating processes (Integer[]: pids = [1,3]): The client process PIDs that should commit - -### Two-Phase Commit Protocol - -=> ./distributed-systems-simulator/two-phase-commit.png Visualization: The Two-Phase Commit Protocol with 3 processes. P2 (server) orchestrates a two-phase voting process with clients P1 and P3. The complex message pattern shows the voting phase followed by the commit/abort phase, with messages crossing between all three processes over a 10-second simulation. - -The Two-Phase Commit Protocol is an extension of the One-Phase Commit Protocol. The server first sends a request to all participating clients asking whether they want to commit. Each client responds with true or false. The server periodically retries until all results are collected. After receiving all votes, the server checks whether all clients voted true. If at least one client voted false, the commit process is aborted and a global result of false is sent to all clients. If all voted true, the global result true is sent. The global result is periodically resent until each client acknowledges receipt. - -In the example, P1 and P3 are clients and P2 is the server. The server sends its first request at 0ms. Here both P1 and P3 vote true, so the commit proceeds. - -``` -Programmed Two-Phase Commit Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 1 | 2-Phase Commit Client activate | -| 0 | 2 | 2-Phase Commit Server activate | -| 0 | 3 | 2-Phase Commit Client activate | -| 0 | 2 | 2-Phase Commit Server request start | -``` - -Example log extract showing the two-phase voting process: - -``` -000000ms: PID 2: Message sent; ID: 94; Protocol: 2-Phase Commit - Boolean: wantVote=true -000905ms: PID 3: Message received; ID: 94; Protocol: 2-Phase Commit -000905ms: PID 3: Message sent; ID: 95; Protocol: 2-Phase Commit - Integer: pid=3; Boolean: isVote=true; vote=true -000905ms: PID 3: Vote true sent -001880ms: PID 2: Message received; ID: 95; Protocol: 2-Phase Commit -001880ms: PID 2: Vote from Process 3 received! Result: true -001947ms: PID 1: Message received; ID: 94; Protocol: 2-Phase Commit -001947ms: PID 1: Vote true sent -003137ms: PID 2: Votes from all participating processes received! - Global result: true -003137ms: PID 2: Message sent; ID: 99; Protocol: 2-Phase Commit - Boolean: isVoteResult=true; voteResult=true -004124ms: PID 1: Global vote result received. Result: true -006051ms: PID 2: All participants have acknowledged the vote -010000ms: Simulation ended -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending -* PIDs of participating processes (Integer[]: pids = [1,3]): Client PIDs that should vote and commit - -Protocol variables (client-side): - -* Commit probability (Integer: ackProb = 50): The probability in percent that the client votes true (for commit) - -### Basic Multicast Protocol - -=> ./distributed-systems-simulator/basic-multicast.png Visualization: The Basic Multicast Protocol with 3 processes. P2 (client) sends periodic multicast messages to servers P1 and P3. P3 crashes at 3000ms (shown in red) and recovers at 6000ms. Red lines indicate lost messages, blue lines show delivered messages. Some messages to P1 are also lost due to the 30% message loss probability. - -The Basic Multicast Protocol is very simple. The client always initiates the request, which represents a simple multicast message. The Basic Multicast servers serve only to receive the message. No acknowledgments are sent. The client P2 sends a multicast message every 2500ms to servers P1 and P3. - -P1 can only receive multicast messages after 2500ms because it does not support the protocol before then. P3 is crashed from 3000ms to 6000ms and also cannot receive messages during that time. Each process has a 30% message loss probability, so some messages are lost in transit (shown in red). - -In this example, the 3rd multicast message to P3 and the 5th and 6th messages to P1 were lost. Only the 4th multicast message reached both destinations. - -``` -Programmed Basic Multicast Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 2 | Basic Multicast Client activate | -| 0 | 3 | Basic Multicast Server activate | -| 0 | 2 | Basic Multicast Client request start | -| 2500 | 1 | Basic Multicast Server activate | -| 2500 | 2 | Basic Multicast Client request start | -| 3000 | 3 | Process crash | -| 5000 | 2 | Basic Multicast Client request start | -| 6000 | 3 | Process revival | -| 7500 | 2 | Basic Multicast Client request start | -| 10000 | 2 | Basic Multicast Client request start | -| 12500 | 2 | Basic Multicast Client request start | -``` - -### Reliable Multicast Protocol - -=> ./distributed-systems-simulator/reliable-multicast.png Visualization: The Reliable Multicast Protocol with 3 processes. P2 (client) sends multicast messages to servers P1 and P3, retrying until acknowledgments are received from all servers. P3 crashes at 3000ms and recovers at 10000ms. Red lines show lost messages, blue lines show delivered ones. Despite failures, all servers eventually receive and acknowledge the multicast. - -In the Reliable Multicast Protocol, the client periodically resends its multicast message until it has received an acknowledgment from all participating servers. After each retry, the client "forgets" which servers have already acknowledged, so each new attempt must be acknowledged again by all participants. - -In the example, P2 is the client and P1 and P3 are the servers. At 0ms, the client initiates its multicast message. The message loss probability is set to 30% on all processes. The client needs exactly 5 attempts until successful delivery: - -* Attempt 1: P1 doesn't support the protocol yet. P3 receives the message but its ACK is lost. -* Attempt 2: The message to P1 is lost. P3 receives it but is crashed and can't process it. -* Attempt 3: P1 receives the message and ACKs successfully. The message to P3 is lost. -* Attempt 4: P1 receives and ACKs again. P3 receives it but is still crashed. -* Attempt 5: Both P1 and P3 receive the message and ACK successfully. - -``` -Programmed Reliable Multicast Events: - -| Time (ms) | PID | Event | -|-----------|-----|------------------------------------------| -| 0 | 3 | Reliable Multicast Server activate | -| 0 | 2 | Reliable Multicast Client activate | -| 0 | 2 | Reliable Multicast Client request start | -| 2500 | 1 | Reliable Multicast Server activate | -| 3000 | 3 | Process crash | -| 10000 | 3 | Process revival | -``` - -Example log extract: - -``` -000000ms: PID 2: Reliable Multicast Client activated -000000ms: PID 2: Message sent; ID: 280; Protocol: Reliable Multicast - Boolean: isMulticast=true -000000ms: PID 3: Reliable Multicast Server activated -001590ms: PID 3: Message received; ID: 280; Protocol: Reliable Multicast -001590ms: PID 3: ACK sent -002500ms: PID 1: Reliable Multicast Server activated -002500ms: PID 2: Message sent; ID: 282; Protocol: Reliable Multicast - Boolean: isMulticast=true -003000ms: PID 3: Crashed -005000ms: PID 2: Message sent; ID: 283; Protocol: Reliable Multicast -005952ms: PID 1: Message received; ID: 283 -005952ms: PID 1: ACK sent -007937ms: PID 2: ACK from Process 1 received! -... -011813ms: PID 2: ACK from Process 3 received! -011813ms: PID 2: ACKs from all participating processes received! -015000ms: Simulation ended -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the multicast -* PIDs of participating processes (Integer[]: pids = [1,3]): Server PIDs that should receive the multicast - -## Additional Examples - -### Lamport and Vector Timestamps - -=> ./distributed-systems-simulator/lamport-timestamps.png Visualization: Lamport Timestamps displayed on the Berkeley Algorithm simulation. Each event on a process bar shows its Lamport timestamp as a number in parentheses. The timestamps increase monotonically and are updated according to the Lamport clock rules when messages are sent and received between P1, P2, and P3. - -> "For many purposes, it is sufficient that all machines agree on the same time. It is not necessary that this time also agrees with real time, like every hour announced on the radio... For a certain class of algorithms, only the internal consistency of clocks is important." - Andrew Tanenbaum - -Clocks that provide such a time are also known as logical clocks. Two implementations are realized in the simulator: Lamport timestamps and vector timestamps. - -After activating the Lamport time switch in expert mode, the current Lamport timestamp appears at every event of a process. Each process has its own Lamport timestamp that is incremented when a message is sent or received. Each message carries the current Lamport time t_l(i) of the sending process i. When another process j receives this message, its Lamport timestamp t_l(j) is recalculated as: - -``` -t_l(j) := 1 + max(t_l(j), t_l(i)) -``` - -The larger Lamport time of the sender and receiver process is used and then incremented by 1. After the Berkeley simulation shown here, P1 has Lamport timestamp 16, P2 has 14, and P3 has 15. - -=> ./distributed-systems-simulator/vector-timestamps.png Visualization: Vector Timestamps displayed on the same Berkeley Algorithm simulation. Each event shows its vector timestamp as a tuple (v1,v2,v3) representing the known state of all three processes. The tuples grow as processes communicate and merge their knowledge of each other's progress. - -With the active vector time switch, all vector timestamps are displayed. Like the Lamport timestamp, each message includes the current vector timestamp of the sending process. With n participating processes, the vector timestamp v has size n. Each participating process i has its own index, accessible via v(i). When v is the vector timestamp of the receiving process j and w is the vector timestamp of the sending process, the new local vector timestamp of process j is calculated as follows: - -``` -for (i := 0; i < n; i++) { - if (i = j) { - v(i)++; - } else if (v(i) < w(i)) { - v(i) := w(i); - } -} -``` - -By default, the vector timestamp is only incremented when a message is sent or received. In both cases, the sender and receiver each increment their own index in the vector timestamp by 1. Upon receiving a message, the local vector timestamp is then compared with the sender's, and the larger value is taken for all indices. - -After the simulation, P1 has vector timestamp (8,10,6), P2 has (6,10,6), and P3 has (6,10,8). - -The simulation settings include boolean variables "Lamport times affect all events" and "Vector times affect all events" (both default to false). When set to true, all events (not just message send/receive) will update the timestamps. - -### Simulating Slow Connections - -=> ./distributed-systems-simulator/slow-connection.png Visualization: Slow connection simulation comparing Internal Synchronization (P1) and Christian's Method (P3) with P2 as server. P3 has high transmission times (2000-8000ms) simulating a slow network connection. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms), showing how slow connections degrade synchronization quality. - -The simulator can also simulate slow connections to a specific process. This example revisits the comparison of Internal Synchronization (P1) and Christian's Method (P3), with P2 serving both. In this scenario, P3 has a poor network connection, so messages to and from P3 always require a longer transmission time. - -P3's minimum transmission time is set to 2000ms and maximum to 8000ms, while P1 and P2 keep the defaults (500ms/2000ms). The simulation duration is 20000ms. With the "Average transmission times" setting enabled, the effective transmission time for messages involving P3 is: - -``` -1/2 * (rand(500,2000) + rand(2000,8000)) = 1/2 * rand(2500,10000) = rand(1250,5000)ms -``` - -Because P3 starts a new request before receiving the answer to its previous one, and because it always associates server responses with its most recently sent request, its RTT calculations become incorrect on each round, and its local time is poorly synchronized. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms). - -### Raft Consensus Failover - -=> ./distributed-systems-simulator/raft-consensus-failover.png Screenshot: A 60-second Raft simulation with three processes. P1 starts as the initial leader, crashes at 3500ms, later recovers, P2 wins the reelection and remains leader, and P3 crashes later. The blue and red message lines show the continuing heartbeat and acknowledgment traffic during and after failover. - -While modernizing ds-sim, I also added a simplified Raft Consensus example. The simulation is intentionally small: three processes, one initial leader, one crash, a clean reelection, a recovery of the old leader, and then another crash later in the run. This makes it possible to see the most important Raft transitions without being overwhelmed by cluster size. - -The event log tells a very readable story. At `0ms`, `P1` starts as the initial leader in `term 0`. It immediately sends a heartbeat and an `appendEntry` message carrying the log entry `cmd1`. `P2` joins at `100ms`, `P3` at `1700ms`, and both acknowledge the leader's traffic. At that point the cluster is healthy: one leader, two followers, successful heartbeats, and successful log replication. - -At `3500ms`, `P1` crashes. The followers still process the last in-flight messages, but once the election timeout expires, `P2` becomes a candidate and sends a `voteRequest` for `term 1`. `P3` grants that vote, and at `9395ms` the log records the decisive line: - -``` -009395ms: PID: 2; ... Leader elected by majority vote: process 2 (term 1) -``` - -That transition is followed immediately by new heartbeats and a new `appendEntry`, which is exactly what you want to see in a Raft simulation: leadership is not just declared, it is exercised. - -At `12002ms`, the old leader `P1` recovers. Importantly, it does not try to reclaim control. Instead, it receives heartbeats from `P2` and answers with `heartbeatAck` messages, rejoining the cluster as a follower. That is one of the most useful teaching moments in the log, because it makes the term-based leadership model concrete: the recovered node does not become leader again just because it used to be one. - -At `20000ms`, `P3` crashes. The cluster continues running with `P2` as leader and `P1` as follower for the rest of the 60-second simulation. The log remains dominated by periodic heartbeats from `P2` and acknowledgments from `P1`, showing that the system stays stable even after a second failure. - -This single scenario demonstrates several core Raft properties in one replay: - -* Stable startup leadership -* Heartbeats and follower acknowledgments -* Log replication -* Leader failure detection -* Majority-based reelection -* Safe reintegration of a recovered former leader -* Continued service after a later follower crash - -It is also a good example of why a simulator is useful for distributed systems. In a real production system, reconstructing this sort of sequence would require stitching together logs from multiple nodes. Here, the message flow, the crashes, the recoveries, and the Lamport/vector timestamps are all visible in one place. - -## Protocol API - -The simulator was designed from the ground up to be extensible. Users can implement their own protocols in Java by extending the `VSAbstractProtocol` base class. Each protocol has its own class in the `protocols.implementations` package. - -### Class Hierarchy - -``` -VSAbstractEvent - +-- VSAbstractProtocol (base class for all protocols) - +-- VSDummyProtocol - +-- VSPingPongProtocol - +-- VSBroadcastProtocol - +-- VSInternalTimeSyncProtocol - +-- VSExternalTimeSyncProtocol - +-- VSBerkeleyTimeProtocol - +-- VSOnePhaseCommitProtocol - +-- VSTwoPhaseCommitProtocol - +-- VSBasicMulticastProtocol - +-- VSReliableMulticastProtocol -``` - -### Implementing a Custom Protocol - -Each protocol class must implement the following methods: - -* A public constructor: Must specify whether the client or the server initiates requests, using `VSAbstractProtocol.HAS_ON_CLIENT_START` or `VSAbstractProtocol.HAS_ON_SERVER_START`. -* `onClientInit()` / `onServerInit()`: Called once before the protocol is first used. Used to initialize protocol variables and attributes via the VSPrefs methods (e.g. `initVector`, `initLong`). Variables initialized this way appear in the process editor and can be configured by the user. -* `onClientReset()` / `onServerReset()`: Called each time the simulation is reset. -* `onClientStart()` / `onServerStart()`: Called when the client/server initiates a request. Typically creates and sends a `VSMessage` object. -* `onClientRecv(VSMessage)` / `onServerRecv(VSMessage)`: Called when a message arrives. -* `onClientSchedule()` / `onServerSchedule()`: Called when a scheduled alarm fires. -* `toString()`: Optional. Customizes log output for this protocol. - -### Available API Methods - -Methods inherited from `VSAbstractProtocol`: - -* `sendMessage(VSMessage message)`: Sends a protocol message (automatically updates Lamport and Vector timestamps) -* `hasOnServerStart()`: Whether the server or client initiates requests -* `isServer()` / `isClient()`: Whether the current process has the protocol activated as server/client -* `scheduleAt(long time)`: Creates an alarm that fires at the given local process time, triggering `onClientSchedule()` or `onServerSchedule()` -* `removeSchedules()`: Cancels all pending alarms in the current context -* `getNumProcesses()`: Returns the total number of processes in the simulation - -Process methods available via the inherited `process` attribute: - -* `getTime()` / `setTime(long)`: Get/set the local process time -* `getGlobalTime()`: Get the current global simulation time -* `getClockVariance()` / `setClockVariance(float)`: Get/set the clock drift -* `getLamportTime()` / `setLamportTime(long)`: Get/set the Lamport timestamp -* `getVectorTime()` / `updateVectorTime(VSVectorTime)`: Get/update the vector timestamp -* `getProcessID()`: Get the process PID -* `isCrashed()` / `isCrashed(boolean)`: Check or set crash state -* `getRandomPercentage()`: Get a random value between 0 and 100 - -Message methods (`VSMessage`): - -* `new VSMessage()`: Create a new message -* `getMessageID()`: Get the message NID -* `setBoolean(key, value)` / `getBoolean(key)`: Set/get boolean data -* `setInteger(key, value)` / `getInteger(key)`: Set/get integer data -* `setLong(key, value)` / `getLong(key)`: Set/get long data -* `setString(key, value)` / `getString(key)`: Set/get string data -* `getSendingProcess()`: Get a reference to the sending process -* `isServerMessage()`: Whether it's a server or client message - -### Example: Reliable Multicast Implementation - -Here is a condensed example showing key parts of the Reliable Multicast Protocol implementation: - -```java -public class VSReliableMulticastProtocol extends VSAbstractProtocol { - public VSReliableMulticastProtocol() { - // The client initiates requests - super(VSAbstractProtocol.HAS_ON_CLIENT_START); - super.setClassname(super.getClass().toString()); - } - - private ArrayList pids; - - // Initialize protocol variables (editable in the process editor) - public void onClientInit() { - Vector vec = new Vector(); - vec.add(1); vec.add(3); - super.initVector("pids", vec, "PIDs of participating processes"); - super.initLong("timeout", 2500, "Time until resend", "ms"); - } - - // Send multicast to all servers that haven't ACKed yet - public void onClientStart() { - if (pids.size() != 0) { - long timeout = super.getLong("timeout") + process.getTime(); - super.scheduleAt(timeout); - VSMessage message = new VSMessage(); - message.setBoolean("isMulticast", true); - super.sendMessage(message); - } - } - - // Handle ACK from a server - public void onClientRecv(VSMessage recvMessage) { - if (pids.size() != 0 && recvMessage.getBoolean("isAck")) { - Integer pid = recvMessage.getIntegerObj("pid"); - if (pids.contains(pid)) - pids.remove(pid); - super.log("ACK from Process " + pid + " received!"); - if (pids.size() == 0) { - super.log("ACKs from all processes received!"); - super.removeSchedules(); - } - } - } - - // Retry on timeout - public void onClientSchedule() { onClientStart(); } -} -``` - -## Project Statistics - -The original VS-Sim project (August 2008) was written in Java 6 and consisted of: - -* 61 source files across 12 Java packages -* Approximately 15,710 lines of code -* 2.2 MB of generated Javadoc documentation -* 142 KB compiled JAR file -* 10 built-in protocols -* 163 configurable settings - -The modernized successor ds-sim (version 1.1.0) has been updated to Java 21 and translated to English: - -* 146 source files (117 main + 29 test) across 19 Java packages -* Approximately 27,900 lines of code (22,400 main + 5,500 test) -* 12 built-in protocols -* 208 unit tests -* 269 configurable settings - -=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg -=> https://codeberg.org/snonux/vs-sim vs-sim source code on Codeberg (original German version, 2008) - -Other related posts are: - -=> ./2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 Loadbars 0.13.0 released -=> ./2022-12-24-ultrarelearning-java-my-takeaways.gmi 2022-12-24 (Re)learning Java - My takeaways -=> ./2022-03-06-the-release-of-dtail-4.0.0.gmi 2022-03-06 The release of DTail 4.0.0 -=> ./2016-11-20-object-oriented-programming-with-ansi-c.gmi 2016-11-20 Object oriented programming with ANSI C - -E-Mail your comments to `paul@nospam.buetow.org` - -=> ../ Back to the main site diff --git a/gemfeed/DRAFT-distributed-systems-simulator.gmi.tpl b/gemfeed/DRAFT-distributed-systems-simulator.gmi.tpl deleted file mode 100644 index 50dfa58e..00000000 --- a/gemfeed/DRAFT-distributed-systems-simulator.gmi.tpl +++ /dev/null @@ -1,806 +0,0 @@ -# Distributed Systems Simulator v1.1.0 released - -> DRAFT - Not yet published - -This blog explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment. - -=> https://codeberg.org/snonux/ds-sim ds-sim on Codeberg (modernized, English-translated version) - -=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. - -<< template::inline::toc - -## Motivation - -Distributed systems are complex—interactions between nodes, network partitions, failure scenarios are hard to debug in production. A simulator lets you experiment with architectures, observe how systems behave under failure, and learn consensus algorithms, replication strategies, and fault tolerance in a controlled, repeatable environment. No operational overhead, no real infrastructure—just focused exploration of system design. - -In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system: - -> "A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum - -The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system. - -This thesis aims to make distributed systems easier to understand from a different angle. Instead of the end-user perspective, it focuses on the functional methods of protocols and their processes, making all relevant events of a distributed system transparent. - -To achieve this, I developed a simulator, particularly for teaching and learning at the University of Applied Sciences Aachen. Protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there's room for personal experiments—no restriction to a fixed number of protocols. Users can design their own. - -The original simulator (VS-Sim) was written in Java 6 in 2008 with a German-language UI. In 2025, I revamped and modernized it as ds-sim: translated the entire codebase and UI from German to English, migrated the build system from hand-rolled Ant scripts to Maven, upgraded from Java 6 to Java 21 (adopting sealed class hierarchies, record types, formatted strings, pattern matching), introduced a proper exception hierarchy and consistent error handling, added comprehensive Javadoc documentation, implemented a headless testing framework (208 unit tests covering core components, the event system, and all protocol implementations), reorganized the project structure to follow standard Maven conventions, and added architecture documentation. Total: 199 files, over 15,000 lines of new code. Back in 2008, I wrote every line by hand in Vim. For the 2025 modernization, Claude Code did most of the heavy lifting—translation, refactoring, test generation, documentation. Times have changed. - -## Installation - -The modernized ds-sim requires Java 21 or higher and Maven 3.8 or higher. - -``` -# Clone the repository -git clone https://codeberg.org/snonux/ds-sim.git -cd ds-sim - -# Set JAVA_HOME if needed (e.g. on Fedora Linux) -export JAVA_HOME=/usr/lib/jvm/java-21-openjdk - -# Build the project -mvn clean package - -# Run the simulator -java -jar target/ds-sim-*.jar -``` - -For a faster development build without running tests: - -``` -mvn package -DskipTests -``` - -After building, the following artifacts are available in the `target/` directory: - -* `ds-sim-1.1.0.jar` - Executable JAR with all dependencies bundled -* `original-ds-sim-1.1.0.jar` - JAR without dependencies - -The project also includes 208 unit tests that can be run with `mvn test`. Example simulation files for all built-in protocols are included in the `saved-simulations/` directory. - -=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg - -## Fundamentals - -For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters. - -### Client/Server Model - -``` -+-----------------------------------------+ -| | -| +--------+ +--------+ | -| | Client |<------->| Server | | -| +--------+ +--------+ | -| | -| Sending of Messages | -| | -+-----------------------------------------+ - -Figure 1.1: Client/Server Model -``` - -The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate. - -### Processes and Their Roles - -A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID). - -### Messages - -In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID). - -### Local and Global Clocks - -In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong. - -Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later. - -``` -+---------------------+ +---------------------+ -| Process 1 | | Process 2 | -| | | | -| +-----------------+ | | +-----------------+ | -| |Server Protocol A| | | |Client Protocol A| | -| +-----------------+ | | +-----------------+ | -| | | | -| +-----------------+ | +---------------------+ -| |Client Protocol B| | -| +-----------------+ | +---------------------+ -| | | Process 3 | -+---------------------+ | | - | +-----------------+ | - | |Server Protocol B| | - | +-----------------+ | - | | - +---------------------+ - -Figure 1.2: Client/Server Protocols -``` - -In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in the "Additional Examples" section. - -### Events - -A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective. - -Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as: - -* Message receive events: Triggered when a message arrives at a recipient process -* Protocol schedule events (alarms): Triggered by a timer set by a protocol, e.g. for retransmission timeouts -* Random events: Such as random process crashes based on configured crash probability - -### Protocols - -A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol. - -In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other. - -Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol later). - -## Graphical User Interface (GUI) - -### Simple Mode - -=> ./distributed-systems-simulator/ds-sim-screenshot2.png Screenshot: The simulator showing the settings dialog. The visualization area displays process bars with message lines between them. The settings window allows configuring simulation parameters like number of processes, simulation duration, clock drift, message loss probability, and more. - -The simulator requires JDK 21 and can be started with the command `java -jar target/ds-sim-VERSION.jar` - -The simulator then presents itself with a main window. To create a new simulation, select "New Simulation" from the "File" menu, after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used. - -By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later. - -### The Menu Bar - -In the File menu, you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations. - -Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. The Simulator menu offers the same options as the toolbar, which is described in the next section. - -Some menu items are only accessible when a simulation has already been created or loaded in the current window. - -### The Toolbar - -The toolbar is located at the top left of the simulator. The toolbar contains the functions most frequently needed by users. The toolbar offers four different functions: - -* Reset simulation: can only be activated when the simulation has been paused or has finished -* Repeat simulation: cannot be activated if the simulation has not yet been started -* Pause simulation: can only be activated when the simulation is currently running -* Start simulation: can only be activated when the simulation is not currently running and has not yet finished - -### The Visualization - -The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. The visualization shows processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time. - -The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later). - -Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options. A process can only be forced to crash or be revived via the popup menu during a running simulation. - -In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time. - -### Color Differentiation - -Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the following colors (these are only the default colors, which can be changed via the settings): - -``` -Process Colors: - Black - The simulation is not currently running - Green - The process is running normally - Orange - The mouse is over the process bar - Red - The process has crashed - -Message Colors: - Green - The message is still in transit - Blue - The message has successfully reached its destination - Red - The message was lost -``` - -### The Sidebar - -The sidebar is used to program process events. At the top, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events along with their occurrence times and PIDs. - -To create a new event, the user can either right-click on a process bar and select "Insert local event", or select an event below the event table, enter the event occurrence time in the text field below, and click "Apply". - -Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective. - -In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process. There, all variables of the process can be edited, providing another way to access a process editor. - -### The Log Window - -The log window (at the bottom) logs all occurring events in chronological order. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples. - -``` -000000ms: New Simulation -000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0) -``` - -By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance. - -### Expert Mode - -=> ./distributed-systems-simulator/ds-sim-screenshot.png Screenshot: The Distributed Systems Simulator in expert mode, showing a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit. - -The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings. - -In expert mode, the following additional features become available: - -* Global events: In addition to local events, global events can now also be edited. Global events are triggered when a specific global simulation time is reached, rather than a local process time. This only makes a difference when local process times differ from the global time (e.g. due to clock drift). -* Direct PID selection: The user can directly select the associated PID when programming a new event. -* Lamport and Vector time switches: If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. Only one can be active at a time to maintain clarity. -* Anti-aliasing switch: Allows the user to activate or deactivate anti-aliasing for smoother graphics. Disabled by default for performance reasons. -* Log filter: A regular expression filter (Java syntax) that makes it possible to filter only the essential data from the logs. For example, `"PID: (1|2)"` shows only log lines containing "PID: 1" or "PID: 2". The filter can be activated retroactively and during a running simulation. - -### Configuration Settings - -The simulation settings window allows configuring many aspects of the simulation. Key settings include: - -* Processes receive own messages (default: false): Whether processes can receive messages they sent to themselves. -* Average message loss probabilities (default: true): Whether to average the loss probabilities of sender and receiver processes. -* Average transmission times (default: true): Whether to average the transmission times of sender and receiver processes. -* Show only relevant messages (default: true): Hides messages sent to processes that don't support the protocol. -* Expert mode (default: false): Enables expert mode features. -* Simulation speed (default: 0.5): The playback speed factor. A value of 1 means real-time, 0.5 means half speed. -* Number of processes (default: 3): Can also be changed during simulation via right-click. -* Simulation duration (default: 15s): Between 5 and 120 seconds. - -Each process also has individual settings: - -* Clock drift (default: 0.0): By what factor the local clock deviates. A value of 0.0 means no deviation. A value of 1.0 means double speed. Values > -1.0 are allowed. -* Random crash probability (default: 0%): Probability that the process crashes randomly during the simulation. -* Message loss probability (default: 0%): Probability that a message sent by this process is lost in transit. -* Min/Max transmission time (default: 500ms/2000ms): The range for random message delivery times. - -## Protocols and Examples - -The simulator comes with 10 built-in protocols. As described earlier, protocols are distinguished between server-side and client-side. Servers can respond to client messages, and clients can respond to server messages. Each process can support any number of protocols on both the client and server side. Users can also implement their own protocols using the simulator's Protocol API (see the Protocol API section). - -The program directory contains a `saved-simulations` folder with example simulations for each protocol as serialized `.dat` files. - -### Dummy Protocol - -The Dummy Protocol serves only as a template for creating custom protocols. When using the Dummy Protocol, only log messages are output when events occur. No further actions are performed. - -### Ping-Pong Protocol - -=> ./distributed-systems-simulator/ping-pong.png Visualization: The Ping-Pong Protocol showing two processes (P1 and P2) exchanging messages in a continuous back-and-forth pattern. Blue lines represent delivered messages bouncing between the process bars over a 15-second simulation. - -In the Ping-Pong Protocol, two processes -- Client P1 and Server P2 -- constantly send messages back and forth. The Ping-Pong client starts the first request, to which the server responds to the client. The client then responds again, and so on. Each message includes a counter that is incremented at each station and logged in the log window. - -``` -Programmed Ping-Pong Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------| -| 0 | 1 | Ping-Pong Client activate | -| 0 | 2 | Ping-Pong Server activate | -| 0 | 1 | Ping-Pong Client request start | -``` - -It is important that Process 1 activates its Ping-Pong client before starting a Ping-Pong client request. Before a process can start a request, it must have the corresponding protocol activated. This also applies to all other protocols. - -**Ping-Pong Storm Variant** - -=> ./distributed-systems-simulator/ping-pong-storm.png Visualization: The Ping-Pong Storm variant with three processes. P1 is the client, P2 and P3 are both servers. The visualization shows an exponentially growing number of messages as each client message generates two server responses, creating a dense web of blue and green message lines. - -By adding a third process P3 as an additional Ping-Pong server, a Ping-Pong "Storm" can be realized. Since every client message now receives two server responses, the number of messages doubles with each round, creating an exponential message flood. - -``` -Programmed Ping-Pong Storm Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------| -| 0 | 1 | Ping-Pong Client activate | -| 0 | 2 | Ping-Pong Server activate | -| 0 | 3 | Ping-Pong Server activate | -| 0 | 1 | Ping-Pong Client request start | -``` - -### Broadcast Protocol - -=> ./distributed-systems-simulator/broadcast.png Visualization: The Broadcast Protocol with 6 processes (P1-P6). Dense crossing message lines show how a broadcast from P1 propagates to all processes, with each process re-broadcasting to others. Blue lines indicate delivered messages, green lines indicate messages still in transit. - -The Broadcast Protocol behaves similarly to the Ping-Pong Protocol. The difference is that the protocol tracks -- using a unique Broadcast ID -- which messages have already been sent. Each process re-broadcasts all received messages to others, provided it has not already sent them. - -In this case, no distinction is made between client and server, so that the same action is performed when a message arrives at either side. This makes it possible, using multiple processes, to create a broadcast. P1 is the client and starts a request at 0ms and 2500ms. The simulation duration is exactly 5000ms. Since a client can only receive server messages and a server can only receive client messages, every process in this simulation is both server and client. - -``` -Programmed Broadcast Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------| -| 0 | 1-6 | Broadcast Client activate | -| 0 | 1-6 | Broadcast Server activate | -| 0 | 1 | Broadcast Client request start | -| 2500 | 1 | Broadcast Client request start | -``` - -### Internal Synchronization Protocol - -=> ./distributed-systems-simulator/int-sync.png Visualization: Internal Synchronization with 2 processes. P1 (client, clock drift 0.1) shows a faster-running clock reaching 15976ms by simulation end. The blue message lines show P1 periodically synchronizing with P2 (server, no drift), with the time corrections visible as slight adjustments in P1's timeline. - -The Internal Synchronization Protocol is used for synchronizing the local process time, which can be applied when a process time is running incorrectly due to clock drift. When the client wants to synchronize its (incorrect) local process time t_c with a server, it sends a client request. The server responds with its own local process time t_s, allowing the client to calculate a new, more accurate time for itself. - -After receiving the server response, the client P1 calculates its new local process time as: - -``` -t_c := t_s + 1/2 * (t'_min + t'_max) -``` - -This synchronizes P1's local time with an error of less than 1/2 * (t'_max - t'_min), where t'_min and t'_max are the assumed minimum and maximum transmission times configured in the protocol settings. - -In the example, the client process has a clock drift of 0.1 and the server has 0.0. The client starts a request at local process times 0ms, 5000ms, and 10000ms. By simulation end, P1's time is synchronized to 15976ms (an error of -976ms from the global 15000ms). - -``` -Programmed Internal Sync Events: - -| Time (ms) | PID | Event | -|-----------|-----|------------------------------------| -| 0 | 1 | Internal Sync Client activate | -| 0 | 2 | Internal Sync Server activate | -| 0 | 1 | Internal Sync Client request start | -| 5000 | 1 | Internal Sync Client request start | -| 10000 | 1 | Internal Sync Client request start | -``` - -Protocol variables (client-side): - -* Min. transmission time (Long: 500): The assumed t'_min in milliseconds -* Max. transmission time (Long: 2000): The assumed t'_max in milliseconds - -These can differ from the actual message transmission times t_min and t_max, allowing simulation of scenarios where the protocol is misconfigured and large synchronization errors occur. - -### Christian's Method (External Synchronization) - -=> ./distributed-systems-simulator/christians.png Visualization: Comparison of Internal Synchronization (P1) and Christian's Method (P3) with P2 as shared server. Both P1 and P3 have clock drift 0.1. The visualization shows P1 synchronized to 14567ms (error: -433ms) while P3 synchronized to 15539ms (error: -539ms), demonstrating the different accuracy of the two methods. - -Christian's Method uses the RTT (Round Trip Time) to approximate the transmission time of individual messages. When the client wants to synchronize its local time t_c with a server, it sends a request and measures the RTT t_rtt until the server response arrives. The server response contains the local process time t_s from the moment the server sent the response. The client then calculates its new local time as: - -``` -t_c := t_s + 1/2 * t_rtt -``` - -The accuracy is +/- (1/2 * t_rtt - u_min) where u_min is a lower bound for message transmission time. - -The visualization compares both synchronization methods side by side: P1 uses Internal Synchronization and P3 uses Christian's Method, with P2 serving both. Both P1 and P3 have clock drift 0.1. In this particular run, Internal Synchronization achieved a better result (-433ms error vs. -539ms), though results vary between runs due to random transmission times. - -``` -Programmed Comparison Events: - -| Time (ms) | PID | Event | -|-----------|-----|--------------------------------------| -| 0 | 1 | Internal Sync Client activate | -| 0 | 1 | Internal Sync Client request start | -| 0 | 2 | Christian's Server activate | -| 0 | 2 | Internal Sync Server activate | -| 0 | 3 | Christian's Client activate | -| 0 | 3 | Christian's Client request start | -| 5000 | 1 | Internal Sync Client request start | -| 5000 | 3 | Christian's Client request start | -| 10000 | 1 | Internal Sync Client request start | -| 10000 | 3 | Christian's Client request start | -``` - -### Berkeley Algorithm - -=> ./distributed-systems-simulator/berkeley.png Visualization: The Berkeley Algorithm with 3 processes. P2 is the server (coordinator) sending time requests to clients P1 and P3. After collecting responses, P2 calculates correction values and sends them back. Final times show P1=16823ms, P2=14434ms, P3=13892ms -- all brought closer together through averaging. - -The Berkeley Algorithm is another method for synchronizing local clocks. This is the first protocol where the server initiates the requests. The server acts as a coordinator. The client processes are passive and must wait until a server request arrives. The server must know which client processes participate in the protocol, which is configured in the server's protocol settings. - -When the server wants to synchronize its local time t_s and the process times t_i of the clients (i = 1,...,n), it sends a server request. n is the number of participating clients. The clients then send their local process times back to the server. The server measures the RTTs r_i for all client responses. - -After all responses are received, the server sets its own time to the average t_avg of all known process times (including its own). The transmission time of a client response is estimated as half the RTT: - -``` -t_avg := 1/(n+1) * (t_s + SUM(r_i/2 + t_i)) -t_s := t_avg -``` - -The server then calculates a correction value k_i := t_avg - t_i for each client and sends it back. Each client sets its new time to t'_i := t'_i + k_i. - -``` -Programmed Berkeley Events: - -| Time (ms) | PID | Event | -|-----------|-----|-----------------------------------| -| 0 | 1 | Berkeley Client activate | -| 0 | 2 | Berkeley Server activate | -| 0 | 3 | Berkeley Client activate | -| 0 | 2 | Berkeley Server request start | -| 7500 | 2 | Berkeley Server request start | -``` - -Protocol variables (server-side): - -* PIDs of participating processes (Integer[]: [1,3]): The PIDs of the Berkeley client processes. The protocol will not work if a non-existent PID is specified or if the process does not support the Berkeley protocol on the client side. - -### One-Phase Commit Protocol - -=> ./distributed-systems-simulator/one-phase-commit.png Visualization: The One-Phase Commit Protocol with 3 processes. P1 crashes at 1000ms (shown in red) and recovers at 5000ms. P2 (server) periodically sends commit requests. The red lines show lost messages during P1's crash period, while blue lines show successful message exchanges after recovery. - -The One-Phase Commit Protocol is designed to move any number of clients to a commit. In practice, this could be creating or deleting a file that each client has a local copy of. The server is the coordinator and initiates the commit request. The server periodically resends the commit request until every client has acknowledged it. For this purpose, the PIDs of all participating client processes and a timer for resending must be configured. - -In the example, P1 and P3 are clients and P2 is the server. P1 crashes at 1000ms and recovers at 5000ms. The first two commit requests fail to reach P1 due to its crash. Only the third attempt succeeds. Each client acknowledges a commit request only once. - -``` -Programmed One-Phase Commit Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 1 | 1-Phase Commit Client activate | -| 0 | 2 | 1-Phase Commit Server activate | -| 0 | 3 | 1-Phase Commit Client activate | -| 0 | 2 | 1-Phase Commit Server request start | -| 1000 | 1 | Process crash | -| 5000 | 1 | Process revival | -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the commit request -* PIDs of participating processes (Integer[]: pids = [1,3]): The client process PIDs that should commit - -### Two-Phase Commit Protocol - -=> ./distributed-systems-simulator/two-phase-commit.png Visualization: The Two-Phase Commit Protocol with 3 processes. P2 (server) orchestrates a two-phase voting process with clients P1 and P3. The complex message pattern shows the voting phase followed by the commit/abort phase, with messages crossing between all three processes over a 10-second simulation. - -The Two-Phase Commit Protocol is an extension of the One-Phase Commit Protocol. The server first sends a request to all participating clients asking whether they want to commit. Each client responds with true or false. The server periodically retries until all results are collected. After receiving all votes, the server checks whether all clients voted true. If at least one client voted false, the commit process is aborted and a global result of false is sent to all clients. If all voted true, the global result true is sent. The global result is periodically resent until each client acknowledges receipt. - -In the example, P1 and P3 are clients and P2 is the server. The server sends its first request at 0ms. Here both P1 and P3 vote true, so the commit proceeds. - -``` -Programmed Two-Phase Commit Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 1 | 2-Phase Commit Client activate | -| 0 | 2 | 2-Phase Commit Server activate | -| 0 | 3 | 2-Phase Commit Client activate | -| 0 | 2 | 2-Phase Commit Server request start | -``` - -Example log extract showing the two-phase voting process: - -``` -000000ms: PID 2: Message sent; ID: 94; Protocol: 2-Phase Commit - Boolean: wantVote=true -000905ms: PID 3: Message received; ID: 94; Protocol: 2-Phase Commit -000905ms: PID 3: Message sent; ID: 95; Protocol: 2-Phase Commit - Integer: pid=3; Boolean: isVote=true; vote=true -000905ms: PID 3: Vote true sent -001880ms: PID 2: Message received; ID: 95; Protocol: 2-Phase Commit -001880ms: PID 2: Vote from Process 3 received! Result: true -001947ms: PID 1: Message received; ID: 94; Protocol: 2-Phase Commit -001947ms: PID 1: Vote true sent -003137ms: PID 2: Votes from all participating processes received! - Global result: true -003137ms: PID 2: Message sent; ID: 99; Protocol: 2-Phase Commit - Boolean: isVoteResult=true; voteResult=true -004124ms: PID 1: Global vote result received. Result: true -006051ms: PID 2: All participants have acknowledged the vote -010000ms: Simulation ended -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending -* PIDs of participating processes (Integer[]: pids = [1,3]): Client PIDs that should vote and commit - -Protocol variables (client-side): - -* Commit probability (Integer: ackProb = 50): The probability in percent that the client votes true (for commit) - -### Basic Multicast Protocol - -=> ./distributed-systems-simulator/basic-multicast.png Visualization: The Basic Multicast Protocol with 3 processes. P2 (client) sends periodic multicast messages to servers P1 and P3. P3 crashes at 3000ms (shown in red) and recovers at 6000ms. Red lines indicate lost messages, blue lines show delivered messages. Some messages to P1 are also lost due to the 30% message loss probability. - -The Basic Multicast Protocol is very simple. The client always initiates the request, which represents a simple multicast message. The Basic Multicast servers serve only to receive the message. No acknowledgments are sent. The client P2 sends a multicast message every 2500ms to servers P1 and P3. - -P1 can only receive multicast messages after 2500ms because it does not support the protocol before then. P3 is crashed from 3000ms to 6000ms and also cannot receive messages during that time. Each process has a 30% message loss probability, so some messages are lost in transit (shown in red). - -In this example, the 3rd multicast message to P3 and the 5th and 6th messages to P1 were lost. Only the 4th multicast message reached both destinations. - -``` -Programmed Basic Multicast Events: - -| Time (ms) | PID | Event | -|-----------|-----|----------------------------------------| -| 0 | 2 | Basic Multicast Client activate | -| 0 | 3 | Basic Multicast Server activate | -| 0 | 2 | Basic Multicast Client request start | -| 2500 | 1 | Basic Multicast Server activate | -| 2500 | 2 | Basic Multicast Client request start | -| 3000 | 3 | Process crash | -| 5000 | 2 | Basic Multicast Client request start | -| 6000 | 3 | Process revival | -| 7500 | 2 | Basic Multicast Client request start | -| 10000 | 2 | Basic Multicast Client request start | -| 12500 | 2 | Basic Multicast Client request start | -``` - -### Reliable Multicast Protocol - -=> ./distributed-systems-simulator/reliable-multicast.png Visualization: The Reliable Multicast Protocol with 3 processes. P2 (client) sends multicast messages to servers P1 and P3, retrying until acknowledgments are received from all servers. P3 crashes at 3000ms and recovers at 10000ms. Red lines show lost messages, blue lines show delivered ones. Despite failures, all servers eventually receive and acknowledge the multicast. - -In the Reliable Multicast Protocol, the client periodically resends its multicast message until it has received an acknowledgment from all participating servers. After each retry, the client "forgets" which servers have already acknowledged, so each new attempt must be acknowledged again by all participants. - -In the example, P2 is the client and P1 and P3 are the servers. At 0ms, the client initiates its multicast message. The message loss probability is set to 30% on all processes. The client needs exactly 5 attempts until successful delivery: - -* Attempt 1: P1 doesn't support the protocol yet. P3 receives the message but its ACK is lost. -* Attempt 2: The message to P1 is lost. P3 receives it but is crashed and can't process it. -* Attempt 3: P1 receives the message and ACKs successfully. The message to P3 is lost. -* Attempt 4: P1 receives and ACKs again. P3 receives it but is still crashed. -* Attempt 5: Both P1 and P3 receive the message and ACK successfully. - -``` -Programmed Reliable Multicast Events: - -| Time (ms) | PID | Event | -|-----------|-----|------------------------------------------| -| 0 | 3 | Reliable Multicast Server activate | -| 0 | 2 | Reliable Multicast Client activate | -| 0 | 2 | Reliable Multicast Client request start | -| 2500 | 1 | Reliable Multicast Server activate | -| 3000 | 3 | Process crash | -| 10000 | 3 | Process revival | -``` - -Example log extract: - -``` -000000ms: PID 2: Reliable Multicast Client activated -000000ms: PID 2: Message sent; ID: 280; Protocol: Reliable Multicast - Boolean: isMulticast=true -000000ms: PID 3: Reliable Multicast Server activated -001590ms: PID 3: Message received; ID: 280; Protocol: Reliable Multicast -001590ms: PID 3: ACK sent -002500ms: PID 1: Reliable Multicast Server activated -002500ms: PID 2: Message sent; ID: 282; Protocol: Reliable Multicast - Boolean: isMulticast=true -003000ms: PID 3: Crashed -005000ms: PID 2: Message sent; ID: 283; Protocol: Reliable Multicast -005952ms: PID 1: Message received; ID: 283 -005952ms: PID 1: ACK sent -007937ms: PID 2: ACK from Process 1 received! -... -011813ms: PID 2: ACK from Process 3 received! -011813ms: PID 2: ACKs from all participating processes received! -015000ms: Simulation ended -``` - -Protocol variables (server-side): - -* Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the multicast -* PIDs of participating processes (Integer[]: pids = [1,3]): Server PIDs that should receive the multicast - -## Additional Examples - -### Lamport and Vector Timestamps - -=> ./distributed-systems-simulator/lamport-timestamps.png Visualization: Lamport Timestamps displayed on the Berkeley Algorithm simulation. Each event on a process bar shows its Lamport timestamp as a number in parentheses. The timestamps increase monotonically and are updated according to the Lamport clock rules when messages are sent and received between P1, P2, and P3. - -> "For many purposes, it is sufficient that all machines agree on the same time. It is not necessary that this time also agrees with real time, like every hour announced on the radio... For a certain class of algorithms, only the internal consistency of clocks is important." - Andrew Tanenbaum - -Clocks that provide such a time are also known as logical clocks. Two implementations are realized in the simulator: Lamport timestamps and vector timestamps. - -After activating the Lamport time switch in expert mode, the current Lamport timestamp appears at every event of a process. Each process has its own Lamport timestamp that is incremented when a message is sent or received. Each message carries the current Lamport time t_l(i) of the sending process i. When another process j receives this message, its Lamport timestamp t_l(j) is recalculated as: - -``` -t_l(j) := 1 + max(t_l(j), t_l(i)) -``` - -The larger Lamport time of the sender and receiver process is used and then incremented by 1. After the Berkeley simulation shown here, P1 has Lamport timestamp 16, P2 has 14, and P3 has 15. - -=> ./distributed-systems-simulator/vector-timestamps.png Visualization: Vector Timestamps displayed on the same Berkeley Algorithm simulation. Each event shows its vector timestamp as a tuple (v1,v2,v3) representing the known state of all three processes. The tuples grow as processes communicate and merge their knowledge of each other's progress. - -With the active vector time switch, all vector timestamps are displayed. Like the Lamport timestamp, each message includes the current vector timestamp of the sending process. With n participating processes, the vector timestamp v has size n. Each participating process i has its own index, accessible via v(i). When v is the vector timestamp of the receiving process j and w is the vector timestamp of the sending process, the new local vector timestamp of process j is calculated as follows: - -``` -for (i := 0; i < n; i++) { - if (i = j) { - v(i)++; - } else if (v(i) < w(i)) { - v(i) := w(i); - } -} -``` - -By default, the vector timestamp is only incremented when a message is sent or received. In both cases, the sender and receiver each increment their own index in the vector timestamp by 1. Upon receiving a message, the local vector timestamp is then compared with the sender's, and the larger value is taken for all indices. - -After the simulation, P1 has vector timestamp (8,10,6), P2 has (6,10,6), and P3 has (6,10,8). - -The simulation settings include boolean variables "Lamport times affect all events" and "Vector times affect all events" (both default to false). When set to true, all events (not just message send/receive) will update the timestamps. - -### Simulating Slow Connections - -=> ./distributed-systems-simulator/slow-connection.png Visualization: Slow connection simulation comparing Internal Synchronization (P1) and Christian's Method (P3) with P2 as server. P3 has high transmission times (2000-8000ms) simulating a slow network connection. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms), showing how slow connections degrade synchronization quality. - -The simulator can also simulate slow connections to a specific process. This example revisits the comparison of Internal Synchronization (P1) and Christian's Method (P3), with P2 serving both. In this scenario, P3 has a poor network connection, so messages to and from P3 always require a longer transmission time. - -P3's minimum transmission time is set to 2000ms and maximum to 8000ms, while P1 and P2 keep the defaults (500ms/2000ms). The simulation duration is 20000ms. With the "Average transmission times" setting enabled, the effective transmission time for messages involving P3 is: - -``` -1/2 * (rand(500,2000) + rand(2000,8000)) = 1/2 * rand(2500,10000) = rand(1250,5000)ms -``` - -Because P3 starts a new request before receiving the answer to its previous one, and because it always associates server responses with its most recently sent request, its RTT calculations become incorrect on each round, and its local time is poorly synchronized. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms). - -### Raft Consensus Failover - -=> ./distributed-systems-simulator/raft-consensus-failover.png Screenshot: A 60-second Raft simulation with three processes. P1 starts as the initial leader, crashes at 3500ms, later recovers, P2 wins the reelection and remains leader, and P3 crashes later. The blue and red message lines show the continuing heartbeat and acknowledgment traffic during and after failover. - -While modernizing ds-sim, I also added a simplified Raft Consensus example. The simulation is intentionally small: three processes, one initial leader, one crash, a clean reelection, a recovery of the old leader, and then another crash later in the run. This makes it possible to see the most important Raft transitions without being overwhelmed by cluster size. - -The event log tells a very readable story. At `0ms`, `P1` starts as the initial leader in `term 0`. It immediately sends a heartbeat and an `appendEntry` message carrying the log entry `cmd1`. `P2` joins at `100ms`, `P3` at `1700ms`, and both acknowledge the leader's traffic. At that point the cluster is healthy: one leader, two followers, successful heartbeats, and successful log replication. - -At `3500ms`, `P1` crashes. The followers still process the last in-flight messages, but once the election timeout expires, `P2` becomes a candidate and sends a `voteRequest` for `term 1`. `P3` grants that vote, and at `9395ms` the log records the decisive line: - -``` -009395ms: PID: 2; ... Leader elected by majority vote: process 2 (term 1) -``` - -That transition is followed immediately by new heartbeats and a new `appendEntry`, which is exactly what you want to see in a Raft simulation: leadership is not just declared, it is exercised. - -At `12002ms`, the old leader `P1` recovers. Importantly, it does not try to reclaim control. Instead, it receives heartbeats from `P2` and answers with `heartbeatAck` messages, rejoining the cluster as a follower. That is one of the most useful teaching moments in the log, because it makes the term-based leadership model concrete: the recovered node does not become leader again just because it used to be one. - -At `20000ms`, `P3` crashes. The cluster continues running with `P2` as leader and `P1` as follower for the rest of the 60-second simulation. The log remains dominated by periodic heartbeats from `P2` and acknowledgments from `P1`, showing that the system stays stable even after a second failure. - -This single scenario demonstrates several core Raft properties in one replay: - -* Stable startup leadership -* Heartbeats and follower acknowledgments -* Log replication -* Leader failure detection -* Majority-based reelection -* Safe reintegration of a recovered former leader -* Continued service after a later follower crash - -It is also a good example of why a simulator is useful for distributed systems. In a real production system, reconstructing this sort of sequence would require stitching together logs from multiple nodes. Here, the message flow, the crashes, the recoveries, and the Lamport/vector timestamps are all visible in one place. - -## Protocol API - -The simulator was designed from the ground up to be extensible. Users can implement their own protocols in Java by extending the `VSAbstractProtocol` base class. Each protocol has its own class in the `protocols.implementations` package. - -### Class Hierarchy - -``` -VSAbstractEvent - +-- VSAbstractProtocol (base class for all protocols) - +-- VSDummyProtocol - +-- VSPingPongProtocol - +-- VSBroadcastProtocol - +-- VSInternalTimeSyncProtocol - +-- VSExternalTimeSyncProtocol - +-- VSBerkeleyTimeProtocol - +-- VSOnePhaseCommitProtocol - +-- VSTwoPhaseCommitProtocol - +-- VSBasicMulticastProtocol - +-- VSReliableMulticastProtocol -``` - -### Implementing a Custom Protocol - -Each protocol class must implement the following methods: - -* A public constructor: Must specify whether the client or the server initiates requests, using `VSAbstractProtocol.HAS_ON_CLIENT_START` or `VSAbstractProtocol.HAS_ON_SERVER_START`. -* `onClientInit()` / `onServerInit()`: Called once before the protocol is first used. Used to initialize protocol variables and attributes via the VSPrefs methods (e.g. `initVector`, `initLong`). Variables initialized this way appear in the process editor and can be configured by the user. -* `onClientReset()` / `onServerReset()`: Called each time the simulation is reset. -* `onClientStart()` / `onServerStart()`: Called when the client/server initiates a request. Typically creates and sends a `VSMessage` object. -* `onClientRecv(VSMessage)` / `onServerRecv(VSMessage)`: Called when a message arrives. -* `onClientSchedule()` / `onServerSchedule()`: Called when a scheduled alarm fires. -* `toString()`: Optional. Customizes log output for this protocol. - -### Available API Methods - -Methods inherited from `VSAbstractProtocol`: - -* `sendMessage(VSMessage message)`: Sends a protocol message (automatically updates Lamport and Vector timestamps) -* `hasOnServerStart()`: Whether the server or client initiates requests -* `isServer()` / `isClient()`: Whether the current process has the protocol activated as server/client -* `scheduleAt(long time)`: Creates an alarm that fires at the given local process time, triggering `onClientSchedule()` or `onServerSchedule()` -* `removeSchedules()`: Cancels all pending alarms in the current context -* `getNumProcesses()`: Returns the total number of processes in the simulation - -Process methods available via the inherited `process` attribute: - -* `getTime()` / `setTime(long)`: Get/set the local process time -* `getGlobalTime()`: Get the current global simulation time -* `getClockVariance()` / `setClockVariance(float)`: Get/set the clock drift -* `getLamportTime()` / `setLamportTime(long)`: Get/set the Lamport timestamp -* `getVectorTime()` / `updateVectorTime(VSVectorTime)`: Get/update the vector timestamp -* `getProcessID()`: Get the process PID -* `isCrashed()` / `isCrashed(boolean)`: Check or set crash state -* `getRandomPercentage()`: Get a random value between 0 and 100 - -Message methods (`VSMessage`): - -* `new VSMessage()`: Create a new message -* `getMessageID()`: Get the message NID -* `setBoolean(key, value)` / `getBoolean(key)`: Set/get boolean data -* `setInteger(key, value)` / `getInteger(key)`: Set/get integer data -* `setLong(key, value)` / `getLong(key)`: Set/get long data -* `setString(key, value)` / `getString(key)`: Set/get string data -* `getSendingProcess()`: Get a reference to the sending process -* `isServerMessage()`: Whether it's a server or client message - -### Example: Reliable Multicast Implementation - -Here is a condensed example showing key parts of the Reliable Multicast Protocol implementation: - -```java -public class VSReliableMulticastProtocol extends VSAbstractProtocol { - public VSReliableMulticastProtocol() { - // The client initiates requests - super(VSAbstractProtocol.HAS_ON_CLIENT_START); - super.setClassname(super.getClass().toString()); - } - - private ArrayList pids; - - // Initialize protocol variables (editable in the process editor) - public void onClientInit() { - Vector vec = new Vector(); - vec.add(1); vec.add(3); - super.initVector("pids", vec, "PIDs of participating processes"); - super.initLong("timeout", 2500, "Time until resend", "ms"); - } - - // Send multicast to all servers that haven't ACKed yet - public void onClientStart() { - if (pids.size() != 0) { - long timeout = super.getLong("timeout") + process.getTime(); - super.scheduleAt(timeout); - VSMessage message = new VSMessage(); - message.setBoolean("isMulticast", true); - super.sendMessage(message); - } - } - - // Handle ACK from a server - public void onClientRecv(VSMessage recvMessage) { - if (pids.size() != 0 && recvMessage.getBoolean("isAck")) { - Integer pid = recvMessage.getIntegerObj("pid"); - if (pids.contains(pid)) - pids.remove(pid); - super.log("ACK from Process " + pid + " received!"); - if (pids.size() == 0) { - super.log("ACKs from all processes received!"); - super.removeSchedules(); - } - } - } - - // Retry on timeout - public void onClientSchedule() { onClientStart(); } -} -``` - -## Project Statistics - -The original VS-Sim project (August 2008) was written in Java 6 and consisted of: - -* 61 source files across 12 Java packages -* Approximately 15,710 lines of code -* 2.2 MB of generated Javadoc documentation -* 142 KB compiled JAR file -* 10 built-in protocols -* 163 configurable settings - -The modernized successor ds-sim (version 1.1.0) has been updated to Java 21 and translated to English: - -* 146 source files (117 main + 29 test) across 19 Java packages -* Approximately 27,900 lines of code (22,400 main + 5,500 test) -* 12 built-in protocols -* 208 unit tests -* 269 configurable settings - -=> https://codeberg.org/snonux/ds-sim ds-sim source code on Codeberg -=> https://codeberg.org/snonux/vs-sim vs-sim source code on Codeberg (original German version, 2008) - -Other related posts are: - -<< template::inline::rindex java object-oriented-programming release - -E-Mail your comments to `paul@nospam.buetow.org` - -=> ../ Back to the main site diff --git a/gemfeed/atom.xml b/gemfeed/atom.xml index e1157fb0..3b4f21af 100644 --- a/gemfeed/atom.xml +++ b/gemfeed/atom.xml @@ -1,6233 +1,4535 @@ - 2026-03-28T00:29:48+02:00 + 2026-03-30T09:16:50+03:00 foo.zone feed To be in the .zone! gemini://foo.zone/ - RCM: The Ruby Configuration Management DSL - - gemini://foo.zone/gemfeed/2026-03-02-rcm-ruby-configuration-management-dsl.gmi - 2026-03-02T00:00:00+02:00 + Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API + + gemini://foo.zone/gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi + 2026-04-02T00:00:00+03:00 Paul Buetow aka snonux paul@dev.buetow.org - RCM is a tiny configuration management system written in Ruby. It gives me a small DSL for describing how I want my machines to look, then it applies the changes: create files and directories, manage packages, and make sure certain lines exist in configuration files. It's deliberately KISS and optimised for a single person's machines instead of a whole fleet. + This is the third and final blog post of the Distributed Systems Simulator series. This part covers advanced simulation examples, the Raft consensus protocol, and the extensible Protocol API.
-

RCM: The Ruby Configuration Management DSL


+

Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API



-Published at 2026-03-02T00:00:00+02:00
+Published at 2026-04-02T00:00:00+03:00

-RCM is a tiny configuration management system written in Ruby. It gives me a small DSL for describing how I want my machines to look, then it applies the changes: create files and directories, manage packages, and make sure certain lines exist in configuration files. It's deliberately KISS and optimised for a single person's machines instead of a whole fleet.
+This is the third and final blog post of the Distributed Systems Simulator series. This part covers advanced simulation examples, the Raft consensus protocol, and the extensible Protocol API.

-RCM DSL in action
+ds-sim on Codeberg (modernized, English-translated version)
+
+These are all the posts of this series:
+
+2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI
+2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols
+2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API (You are currently reading this)
+
+Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit.

Table of Contents




-

Why I built RCM


+

Additional Examples



-I've used (and still use) the usual suspects in configuration management: Puppet, Ansible, etc. They are powerful, but also come with orchestration layers, agents, inventories, and a lot of moving parts. For my personal machines I wanted something smaller: one Ruby process, one configuration file, a few resource types, and good enough safety features.
-
-I've always been a fan of Ruby's metaprogramming features, and this project let me explore them in a focused, practical way.
+

Lamport and Vector Timestamps



-Because of that metaprogramming support, Ruby is a great fit for DSLs. You can get very close to natural language without inventing a brand-new syntax. RCM leans into that: the goal is to read a configuration and understand what happens without jumping between multiple files or templating languages.
+Visualization: Lamport Timestamps displayed on the Berkeley Algorithm simulation. Each event on a process bar shows its Lamport timestamp as a number in parentheses. The timestamps increase monotonically and are updated according to the Lamport clock rules when messages are sent and received between P1, P2, and P3.

-RCM repo on Codeberg
+"For many purposes, it is sufficient that all machines agree on the same time. It is not necessary that this time also agrees with real time, like every hour announced on the radio... For a certain class of algorithms, only the internal consistency of clocks is important." - Andrew Tanenbaum

-

How the DSL feels


+Clocks that provide such a time are also known as logical clocks. Two implementations are realized in the simulator: Lamport timestamps and vector timestamps.

-An RCM configuration starts with a configure block. Inside it you declare resources (file, package, given, notify, …). RCM figures out dependencies between resources and runs them in the right order.
+After activating the Lamport time switch in expert mode, the current Lamport timestamp appears at every event of a process. Each process has its own Lamport timestamp that is incremented when a message is sent or received. Each message carries the current Lamport time t_l(i) of the sending process i. When another process j receives this message, its Lamport timestamp t_l(j) is recalculated as:

- -
configure do
-  given { hostname is :earth }
-
-  file '/tmp/test/wg0.conf' do
-    requires file '/etc/hosts.test'
-    manage directory
-    from template
-    'content with <%= 1 + 2 %>'
-  end
-
-  file '/etc/hosts.test' do
-    line '192.168.1.101 earth'
-  end
-end
+
+t_l(j) := 1 + max(t_l(j), t_l(i))
 

-Which would look like this when run:
+The larger Lamport time of the sender and receiver process is used and then incremented by 1. After the Berkeley simulation shown here, P1 has Lamport timestamp 16, P2 has 14, and P3 has 15.

- -
% sudo ruby example.rb
-INFO 20260301-213817 dsl(0) => Configuring...
-INFO 20260301-213817 file('/tmp/test/wg0.conf') => Registered dependency on file('/etc/hosts.test')
-INFO 20260301-213817 file('/tmp/test/wg0.conf') => Evaluating...
-INFO 20260301-213817 file('/etc/hosts.test') => Evaluating...
-INFO 20260301-213817 file('/etc/hosts.test') => Writing file /etc/hosts.test
-INFO 20260301-213817 file('/tmp/test/wg0.conf') => Creating parent directory /tmp/test
-INFO 20260301-213817 file('/tmp/test/wg0.conf') => Writing file /tmp/test/wg0.conf
+Visualization: Vector Timestamps displayed on the same Berkeley Algorithm simulation. Each event shows its vector timestamp as a tuple (v1,v2,v3) representing the known state of all three processes. The tuples grow as processes communicate and merge their knowledge of each other's progress.
+
+With the active vector time switch, all vector timestamps are displayed. Like the Lamport timestamp, each message includes the current vector timestamp of the sending process. With n participating processes, the vector timestamp v has size n. Each participating process i has its own index, accessible via v(i). When v is the vector timestamp of the receiving process j and w is the vector timestamp of the sending process, the new local vector timestamp of process j is calculated as follows:
+
+
+for (i := 0; i < n; i++) {
+    if (i = j) {
+        v(i)++;
+    } else if (v(i) < w(i)) {
+        v(i) := w(i);
+    }
+}
 

-The idea is that you describe the desired state and RCM worries about the steps. The given block can short‑circuit the whole run (for example, only run on a specific hostname). Each file resource can either manage a complete file (from a template) or just make sure individual lines are present.
+By default, the vector timestamp is only incremented when a message is sent or received. In both cases, the sender and receiver each increment their own index in the vector timestamp by 1. Upon receiving a message, the local vector timestamp is then compared with the sender's, and the larger value is taken for all indices.

-

Keywords and resources


+After the simulation, P1 has vector timestamp (8,10,6), P2 has (6,10,6), and P3 has (6,10,8).

-Under the hood, each DSL word is either a keyword or a resource:
+The simulation settings include boolean variables "Lamport times affect all events" and "Vector times affect all events" (both default to false). When set to true, all events (not just message send/receive) will update the timestamps.

-
    -
  • Keyword is the base class for all top‑level DSL constructs.
  • -
  • Resource is the base class for things RCM can manage (files, packages, and so on).
  • -

-Resources can declare dependencies with requires. Before a resource runs, RCM makes sure all its requirements are satisfied and only evaluates each resource once per run. This keeps the mental model simple even when you compose more complex configurations.
+

Simulating Slow Connections



-

Files, directories, and templates


+Visualization: Slow connection simulation comparing Internal Synchronization (P1) and Christian's Method (P3) with P2 as server. P3 has high transmission times (2000-8000ms) simulating a slow network connection. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms), showing how slow connections degrade synchronization quality.

-The file resource handles three common cases:
+The simulator can also simulate slow connections to a specific process. This example revisits the comparison of Internal Synchronization (P1) and Christian's Method (P3), with P2 serving both. In this scenario, P3 has a poor network connection, so messages to and from P3 always require a longer transmission time.

-
    -
  • Managing parent directories (manage directory) so you don't have to create them manually.
  • -
  • Rendering ERB templates (from template) so you can mix Ruby expressions into config files.
  • -
  • Ensuring individual lines exist (line) for the many "append this line if missing" situations.
  • -

-Every write operation creates a backup copy in .rcmbackup/, so you can always inspect what changed and roll back manually if needed.
+P3's minimum transmission time is set to 2000ms and maximum to 8000ms, while P1 and P2 keep the defaults (500ms/2000ms). The simulation duration is 20000ms. With the "Average transmission times" setting enabled, the effective transmission time for messages involving P3 is:

-

How Ruby's metaprogramming helps


+
+1/2 * (rand(500,2000) + rand(2000,8000)) = 1/2 * rand(2500,10000) = rand(1250,5000)ms
+

-The nice thing about RCM is that the Ruby code you write in your configuration is not that different from the Ruby code inside RCM itself. The DSL is just a thin layer on top.
+Because P3 starts a new request before receiving the answer to its previous one, and because it always associates server responses with its most recently sent request, its RTT calculations become incorrect on each round, and its local time is poorly synchronized. P1 synchronizes to 21446ms (error: -1446ms) while P3 only reaches 16557ms (error: -3443ms).

-For example, when you write:
+

Raft Consensus Failover



- -
file '/etc/hosts.test' do
-  line '192.168.1.101 earth'
-end
-
+Screenshot: A 60-second Raft simulation with three processes. P1 starts as the initial leader, crashes at 3500ms, later recovers, P2 wins the reelection and remains leader, and P3 crashes later. The blue and red message lines show the continuing heartbeat and acknowledgment traffic during and after failover.

-Ruby turns file into a method call and '/etc/hosts.test' into a normal argument. Inside RCM, that method builds a File resource object and stores it for later. The block you pass is just a Ruby block; RCM calls it with the file resource as self, so method calls like line configure that resource. There is no special parser here, just plain Ruby method and block dispatch.
+While modernizing ds-sim, I also added a simplified Raft Consensus example. The simulation is intentionally small: three processes, one initial leader, one crash, a clean reelection, a recovery of the old leader, and then another crash later in the run. This makes it possible to see the most important Raft transitions without being overwhelmed by cluster size.

-The same goes for constructs like:
+The event log tells a very readable story. At 0ms, P1 starts as the initial leader in term 0. It immediately sends a heartbeat and an appendEntry message carrying the log entry cmd1. P2 joins at 100ms, P3 at 1700ms, and both acknowledge the leader's traffic. At that point the cluster is healthy: one leader, two followers, successful heartbeats, and successful log replication.

- -
given { hostname is :earth }
+At 3500ms, P1 crashes. The followers still process the last in-flight messages, but once the election timeout expires, P2 becomes a candidate and sends a voteRequest for term 1. P3 grants that vote, and at 9395ms the log records the decisive line:
+
+
+009395ms: PID: 2; ... Leader elected by majority vote: process 2 (term 1)
 

-RCM uses Ruby's dynamic method lookup to interpret hostname and is in that block and to decide whether the rest of the configuration should run at all. Features like method_missing, blocks, and the ability to change what self means in a block make this kind of DSL possible with very little code. You still get all the power of Ruby (conditionals, loops, helper methods), but the surface reads like a small language of its own.
+That transition is followed immediately by new heartbeats and a new appendEntry, which is exactly what you want to see in a Raft simulation: leadership is not just declared, it is exercised.

-

A bit more about method_missing


+At 12002ms, the old leader P1 recovers. Importantly, it does not try to reclaim control. Instead, it receives heartbeats from P2 and answers with heartbeatAck messages, rejoining the cluster as a follower. That is one of the most useful teaching moments in the log, because it makes the term-based leadership model concrete: the recovered node does not become leader again just because it used to be one.

-method_missing is one of the key tools that make the RCM DSL feel natural. In plain Ruby, if you call a method that does not exist, you get a NoMethodError. But before Ruby raises that error, it checks whether the object implements method_missing. If it does, Ruby calls that instead and lets the object decide what to do.
+At 20000ms, P3 crashes. The cluster continues running with P2 as leader and P1 as follower for the rest of the 60-second simulation. The log remains dominated by periodic heartbeats from P2 and acknowledgments from P1, showing that the system stays stable even after a second failure.

-In RCM, you can write things like:
+This single scenario demonstrates several core Raft properties in one replay:

- -
given { hostname is :earth }
+
    +
  • Stable startup leadership
  • +
  • Heartbeats and follower acknowledgments
  • +
  • Log replication
  • +
  • Leader failure detection
  • +
  • Majority-based reelection
  • +
  • Safe reintegration of a recovered former leader
  • +
  • Continued service after a later follower crash
  • +

+It is also a good example of why a simulator is useful for distributed systems. In a real production system, reconstructing this sort of sequence would require stitching together logs from multiple nodes. Here, the message flow, the crashes, the recoveries, and the Lamport/vector timestamps are all visible in one place.
+
+

Protocol API


+
+The simulator was designed from the ground up to be extensible. Users can implement their own protocols in Java by extending the VSAbstractProtocol base class. Each protocol has its own class in the protocols.implementations package.
+
+

Class Hierarchy


+
+
+VSAbstractEvent
+  +-- VSAbstractProtocol (base class for all protocols)
+        +-- VSDummyProtocol
+        +-- VSPingPongProtocol
+        +-- VSBroadcastProtocol
+        +-- VSInternalTimeSyncProtocol
+        +-- VSExternalTimeSyncProtocol
+        +-- VSBerkeleyTimeProtocol
+        +-- VSOnePhaseCommitProtocol
+        +-- VSTwoPhaseCommitProtocol
+        +-- VSBasicMulticastProtocol
+        +-- VSReliableMulticastProtocol
 

-Inside that block, calls such as hostname and is don't map to normal Ruby methods. Instead, RCM's DSL objects see those calls in method_missing, and interpret them as "check the current hostname" and "compare it to this symbol". This lets the DSL stay small and flexible: adding a new keyword can be as simple as handling another case in method_missing, without changing the Ruby syntax at all.
+

Implementing a Custom Protocol



-Put differently: you can write what looks like a tiny English sentence (hostname is :earth) and Ruby breaks it into method calls (hostname, then is) that RCM can interpret dynamically. Those "barewords" are not special syntax; they are just regular Ruby method names that the DSL catches and turns into configuration logic at runtime.
+Each protocol class must implement the following methods:

-Here's a simplified sketch of how such a condition object could look in Ruby:
+
    +
  • A public constructor: Must specify whether the client or the server initiates requests, using VSAbstractProtocol.HAS_ON_CLIENT_START or VSAbstractProtocol.HAS_ON_SERVER_START.
  • +
  • onClientInit() / onServerInit(): Called once before the protocol is first used. Used to initialize protocol variables and attributes via the VSPrefs methods (e.g. initVector, initLong). Variables initialized this way appear in the process editor and can be configured by the user.
  • +
  • onClientReset() / onServerReset(): Called each time the simulation is reset.
  • +
  • onClientStart() / onServerStart(): Called when the client/server initiates a request. Typically creates and sends a VSMessage object.
  • +
  • onClientRecv(VSMessage) / onServerRecv(VSMessage): Called when a message arrives.
  • +
  • onClientSchedule() / onServerSchedule(): Called when a scheduled alarm fires.
  • +
  • toString(): Optional. Customizes log output for this protocol.
  • +

+

Available API Methods


+
+Methods inherited from VSAbstractProtocol:
+
+
    +
  • sendMessage(VSMessage message): Sends a protocol message (automatically updates Lamport and Vector timestamps)
  • +
  • hasOnServerStart(): Whether the server or client initiates requests
  • +
  • isServer() / isClient(): Whether the current process has the protocol activated as server/client
  • +
  • scheduleAt(long time): Creates an alarm that fires at the given local process time, triggering onClientSchedule() or onServerSchedule()
  • +
  • removeSchedules(): Cancels all pending alarms in the current context
  • +
  • getNumProcesses(): Returns the total number of processes in the simulation
  • +

+Process methods available via the inherited process attribute:
+
+
    +
  • getTime() / setTime(long): Get/set the local process time
  • +
  • getGlobalTime(): Get the current global simulation time
  • +
  • getClockVariance() / setClockVariance(float): Get/set the clock drift
  • +
  • getLamportTime() / setLamportTime(long): Get/set the Lamport timestamp
  • +
  • getVectorTime() / updateVectorTime(VSVectorTime): Get/update the vector timestamp
  • +
  • getProcessID(): Get the process PID
  • +
  • isCrashed() / isCrashed(boolean): Check or set crash state
  • +
  • getRandomPercentage(): Get a random value between 0 and 100
  • +

+Message methods (VSMessage):
+
+
    +
  • new VSMessage(): Create a new message
  • +
  • getMessageID(): Get the message NID
  • +
  • setBoolean(key, value) / getBoolean(key): Set/get boolean data
  • +
  • setInteger(key, value) / getInteger(key): Set/get integer data
  • +
  • setLong(key, value) / getLong(key): Set/get long data
  • +
  • setString(key, value) / getString(key): Set/get string data
  • +
  • getSendingProcess(): Get a reference to the sending process
  • +
  • isServerMessage(): Whether it's a server or client message
  • +

+

Example: Reliable Multicast Implementation


+
+Here is a condensed example showing key parts of the Reliable Multicast Protocol implementation:

-
class HostCondition
-  def initialize
-    @current_hostname = Socket.gethostname.to_sym
-  end
+
public class VSReliableMulticastProtocol extends VSAbstractProtocol {
+    public VSReliableMulticastProtocol() {
+        // The client initiates requests
+        super(VSAbstractProtocol.HAS_ON_CLIENT_START);
+        super.setClassname(super.getClass().toString());
+    }
 
-  def method_missing(name, *args, &)
-    case name
-    when :hostname
-      @left = @current_hostname
-      self               # allow chaining: hostname is :earth
-    when :is
-      @left == args.first
-    else
-      super
-    end
-  end
-end
+    private ArrayList<Integer> pids;
 
-HostCondition.new.hostname.is(:earth)
+    // Initialize protocol variables (editable in the process editor)
+    public void onClientInit() {
+        Vector<Integer> vec = new Vector<Integer>();
+        vec.add(1); vec.add(3);
+        super.initVector("pids", vec, "PIDs of participating processes");
+        super.initLong("timeout", 2500, "Time until resend", "ms");
+    }
+
+    // Send multicast to all servers that haven't ACKed yet
+    public void onClientStart() {
+        if (pids.size() != 0) {
+            long timeout = super.getLong("timeout") + process.getTime();
+            super.scheduleAt(timeout);
+            VSMessage message = new VSMessage();
+            message.setBoolean("isMulticast", true);
+            super.sendMessage(message);
+        }
+    }
+
+    // Handle ACK from a server
+    public void onClientRecv(VSMessage recvMessage) {
+        if (pids.size() != 0 && recvMessage.getBoolean("isAck")) {
+            Integer pid = recvMessage.getIntegerObj("pid");
+            if (pids.contains(pid))
+                pids.remove(pid);
+            super.log("ACK from Process " + pid + " received!");
+            if (pids.size() == 0) {
+                super.log("ACKs from all processes received!");
+                super.removeSchedules();
+            }
+        }
+    }
+
+    // Retry on timeout
+    public void onClientSchedule() { onClientStart(); }
+}
 

-RCM's real code is more sophisticated, but the idea is the same: Ruby happily calls method_missing for unknown methods like hostname and is, and the DSL turns those calls into a value (true/false) that decides whether the rest of the configuration should run.
-
-

Ruby metaprogramming: further reading


+

Project Statistics



-If you want to dive deeper into the ideas behind RCM's DSL, these books are great starting points:
+The original VS-Sim project (August 2008) was written in Java 6 and consisted of:

    -
  • "Metaprogramming Ruby 2" by Paolo Perrotta
  • -
  • "The Well-Grounded Rubyist" by David A. Black (and others)
  • -
  • "Eloquent Ruby" by Russ Olsen
  • +
  • 61 source files across 12 Java packages
  • +
  • Approximately 15,710 lines of code
  • +
  • 2.2 MB of generated Javadoc documentation
  • +
  • 142 KB compiled JAR file
  • +
  • 10 built-in protocols
  • +
  • 163 configurable settings

-They all cover Ruby's object model, blocks, method_missing, and other metaprogramming techniques in much more detail than I can in a single blog post.
+The modernized successor ds-sim (version 1.1.0) has been updated to Java 21 and translated to English:

-

Safety, dry runs, and debugging


+
    +
  • 146 source files (117 main + 29 test) across 19 Java packages
  • +
  • Approximately 27,900 lines of code (22,400 main + 5,500 test)
  • +
  • 12 built-in protocols
  • +
  • 208 unit tests
  • +
  • 269 configurable settings
  • +

+ds-sim source code on Codeberg
+vs-sim source code on Codeberg (original German version, 2008)

-RCM has a --dry mode: it logs what it would do without actually touching the file system. I use this when iterating on new configurations or refactoring existing ones. Combined with the built‑in logging and debug output, it's straightforward to see which resources were scheduled and in which order.
+Other related posts are:

-Because RCM is just Ruby, there's no separate agent protocol or daemon. The same process parses the DSL, resolves dependencies, and performs the actions. If something goes wrong, you can drop into the code, add a quick debug statement, and re‑run your configuration.
+2026-03-01 Loadbars 0.13.0 released
+2022-12-24 (Re)learning Java - My takeaways
+2022-03-06 The release of DTail 4.0.0
+2016-11-20 Object oriented programming with ANSI C

-

RCM vs Puppet and other big tools


+E-Mail your comments to paul@nospam.buetow.org

-RCM does not try to compete with Puppet, Chef, or Ansible on scale. Those tools shine when you manage hundreds or thousands of machines, have multiple teams contributing modules, and need centralised orchestration, reporting, and role‑based access control. They also come with their own DSLs, servers/agents, certificate handling, and a long list of resource types and modules. Ansible may be more similar to RCM than the other tools, but it's still much more complex than RCM.
+Back to the main site
+
+
+
+ + Distributed Systems Simulator - Part 2: Built-in Protocols + + gemini://foo.zone/gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi + 2026-04-01T00:00:00+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the second blog post of the Distributed Systems Simulator series. This part covers all 10 built-in protocols with examples. + +
+

Distributed Systems Simulator - Part 2: Built-in Protocols



-For my personal use cases, that layer is mostly overhead. I want:
+Published at 2026-04-01T00:00:00+03:00

-
    -
  • No extra daemon, message bus, or master node.
  • -
  • No separate DSL to learn besides Ruby itself.
  • -
  • A codebase small enough that I can understand and change all of it in an evening.
  • -
  • Behaviour I can inspect just by reading the Ruby code.
  • -

-In that space RCM wins: it is small, transparent, and tuned for one person (me!) with a handful of personal machines or my Laptops. I still think tools like Puppet are the right choice for larger organisations and shared infrastructure, but RCM gives me a tiny, focused alternative for my own systems.
+This is the second blog post of the Distributed Systems Simulator series. This part covers all 10 built-in protocols with examples.

-

Cutting RCM 0.1.0


+ds-sim on Codeberg (modernized, English-translated version)

-As of this post I'm tagging and releasing **RCM 0.1.0**. About 99% of the code has been written by me so far, and before AI agents take over more of the boilerplate and wiring work, it felt like a good moment to cut a release and mark this mostly‑human baseline.
+These are all the posts of this series:

-Future changes will very likely involve more automated help, but 0.1.0 is the snapshot of the original, hand‑crafted version of the tool.
+2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI
+2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols (You are currently reading this)
+2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API

-

What's next


+Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit.

-RCM already does what I need on my machines, but there are a few ideas I want to explore:
+

Table of Contents




-

Feature overview (for now)


+

Protocols and Examples



-Here is a quick overview of what RCM can do today, grouped by area:
+The simulator comes with 10 built-in protocols. As described earlier, protocols are distinguished between server-side and client-side. Servers can respond to client messages, and clients can respond to server messages. Each process can support any number of protocols on both the client and server side. Users can also implement their own protocols using the simulator's Protocol API (see the Protocol API section).

-
    -
  • File management: file '/path', manage directory, from template, line '...'
  • -
  • Packages: package 'name' resources for installing and updating packages (currently focused on Fedora/DNF)
  • -
  • Conditions and flow: given { ... } blocks, predicates such as hostname is :earth
  • -
  • Notifications and dependencies: requires between resources, notify for follow‑up actions
  • -
  • Safety and execution modes: backups in .rcmbackup/, --dry runs, debug logging
  • -

-Some small examples adapted from RCM's own tests:
+The program directory contains a saved-simulations folder with example simulations for each protocol as serialized .dat files.

-

Template rendering into a file


+

Dummy Protocol



- -
configure do
-  file './.file_example.rcmtmp' do
-    from template
-    'One plus two is <%= 1 + 2 %>!'
-  end
-end
+The Dummy Protocol serves only as a template for creating custom protocols. When using the Dummy Protocol, only log messages are output when events occur. No further actions are performed.
+
+

Ping-Pong Protocol


+
+Visualization: The Ping-Pong Protocol showing two processes (P1 and P2) exchanging messages in a continuous back-and-forth pattern. Blue lines represent delivered messages bouncing between the process bars over a 15-second simulation.
+
+In the Ping-Pong Protocol, two processes -- Client P1 and Server P2 -- constantly send messages back and forth. The Ping-Pong client starts the first request, to which the server responds to the client. The client then responds again, and so on. Each message includes a counter that is incremented at each station and logged in the log window.
+
+
+Programmed Ping-Pong Events:
+
+| Time (ms) | PID | Event                          |
+|-----------|-----|--------------------------------|
+| 0         | 1   | Ping-Pong Client activate      |
+| 0         | 2   | Ping-Pong Server activate      |
+| 0         | 1   | Ping-Pong Client request start |
 

-

Ensuring a line is absent from a file


+It is important that Process 1 activates its Ping-Pong client before starting a Ping-Pong client request. Before a process can start a request, it must have the corresponding protocol activated. This also applies to all other protocols.

- -
configure do
-  file './.file_example.rcmtmp' do
-    line 'Whats up?'
-    is absent
-  end
-end
-
+**Ping-Pong Storm Variant**

-

Guarding a configuration run on the current hostname


+Visualization: The Ping-Pong Storm variant with three processes. P1 is the client, P2 and P3 are both servers. The visualization shows an exponentially growing number of messages as each client message generates two server responses, creating a dense web of blue and green message lines.

- -
configure do
-  given { hostname Socket.gethostname }
-  ...
-end
+By adding a third process P3 as an additional Ping-Pong server, a Ping-Pong "Storm" can be realized. Since every client message now receives two server responses, the number of messages doubles with each round, creating an exponential message flood.
+
+
+Programmed Ping-Pong Storm Events:
+
+| Time (ms) | PID | Event                          |
+|-----------|-----|--------------------------------|
+| 0         | 1   | Ping-Pong Client activate      |
+| 0         | 2   | Ping-Pong Server activate      |
+| 0         | 3   | Ping-Pong Server activate      |
+| 0         | 1   | Ping-Pong Client request start |
 

-

Creating and deleting directories, and purging a directory tree


+

Broadcast Protocol



- -
configure do
-  directory './.directory_example.rcmtmp' do
-    is present
-  end
+Visualization: The Broadcast Protocol with 6 processes (P1-P6). Dense crossing message lines show how a broadcast from P1 propagates to all processes, with each process re-broadcasting to others. Blue lines indicate delivered messages, green lines indicate messages still in transit.
+
+The Broadcast Protocol behaves similarly to the Ping-Pong Protocol. The difference is that the protocol tracks -- using a unique Broadcast ID -- which messages have already been sent. Each process re-broadcasts all received messages to others, provided it has not already sent them.
+
+In this case, no distinction is made between client and server, so that the same action is performed when a message arrives at either side. This makes it possible, using multiple processes, to create a broadcast. P1 is the client and starts a request at 0ms and 2500ms. The simulation duration is exactly 5000ms. Since a client can only receive server messages and a server can only receive client messages, every process in this simulation is both server and client.
+
+
+Programmed Broadcast Events:
 
-  directory delete do
-    path './.directory_example.rcmtmp'
-    is absent
-  end
-end
+| Time (ms) | PID | Event                            |
+|-----------|-----|----------------------------------|
+| 0         | 1-6 | Broadcast Client activate        |
+| 0         | 1-6 | Broadcast Server activate        |
+| 0         | 1   | Broadcast Client request start   |
+| 2500      | 1   | Broadcast Client request start   |
 

-

Managing file and directory modes and ownership


+

Internal Synchronization Protocol



- -
configure do
-  touch './.mode_example.rcmtmp' do
-    mode 0o600
-  end
-
-  directory './.mode_example_dir.rcmtmp' do
-    mode 0o705
-  end
-end
+Visualization: Internal Synchronization with 2 processes. P1 (client, clock drift 0.1) shows a faster-running clock reaching 15976ms by simulation end. The blue message lines show P1 periodically synchronizing with P2 (server, no drift), with the time corrections visible as slight adjustments in P1's timeline.
+
+The Internal Synchronization Protocol is used for synchronizing the local process time, which can be applied when a process time is running incorrectly due to clock drift. When the client wants to synchronize its (incorrect) local process time t_c with a server, it sends a client request. The server responds with its own local process time t_s, allowing the client to calculate a new, more accurate time for itself.
+
+After receiving the server response, the client P1 calculates its new local process time as:
+
+
+t_c := t_s + 1/2 * (t'_min + t'_max)
 

-

Using a chained, more natural language style for notifications


+This synchronizes P1's local time with an error of less than 1/2 * (t'_max - t'_min), where t'_min and t'_max are the assumed minimum and maximum transmission times configured in the protocol settings.

-This will just print out something, not changing anything:
+In the example, the client process has a clock drift of 0.1 and the server has 0.0. The client starts a request at local process times 0ms, 5000ms, and 10000ms. By simulation end, P1's time is synchronized to 15976ms (an error of -976ms from the global 15000ms).

- -
configure do
-  notify hello dear world do
-    thank you to be part of you
-  end
-end
+
+Programmed Internal Sync Events:
+
+| Time (ms) | PID | Event                              |
+|-----------|-----|------------------------------------|
+| 0         | 1   | Internal Sync Client activate      |
+| 0         | 2   | Internal Sync Server activate      |
+| 0         | 1   | Internal Sync Client request start |
+| 5000      | 1   | Internal Sync Client request start |
+| 10000     | 1   | Internal Sync Client request start |
 

-

Touching files and updating their timestamps


+Protocol variables (client-side):

- -
configure do
-  touch './.touch_example.rcmtmp'
-end
+
    +
  • Min. transmission time (Long: 500): The assumed t'_min in milliseconds
  • +
  • Max. transmission time (Long: 2000): The assumed t'_max in milliseconds
  • +

+These can differ from the actual message transmission times t_min and t_max, allowing simulation of scenarios where the protocol is misconfigured and large synchronization errors occur.
+
+

Christian's Method (External Synchronization)


+
+Visualization: Comparison of Internal Synchronization (P1) and Christian's Method (P3) with P2 as shared server. Both P1 and P3 have clock drift 0.1. The visualization shows P1 synchronized to 14567ms (error: -433ms) while P3 synchronized to 15539ms (error: -539ms), demonstrating the different accuracy of the two methods.
+
+Christian's Method uses the RTT (Round Trip Time) to approximate the transmission time of individual messages. When the client wants to synchronize its local time t_c with a server, it sends a request and measures the RTT t_rtt until the server response arrives. The server response contains the local process time t_s from the moment the server sent the response. The client then calculates its new local time as:
+
+
+t_c := t_s + 1/2 * t_rtt
 

-

Expressing dependencies between notifications


+The accuracy is +/- (1/2 * t_rtt - u_min) where u_min is a lower bound for message transmission time.

- -
configure do
-  notify foo do
-    requires notify bar and requires notify baz
-    'foo_message'
-  end
-
-  notify bar
+The visualization compares both synchronization methods side by side: P1 uses Internal Synchronization and P3 uses Christian's Method, with P2 serving both. Both P1 and P3 have clock drift 0.1. In this particular run, Internal Synchronization achieved a better result (-433ms error vs. -539ms), though results vary between runs due to random transmission times.
+
+
+Programmed Comparison Events:
 
-  notify baz do
-    requires notify bar
-    'baz_message'
-  end
-end
+| Time (ms) | PID | Event                                |
+|-----------|-----|--------------------------------------|
+| 0         | 1   | Internal Sync Client activate        |
+| 0         | 1   | Internal Sync Client request start   |
+| 0         | 2   | Christian's Server activate          |
+| 0         | 2   | Internal Sync Server activate        |
+| 0         | 3   | Christian's Client activate          |
+| 0         | 3   | Christian's Client request start     |
+| 5000      | 1   | Internal Sync Client request start   |
+| 5000      | 3   | Christian's Client request start     |
+| 10000     | 1   | Internal Sync Client request start   |
+| 10000     | 3   | Christian's Client request start     |
 

-
+

Berkeley Algorithm



- -
configure do
-  symlink './.symlink_example.rcmtmp' do
-    manage directory
-    './.symlink_target_example.rcmtmp'
-  end
-end
+Visualization: The Berkeley Algorithm with 3 processes. P2 is the server (coordinator) sending time requests to clients P1 and P3. After collecting responses, P2 calculates correction values and sends them back. Final times show P1=16823ms, P2=14434ms, P3=13892ms -- all brought closer together through averaging.
+
+The Berkeley Algorithm is another method for synchronizing local clocks. This is the first protocol where the server initiates the requests. The server acts as a coordinator. The client processes are passive and must wait until a server request arrives. The server must know which client processes participate in the protocol, which is configured in the server's protocol settings.
+
+When the server wants to synchronize its local time t_s and the process times t_i of the clients (i = 1,...,n), it sends a server request. n is the number of participating clients. The clients then send their local process times back to the server. The server measures the RTTs r_i for all client responses.
+
+After all responses are received, the server sets its own time to the average t_avg of all known process times (including its own). The transmission time of a client response is estimated as half the RTT:
+
+
+t_avg := 1/(n+1) * (t_s + SUM(r_i/2 + t_i))
+t_s := t_avg
 

-

Detecting duplicate resource definitions at configure time


+The server then calculates a correction value k_i := t_avg - t_i for each client and sends it back. Each client sets its new time to t'_i := t'_i + k_i.

- -
configure do
-  notify :foo
-  notify :foo # raises RCM::DSL::DuplicateResource
-end
+
+Programmed Berkeley Events:
+
+| Time (ms) | PID | Event                             |
+|-----------|-----|-----------------------------------|
+| 0         | 1   | Berkeley Client activate          |
+| 0         | 2   | Berkeley Server activate          |
+| 0         | 3   | Berkeley Client activate          |
+| 0         | 2   | Berkeley Server request start     |
+| 7500      | 2   | Berkeley Server request start     |
 

-If you find RCM interesting, feel free to browse the code, adapt it to your own setup, or just steal ideas for your own Ruby DSLs. I will probably extend it with more features over time as my own needs evolve.
+Protocol variables (server-side):

-E-Mail your comments to paul@nospam.buetow.org :-)
+
    +
  • PIDs of participating processes (Integer[]: [1,3]): The PIDs of the Berkeley client processes. The protocol will not work if a non-existent PID is specified or if the process does not support the Berkeley protocol on the client side.
  • +

+

One-Phase Commit Protocol



-Other related posts:
+Visualization: The One-Phase Commit Protocol with 3 processes. P1 crashes at 1000ms (shown in red) and recovers at 5000ms. P2 (server) periodically sends commit requests. The red lines show lost messages during P1's crash period, while blue lines show successful message exchanges after recovery.

-2026-03-02 RCM: The Ruby Configuration Management DSL (You are currently reading this)
-2025-10-11 Key Takeaways from The Well-Grounded Rubyist
-2021-07-04 The Well-Grounded Rubyist
-2016-04-09 Jails and ZFS with Puppet on FreeBSD
+The One-Phase Commit Protocol is designed to move any number of clients to a commit. In practice, this could be creating or deleting a file that each client has a local copy of. The server is the coordinator and initiates the commit request. The server periodically resends the commit request until every client has acknowledged it. For this purpose, the PIDs of all participating client processes and a timer for resending must be configured.

-Back to the main site
-
-
-
- - Site Reliability Engineering - Part 5: System Design, Incidents, and Learning - - gemini://foo.zone/gemfeed/2026-03-01-site-reliability-engineering-part-5.gmi - 2026-03-01T12:00:00+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Welcome to Part 5 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I'm here to share what SRE is all about in this blog series. - -
-

Site Reliability Engineering - Part 5: System Design, Incidents, and Learning


+In the example, P1 and P3 are clients and P2 is the server. P1 crashes at 1000ms and recovers at 5000ms. The first two commit requests fail to reach P1 due to its crash. Only the third attempt succeeds. Each client acknowledges a commit request only once.

-Published at 2026-03-01T12:00:00+02:00
+
+Programmed One-Phase Commit Events:
+
+| Time (ms) | PID | Event                                  |
+|-----------|-----|----------------------------------------|
+| 0         | 1   | 1-Phase Commit Client activate         |
+| 0         | 2   | 1-Phase Commit Server activate         |
+| 0         | 3   | 1-Phase Commit Client activate         |
+| 0         | 2   | 1-Phase Commit Server request start    |
+| 1000      | 1   | Process crash                          |
+| 5000      | 1   | Process revival                        |
+

-Welcome to Part 5 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I'm here to share what SRE is all about in this blog series.
+Protocol variables (server-side):

-2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
-2023-11-19 Site Reliability Engineering - Part 2: Operational Balance
-2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture
-2024-09-07 Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers
-2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning (You are currently reading this)
+
    +
  • Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the commit request
  • +
  • PIDs of participating processes (Integer[]: pids = [1,3]): The client process PIDs that should commit
  • +

+

Two-Phase Commit Protocol


+
+Visualization: The Two-Phase Commit Protocol with 3 processes. P2 (server) orchestrates a two-phase voting process with clients P1 and P3. The complex message pattern shows the voting phase followed by the commit/abort phase, with messages crossing between all three processes over a 10-second simulation.
+
+The Two-Phase Commit Protocol is an extension of the One-Phase Commit Protocol. The server first sends a request to all participating clients asking whether they want to commit. Each client responds with true or false. The server periodically retries until all results are collected. After receiving all votes, the server checks whether all clients voted true. If at least one client voted false, the commit process is aborted and a global result of false is sent to all clients. If all voted true, the global result true is sent. The global result is periodically resent until each client acknowledges receipt.
+
+In the example, P1 and P3 are clients and P2 is the server. The server sends its first request at 0ms. Here both P1 and P3 vote true, so the commit proceeds.

-    ___
-   /   \     resilience
-  |  o  |  <----------  learning
-   \___/
+Programmed Two-Phase Commit Events:
+
+| Time (ms) | PID | Event                                  |
+|-----------|-----|----------------------------------------|
+| 0         | 1   | 2-Phase Commit Client activate         |
+| 0         | 2   | 2-Phase Commit Server activate         |
+| 0         | 3   | 2-Phase Commit Client activate         |
+| 0         | 2   | 2-Phase Commit Server request start    |
 

-This time I want to share some themes that build on what we've already covered: how system design and incident analysis fit together, why observability should not be an afterthought, and how a design‑improvement loop keeps systems getting better. Let's dive in!
+Example log extract showing the two-phase voting process:

-

Table of Contents


+
+000000ms: PID 2: Message sent; ID: 94; Protocol: 2-Phase Commit
+                  Boolean: wantVote=true
+000905ms: PID 3: Message received; ID: 94; Protocol: 2-Phase Commit
+000905ms: PID 3: Message sent; ID: 95; Protocol: 2-Phase Commit
+                  Integer: pid=3; Boolean: isVote=true; vote=true
+000905ms: PID 3: Vote true sent
+001880ms: PID 2: Message received; ID: 95; Protocol: 2-Phase Commit
+001880ms: PID 2: Vote from Process 3 received! Result: true
+001947ms: PID 1: Message received; ID: 94; Protocol: 2-Phase Commit
+001947ms: PID 1: Vote true sent
+003137ms: PID 2: Votes from all participating processes received!
+                  Global result: true
+003137ms: PID 2: Message sent; ID: 99; Protocol: 2-Phase Commit
+                  Boolean: isVoteResult=true; voteResult=true
+004124ms: PID 1: Global vote result received. Result: true
+006051ms: PID 2: All participants have acknowledged the vote
+010000ms: Simulation ended
+
+
+Protocol variables (server-side):


-

System Design and Incident Analysis


+Protocol variables (client-side):

-In my experience, a big chunk of SRE work revolves around system design and incident analysis. The thing that really matters is whether your system can contain cascading failures—because if it can't, one bad component can take everything down.
+
    +
  • Commit probability (Integer: ackProb = 50): The probability in percent that the client votes true (for commit)
  • +

+

Basic Multicast Protocol



-

Resilience and cascading failures


+Visualization: The Basic Multicast Protocol with 3 processes. P2 (client) sends periodic multicast messages to servers P1 and P3. P3 crashes at 3000ms (shown in red) and recovers at 6000ms. Red lines indicate lost messages, blue lines show delivered messages. Some messages to P1 are also lost due to the 30% message loss probability.

-What I've seen work well is thinking about resilience early—at design time, not after the first outage. You look for the weak points, address them before production, and try to keep the blast radius small when (not if) something fails.
+The Basic Multicast Protocol is very simple. The client always initiates the request, which represents a simple multicast message. The Basic Multicast servers serve only to receive the message. No acknowledgments are sent. The client P2 sends a multicast message every 2500ms to servers P1 and P3.

-

Learning from incidents


+P1 can only receive multicast messages after 2500ms because it does not support the protocol before then. P3 is crashed from 3000ms to 6000ms and also cannot receive messages during that time. Each process has a 30% message loss probability, so some messages are lost in transit (shown in red).

-When incidents do happen, their analysis is a goldmine. Every incident exposes gaps—whether in tooling (ops tools that aren't up to the job) or in skills (engineers missing critical know-how). Blaming "human error" doesn't help. The job is to dig into root causes and fix the system. Postmortems that focus on customer impact help us distil lessons and make the system more robust so we're less likely to repeat the same failure.
+In this example, the 3rd multicast message to P3 and the 5th and 6th messages to P1 were lost. Only the 4th multicast message reached both destinations.

-System design and incident analysis form a feedback loop: we improve the design based on what we learn from incidents, and a better design reduces the impact of the next one.
+
+Programmed Basic Multicast Events:
+
+| Time (ms) | PID | Event                                  |
+|-----------|-----|----------------------------------------|
+| 0         | 2   | Basic Multicast Client activate        |
+| 0         | 3   | Basic Multicast Server activate        |
+| 0         | 2   | Basic Multicast Client request start   |
+| 2500      | 1   | Basic Multicast Server activate        |
+| 2500      | 2   | Basic Multicast Client request start   |
+| 3000      | 3   | Process crash                          |
+| 5000      | 2   | Basic Multicast Client request start   |
+| 6000      | 3   | Process revival                        |
+| 7500      | 2   | Basic Multicast Client request start   |
+| 10000     | 2   | Basic Multicast Client request start   |
+| 12500     | 2   | Basic Multicast Client request start   |
+

-

Observability: Don't leave it for when it's too late


+

Reliable Multicast Protocol



-Here's something I've seen over and over: teams agree that "we need better observability" when they're already in the middle of an incident—and by then it's too late. Observability is always an afterthought compared to product features. But you really need it in place before things go wrong. Tools that can query high-cardinality data and give you granular insight into what's happening—that's what saves you when chaos hits. So invest in it early. Trust me on this one.
+Visualization: The Reliable Multicast Protocol with 3 processes. P2 (client) sends multicast messages to servers P1 and P3, retrying until acknowledgments are received from all servers. P3 crashes at 3000ms and recovers at 10000ms. Red lines show lost messages, blue lines show delivered ones. Despite failures, all servers eventually receive and acknowledge the multicast.

-

The iterative spirit


+In the Reliable Multicast Protocol, the client periodically resends its multicast message until it has received an acknowledgment from all participating servers. After each retry, the client "forgets" which servers have already acknowledged, so each new attempt must be acknowledged again by all participants.

-We also accept that system design is never "done." We refine it based on real-world performance, incident learnings, and changing needs. Every incident is a chance to learn and improve; the emphasis is on learning, not blame. SREs work with developers, backend teams, and incident response so that the whole system keeps getting better. It's never perfect, but that's kind of the point.
+In the example, P2 is the client and P1 and P3 are the servers. At 0ms, the client initiates its multicast message. The message loss probability is set to 30% on all processes. The client needs exactly 5 attempts until successful delivery:

-

Book tips


+
    +
  • Attempt 1: P1 doesn't support the protocol yet. P3 receives the message but its ACK is lost.
  • +
  • Attempt 2: The message to P1 is lost. P3 receives it but is crashed and can't process it.
  • +
  • Attempt 3: P1 receives the message and ACKs successfully. The message to P3 is lost.
  • +
  • Attempt 4: P1 receives and ACKs again. P3 receives it but is still crashed.
  • +
  • Attempt 5: Both P1 and P3 receive the message and ACK successfully.
  • +

+
+Programmed Reliable Multicast Events:
+
+| Time (ms) | PID | Event                                    |
+|-----------|-----|------------------------------------------|
+| 0         | 3   | Reliable Multicast Server activate       |
+| 0         | 2   | Reliable Multicast Client activate       |
+| 0         | 2   | Reliable Multicast Client request start  |
+| 2500      | 1   | Reliable Multicast Server activate       |
+| 3000      | 3   | Process crash                            |
+| 10000     | 3   | Process revival                          |
+

-If you want to go deeper, here are a few books I can recommend:
+Example log extract:
+
+
+000000ms: PID 2: Reliable Multicast Client activated
+000000ms: PID 2: Message sent; ID: 280; Protocol: Reliable Multicast
+                  Boolean: isMulticast=true
+000000ms: PID 3: Reliable Multicast Server activated
+001590ms: PID 3: Message received; ID: 280; Protocol: Reliable Multicast
+001590ms: PID 3: ACK sent
+002500ms: PID 1: Reliable Multicast Server activated
+002500ms: PID 2: Message sent; ID: 282; Protocol: Reliable Multicast
+                  Boolean: isMulticast=true
+003000ms: PID 3: Crashed
+005000ms: PID 2: Message sent; ID: 283; Protocol: Reliable Multicast
+005952ms: PID 1: Message received; ID: 283
+005952ms: PID 1: ACK sent
+007937ms: PID 2: ACK from Process 1 received!
+...
+011813ms: PID 2: ACK from Process 3 received!
+011813ms: PID 2: ACKs from all participating processes received!
+015000ms: Simulation ended
+
+
+Protocol variables (server-side):

    -
  • 97 Things Every SRE Should Know: Collective Wisdom from the Experts by Emily Stolarsky and Jaime Woo
  • -
  • Site Reliability Engineering: How Google Runs Production Systems by Jennifer Petoff, Niall Murphy, Betsy Beyer, and Chris Jones
  • -
  • Implementing Service Level Objectives by Alex Hidalgo
  • +
  • Time until resend (Long: timeout = 2500): Milliseconds to wait before resending the multicast
  • +
  • PIDs of participating processes (Integer[]: pids = [1,3]): Server PIDs that should receive the multicast

-E-Mail your comments to paul@nospam.buetow.org :-)
+Read the next post of this series:
+
+Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API
+
+Other related posts are:
+
+2026-03-01 Loadbars 0.13.0 released
+2022-12-24 (Re)learning Java - My takeaways
+2022-03-06 The release of DTail 4.0.0
+2016-11-20 Object oriented programming with ANSI C
+
+E-Mail your comments to paul@nospam.buetow.org

Back to the main site
- Loadbars 0.13.0 released - - gemini://foo.zone/gemfeed/2026-03-01-loadbars-0.13.0-released.gmi - 2026-03-01T00:00:00+02:00 + Distributed Systems Simulator - Part 1: Introduction and GUI + + gemini://foo.zone/gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi + 2026-03-31T00:00:00+03:00 Paul Buetow aka snonux paul@dev.buetow.org - Loadbars is a real-time server load monitoring tool. It connects to one or more Linux hosts via SSH and shows CPU, memory, network, load average, and disk I/O as vertical colored bars in an SDL window. You can run it locally or point it at your servers and see what's happening right now — like `top` or `vmstat`, but visual and across multiple hosts at once. + This is the first blog post of the Distributed Systems Simulator series, written for the recent v1.1.0 release. It explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment.
-

Loadbars 0.13.0 released


+

Distributed Systems Simulator - Part 1: Introduction and GUI



-Published at 2026-03-01T00:00:00+02:00
+Published at 2026-03-31T00:00:00+03:00

-Loadbars is a real-time server load monitoring tool. It connects to one or more Linux hosts via SSH and shows CPU, memory, network, load average, and disk I/O as vertical colored bars in an SDL window. You can run it locally or point it at your servers and see what's happening right now — like top or vmstat, but visual and across multiple hosts at once.
+This is the first blog post of the Distributed Systems Simulator series, written for the recent v1.1.0 release. It explores the Java-based Distributed Systems Simulator program I created as my diploma thesis at the Aachen University of Applied Sciences (August 2008). The simulator offers both built-in implementations of common distributed systems algorithms and an extensible framework that allows researchers and practitioners to implement and test their own custom protocols within the simulation environment.

-Loadbars in action
+ds-sim on Codeberg (modernized, English-translated version)

-Loadbars can connect to hundreds of servers in parallel; the GIF above doesn't do it justice — at scale you get a wall of bars that makes it easy to spot outliers and compare hosts at a glance.
+These are all the posts of this series:

-Loadbars on Codeberg
+2026-03-31 Distributed Systems Simulator - Part 1: Introduction and GUI (You are currently reading this)
+2026-04-01 Distributed Systems Simulator - Part 2: Built-in Protocols
+2026-04-02 Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API
+
+Screenshot: The Distributed Systems Simulator running a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit.

Table of Contents




-

What Loadbars is (and isn't)


+

Motivation



-Loadbars shows the current state only. It is not a tool for collecting loads and drawing graphs for later analysis. There is no history, no recording, no database. Tools like Prometheus or Grafana require significant setup before producing results. Loadbars lets you observe the current state immediately: one binary, SSH (or local), and you're done.
+Distributed systems are complex—interactions between nodes, network partitions, failure scenarios are hard to debug in production. A simulator lets you experiment with architectures, observe how systems behave under failure, and learn consensus algorithms, replication strategies, and fault tolerance in a controlled, repeatable environment. No operational overhead, no real infrastructure—just focused exploration of system design.

-
-┌─ Loadbars 0.13.0 ─────────────────────────────────────────┐
-│                                                           │
-│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
-│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
-│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
-│   CPU   cpu0  cpu1  mem  CPU   cpu0  cpu1  mem  net   net │
-│  └──── host1 ────┘      └──── host2 ────┘                 │
-└───────────────────────────────────────────────────────────┘
-
-
-

Use cases


-
-
    -
  • Deployments and rollouts: watch CPU, memory, and network across app servers or nodes while you deploy. Spot the one that isn't coming up or is stuck under load.
  • -
  • Load testing: run your load tool against a cluster and see which hosts (or cores) are saturated, whether memory or disk I/O is the bottleneck, and how load spreads.
  • -
  • Quick health sweep: no dashboards set up yet? SSH to a handful of hosts and run Loadbars. You get an instant picture of who's busy, who's idle, and who's swapping.
  • -
  • Comparing hosts: side-by-side bars make it easy to see if one machine is hotter than the rest (e.g. after a config change or migration).
  • -
  • Local tuning: run loadbars --hosts localhost while you benchmark or stress a single box; the bars and load-average view help correlate activity with what you're doing.
  • -

-

What's new since the Perl version


+In the literature, one can find many different definitions of a distributed system. Many of these definitions differ from each other, making it difficult to find a single definition that stands alone as the correct one. Andrew Tanenbaum and Maarten van Steen chose the following loose characterization for describing a distributed system:

-The original Loadbars (Perl + SDL, ~2010–2013) had CPU, memory, network, ClusterSSH, and a config file. The Go rewrite and subsequent releases added the following. Why each one matters:
+"A distributed system is a collection of independent computers that appears to its users as a single coherent system" - Andrew Tanenbaum

-
    -
  • Load average bars: the Perl version had no load average. Now you get 1/5/15-minute load per host. Useful because load average is the classic "how queued is this box" signal — you see saturation and trends at a glance without reading numbers.
  • -

-
    -
  • Disk I/O bars: disk was invisible in the Perl version. You now get read/write throughput (and optionally utilization %) per host or per device. Whole-disk devices only (partitions, loop, ram, zram, and device-mapper are excluded). Useful when you need to tell "is this slow because of CPU or because of disk?" — especially with many hosts, one disk-heavy host stands out. Disk smoothing (config diskaverage, hotkeys b/x) lets you tune how much the bars are averaged.
  • -

-
    -
  • Extended peak line on CPU: a 1px line shows max system+user over the last N samples. Useful to see short spikes that the stacked bar might smooth out, so you don't miss bursty load.
  • -

-
    -
  • Tooltips and host highlight: hover the mouse over any bar to see a tooltip with exact values (CPU %, memory, network, load, or disk depending on bar type). The hovered host's bars are highlighted (inverted) so you can tell which host you're over. Useful when you have hundreds of bars and want to read a specific number or confirm which host a bar belongs to.
  • -

-
    -
  • GuestNice in CPU bars: CPU bars now show GuestNice as a lime green segment (above Nice). One more breakdown for virtualized or container workloads.
  • -

-
    -
  • Version in window title: the default SDL title is "Loadbars <version> (press h for help on stdout)". Override with --title when you need a custom label.
  • -

-
    -
  • Global average CPU line (key g): a single red line across all hosts at the fleet-average CPU. Useful when you have hundreds of bars: you instantly see which hosts are above or below average without comparing bar heights in your head.
  • -

-
    -
  • Global I/O average line (key i): same idea for iowait+IRQ. Useful to spot which hosts are waiting on I/O more than the rest — quick way to find the disk-bound or interrupt-heavy machines.
  • -

-
    -
  • Host separator lines (key s): a thin red vertical line between each host's bars. Useful at scale so you don't lose track of where one host ends and the next begins when the window is full of bars.
  • -

-
    -
  • Scale reset (key r): reset the auto-scale for load and disk back to the floor. Useful after a big spike so the bars don't stay compressed for the rest of the session.
  • -

-
    -
  • Toggle CPU off (key 1 cycles through aggregate → per-core → off): the Perl version didn't let you turn CPU bars off. Useful when you want to focus only on memory, network, load, or disk and reduce clutter.
  • -

-
    -
  • maxbarsperrow: wrap bars into multiple rows instead of one long row. Useful with many hosts so the window doesn't become impossibly wide; you get a grid and can still scan everything.
  • -

-
    -
  • maxwidth: cap on window width in pixels (default 1900). Stops the window growing unbounded with many hosts; use together with maxbarsperrow for a predictable layout.
  • -

-
    -
  • Startup visibility flags: --showmem, --shownet, --showload, --extended, --cpumode, --diskmode (and friends) let you start with the bars you care about already on. Useful so you don't have to press 2, 3, 4, 5 every time.
  • -

-
    -
  • Window title (--title): set the SDL window title. Useful when you run several Loadbars windows (e.g. one per cluster or environment) and need to tell them apart in your taskbar or window list.
  • -

-
    -
  • SSH options (--sshopts): pass extra flags to ssh (e.g. ConnectTimeout, ProxyJump). Useful on locked-down or jump-host setups so Loadbars works without changing your global SSH config for a one-off session.
  • -

-
    -
  • hasagent: skip extra SSH agent checks when you know the key is already loaded. Useful to avoid startup delay or warnings when you've already run ssh-add and are monitoring many hosts.
  • -

-
    -
  • Config file covers every option: any flag from --help can be set in ~/.loadbarsrc (no leading --). Perl had a config but the Go version supports the full set. Useful for reproducible setups and sharing.
  • -

-
    -
  • Positional host arguments: you can run loadbars server1 server2 without --hosts. Convenience when you only have a few hosts.
  • -

-
    -
  • macOS as client: run the Loadbars binary on a Mac and connect to Linux servers via SSH. The Perl version was Linux-only. Useful to watch production from a laptop without a Linux VM or second machine.
  • -

-
    -
  • Single static binary: no Perl runtime, no SDL Perl modules, no CPAN. Useful for deployment — copy one file to a jump host or new machine and run it.
  • -

-
    -
  • Unit tests: mage test (or go test). The Go version has proper tests; useful for development and catching regressions.
  • -

-
    -
  • Window resize (arrow keys): resize the window with the keyboard (left/right = width, up/down = height). Useful to fit more or fewer bars on screen without touching the mouse. (The Perl version had mouse-based resize; Go uses arrow keys.)
  • -

-
    -
  • Hundreds of hosts in parallel: the Go implementation connects to all hosts concurrently and keeps polling without blocking. The Perl version struggled with many hosts. Useful for large fleets; you get a real "wall of bars" instead of a subset.
  • -

-

Core features


+The user only needs to interact with the local computer in front of them, while the software of the local computer ensures smooth communication with the other participating computers in the distributed system.

-

Load average bars


+This thesis aims to make distributed systems easier to understand from a different angle. Instead of the end-user perspective, it focuses on the functional methods of protocols and their processes, making all relevant events of a distributed system transparent.

-Press 4 or l to toggle. Each host gets a bar: teal fill (1-min load), yellow 1px line (5-min), white 1px line (15-min). Scale: auto (floor 2.0) or fixed with --loadmax N. Press r to reset auto-scale.
+To achieve this, I developed a simulator, particularly for teaching and learning at the University of Applied Sciences Aachen. Protocols from distributed systems with their most important influencing factors can be replicated through simulations. At the same time, there's room for personal experiments—no restriction to a fixed number of protocols. Users can design their own.

-

Disk I/O bars


+The original simulator (VS-Sim) was written in Java 6 in 2008 with a German-language UI. In 2025, I revamped and modernized it as ds-sim: translated the entire codebase and UI from German to English, migrated the build system from hand-rolled Ant scripts to Maven, upgraded from Java 6 to Java 21 (adopting sealed class hierarchies, record types, formatted strings, pattern matching), introduced a proper exception hierarchy and consistent error handling, added comprehensive Javadoc documentation, implemented a headless testing framework (208 unit tests covering core components, the event system, and all protocol implementations), reorganized the project structure to follow standard Maven conventions, and added architecture documentation. Total: 199 files, over 15,000 lines of new code. Back in 2008, I wrote every line by hand in Vim. For the 2025 modernization, Claude Code did most of the heavy lifting—translation, refactoring, test generation, documentation. Times have changed.

-Press 5 to toggle: aggregate (all whole-disk devices per host) → per-device → off. Partitions, loop, ram, zram, and device-mapper are excluded. Purple fill from top = read, darker purple from bottom = write. Extended mode (e) adds a 3px disk-utilization line. Config: diskmode, diskmax, diskaverage. b/x change disk average samples.
+

Installation



-

Global reference lines and options


+The modernized ds-sim requires Java 21 or higher and Maven 3.8 or higher.

-g: global average CPU line (1px red). i: global I/O average line (1px pink). s: host separator lines (1px red). Other options: --maxbarsperrow N, --title, --sshopts, --hasagent. Hotkeys m/n mirror 2/3 for memory and network. Hover over a bar for a tooltip with exact values and host highlight.
+
+# Clone the repository
+git clone https://codeberg.org/snonux/ds-sim.git
+cd ds-sim
+
+# Set JAVA_HOME if needed (e.g. on Fedora Linux)
+export JAVA_HOME=/usr/lib/jvm/java-21-openjdk
+
+# Build the project
+mvn clean package
+
+# Run the simulator
+java -jar target/ds-sim-*.jar
+

-

CPU monitoring


+For a faster development build without running tests:

-CPU usage as vertical stacked bars: System (blue), User (yellow), Nice (green), GuestNice (lime green), Idle (black), IOwait (purple), IRQ/SoftIRQ (white), Guest/Steal (red). Press 1 for aggregate vs. per-core. Press e for extended mode (1px peak line: max system+user over last N samples).
+
+mvn package -DskipTests
+

-

Memory and network


+After building, the following artifacts are available in the target/ directory:

    -
  • 2 / m: memory — left half RAM (dark grey/black), right half Swap (grey/black) per host
  • -
  • 3 / n: network — RX (top, light green) and TX (bottom) summed over non-loopback interfaces. Red bar = no non-lo interface. Use --netlink or f/v for link speed (utilization %). Default gbit.
  • +
  • ds-sim-1.1.0.jar - Executable JAR with all dependencies bundled
  • +
  • original-ds-sim-1.1.0.jar - JAR without dependencies

-

All hotkeys


+The project also includes 208 unit tests that can be run with mvn test. Example simulation files for all built-in protocols are included in the saved-simulations/ directory.

-
-Key     Action
-─────   ──────────────────────────────────────────────────
-1       Toggle CPU (aggregate / per-core / off)
-2 / m   Toggle memory bars
-3 / n   Toggle network bars
-4 / l   Toggle load average bars
-5       Toggle disk I/O (aggregate / per-device / off)
-r       Reset load and disk auto-scale peaks
-e       Toggle extended (peak line on CPU; disk util line)
-g       Toggle global average CPU line
-i       Toggle global I/O average line
-s       Toggle host separator lines
-h       Print hotkey list to stdout
-q       Quit
-w       Write current settings to ~/.loadbarsrc
-a / y   CPU average samples up / down
-d / c   Net average samples up / down
-b / x   Disk average samples up / down
-f / v   Link scale up / down
-Arrows  Resize window
-
+ds-sim source code on Codeberg

-

SSH and config


+

Fundamentals



-Connect with public key auth; hosts need bash and /proc (Linux). No agent needed on the remote side.
+For basic understanding, some fundamentals are explained below. A deeper exploration will follow in later chapters.
+
+

Client/Server Model



-loadbars --hosts server1,server2,server3
-loadbars --hosts root@server1,root@server2
-loadbars servername{01..50}.example.com --showcores 1
-loadbars --cluster production
++-----------------------------------------+
+|                                         |
+|   +--------+         +--------+         |
+|   | Client |<------->| Server |         |
+|   +--------+         +--------+         |
+|                                         |
+|       Sending of Messages               |
+|                                         |
++-----------------------------------------+
+
+Figure 1.1: Client/Server Model
 

-Config: ~/.loadbarsrc (key=value, no --; use # for comments). Any --help option. Press w to save current settings.
+The simulator is based on the client/server principle. Each simulation typically consists of a participating client and a server that communicate with each other via messages (see Fig. 1.1). In complex simulations, multiple clients and/or servers can also participate.

-

Building and platforms


+

Processes and Their Roles



-Go 1.25+ and SDL2. Install SDL2 (e.g. sudo dnf install SDL2-devel on Fedora, brew install sdl2 on macOS), then:
+A distributed system is simulated using processes. Each process takes on one or more roles. For example, one process can take on the role of a client and another process the role of a server. The possibility of assigning both client and server roles to a process simultaneously is also provided. A process could also take on the roles of multiple servers and clients simultaneously. To identify a process, each one has a unique Process Identification Number (PID).

-
-mage build
-./loadbars --hosts localhost
-mage install   # to ~/go/bin
-mage test
-
+

Messages



-Tested on Fedora Linux 43 and common distros; macOS as client to remote Linux only (no local macOS monitoring — no /proc).
+In a distributed system, it must be possible to send messages. A message can be sent by a client or server process and can have any number of recipients. The content of a message depends on the protocol used. What is meant by a protocol will be covered later. To identify a message, each message has a unique Message Identification Number (NID).

-E-Mail your comments to paul@nospam.buetow.org :-)
+

Local and Global Clocks



-Back to the main site
-
-
-
- - My desk rack: DeskPi RackMate T0 - - gemini://foo.zone/gemfeed/2026-02-22-my-desk-rack.gmi - 2026-02-21T11:17:15+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - On my desk sits a small rack that keeps audio gear, power, and network in one place: the DeskPi RackMate T0. Here's what lives in it and how it's wired. - -
-

My desk rack: DeskPi RackMate T0


+In a simulation, there is exactly one global clock. It represents the current and always correct time. A global clock never goes wrong.

-Published at 2026-02-21T11:17:15+02:00
+Additionally, each participating process has its own local clock. It represents the current time of the respective process. Unlike the global clock, local clocks can display an incorrect time. If the process time is not globally correct (not equal to the global time, or displays an incorrect time), then it was either reset during a simulation, or it is running incorrectly due to clock drift. The clock drift indicates by what factor the clock is running incorrectly. This will be discussed in more detail later.

-    ┌─────────────────┐
-    │   ●  ●  AIR     │  ← air-quality monitor
-    ├─────────────────┤
-    │  ╔═╗  CD        │  ← CD transport
-    │  ║ ◉║  S/PDIF   │
-    │  ╚═╝            │
-    ├─────────────────┤
-    │  ▓▓▓  USB PWR   │  ← PinePower
-    ├─────────────────┤
-    │  ░░░  (phones)  │  ← 1U "empty" shelf
-    ├─────────────────┤
-    │  ◉◉◉◉◉  LAN     │  ← 5-port switch
-    ├─────────────────┤
-    │  [E50] [L50]    │  ← DAC + AMP
-    │   DAC   AMP     │
-    └─────────────────┘
-         RackMate T0
++---------------------+     +---------------------+
+|    Process 1        |     |    Process 2        |
+|                     |     |                     |
+| +-----------------+ |     | +-----------------+ |
+| |Server Protocol A| |     | |Client Protocol A| |
+| +-----------------+ |     | +-----------------+ |
+|                     |     |                     |
+| +-----------------+ |     +---------------------+
+| |Client Protocol B| |
+| +-----------------+ |     +---------------------+
+|                     |     |    Process 3        |
++---------------------+     |                     |
+                            | +-----------------+ |
+                            | |Server Protocol B| |
+                            | +-----------------+ |
+                            |                     |
+                            +---------------------+
+
+Figure 1.2: Client/Server Protocols
 

-On my desk sits a small rack that keeps audio gear, power, and network in one place: the DeskPi RackMate T0. Here's what lives in it and how it's wired.
+In addition to normal clocks, vector timestamps and Lamport's logical clocks are also of interest. For vector and Lamport times, there are no global equivalents here, unlike normal time. Concrete examples of Lamport and vector times will be covered later in the "Additional Examples" section.

-DeskPi RackMate T0
+

Events



-DeskPi RackMate T0 on the desk
+A simulation consists of the sequential execution of finitely many events. For example, there can be an event that causes a process to send a message. A process crash event would also be conceivable. Each event occurs at a specific point in time. Events with the same occurrence time are executed directly one after another by the simulator. However, this does not hinder the simulator's users, as events are executed in parallel from their perspective.

-

Table of Contents


+Two main types of events are distinguished: programmable events and non-programmable events. Programmable events can be programmed and edited in the event editor, and their occurrence times depend on the local process clocks or the global clock. Non-programmable events, on the other hand, cannot be programmed in the event editor and do not occur because of a specific time, but due to other circumstances such as:


-

What's in the rack (top to bottom)


-
-

Top: CD transport and air-quality monitor


+

Protocols



-At the top is the S.M.S.L PL200T, a CD transport with anti-vibration design. It outputs digital audio over coaxial S/PDIF into the DAC in the rack. On top of the transport sits a small air-quality monitor so I can keep an eye on the room.
+A simulation also consists of the application of protocols. It has already been mentioned that a process can take on the roles of servers and/or clients. For each server and client role, the associated protocol must also be specified. A protocol defines how a client and a server send messages, and how they react when a message arrives. A protocol also determines what data is contained in a message. A process only processes a received message if it understands the respective protocol.

-S.M.S.L PL200T CD Transport
+In Figure 1.2, 3 processes are shown. Process 1 supports protocol "A" on the server side and protocol "B" on the client side. Process 2 supports protocol "A" on the client side and Process 3 supports protocol "B" on the server side. This means that Process 1 can communicate with Process 2 via protocol "A" and with Process 3 via protocol "B". Processes 2 and 3 are incompatible with each other and cannot process messages received from each other.

-CD transport and air-quality monitor on top
+Clients cannot communicate with clients, and servers cannot communicate with servers. For communication, at least one client and one server are always required. However, this restriction can be circumvented by having processes support a given protocol on both the server and client sides (see Broadcast Protocol later).

-A CD transport is not the same as a CD player. A CD player has a built-in DAC (digital-to-analog converter) and outputs analogue audio—you plug it into an amp or active speakers and you're done. A CD transport only reads the disc and outputs a digital signal (e.g. coaxial or optical S/PDIF). It has no DAC. You feed that digital stream into an external DAC, which then does the conversion. The idea is to separate the mechanical part (spinning the disc, reading the pits) from the conversion stage, so you can use one DAC for CDs, streaming, and other sources, and upgrade or swap the transport and the DAC independently.
+

Graphical User Interface (GUI)



-In the age of streaming and files, putting on a real CD is still a pleasure. You own the disc and the sound isn't at the mercy of a subscription or a server. You pick an album, put it in, and listen from start to finish—no endless scrolling, no algorithm. The format is fixed (16-bit/44.1 kHz), so what you hear is consistent and often better than heavily compressed streams. And there's something satisfying about the ritual: handling the case, the disc, and the artwork instead of tapping a screen.
+

Simple Mode



-

Power and charging: PinePower Desktop + 1U shelf


+Screenshot: The simulator showing the settings dialog. The visualization area displays process bars with message lines between them. The settings window allows configuring simulation parameters like number of processes, simulation duration, clock drift, message loss probability, and more.

-Below that is the PinePower Desktop from Pine64, used as a desktop power and USB charging station for phones and other devices. The rack has one free 1U space under the PinePower where I put the devices that are charging, so cables and gadgets stay in one spot.
+The simulator requires JDK 21 and can be started with the command java -jar target/ds-sim-VERSION.jar

-PinePower Desktop (Pine64)
+The simulator then presents itself with a main window. To create a new simulation, select "New Simulation" from the "File" menu, after which the settings window for the new simulation appears. The individual options will be discussed in more detail later, and for now, only the default settings will be used.

-

Network: 5-port mini switch


+By default, the simulator starts in "simple mode". There is also an "expert mode", which will be discussed later.

-Next is a compact 5-port Ethernet switch. The uplink goes to a wall socket behind the desk; the other ports feed the computer, laptop, and anything else that needs wired LAN on the desk. Next to the switch you can see my Nothing ear buds.
+

The Menu Bar



-Nothing ear buds
+In the File menu, you can create new simulations or close the currently open simulation. New simulations open by default in a new tab. However, you can also open or close new simulation windows that have their own tabs. Each tab contains a simulation that is completely independent from the others. This allows any number of simulations to be run in parallel. The menu items "Open", "Save" and "Save As" are used for loading and saving simulations.

-

Bottom: DAC and headphone amp


+Through the Edit menu, users can access the simulation settings, which will be discussed in more detail later. This menu also lists all participating processes for editing. If the user selects a process there, the corresponding process editor opens. The Simulator menu offers the same options as the toolbar, which is described in the next section.

-At the bottom of the rack are the Topping E50 (DAC) and Topping L50 (headphone amplifier). The E50 converts digital to analogue; the L50 drives the headphones. They drive my Hifiman Sundara headphones.
+Some menu items are only accessible when a simulation has already been created or loaded in the current window.

-Topping E50 DAC
-Topping L50 Headphone Amplifier
-Hifiman Sundara
+

The Toolbar



-

Music sources


+The toolbar is located at the top left of the simulator. The toolbar contains the functions most frequently needed by users. The toolbar offers four different functions:

    -
  • CD transport: coaxial (S/PDIF) from the S.M.S.L PL200T into the Topping E50.
  • -
  • Streaming: USB from the desktop computer and/or laptop on the desk into the E50, so I can play from either machine.
  • +
  • Reset simulation: can only be activated when the simulation has been paused or has finished
  • +
  • Repeat simulation: cannot be activated if the simulation has not yet been started
  • +
  • Pause simulation: can only be activated when the simulation is currently running
  • +
  • Start simulation: can only be activated when the simulation is not currently running and has not yet finished

-

Left side: cable management


+

The Visualization



-On the left of the rack are two cable holders to keep power and signal cables tidy.
+The graphical simulation visualization is located in the center right. The X-axis shows the time in milliseconds, and all participating processes are listed on the Y-axis. The demo simulation ends after exactly 15 seconds. The visualization shows processes (with PIDs 1, 2, and 3), each with its own horizontal black bar. On these process bars, users can read the respective local process time. The vertical red line represents the global simulation time.

-

Next to the rack


+The process bars also serve as start and end points for messages. For example, if Process 1 sends a message to Process 2, a line is drawn from one process bar to the other. Messages that a process sends to itself are not visualized but are logged in the log window (more on this later).

-Right beside the rack is my Supernote Nomad, which I use for notes and reading and have written about elsewhere on this blog. It’s the small tablet-shaped device on the right side of the rack.
+Another way to open a process editor is to left-click on the process bar belonging to the process. A right-click, on the other hand, opens a popup window with additional options. A process can only be forced to crash or be revived via the popup menu during a running simulation.

-Supernote Nomad (small tablet on the right of the rack)
-Supernote Nomad (product page)
+In general, the number of processes can vary as desired. The simulation duration is at least 5 and at most 120 seconds. The simulation only ends when the global time reaches the specified simulation end time (here 15 seconds), not when a local process time reaches this end time.

-Front view of the rack
-Back of the rack
+

Color Differentiation



-

Bedside: another HiFi setup


+Colors help to better interpret the processes of a simulation. By default, processes (process bars) and messages are displayed with the following colors (these are only the default colors, which can be changed via the settings):

-I have a second setup for high-res listening next to my bed. On the nightstand sit my FiiO K13 R2R (an R2R DAC/amp) and my Denon AH-D9200 headphones. I connect the K13 to my laptop via USB and use it for high-resolution files and streaming when I'm not at the desk.
+
+Process Colors:
+  Black   - The simulation is not currently running
+  Green   - The process is running normally
+  Orange  - The mouse is over the process bar
+  Red     - The process has crashed
+
+Message Colors:
+  Green   - The message is still in transit
+  Blue    - The message has successfully reached its destination
+  Red     - The message was lost
+

-Fiio K13 R2R
-Denon AH-D9200
+

The Sidebar



-That's the full desk rack: CD transport and air monitor on top, PinePower and charging shelf, switch, then Topping E50 and L50 at the bottom, with the Hifiman Sundara as the main output and the Supernote Nomad sitting next to it. I hope that you found this interesting.
+The sidebar is used to program process events. At the top, the process to be managed is selected (here with PID 1). In this process selection, there is also the option to select "All Processes", which displays all programmed events of all processes simultaneously. "Local events" are those events that occur when a certain local time of the associated process has been reached. The event table below lists all programmed events along with their occurrence times and PIDs.

-E-Mail your comments to paul@nospam.buetow.org :-)
-
-
-
- - A tmux popup editor for Cursor Agent CLI prompts - - gemini://foo.zone/gemfeed/2026-02-02-tmux-popup-editor-for-cursor-agent-prompts.gmi - 2026-02-01T20:24:16+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - I spend some time in Cursor Agent (the CLI version of the Cursor IDE, I don't like really the IDE), and I also jump between Claude Code CLI, Ampcode, Gemini CLI, OpenAI Codex CLI, OpenCode, and Aider just to see how things are evolving. But for the next month I'll be with Cursor Agent. +To create a new event, the user can either right-click on a process bar and select "Insert local event", or select an event below the event table, enter the event occurrence time in the text field below, and click "Apply".
+
+Right-clicking on the event editor allows you to either copy or delete all selected events. Using the Ctrl key, multiple events can be selected simultaneously. The entries in the Time and PID columns can be edited afterwards. This provides a convenient way to move already programmed events to a different time or assign them to a different process. However, users should ensure that they press the Enter key after changing the event occurrence time, otherwise the change will be ineffective.
+
+In addition to the Events tab, the sidebar has another tab called "Variables". Behind this tab is the process editor of the currently selected process. There, all variables of the process can be edited, providing another way to access a process editor.
+
+

The Log Window


+
+The log window (at the bottom) logs all occurring events in chronological order. At the beginning of each log entry, the global time in milliseconds is always logged. For each process, its local times as well as the Lamport and vector timestamps are also listed. After the time information, additional details are provided, such as which message was sent with what content and which protocol it belongs to. This will be demonstrated later with examples.
+
+
+000000ms: New Simulation
+000000ms: New Process; PID: 1; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)
+000000ms: New Process; PID: 2; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)
+000000ms: New Process; PID: 3; Local Time: 000000ms; Lamport time: 0; Vector time: (0,0,0)
+
+
+By deactivating the logging switch, message logging can be temporarily disabled. With logging deactivated, no new messages are written to the log window. After reactivating the switch, all omitted messages are subsequently written to the window. Deactivated logging can lead to improved simulator performance.
+
+

Expert Mode


+
+Screenshot: The Distributed Systems Simulator in expert mode, showing a Broadcast protocol simulation with 6 processes. The visualization shows message lines between process bars, with blue indicating delivered messages and green indicating messages still in transit.
+
+The simulator can be operated in two different modes: simple mode and expert mode. The simulator starts in simple mode by default, so users don't have to deal with the simulator's full functionality all at once. Simple mode is clearer but offers fewer functions. Expert mode is more suitable for experienced users and accordingly offers more flexibility. Expert mode can be activated or deactivated via the switch of the same name below the log window or via the simulation settings.
+
+In expert mode, the following additional features become available:
+
+
    +
  • Global events: In addition to local events, global events can now also be edited. Global events are triggered when a specific global simulation time is reached, rather than a local process time. This only makes a difference when local process times differ from the global time (e.g. due to clock drift).
  • +
  • Direct PID selection: The user can directly select the associated PID when programming a new event.
  • +
  • Lamport and Vector time switches: If the user activates one of these two switches, the Lamport or vector timestamps are displayed in the visualization. Only one can be active at a time to maintain clarity.
  • +
  • Anti-aliasing switch: Allows the user to activate or deactivate anti-aliasing for smoother graphics. Disabled by default for performance reasons.
  • +
  • Log filter: A regular expression filter (Java syntax) that makes it possible to filter only the essential data from the logs. For example, "PID: (1|2)" shows only log lines containing "PID: 1" or "PID: 2". The filter can be activated retroactively and during a running simulation.
  • +

+

Configuration Settings


+
+The simulation settings window allows configuring many aspects of the simulation. Key settings include:
+
+
    +
  • Processes receive own messages (default: false): Whether processes can receive messages they sent to themselves.
  • +
  • Average message loss probabilities (default: true): Whether to average the loss probabilities of sender and receiver processes.
  • +
  • Average transmission times (default: true): Whether to average the transmission times of sender and receiver processes.
  • +
  • Show only relevant messages (default: true): Hides messages sent to processes that don't support the protocol.
  • +
  • Expert mode (default: false): Enables expert mode features.
  • +
  • Simulation speed (default: 0.5): The playback speed factor. A value of 1 means real-time, 0.5 means half speed.
  • +
  • Number of processes (default: 3): Can also be changed during simulation via right-click.
  • +
  • Simulation duration (default: 15s): Between 5 and 120 seconds.
  • +

+Each process also has individual settings:
+
+
    +
  • Clock drift (default: 0.0): By what factor the local clock deviates. A value of 0.0 means no deviation. A value of 1.0 means double speed. Values > -1.0 are allowed.
  • +
  • Random crash probability (default: 0%): Probability that the process crashes randomly during the simulation.
  • +
  • Message loss probability (default: 0%): Probability that a message sent by this process is lost in transit.
  • +
  • Min/Max transmission time (default: 500ms/2000ms): The range for random message delivery times.
  • +

+Read the next post of this series:
+
+Distributed Systems Simulator - Part 2: Built-in Protocols
+
+Other related posts are:
+
+2026-03-01 Loadbars 0.13.0 released
+2022-12-24 (Re)learning Java - My takeaways
+2022-03-06 The release of DTail 4.0.0
+2016-11-20 Object oriented programming with ANSI C
+
+E-Mail your comments to paul@nospam.buetow.org
+
+Back to the main site
+ + +
+ + RCM: The Ruby Configuration Management DSL + + gemini://foo.zone/gemfeed/2026-03-02-rcm-ruby-configuration-management-dsl.gmi + 2026-03-02T00:00:00+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + RCM is a tiny configuration management system written in Ruby. It gives me a small DSL for describing how I want my machines to look, then it applies the changes: create files and directories, manage packages, and make sure certain lines exist in configuration files. It's deliberately KISS and optimised for a single person's machines instead of a whole fleet.
-

A tmux popup editor for Cursor Agent CLI prompts


+

RCM: The Ruby Configuration Management DSL



-Published at 2026-02-01T20:24:16+02:00
+Published at 2026-03-02T00:00:00+02:00

-...and any other TUI based application
+RCM is a tiny configuration management system written in Ruby. It gives me a small DSL for describing how I want my machines to look, then it applies the changes: create files and directories, manage packages, and make sure certain lines exist in configuration files. It's deliberately KISS and optimised for a single person's machines instead of a whole fleet.
+
+RCM DSL in action

Table of Contents




-

Why I built this


+

Why I built RCM



-I spend some time in Cursor Agent (the CLI version of the Cursor IDE, I don't like really the IDE), and I also jump between Claude Code CLI, Ampcode, Gemini CLI, OpenAI Codex CLI, OpenCode, and Aider just to see how things are evolving. But for the next month I'll be with Cursor Agent.
+I've used (and still use) the usual suspects in configuration management: Puppet, Ansible, etc. They are powerful, but also come with orchestration layers, agents, inventories, and a lot of moving parts. For my personal machines I wanted something smaller: one Ruby process, one configuration file, a few resource types, and good enough safety features.

-https://cursor.com/cli
+I've always been a fan of Ruby's metaprogramming features, and this project let me explore them in a focused, practical way.

-Short prompts are fine in the inline input, but for longer prompts I want a real editor: spellcheck, search/replace, multiple cursors, and all the Helix muscle memory I already have.
+Because of that metaprogramming support, Ruby is a great fit for DSLs. You can get very close to natural language without inventing a brand-new syntax. RCM leans into that: the goal is to read a configuration and understand what happens without jumping between multiple files or templating languages.

-Cursor Agent has a Vim editing mode, but not Helix. And even in Vim mode I can't use my full editor setup. I want the real thing, not a partial emulation.
+RCM repo on Codeberg

-https://helix-editor.com
-https://www.vim.org
-https://neovim.io
+

How the DSL feels



-So I built a tiny tmux popup editor. It opens $EDITOR (Helix for me), and when I close it, the buffer is sent back into the prompt. It sounds simple, but it feels surprisingly native.
+An RCM configuration starts with a configure block. Inside it you declare resources (file, package, given, notify, …). RCM figures out dependencies between resources and runs them in the right order.

-This is how it looks like:
+ +
configure do
+  given { hostname is :earth }
+
+  file '/tmp/test/wg0.conf' do
+    requires file '/etc/hosts.test'
+    manage directory
+    from template
+    'content with <%= 1 + 2 %>'
+  end
+
+  file '/etc/hosts.test' do
+    line '192.168.1.101 earth'
+  end
+end
+

-Popup editor in action
+Which would look like this when run:

-

What it is


+ +
% sudo ruby example.rb
+INFO 20260301-213817 dsl(0) => Configuring...
+INFO 20260301-213817 file('/tmp/test/wg0.conf') => Registered dependency on file('/etc/hosts.test')
+INFO 20260301-213817 file('/tmp/test/wg0.conf') => Evaluating...
+INFO 20260301-213817 file('/etc/hosts.test') => Evaluating...
+INFO 20260301-213817 file('/etc/hosts.test') => Writing file /etc/hosts.test
+INFO 20260301-213817 file('/tmp/test/wg0.conf') => Creating parent directory /tmp/test
+INFO 20260301-213817 file('/tmp/test/wg0.conf') => Writing file /tmp/test/wg0.conf
+

-The idea is straightforward:
+The idea is that you describe the desired state and RCM worries about the steps. The given block can short‑circuit the whole run (for example, only run on a specific hostname). Each file resource can either manage a complete file (from a template) or just make sure individual lines are present.
+
+

Keywords and resources


+
+Under the hood, each DSL word is either a keyword or a resource:

    -
  • A tmux key binding prefix-e opens a popup overlay near the bottom of the screen.
  • -
  • The popup starts $EDITOR on a temp file.
  • -
  • When I exit the editor, the script sends the contents back to the original pane with tmux send-keys.
  • +
  • Keyword is the base class for all top‑level DSL constructs.
  • +
  • Resource is the base class for things RCM can manage (files, packages, and so on).

-It also pre-fills the temp file with whatever is already typed after Cursor Agent's prompt, so I can continue where I left off.
+Resources can declare dependencies with requires. Before a resource runs, RCM makes sure all its requirements are satisfied and only evaluates each resource once per run. This keeps the mental model simple even when you compose more complex configurations.

-

How it works (overview)


+

Files, directories, and templates



-This is the tmux binding I use (trimmed to the essentials):
+The file resource handles three common cases:

-
-bind-key e run-shell -b "tmux display-message -p '#{pane_id}'
-  > /tmp/tmux-edit-target-#{client_pid} \;
-  tmux popup -E -w 90% -h 35% -x 5% -y 65% -d '#{pane_current_path}'
-  \"~/scripts/tmux-edit-send /tmp/tmux-edit-target-#{client_pid}\""
+
    +
  • Managing parent directories (manage directory) so you don't have to create them manually.
  • +
  • Rendering ERB templates (from template) so you can mix Ruby expressions into config files.
  • +
  • Ensuring individual lines exist (line) for the many "append this line if missing" situations.
  • +

+Every write operation creates a backup copy in .rcmbackup/, so you can always inspect what changed and roll back manually if needed.
+
+

How Ruby's metaprogramming helps


+
+The nice thing about RCM is that the Ruby code you write in your configuration is not that different from the Ruby code inside RCM itself. The DSL is just a thin layer on top.
+
+For example, when you write:
+
+ +
file '/etc/hosts.test' do
+  line '192.168.1.101 earth'
+end
 

-

Workflow diagram


+Ruby turns file into a method call and '/etc/hosts.test' into a normal argument. Inside RCM, that method builds a File resource object and stores it for later. The block you pass is just a Ruby block; RCM calls it with the file resource as self, so method calls like line configure that resource. There is no special parser here, just plain Ruby method and block dispatch.

-This is the whole workflow:
+The same goes for constructs like:

-
-┌────────────────────┐   ┌───────────────┐   ┌─────────────────────┐   ┌─────────────────────┐
-│ Cursor input box   │-->| tmux keybind  │-->| popup runs script   │-->| capture + prefill   │
-│ (prompt pane)      │   │ prefix + e    │   │ tmux-edit-send      │   │ temp file           │
-└────────────────────┘   └───────────────┘   └─────────────────────┘   └─────────────────────┘
-                                                                                 |
-                                                                                 v
-┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐
-│ Cursor input box   │<--| send-keys back     |<--| close editor+popup |<--| edit temp file     |
-│ (prompt pane)      │   │ to original pane   │   │ (exit $EDITOR)     │   │ in $EDITOR         │
-└────────────────────┘   └────────────────────┘   └────────────────────┘   └────────────────────┘
+
+
given { hostname is :earth }
 

-And this is how it looks like after sending back the text to the Cursor Agent's input:
+RCM uses Ruby's dynamic method lookup to interpret hostname and is in that block and to decide whether the rest of the configuration should run at all. Features like method_missing, blocks, and the ability to change what self means in a block make this kind of DSL possible with very little code. You still get all the power of Ruby (conditionals, loops, helper methods), but the surface reads like a small language of its own.

-Prefilled prompt text
+

A bit more about method_missing



-And here is the full script. It is a bit ugly since it's shell (written with Cursor Agent with GPT-5.2-Codex), and I might (let) rewrite it in Go with proper unit tests, config-file, multi-agent support and release it once I have time. But it works well enough for now.
+method_missing is one of the key tools that make the RCM DSL feel natural. In plain Ruby, if you call a method that does not exist, you get a NoMethodError. But before Ruby raises that error, it checks whether the object implements method_missing. If it does, Ruby calls that instead and lets the object decide what to do.

-Update 2026-02-08: This functionality has been integrated into the hexai project (https://codeberg.org/snonux/hexai) with proper multi-agent support for Cursor Agent, Claude Code CLI, and Ampcode. The hexai version includes unit tests, configuration files, and better agent detection. While still experimental, it's more robust than this shell script. See the hexai-tmux-edit command for details.
+In RCM, you can write things like:

-https://codeberg.org/snonux/hexai
+ +
given { hostname is :earth }
+
+
+Inside that block, calls such as hostname and is don't map to normal Ruby methods. Instead, RCM's DSL objects see those calls in method_missing, and interpret them as "check the current hostname" and "compare it to this symbol". This lets the DSL stay small and flexible: adding a new keyword can be as simple as handling another case in method_missing, without changing the Ruby syntax at all.
+
+Put differently: you can write what looks like a tiny English sentence (hostname is :earth) and Ruby breaks it into method calls (hostname, then is) that RCM can interpret dynamically. Those "barewords" are not special syntax; they are just regular Ruby method names that the DSL catches and turns into configuration logic at runtime.
+
+Here's a simplified sketch of how such a condition object could look in Ruby:

-
#!/usr/bin/env bash
-set -u -o pipefail
+
class HostCondition
+  def initialize
+    @current_hostname = Socket.gethostname.to_sym
+  end
 
-LOG_ENABLED=0
-log_file="${TMPDIR:-/tmp}/tmux-edit-send.log"
-log() {
-  if [ "$LOG_ENABLED" -eq 1 ]; then
-    printf '%s\n' "$*" >> "$log_file"
-  fi
-}
+  def method_missing(name, *args, &)
+    case name
+    when :hostname
+      @left = @current_hostname
+      self               # allow chaining: hostname is :earth
+    when :is
+      @left == args.first
+    else
+      super
+    end
+  end
+end
 
-# Read the target pane id from a temp file created by tmux binding.
-read_target_from_file() {
-  local file_path="$1"
-  local pane_id
-  if [ -n "$file_path" ] && [ -f "$file_path" ]; then
-    pane_id="$(sed -n '1p' "$file_path" | tr -d '[:space:]')"
-    # Ensure pane ID has % prefix
-    if [ -n "$pane_id" ] && [[ "$pane_id" != %* ]]; then
-      pane_id="%${pane_id}"
-    fi
-    printf '%s' "$pane_id"
-  fi
-}
-
-# Read the target pane id from tmux environment if present.
-read_target_from_env() {
-  local env_line pane_id
-  env_line="$(tmux show-environment -g TMUX_EDIT_TARGET 2>/dev/null || true)"
-  case "$env_line" in
-    TMUX_EDIT_TARGET=*)
-      pane_id="${env_line#TMUX_EDIT_TARGET=}"
-      # Ensure pane ID has % prefix
-      if [ -n "$pane_id" ] && [[ "$pane_id" != %* ]] && [[ "$pane_id" =~ ^[0-9]+$ ]]; then
-        pane_id="%${pane_id}"
-      fi
-      printf '%s' "$pane_id"
-      ;;
-  esac
-}
-
-# Resolve the target pane id, falling back to the last pane.
-resolve_target_pane() {
-  local candidate="$1"
-  local current_pane last_pane
-
-  current_pane="$(tmux display-message -p "#{pane_id}" 2>/dev/null || true)"
-  log "current pane=${current_pane:-<empty>}"
-  
-  # Ensure candidate has % prefix if it's a pane ID
-  if [ -n "$candidate" ] && [[ "$candidate" =~ ^[0-9]+$ ]]; then
-    candidate="%${candidate}"
-    log "normalized candidate to $candidate"
-  fi
-  
-  if [ -n "$candidate" ] && [[ "$candidate" == *"#{"* ]]; then
-    log "format target detected, clearing"
-    candidate=""
-  fi
-  if [ -z "$candidate" ]; then
-    candidate="$(tmux display-message -p "#{last_pane}" 2>/dev/null || true)"
-    log "using last pane as fallback: $candidate"
-  elif [ "$candidate" = "$current_pane" ]; then
-    last_pane="$(tmux display-message -p "#{last_pane}" 2>/dev/null || true)"
-    if [ -n "$last_pane" ]; then
-      candidate="$last_pane"
-      log "candidate was current, using last pane: $candidate"
-    fi
-  fi
-  printf '%s' "$candidate"
-}
-
-# Capture the latest multi-line prompt content from the pane.
-capture_prompt_text() {
-  local target="$1"
-  tmux capture-pane -p -t "$target" -S -2000 2>/dev/null | awk '
-    function trim_box(line) {
-      sub(/^ *│ ?/, "", line)
-      sub(/ *│ *$/, "", line)
-      sub(/[[:space:]]+$/, "", line)
-      return line
-    }
-    /^ *│ *→/ && index($0,"INSERT")==0 && index($0,"Add a follow-up")==0 {
-      if (text != "") last = text
-      text = ""
-      capture = 1
-      line = $0
-      sub(/^.*→ ?/, "", line)
-      line = trim_box(line)
-      if (line != "") text = line
-      next
-    }
-    capture {
-      if ($0 ~ /^ *└/) {
-        capture = 0
-        if (text != "") last = text
-        next
-      }
-      if ($0 ~ /^ *│/ && index($0,"INSERT")==0 && index($0,"Add a follow-up")==0) {
-        line = trim_box($0)
-        if (line != "") {
-          if (text != "") text = text " " line
-          else text = line
-        }
-      }
-    }
-    END {
-      if (text != "") last = text
-      if (last != "") print last
-    }
-  '
-}
-
-# Write captured prompt text into the temp file if available.
-prefill_tmpfile() {
-  local tmpfile="$1"
-  local prompt_text="$2"
-  if [ -n "$prompt_text" ]; then
-    printf '%s\n' "$prompt_text" > "$tmpfile"
-  fi
-}
-
-# Ensure the target pane exists before sending keys.
-validate_target_pane() {
-  local target="$1"
-  local pane target_found
-  if [ -z "$target" ]; then
-    log "error: no target pane determined"
-    echo "Could not determine target pane." >&2
-    return 1
-  fi
-  target_found=0
-  log "validate: looking for target='$target' in all panes:"
-  for pane in $(tmux list-panes -a -F "#{pane_id}" 2>/dev/null || true); do
-    log "validate: checking pane='$pane'"
-    if [ "$pane" = "$target" ]; then
-      target_found=1
-      log "validate: MATCH FOUND!"
-      break
-    fi
-  done
-  if [ "$target_found" -ne 1 ]; then
-    log "error: target pane not found: $target"
-    echo "Target pane not found: $target" >&2
-    return 1
-  fi
-  log "validate: target pane validated successfully"
-}
-
-# Send temp file contents to the target pane line by line.
-send_content() {
-  local target="$1"
-  local tmpfile="$2"
-  local prompt_text="$3"
-  local first_line=1
-  local line
-  log "send_content: target=$target, prompt_text='$prompt_text'"
-  while IFS= read -r line || [ -n "$line" ]; do
-    log "send_content: read line='$line'"
-    if [ "$first_line" -eq 1 ] && [ -n "$prompt_text" ]; then
-      if [[ "$line" == "$prompt_text"* ]]; then
-        local old_line="$line"
-        line="${line#"$prompt_text"}"
-        log "send_content: stripped prompt, was='$old_line' now='$line'"
-      fi
-    fi
-    first_line=0
-    log "send_content: sending line='$line'"
-    tmux send-keys -t "$target" -l "$line"
-    tmux send-keys -t "$target" Enter
-  done < "$tmpfile"
-  log "sent content to $target"
-}
-
-# Main entry point.
-main() {
-  local target_file="${1:-}"
-  local target
-  local editor="${EDITOR:-vi}"
-  local tmpfile
-  local prompt_text
-
-  log "=== tmux-edit-send starting ==="
-  log "target_file=$target_file"
-  log "EDITOR=$editor"
-  
-  target="$(read_target_from_file "$target_file" || true)"
-  if [ -n "$target" ]; then
-    log "file target=${target:-<empty>}"
-    rm -f "$target_file"
-  fi
-  if [ -z "$target" ]; then
-    target="${TMUX_EDIT_TARGET:-}"
-  fi
-  log "env target=${target:-<empty>}"
-  if [ -z "$target" ]; then
-    target="$(read_target_from_env || true)"
-  fi
-  log "tmux env target=${target:-<empty>}"
-  target="$(resolve_target_pane "$target")"
-  log "fallback target=${target:-<empty>}"
-
-  tmpfile="$(mktemp)"
-  log "created tmpfile=$tmpfile"
-  if [ ! -f "$tmpfile" ]; then
-    log "ERROR: mktemp failed to create file"
-    echo "ERROR: mktemp failed" >&2
-    exit 1
-  fi
-  mv "$tmpfile" "${tmpfile}.md" 2>&1 | while read -r line; do log "mv output: $line"; done
-  tmpfile="${tmpfile}.md"
-  log "renamed to tmpfile=$tmpfile"
-  if [ ! -f "$tmpfile" ]; then
-    log "ERROR: tmpfile does not exist after rename"
-    echo "ERROR: tmpfile rename failed" >&2
-    exit 1
-  fi
-  trap 'rm -f "$tmpfile"' EXIT
-
-  log "capturing prompt text from target=$target"
-  prompt_text="$(capture_prompt_text "$target")"
-  log "captured prompt_text='$prompt_text'"
-  prefill_tmpfile "$tmpfile" "$prompt_text"
-  log "prefilled tmpfile"
-
-  log "launching editor: $editor $tmpfile"
-  "$editor" "$tmpfile"
-  local editor_exit=$?
-  log "editor exited with status $editor_exit"
-
-  if [ ! -s "$tmpfile" ]; then
-    log "empty file, nothing sent"
-    exit 0
-  fi
-  
-  log "tmpfile contents:"
-  log "$(cat "$tmpfile")"
-
-  log "validating target pane"
-  validate_target_pane "$target"
-  log "sending content to target=$target"
-  send_content "$target" "$tmpfile" "$prompt_text"
-  log "=== tmux-edit-send completed ==="
-}
-
-main "$@"
+HostCondition.new.hostname.is(:earth)
 

-

Challenges and small discoveries


-
-The problems were mostly small but annoying:
+RCM's real code is more sophisticated, but the idea is the same: Ruby happily calls method_missing for unknown methods like hostname and is, and the DSL turns those calls into a value (true/false) that decides whether the rest of the configuration should run.

-
    -
  • Getting the right target pane was the first hurdle. I ended up storing the pane id in a file because of tmux format expansion quirks.
  • -
  • The Cursor UI draws a nice box around the prompt, so the prompt line contains a and other markers. I had to filter those out and strip the box-drawing characters.
  • -
  • When I prefilled text and then sent it back, I sometimes duplicated the prompt. Stripping the prefilled prompt text from the submitted text fixed that.
  • -

-

Test cases (for a future rewrite)


+

Ruby metaprogramming: further reading



-These are the cases I test whenever I touch the script:
+If you want to dive deeper into the ideas behind RCM's DSL, these books are great starting points:

    -
  • Single-line prompt: capture everything after and prefill the editor.
  • -
  • Multi-line boxed prompt: capture the wrapped lines inside the │ ... │ box and join them with spaces (no newline in the editor).
  • -
  • Ignore UI noise: do not capture lines containing INSERT or Add a follow-up.
  • -
  • Preserve appended text: if I add juju to an existing line, the space before juju must survive.
  • -
  • No duplicate send: if the prefilled text is still at the start of the first line, it must be stripped once before sending back.
  • +
  • "Metaprogramming Ruby 2" by Paolo Perrotta
  • +
  • "The Well-Grounded Rubyist" by David A. Black (and others)
  • +
  • "Eloquent Ruby" by Russ Olsen

-

(Almost) works with any editor (or any TUI)


+They all cover Ruby's object model, blocks, method_missing, and other metaprogramming techniques in much more detail than I can in a single blog post.

-Although I use Helix, this is just $EDITOR. If you prefer Vim, Neovim, or something more exotic, it should work. The same mechanism can be used to feed text into any TUI that reads from a terminal pane, not just Cursor Agent.
+

Safety, dry runs, and debugging



-One caveat: different agents draw different prompt UIs, so the capture logic depends on the prompt shape. A future version of this script should be more modular in that respect; for now this is just a PoC tailored to Cursor Agent.
+RCM has a --dry mode: it logs what it would do without actually touching the file system. I use this when iterating on new configurations or refactoring existing ones. Combined with the built‑in logging and debug output, it's straightforward to see which resources were scheduled and in which order.

-Another thing is, what if Cursor decides to change the design of its TUI? I would need to change my script as well.
+Because RCM is just Ruby, there's no separate agent protocol or daemon. The same process parses the DSL, resolves dependencies, and performs the actions. If something goes wrong, you can drop into the code, add a quick debug statement, and re‑run your configuration.

-If I get a chance, I'll clean it up and rewrite it in Go (and release it properly or include it into Hexai, another AI related tool of mine, of which I haven't blogged about yet). For now, I am happy with this little hack. It already feels like a native editing workflow for Cursor Agent prompts.
+

RCM vs Puppet and other big tools



-https://codeberg.org/snonux/hexai
-
-E-Mail your comments to paul@nospam.buetow.org :-)
-
-Other related posts are:
-
-2026-02-02 A tmux popup editor for Cursor Agent CLI prompts (You are currently reading this)
-2025-08-05 Local LLM for Coding with Ollama on macOS
-2025-05-02 Terminal multiplexing with tmux - Fish edition
-2024-06-23 Terminal multiplexing with tmux - Z-Shell edition
+RCM does not try to compete with Puppet, Chef, or Ansible on scale. Those tools shine when you manage hundreds or thousands of machines, have multiple teams contributing modules, and need centralised orchestration, reporting, and role‑based access control. They also come with their own DSLs, servers/agents, certificate handling, and a long list of resource types and modules. Ansible may be more similar to RCM than the other tools, but it's still much more complex than RCM.

-Back to the main site
-
-
-
- - Using Supernote Nomad offline - - gemini://foo.zone/gemfeed/2026-01-01-using-supernote-nomad-offline.gmi - 2025-12-31T16:25:30+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - I am a note taker. For years, I've been searching for a good digital device that could complement my paper notebooks. I've finally found it in the Supernote Nomad. I use it completely offline without cloud-sync, and in this post, I'll explain why this is a benefit. - -
-

Using Supernote Nomad offline


+For my personal use cases, that layer is mostly overhead. I want:

-Published at 2025-12-31T16:25:30+02:00
+
    +
  • No extra daemon, message bus, or master node.
  • +
  • No separate DSL to learn besides Ruby itself.
  • +
  • A codebase small enough that I can understand and change all of it in an evening.
  • +
  • Behaviour I can inspect just by reading the Ruby code.
  • +

+In that space RCM wins: it is small, transparent, and tuned for one person (me!) with a handful of personal machines or my Laptops. I still think tools like Puppet are the right choice for larger organisations and shared infrastructure, but RCM gives me a tiny, focused alternative for my own systems.

-I am a note taker. For years, I've been searching for a good digital device that could complement my paper notebooks. I've finally found it in the Supernote Nomad. I use it completely offline without cloud-sync, and in this post, I'll explain why this is a benefit.
+

Cutting RCM 0.1.0



-Supernote Nomad
+As of this post I'm tagging and releasing **RCM 0.1.0**. About 99% of the code has been written by me so far, and before AI agents take over more of the boilerplate and wiring work, it felt like a good moment to cut a release and mark this mostly‑human baseline.

-I initially bought it because Retta (the manufacturer of the Supernote) stated on their website that an open-source Linux firmware would be released soon. However, after over a year, there still hasn't been any progress (hopefully there will be someday). So I looked into alternative ways to use this device.
+Future changes will very likely involve more automated help, but 0.1.0 is the snapshot of the original, hand‑crafted version of the tool.

-
-⣿⣿⣿⣿⣿⣿⡿⠿⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣿⣿⣏⠀⢶⣆⡘⠉⠙⠛⠿⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣿⣿⠋⣤⣄⠘⠃⢠⣀⣀⠀⠀⠀⠀⠀⠉⠉⠛⠛⠿⢿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣿⡿⠀⡉⠻⡟⠀⠈⠉⠙⠛⠷⠶⣦⣤⣄⣀⠀⠀⠀⠀⠀⣾⣿⣿⣿⣿
-⣿⣿⣿⣿⡄⠸⢿⣤⠀⢠⣤⣀⡀⠀⠀⠀⠀⠀⠉⠙⠛⠻⠶⠀⢰⣿⣿⠻⣿⣿
-⣿⣿⣿⣿⠠⣶⣆⡉⠀⠀⠈⠉⠙⠛⠳⠶⠦⣤⣤⣄⣀⡀⢀⣴⠟⠋⠙⢷⣬⣿
-⣿⣿⣿⠏⣠⡄⠹⠁⠰⢶⣤⣤⣀⡀⠀⠀⠀⠀⠀⠉⢉⣿⠟⠁⠀⠀⣠⣾⣿⣿
-⣿⣿⡿⠂⠙⠻⡆⠀⠀⠀⠀⠈⠉⠛⠛⠷⠶⣦⣤⣴⠟⠁⠀⠀⣠⣾⣿⣿⣿⣿
-⣿⣿⡇⠸⣿⣄⠀⠰⠶⢶⣤⣄⣀⡀⠀⠀⠀⣴⣟⠁⠀⠀⣠⣾⣿⣿⣿⣿⣿⣿
-⣿⡟⠀⣶⣀⠃⠀⠀⠀⠀⠀⠈⠉⠙⠛⠓⢾⡟⢙⣷⣤⢾⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⠋⣀⡉⠻⠀⠘⠛⠻⠶⢶⣤⣤⣀⡀⢠⠿⠟⠛⠉⠁⣸⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⡀⠛⠳⠆⠀⠀⠀⠀⠀⠀⠀⠉⠉⠛⠛⠷⠶⣦⠄⢀⣿⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣶⣦⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⣿⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⣶⣤⣤⣀⣀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣶⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
-
+

What's next



-

Table of Contents


+RCM already does what I need on my machines, but there are a few ideas I want to explore:


-

The Joy of Being Offline


-
-I keep my Supernote Nomad offline at all times. No Wi-Fi, no cloud sync, just me and my notes. And honestly, it's great.
-
-With Wi-Fi off, the battery lasts about a week on a single charge (how convenient :-)).
+

Feature overview (for now)



-Privacy was my main concern, though. I don't sync anything to Retta's cloud, so my notes stay mine. No one's reading or mining my stuff. Simple as that.
+Here is a quick overview of what RCM can do today, grouped by area:

-A picture of the Supernote Nomad
+
    +
  • File management: file '/path', manage directory, from template, line '...'
  • +
  • Packages: package 'name' resources for installing and updating packages (currently focused on Fedora/DNF)
  • +
  • Conditions and flow: given { ... } blocks, predicates such as hostname is :earth
  • +
  • Notifications and dependencies: requires between resources, notify for follow‑up actions
  • +
  • Safety and execution modes: backups in .rcmbackup/, --dry runs, debug logging
  • +

+Some small examples adapted from RCM's own tests:

-

My Offline Workflow


+

Template rendering into a file



-My workflow is simple, only relying on a direct USB connection to my Linux laptop.
+ +
configure do
+  file './.file_example.rcmtmp' do
+    from template
+    'One plus two is <%= 1 + 2 %>!'
+  end
+end
+

-I connect my Supernote Nomad to my Linux laptop via a USB-C cable. The device is automatically recognized as a storage device, and I can directly access the Note folder, which contains all my notes as .note files. I then copy these files to a dedicated archive folder on my laptop.
+

Ensuring a line is absent from a file



-

Converting Notes to PDF


+ +
configure do
+  file './.file_example.rcmtmp' do
+    line 'Whats up?'
+    is absent
+  end
+end
+

-To make my notes accessible and shareable, I convert them from the proprietary .note format to PDF. For this, I use a fantastic open-source tool called supernote-tool. It's not an official tool from Ratta, but it works flawlessly.
+

Guarding a configuration run on the current hostname



-https://github.com/jya-dev/supernote-tool
+ +
configure do
+  given { hostname Socket.gethostname }
+  ...
+end
+

-I've created a small shell script to automate the conversion process using tis tool. This script, convert-notes-to-pdfs.sh, resides in my notes archive folder:
+

Creating and deleting directories, and purging a directory tree



-
#!/usr/bin/env bash
-
-convert () {
-  find . -name \*.note \
-    | while read -r note; do
-        echo supernote-tool convert -a -t pdf "$note" "${note/.note/.pdf}"
-        supernote-tool convert -a -t pdf "$note" "${note/.note/.pdf}.tmp"
-        mv "${note/.note/.pdf}.tmp" "${note/.note/.pdf}"
-        du -hs "$note" "${note/.note/.pdf}"
-        echo
-      done
-}
-
-# Make the PDFs available on my Phone as well
-copy () {
-  if [ ! -d ~/Documents/Supernote ]; then
-    echo "Directory ~/Documents/Supernote does not exist, skipping"
-    exit 1
-  fi
-
-  rsync -delete -av --include='*/' --include='*.pdf' --exclude='*' . ~/Documents/Supernote/
-  echo This was copied from $(pwd) so dont edit manually >~/Documents/Supernote/README.txt
-}
+
configure do
+  directory './.directory_example.rcmtmp' do
+    is present
+  end
 
-convert
-copy
+  directory delete do
+    path './.directory_example.rcmtmp'
+    is absent
+  end
+end
 

-This script does two things:
+

Managing file and directory modes and ownership



-
    -
  • It finds all .note files in the current directory and converts them to PDF using supernote-tool.
  • -
  • It copies the generated PDFs to my ~/Documents/Supernote folder.
  • -

-

Syncing to my Phone


+ +
configure do
+  touch './.mode_example.rcmtmp' do
+    mode 0o600
+  end
+
+  directory './.mode_example_dir.rcmtmp' do
+    mode 0o705
+  end
+end
+

-The ~/Documents/Supernote folder on my laptop is synchronized with my phone using Syncthing. This way, I have access to all my notes in PDF format on my phone, wherever I go, without relying on any cloud service.
+

Using a chained, more natural language style for notifications



-https://syncthing.net/
+This will just print out something, not changing anything:

-

Firmware updates


+ +
configure do
+  notify hello dear world do
+    thank you to be part of you
+  end
+end
+

-One usually updates the software or firmware of the Supernote Nomad via Wi-Fi. However, it is also possible to update it completely offline. To install the firmware update, follow the steps below (the following instructions were copied from the Supernote website):
+

Touching files and updating their timestamps



-
    -
  • Connect your Supernote to your PC with a USB-C cable. For macOS, an MTP software (e.g. OpenMTP or Android File Transfer) is required for your Supernote to show up on your Mac.
  • -
  • For Manta, Nomad, A5 X and A6 X devices, copy the firmware (DO NOT UNZIP) to the "Export" folder of Supernote; for A5 and A6 devices, copy the firmware (DO NOT UNZIP) to the root directory of Supernote.
  • -
  • Unplug the USB connection, tap “OK” on your Supernote to continue, and if no prompt pops up, please restart your device directly to proceed to update.
  • -

-

The Writing Experience


+ +
configure do
+  touch './.touch_example.rcmtmp'
+end
+

-The writing feel of the Supernote Nomad is simply great. The combination of the screen's texture and the ceramic nib of the pen creates a feeling that is remarkably close to writing on real paper. The latency is almost non-existent, and the pressure sensitivity allows for a natural and expressive writing experience. It's great to write on, and it makes me want to take more notes.
+

Expressing dependencies between notifications



-Another picture of the Supernote Nomad
+ +
configure do
+  notify foo do
+    requires notify bar and requires notify baz
+    'foo_message'
+  end
+
+  notify bar
+
+  notify baz do
+    requires notify bar
+    'baz_message'
+  end
+end
+

-

Conclusion


+

-The Supernote Nomad has become an additional tool for me. By using it offline, I've created a distraction-free and private note-taking environment. The simple, manual workflow for transferring and converting notes gives me full control over my data, and the writing experience is second to none. If you're looking for a digital notebook that respects your privacy and helps you focus, I highly recommend giving the Supernote Nomad a try with an offline-first approach.
+ +
configure do
+  symlink './.symlink_example.rcmtmp' do
+    manage directory
+    './.symlink_target_example.rcmtmp'
+  end
+end
+

-The Supernote didn't fully replace my traditional paper journals, though. Each of them has its own use case. However, that is outside the scope of this blog post.
+

Detecting duplicate resource definitions at configure time



-Other related posts:
+ +
configure do
+  notify :foo
+  notify :foo # raises RCM::DSL::DuplicateResource
+end
+

-2026-01-01 Using Supernote Nomad offline (You are currently reading this)
-2026-01-01 Cloudless Kobo Forma with KOReader
+If you find RCM interesting, feel free to browse the code, adapt it to your own setup, or just steal ideas for your own Ruby DSLs. I will probably extend it with more features over time as my own needs evolve.

E-Mail your comments to paul@nospam.buetow.org :-)

+Other related posts:
+
+2026-03-02 RCM: The Ruby Configuration Management DSL (You are currently reading this)
+2025-10-11 Key Takeaways from The Well-Grounded Rubyist
+2021-07-04 The Well-Grounded Rubyist
+2016-04-09 Jails and ZFS with Puppet on FreeBSD
+
Back to the main site
- Posts from July to December 2025 - - gemini://foo.zone/gemfeed/2026-01-01-posts-from-july-to-december-2025.gmi - 2025-12-31T15:49:06+02:00 + Site Reliability Engineering - Part 5: System Design, Incidents, and Learning + + gemini://foo.zone/gemfeed/2026-03-01-site-reliability-engineering-part-5.gmi + 2026-03-01T12:00:00+02:00 Paul Buetow aka snonux paul@dev.buetow.org - Hello there, I wish you all a happy new year! These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay. + Welcome to Part 5 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I'm here to share what SRE is all about in this blog series.
-

Posts from July to December 2025


+

Site Reliability Engineering - Part 5: System Design, Incidents, and Learning



-Published at 2025-12-31T15:49:06+02:00
+Published at 2026-03-01T12:00:00+02:00

-Hello there, I wish you all a happy new year! These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.
+Welcome to Part 5 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I'm here to share what SRE is all about in this blog series.

-These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.
+2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
+2023-11-19 Site Reliability Engineering - Part 2: Operational Balance
+2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture
+2024-09-07 Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers
+2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning (You are currently reading this)

-My about page
-https://codeberg.org/snonux/gos
+
+    ___
+   /   \     resilience
+  |  o  |  <----------  learning
+   \___/
+
+
+This time I want to share some themes that build on what we've already covered: how system design and incident analysis fit together, why observability should not be an afterthought, and how a design‑improvement loop keeps systems getting better. Let's dive in!

Table of Contents




-

July 2025


-
-

In #Golang, values are actually copied when ...


-
-In #Golang, values are actually copied when assigned (boxed) into an interface. That can have performance impact.
-
-goperf.dev/01-common-patterns/interface-boxing/
-
-

Same experiences I had, but it's a time saver. ...


-
-Same experiences I had, but it's a time saver. and when done correctly, those tools are amazing: #llm #coding #programming
-
-lucumr.pocoo.org/2025/06/21/my-first-ai-library/
-
-

We (programmers) all use them (I hope): ...


+

System Design and Incident Analysis



-We (programmers) all use them (I hope): language servers. LSP stands for Language Server Protocol, which standardizes communication between coding editors or IDEs and language servers, facilitating features like autocompletion, refactoring, linting, error-checking, etc.... It's interesting to look under the hood a little bit to see how your code editor actually communicates with a language server. #LSP #coding #programming
+In my experience, a big chunk of SRE work revolves around system design and incident analysis. The thing that really matters is whether your system can contain cascading failures—because if it can't, one bad component can take everything down.

-packagemain.tech/p/understanding-the-language-server-protocol
+

Resilience and cascading failures



-

Shells of the early unices didnt understand ...


+What I've seen work well is thinking about resilience early—at design time, not after the first outage. You look for the weak points, address them before production, and try to keep the blast radius small when (not if) something fails.

-Shells of the early unices didnt understand file globbing, that was done by the external glob command! #unix #history #shell
+

Learning from incidents



-utcc.utoronto.ca/%7Ecks/space/blog/unix/EtcGlobHistory
+When incidents do happen, their analysis is a goldmine. Every incident exposes gaps—whether in tooling (ops tools that aren't up to the job) or in skills (engineers missing critical know-how). Blaming "human error" doesn't help. The job is to dig into root causes and fix the system. Postmortems that focus on customer impact help us distil lessons and make the system more robust so we're less likely to repeat the same failure.

-

I've picked up a few techniques from this blog ...


+System design and incident analysis form a feedback loop: we improve the design based on what we learn from incidents, and a better design reduces the impact of the next one.

-I've picked up a few techniques from this blog post and found them worth sharing here: #ai #llm #prompting #techniques
+

Observability: Don't leave it for when it's too late



-cracking-ai-engineering.com/writing/2025/07/07/four-prompting-paradigms/
+Here's something I've seen over and over: teams agree that "we need better observability" when they're already in the middle of an incident—and by then it's too late. Observability is always an afterthought compared to product features. But you really need it in place before things go wrong. Tools that can query high-cardinality data and give you granular insight into what's happening—that's what saves you when chaos hits. So invest in it early. Trust me on this one.

-

I've published the sixth part of my "Kubernetes ...


+

The iterative spirit



-I've published the sixth part of my "Kubernetes with FreeBSD" blog series. This time, I set up the storage, which will be used with persistent volume claims later on in the Kubernetes cluster. Have a lot of fun! #freebsd #nfs #ha #zfs #zrepl #carp #kubernetes #k8s #k3s #homelab
+We also accept that system design is never "done." We refine it based on real-world performance, incident learnings, and changing needs. Every incident is a chance to learn and improve; the emphasis is on learning, not blame. SREs work with developers, backend teams, and incident response so that the whole system keeps getting better. It's never perfect, but that's kind of the point.

-foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.gmi (Gemini)
-foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.html
+

Book tips



-

The book "Coders at Work" offers a fascinating ...


+If you want to go deeper, here are a few books I can recommend:

-The book "Coders at Work" offers a fascinating glimpse into how programming legends emerged in the early days of computing. I especially enjoyed the personal stories and insights. It would be great to see a new edition reflecting today’s AI and LLM revolution—so much has changed since!
+
    +
  • 97 Things Every SRE Should Know: Collective Wisdom from the Experts by Emily Stolarsky and Jaime Woo
  • +
  • Site Reliability Engineering: How Google Runs Production Systems by Jennifer Petoff, Niall Murphy, Betsy Beyer, and Chris Jones
  • +
  • Implementing Service Level Objectives by Alex Hidalgo
  • +

+E-Mail your comments to paul@nospam.buetow.org :-)

-www.goodreads.com/book/show/6713575-coders-at-work
-
-

For me, that's all normal. Couldn't imagine a ...


-
-For me, that's all normal. Couldn't imagine a simpler job. #software
-
-0x1.pt/2025/04/06/the-insanity-of-being-a-software-engineer/
-
-

This is similar to my #dtail project. It got ...


-
-This is similar to my #dtail project. It got some features, which dtail doesnt, and dtail has some features, which #nerdlog hasnt. But the principle is the same, both tools don't have a centralised log store and both use SSH to connect to the servers (sources of the logs) directly.
+Back to the main site
+
+
+
+ + Loadbars 0.13.0 released + + gemini://foo.zone/gemfeed/2026-03-01-loadbars-0.13.0-released.gmi + 2026-03-01T00:00:00+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Loadbars is a real-time server load monitoring tool. It connects to one or more Linux hosts via SSH and shows CPU, memory, network, load average, and disk I/O as vertical colored bars in an SDL window. You can run it locally or point it at your servers and see what's happening right now — like `top` or `vmstat`, but visual and across multiple hosts at once. + +
+

Loadbars 0.13.0 released



-github.com/dimonomid/nerdlog
+Published at 2026-03-01T00:00:00+02:00

-

I also feel the most comfortable in the ...


+Loadbars is a real-time server load monitoring tool. It connects to one or more Linux hosts via SSH and shows CPU, memory, network, load average, and disk I/O as vertical colored bars in an SDL window. You can run it locally or point it at your servers and see what's happening right now — like top or vmstat, but visual and across multiple hosts at once.

-I also feel the most comfortable in the #terminal. There are a few high-level tools where it doesn't make always a lot of sense like web-browsing most of the web, but for most of the things I do, I prefer the terminal. I think it's a good idea to have a terminal-based interface for most of the things you do. It makes it easier to automate things and to work with other tools.
+Loadbars in action

-lambdaland.org/posts/2025-05-13_real_programmers/
+Loadbars can connect to hundreds of servers in parallel; the GIF above doesn't do it justice — at scale you get a wall of bars that makes it easy to spot outliers and compare hosts at a glance.

-

I have been enjoying lately as an alternative ...


+Loadbars on Codeberg

-I have been enjoying lately as an alternative TUI to Claude Code CLI. It is a 100% open-source agentic coding tool, which supports all models from including local ones (e.g. DeepSeek), and has got some nice tweaks like side-by-side diffs and you can also use your favourite text $EDITOR for prompt editing! Highly recommend! #llm #coding #programming #agentic #ai
+

Table of Contents



-opencode.ai
-models.dev
+
+

What Loadbars is (and isn't)



-

Jonathan's reflection of 10 years of ...


+Loadbars shows the current state only. It is not a tool for collecting loads and drawing graphs for later analysis. There is no history, no recording, no database. Tools like Prometheus or Grafana require significant setup before producing results. Loadbars lets you observe the current state immediately: one binary, SSH (or local), and you're done.

-Jonathan's reflection of 10 years of programming!
+
+┌─ Loadbars 0.13.0 ─────────────────────────────────────────┐
+│                                                           │
+│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
+│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
+│  ████  ████  ████  ██  ████  ████  ████  ██  ░░██  ░░██   │
+│   CPU   cpu0  cpu1  mem  CPU   cpu0  cpu1  mem  net   net │
+│  └──── host1 ────┘      └──── host2 ────┘                 │
+└───────────────────────────────────────────────────────────┘
+

-jonathan-frere.com/posts/10-years-of-programming/
+

Use cases



-

Some neat zero-copy #Golang tricks here ...


+
    +
  • Deployments and rollouts: watch CPU, memory, and network across app servers or nodes while you deploy. Spot the one that isn't coming up or is stuck under load.
  • +
  • Load testing: run your load tool against a cluster and see which hosts (or cores) are saturated, whether memory or disk I/O is the bottleneck, and how load spreads.
  • +
  • Quick health sweep: no dashboards set up yet? SSH to a handful of hosts and run Loadbars. You get an instant picture of who's busy, who's idle, and who's swapping.
  • +
  • Comparing hosts: side-by-side bars make it easy to see if one machine is hotter than the rest (e.g. after a config change or migration).
  • +
  • Local tuning: run loadbars --hosts localhost while you benchmark or stress a single box; the bars and load-average view help correlate activity with what you're doing.
  • +

+

What's new since the Perl version



-Some neat zero-copy #Golang tricks here
+The original Loadbars (Perl + SDL, ~2010–2013) had CPU, memory, network, ClusterSSH, and a config file. The Go rewrite and subsequent releases added the following. Why each one matters:

-goperf.dev/01-common-patterns/zero-copy/
+
    +
  • Load average bars: the Perl version had no load average. Now you get 1/5/15-minute load per host. Useful because load average is the classic "how queued is this box" signal — you see saturation and trends at a glance without reading numbers.
  • +

+
    +
  • Disk I/O bars: disk was invisible in the Perl version. You now get read/write throughput (and optionally utilization %) per host or per device. Whole-disk devices only (partitions, loop, ram, zram, and device-mapper are excluded). Useful when you need to tell "is this slow because of CPU or because of disk?" — especially with many hosts, one disk-heavy host stands out. Disk smoothing (config diskaverage, hotkeys b/x) lets you tune how much the bars are averaged.
  • +

+
    +
  • Extended peak line on CPU: a 1px line shows max system+user over the last N samples. Useful to see short spikes that the stacked bar might smooth out, so you don't miss bursty load.
  • +

+
    +
  • Tooltips and host highlight: hover the mouse over any bar to see a tooltip with exact values (CPU %, memory, network, load, or disk depending on bar type). The hovered host's bars are highlighted (inverted) so you can tell which host you're over. Useful when you have hundreds of bars and want to read a specific number or confirm which host a bar belongs to.
  • +

+
    +
  • GuestNice in CPU bars: CPU bars now show GuestNice as a lime green segment (above Nice). One more breakdown for virtualized or container workloads.
  • +

+
    +
  • Version in window title: the default SDL title is "Loadbars <version> (press h for help on stdout)". Override with --title when you need a custom label.
  • +

+
    +
  • Global average CPU line (key g): a single red line across all hosts at the fleet-average CPU. Useful when you have hundreds of bars: you instantly see which hosts are above or below average without comparing bar heights in your head.
  • +

+
    +
  • Global I/O average line (key i): same idea for iowait+IRQ. Useful to spot which hosts are waiting on I/O more than the rest — quick way to find the disk-bound or interrupt-heavy machines.
  • +

+
    +
  • Host separator lines (key s): a thin red vertical line between each host's bars. Useful at scale so you don't lose track of where one host ends and the next begins when the window is full of bars.
  • +

+
    +
  • Scale reset (key r): reset the auto-scale for load and disk back to the floor. Useful after a big spike so the bars don't stay compressed for the rest of the session.
  • +

+
    +
  • Toggle CPU off (key 1 cycles through aggregate → per-core → off): the Perl version didn't let you turn CPU bars off. Useful when you want to focus only on memory, network, load, or disk and reduce clutter.
  • +

+
    +
  • maxbarsperrow: wrap bars into multiple rows instead of one long row. Useful with many hosts so the window doesn't become impossibly wide; you get a grid and can still scan everything.
  • +

+
    +
  • maxwidth: cap on window width in pixels (default 1900). Stops the window growing unbounded with many hosts; use together with maxbarsperrow for a predictable layout.
  • +

+
    +
  • Startup visibility flags: --showmem, --shownet, --showload, --extended, --cpumode, --diskmode (and friends) let you start with the bars you care about already on. Useful so you don't have to press 2, 3, 4, 5 every time.
  • +

+
    +
  • Window title (--title): set the SDL window title. Useful when you run several Loadbars windows (e.g. one per cluster or environment) and need to tell them apart in your taskbar or window list.
  • +

+
    +
  • SSH options (--sshopts): pass extra flags to ssh (e.g. ConnectTimeout, ProxyJump). Useful on locked-down or jump-host setups so Loadbars works without changing your global SSH config for a one-off session.
  • +

+
    +
  • hasagent: skip extra SSH agent checks when you know the key is already loaded. Useful to avoid startup delay or warnings when you've already run ssh-add and are monitoring many hosts.
  • +

+
    +
  • Config file covers every option: any flag from --help can be set in ~/.loadbarsrc (no leading --). Perl had a config but the Go version supports the full set. Useful for reproducible setups and sharing.
  • +

+
    +
  • Positional host arguments: you can run loadbars server1 server2 without --hosts. Convenience when you only have a few hosts.
  • +

+
    +
  • macOS as client: run the Loadbars binary on a Mac and connect to Linux servers via SSH. The Perl version was Linux-only. Useful to watch production from a laptop without a Linux VM or second machine.
  • +

+
    +
  • Single static binary: no Perl runtime, no SDL Perl modules, no CPAN. Useful for deployment — copy one file to a jump host or new machine and run it.
  • +

+
    +
  • Unit tests: mage test (or go test). The Go version has proper tests; useful for development and catching regressions.
  • +

+
    +
  • Window resize (arrow keys): resize the window with the keyboard (left/right = width, up/down = height). Useful to fit more or fewer bars on screen without touching the mouse. (The Perl version had mouse-based resize; Go uses arrow keys.)
  • +

+
    +
  • Hundreds of hosts in parallel: the Go implementation connects to all hosts concurrently and keeps polling without blocking. The Perl version struggled with many hosts. Useful for large fleets; you get a real "wall of bars" instead of a subset.
  • +

+

Core features



-

What was it like working at GitLab? A scary ...


+

Load average bars



-What was it like working at GitLab? A scary moment was the deletion of the gitlab.com database, though fortunately, there was a six-hour-old copy on the staging server. More people don't necessarily produce better results. Additionally, Ruby's metaprogramming isn't ideal for large projects. A burnout. And many more insights....
+Press 4 or l to toggle. Each host gets a bar: teal fill (1-min load), yellow 1px line (5-min), white 1px line (15-min). Scale: auto (floor 2.0) or fixed with --loadmax N. Press r to reset auto-scale.

-yorickpeterse.com/articles/what-it-was-like-working-for-gitlab/
+

Disk I/O bars



-

I have learned a lot from the Practical #AI ...


+Press 5 to toggle: aggregate (all whole-disk devices per host) → per-device → off. Partitions, loop, ram, zram, and device-mapper are excluded. Purple fill from top = read, darker purple from bottom = write. Extended mode (e) adds a 3px disk-utilization line. Config: diskmode, diskmax, diskaverage. b/x change disk average samples.

-I have learned a lot from the Practical #AI #podcast, especially from episode 312, which discusses the #MCP (model context protocol). Are there any MCP servers you plan to use or to build?
+

Global reference lines and options



-practicalai.fm/312
+g: global average CPU line (1px red). i: global I/O average line (1px pink). s: host separator lines (1px red). Other options: --maxbarsperrow N, --title, --sshopts, --hasagent. Hotkeys m/n mirror 2/3 for memory and network. Hover over a bar for a tooltip with exact values and host highlight.

-

August 2025


+

CPU monitoring



-

At the end of the article it's mentione that ...


+CPU usage as vertical stacked bars: System (blue), User (yellow), Nice (green), GuestNice (lime green), Idle (black), IOwait (purple), IRQ/SoftIRQ (white), Guest/Steal (red). Press 1 for aggregate vs. per-core. Press e for extended mode (1px peak line: max system+user over last N samples).

-At the end of the article it's mentione that it's difficult to stay in the zone when AI does the coding for you. I think it's possible to stay in the zon, but only when you use AI surgically. #llm #ai #programming
+

Memory and network



-newsletter.pragmaticengineer.com/p/cur..-..email=true&r=4ijqut&triedRedirect=true
+
    +
  • 2 / m: memory — left half RAM (dark grey/black), right half Swap (grey/black) per host
  • +
  • 3 / n: network — RX (top, light green) and TX (bottom) summed over non-loopback interfaces. Red bar = no non-lo interface. Use --netlink or f/v for link speed (utilization %). Default gbit.
  • +

+

All hotkeys



-

Great blog post a out #OpenBSDAmsterdam, of ...


+
+Key     Action
+─────   ──────────────────────────────────────────────────
+1       Toggle CPU (aggregate / per-core / off)
+2 / m   Toggle memory bars
+3 / n   Toggle network bars
+4 / l   Toggle load average bars
+5       Toggle disk I/O (aggregate / per-device / off)
+r       Reset load and disk auto-scale peaks
+e       Toggle extended (peak line on CPU; disk util line)
+g       Toggle global average CPU line
+i       Toggle global I/O average line
+s       Toggle host separator lines
+h       Print hotkey list to stdout
+q       Quit
+w       Write current settings to ~/.loadbarsrc
+a / y   CPU average samples up / down
+d / c   Net average samples up / down
+b / x   Disk average samples up / down
+f / v   Link scale up / down
+Arrows  Resize window
+

-Great blog post a out #OpenBSDAmsterdam, of which I am a customer too for some years now. #OpenBSD
+

SSH and config



-www.tumfatig.net/2025/cruising-a-vps-at-openbsd-amsterdam/
+Connect with public key auth; hosts need bash and /proc (Linux). No agent needed on the remote side.

-

Interesting. #llm #ai #slowdown ...


+
+loadbars --hosts server1,server2,server3
+loadbars --hosts root@server1,root@server2
+loadbars servername{01..50}.example.com --showcores 1
+loadbars --cluster production
+

-Interesting. #llm #ai #slowdown
+Config: ~/.loadbarsrc (key=value, no --; use # for comments). Any --help option. Press w to save current settings.

-m.slashdot.org/story/444304
+

Building and platforms



-

With the help of genai, I could generate this ...


+Go 1.25+ and SDL2. Install SDL2 (e.g. sudo dnf install SDL2-devel on Fedora, brew install sdl2 on macOS), then:

-With the help of genai, I could generate this neat small showcase site, of many of my small to medium sized side projects. The projects descriptions were generated by Claude Code CLI with Sonnet 4 based on the git repo contents. The page content by gitsyncer, a tool I created (listed on the showcase page as well) and gemtexter, which did the HTML generation part (another tool I wrote, listed on the showcase page as well). The stats seem neat, over time a lot of stuff starts to pile up! With the age of AI (so far, only 8 projects were created AI-assisted), I think more projects will spin up faster (not just for me, but for everyone working on side projects). I have more (older) side projects archived on my local NAS, but they are not worth digging out... 📦 Total Projects: 55 📊 Total Commits: 10,379 📈 Total Lines of Code: 252,969 📄 Total Lines of Documentation: 24,167 💻 Languages: Java (22.4%), Go (17.6%), HTML (14.0%), C++ (8.9%), C (7.3%), Perl (6.3%), Shell (6.3%), C/C++ (5.8%), XML (4.6%), Config (1.5%), Ruby (1.1%), HCL (1.1%), Make (0.7%), Python (0.6%), CSS (0.6%), JSON (0.3%), Raku (0.3%), Haskell (0.2%), YAML (0.2%), TOML (0.1%) 📚 Documentation: Text (47.4%), Markdown (38.4%), LaTeX (14.2%) 🤖 AI-Assisted Projects: 8 out of 55 (14.5% AI-assisted, 85.5% human-only) 🚀 Release Status: 31 released, 24 experimental (56.4% with releases, 43.6% experimental) #llm #genai #showcase #coding #programming
+
+mage build
+./loadbars --hosts localhost
+mage install   # to ~/go/bin
+mage test
+

-foo.zone/about/showcase.gmi (Gemini)
-foo.zone/about/showcase.html
+Tested on Fedora Linux 43 and common distros; macOS as client to remote Linux only (no local macOS monitoring — no /proc).

-

I tinkered a bit with local LLMs for coding: ...


+E-Mail your comments to paul@nospam.buetow.org :-)

-I tinkered a bit with local LLMs for coding: #llm #local #ai #coding #ollama #qwen #deepseek #HelixEditor #LSP #codecompletion #aider
+Back to the main site
+
+
+
+ + My desk rack: DeskPi RackMate T0 + + gemini://foo.zone/gemfeed/2026-02-22-my-desk-rack.gmi + 2026-02-21T11:17:15+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + On my desk sits a small rack that keeps audio gear, power, and network in one place: the DeskPi RackMate T0. Here's what lives in it and how it's wired. + +
+

My desk rack: DeskPi RackMate T0



-foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.gmi (Gemini)
-foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.html
+Published at 2026-02-21T11:17:15+02:00

-

Good stuff: 10 years of functional options and ...


-
-Good stuff: 10 years of functional options and key lessons Learned along the way #golang
-
-www.bytesizego.com/blog/10-years-functional-options-golang
-
-

Top 5 performance boosters #golang ...


-
-Top 5 performance boosters #golang
-
-blog.devtrovert.com/p/go-performance-boosters-the-top-5
-
-

This person found the balance.. although I ...


-
-This person found the balance.. although I would use a different code editor: Why Open Source Maintainers Thrive in the LLM Era via @wallabagapp #ai #llm #coding #programming
-
-mikemcquaid.com/why-open-source-maintainers-thrive-in-the-llm-era/
-
-

Let's rewrite all slow in #assembly, surely ...


-
-Let's rewrite all slow in #assembly, surely it's not just about the language but also about the architecture and the algorithms used. Still, impressive.
-
-x.com/FFmpeg/status/1945478331077374335
-
-

How to store data forever? #storage ...


-
-How to store data forever? #storage #archiving
-
-drewdevault.com/2020/04/22/How-to-store-data-forever.html
-
-

No wonder, that almost everyone doing something ...


-
-No wonder, that almost everyone doing something with AI is releasing their own aentic coding tool now. As it's so dead simple to write one. #ai #llm #agenticcoding
-
-ampcode.com/how-to-build-an-agent
-
-

Another drawback of running load tests in a ...


-
-Another drawback of running load tests in a pre-prod environment is that it is not always possible to reproduce production load, especially in a complex environment. I personally prefer a combination of pre-prod load testing, production canaries, and gradual production deployment. What are your thoughts? #sre #loadtesting #lt #loadtesting
-
-thefridaydeploy.substack.com/p/load-testing-prepare-for-the-growth
-
-

Interesting read Learnings from two years of ...


-
-Interesting read Learnings from two years of using AI tools for software engineering #ai #llm #genai
-
-newsletter.pragmaticengineer.com/p/two-years-of-using-ai
-
-

Neat little story a school girl writing her ...


-
-Neat little story a school girl writing her first (and only) malware and have it infected her school.
+
+    ┌─────────────────┐
+    │   ●  ●  AIR     │  ← air-quality monitor
+    ├─────────────────┤
+    │  ╔═╗  CD        │  ← CD transport
+    │  ║ ◉║  S/PDIF   │
+    │  ╚═╝            │
+    ├─────────────────┤
+    │  ▓▓▓  USB PWR   │  ← PinePower
+    ├─────────────────┤
+    │  ░░░  (phones)  │  ← 1U "empty" shelf
+    ├─────────────────┤
+    │  ◉◉◉◉◉  LAN     │  ← 5-port switch
+    ├─────────────────┤
+    │  [E50] [L50]    │  ← DAC + AMP
+    │   DAC   AMP     │
+    └─────────────────┘
+         RackMate T0
+

-ntietz.com/blog/that-time-i-wrote-malware/
+On my desk sits a small rack that keeps audio gear, power, and network in one place: the DeskPi RackMate T0. Here's what lives in it and how it's wired.

-

Happy, that I am not yet obsolete! #llm ...


+DeskPi RackMate T0

-Happy, that I am not yet obsolete! #llm #sre
+DeskPi RackMate T0 on the desk

-clickhouse.com/blog/llm-observability-challenge
+

Table of Contents



-

September 2025


+
+

What's in the rack (top to bottom)



-

Loving this as well: #slackware #linux ...


+

Top: CD transport and air-quality monitor



-Loving this as well: #slackware #linux
+At the top is the S.M.S.L PL200T, a CD transport with anti-vibration design. It outputs digital audio over coaxial S/PDIF into the DAC in the rack. On top of the transport sits a small air-quality monitor so I can keep an eye on the room.

-www.osnews.com/story/142145/what-makes-slackware-different/
+S.M.S.L PL200T CD Transport

-

Some #fun: Random Weird Things Part III blog ...


+CD transport and air-quality monitor on top

-Some #fun: Random Weird Things Part III blog post
+A CD transport is not the same as a CD player. A CD player has a built-in DAC (digital-to-analog converter) and outputs analogue audio—you plug it into an amp or active speakers and you're done. A CD transport only reads the disc and outputs a digital signal (e.g. coaxial or optical S/PDIF). It has no DAC. You feed that digital stream into an external DAC, which then does the conversion. The idea is to separate the mechanical part (spinning the disc, reading the pits) from the conversion stage, so you can use one DAC for CDs, streaming, and other sources, and upgrade or swap the transport and the DAC independently.

-foo.zone/gemfeed/2025-08-15-random-weird-things-iii.gmi (Gemini)
-foo.zone/gemfeed/2025-08-15-random-weird-things-iii.html
+In the age of streaming and files, putting on a real CD is still a pleasure. You own the disc and the sound isn't at the mercy of a subscription or a server. You pick an album, put it in, and listen from start to finish—no endless scrolling, no algorithm. The format is fixed (16-bit/44.1 kHz), so what you hear is consistent and often better than heavily compressed streams. And there's something satisfying about the ritual: handling the case, the disc, and the artwork instead of tapping a screen.

-

Yes, write more useless software. I agree that ...


+

Power and charging: PinePower Desktop + 1U shelf



-Yes, write more useless software. I agree that play has a vital role in learning and experimentation. Also, programming is a lot of fun this way. I've learned programming mostly by writing useless software or almost useful tools for myself, but I can now apply all that knowledge to real work as well. #coding #programming
+Below that is the PinePower Desktop from Pine64, used as a desktop power and USB charging station for phones and other devices. The rack has one free 1U space under the PinePower where I put the devices that are charging, so cables and gadgets stay in one spot.

-ntietz.com/blog/write-more-useless-software/
+PinePower Desktop (Pine64)

-

I learned a lot from this #OpenBSD #relayd ...


+

Network: 5-port mini switch



-I learned a lot from this #OpenBSD #relayd talk, and I already put the information into production! I know the excellent OpenBSD manual pages document everything, but it is a bit different when you see it presented in a talk.
+Next is a compact 5-port Ethernet switch. The uplink goes to a wall socket behind the desk; the other ports feed the computer, laptop, and anything else that needs wired LAN on the desk. Next to the switch you can see my Nothing ear buds.

-www.youtube.com/watch?v=yW8QSZyEs6E
+Nothing ear buds

-

Six weeks of claude code


+

Bottom: DAC and headphone amp



-blog.puzzmo.com/posts/2025/07/30/six-weeks-of-claude-code/
+At the bottom of the rack are the Topping E50 (DAC) and Topping L50 (headphone amplifier). The E50 converts digital to analogue; the L50 drives the headphones. They drive my Hifiman Sundara headphones.

-

It's good that there is now a truly open-source ...


+Topping E50 DAC
+Topping L50 Headphone Amplifier
+Hifiman Sundara

-It's good that there is now a truly open-source LLM model; I'm just wondering how it will perform. The difference compared to other open models is that the others only provide open weights, but you can't reproduce the training! That issue would be solved with this Swiss model. I will definitively have a look! #llm #opensource #privacy
+

Music sources



-m.slashdot.org/story/446310
+
    +
  • CD transport: coaxial (S/PDIF) from the S.M.S.L PL200T into the Topping E50.
  • +
  • Streaming: USB from the desktop computer and/or laptop on the desk into the E50, so I can play from either machine.
  • +

+

Left side: cable management



-

Have to try this at some point ...


+On the left of the rack are two cable holders to keep power and signal cables tidy.

-Have to try this at some point, troubleshooting #k8s with the help of #genai
+

Next to the rack



-blog.palark.com/k8sgpt-ai-troubleshooting-kubernetes/
+Right beside the rack is my Supernote Nomad, which I use for notes and reading and have written about elsewhere on this blog. It’s the small tablet-shaped device on the right side of the rack.

-

I could not agree more. For me, a personal ...


+Supernote Nomad (small tablet on the right of the rack)
+Supernote Nomad (product page)

-I could not agree more. For me, a personal (tech oriented) website is not a business contact card, but a playground to experience and learn with/about technologies. The Value of a Personal Site #website #personal #tech
+Front view of the rack
+Back of the rack

-atthis.link/blog/2021/personalsite.html
+

Bedside: another HiFi setup



-

The true enterprise developer can write Java in ...


+I have a second setup for high-res listening next to my bed. On the nightstand sit my FiiO K13 R2R (an R2R DAC/amp) and my Denon AH-D9200 headphones. I connect the K13 to my laptop via USB and use it for high-resolution files and streaming when I'm not at the desk.

-The true enterprise developer can write Java in any language. #java #programming
+Fiio K13 R2R
+Denon AH-D9200

-

#fx is a neat little tool for viewing JSON ...


+That's the full desk rack: CD transport and air monitor on top, PinePower and charging shelf, switch, then Topping E50 and L50 at the bottom, with the Hifiman Sundara as the main output and the Supernote Nomad sitting next to it. I hope that you found this interesting.

-#fx is a neat little tool for viewing JSON files!
+E-Mail your comments to paul@nospam.buetow.org :-)
+
+
+
+ + A tmux popup editor for Cursor Agent CLI prompts + + gemini://foo.zone/gemfeed/2026-02-02-tmux-popup-editor-for-cursor-agent-prompts.gmi + 2026-02-01T20:24:16+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + I spend some time in Cursor Agent (the CLI version of the Cursor IDE, I don't like really the IDE), and I also jump between Claude Code CLI, Ampcode, Gemini CLI, OpenAI Codex CLI, OpenCode, and Aider just to see how things are evolving. But for the next month I'll be with Cursor Agent. + +
+

A tmux popup editor for Cursor Agent CLI prompts



-fx.wtf
+Published at 2026-02-01T20:24:16+02:00

-

I wish I had as much time as this guy. He ...


+...and any other TUI based application

-I wish I had as much time as this guy. He writes entire operating systems, including a Unix clone called "Bunnix" in a month. He is also the inventor of the Hare programming language (If I am not wrong). Now, he is also creating a new shell, primarily for his other operating systems and kernels he is working on. #shell #unix #programming #operatingsystem #bunnix #hare
+

Table of Contents



-drewdevault.com/2023/04/18/2023-04-18-A-new-shell-for-Unix.html
+
+

Why I built this



-

What exactly was the point of [ “x$var” = ...


+I spend some time in Cursor Agent (the CLI version of the Cursor IDE, I don't like really the IDE), and I also jump between Claude Code CLI, Ampcode, Gemini CLI, OpenAI Codex CLI, OpenCode, and Aider just to see how things are evolving. But for the next month I'll be with Cursor Agent.

-What exactly was the point of [ “x$var” = “xval” ]? #bash #shell #posix #sh #history
+https://cursor.com/cli

-www.vidarholen.net/contents/blog/?p=1035
+Short prompts are fine in the inline input, but for longer prompts I want a real editor: spellcheck, search/replace, multiple cursors, and all the Helix muscle memory I already have.

-

Neat #ZFS feature (here #FreeBSD) which I ...


+Cursor Agent has a Vim editing mode, but not Helix. And even in Vim mode I can't use my full editor setup. I want the real thing, not a partial emulation.

-Neat #ZFS feature (here #FreeBSD) which I didn't know of before: Pool snapshots, which are different to snapshots of individual data sets:
+https://helix-editor.com
+https://www.vim.org
+https://neovim.io

-it-notes.dragas.net/2024/07/01/enhanci..-..d-stability-with-zfs-pool-checkpoints/
+So I built a tiny tmux popup editor. It opens $EDITOR (Helix for me), and when I close it, the buffer is sent back into the prompt. It sounds simple, but it feels surprisingly native.

-

Longer hours help only short term. About 40 ...


+This is how it looks like:

-Longer hours help only short term. About 40 hours #productivity
+Popup editor in action

-thesquareplanet.com/blog/about-40-hours/
+

What it is



-

You could also use #bpf instead of #strace, ...


+The idea is straightforward:

-You could also use #bpf instead of #strace, albeit modern strace uses bpf if told so: How to use the new Docker Seccomp profiles
+
    +
  • A tmux key binding prefix-e opens a popup overlay near the bottom of the screen.
  • +
  • The popup starts $EDITOR on a temp file.
  • +
  • When I exit the editor, the script sends the contents back to the original pane with tmux send-keys.
  • +

+It also pre-fills the temp file with whatever is already typed after Cursor Agent's prompt, so I can continue where I left off.

-blog.jessfraz.com/post/how-to-use-new-docker-seccomp-profiles/
+

How it works (overview)



-

Some great things are approaching #bhyve on ...


+This is the tmux binding I use (trimmed to the essentials):

-Some great things are approaching #bhyve on #FreeBSD and VM Live Migration – Quo vadis? #freebsd #virtualization #bhyve
+
+bind-key e run-shell -b "tmux display-message -p '#{pane_id}'
+  > /tmp/tmux-edit-target-#{client_pid} \;
+  tmux popup -E -w 90% -h 35% -x 5% -y 65% -d '#{pane_current_path}'
+  \"~/scripts/tmux-edit-send /tmp/tmux-edit-target-#{client_pid}\""
+

-gyptazy.com/bhyve-on-freebsd-and-vm-live-migration-quo-vadis/
+

Workflow diagram



-

Another synchronization tool part of the ...


+This is the whole workflow:

-Another synchronization tool part of the #golang std lib, singleflight! Used to not overload external resources (like DBs) with N concurrent requests. Useful!
+
+┌────────────────────┐   ┌───────────────┐   ┌─────────────────────┐   ┌─────────────────────┐
+│ Cursor input box   │-->| tmux keybind  │-->| popup runs script   │-->| capture + prefill   │
+│ (prompt pane)      │   │ prefix + e    │   │ tmux-edit-send      │   │ temp file           │
+└────────────────────┘   └───────────────┘   └─────────────────────┘   └─────────────────────┘
+                                                                                 |
+                                                                                 v
+┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐   ┌────────────────────┐
+│ Cursor input box   │<--| send-keys back     |<--| close editor+popup |<--| edit temp file     |
+│ (prompt pane)      │   │ to original pane   │   │ (exit $EDITOR)     │   │ in $EDITOR         │
+└────────────────────┘   └────────────────────┘   └────────────────────┘   └────────────────────┘
+

-victoriametrics.com/blog/go-singleflight/index.html
+And this is how it looks like after sending back the text to the Cursor Agent's input:

-

Too many open files #linux ...


+Prefilled prompt text

-Too many open files #linux
+And here is the full script. It is a bit ugly since it's shell (written with Cursor Agent with GPT-5.2-Codex), and I might (let) rewrite it in Go with proper unit tests, config-file, multi-agent support and release it once I have time. But it works well enough for now.

-mattrighetti.com/2025/06/04/too-many-files-open.html
+Update 2026-02-08: This functionality has been integrated into the hexai project (https://codeberg.org/snonux/hexai) with proper multi-agent support for Cursor Agent, Claude Code CLI, and Ampcode. The hexai version includes unit tests, configuration files, and better agent detection. While still experimental, it's more robust than this shell script. See the hexai-tmux-edit command for details.

-

Just posted Part 4 of my #Bash #Golf ...


+https://codeberg.org/snonux/hexai

-Just posted Part 4 of my #Bash #Golf series:
-
-foo.zone/gemfeed/2025-09-14-bash-golf-part-4.gmi (Gemini)
-foo.zone/gemfeed/2025-09-14-bash-golf-part-4.html
-
-

#Perl is like a swiss army knife, as one of ...


-
-#Perl is like a swiss army knife, as one of the comments states:
-
-developers.slashdot.org/story/25/09/14..-..10th-most-popular-programming-language
-
-

Personally, mainly working with colorless ...


-
-Personally, mainly working with colorless languages like #ruby and #golang, now slowly understand the pain ppl would have w/ Rust or JS. It wasn't just me when I got confused writing that Grafana DS plugin in TypeScript...
-
-jpcamara.com/2024/07/15/ruby-methods-are.html
-
-

How do GPUs work? Usually, people only know ...


-
-How do GPUs work? Usually, people only know about CPUs... ... I got the gist, but #gpu #cpu
-
-blog.codingconfessions.com/p/gpu-computing
-
-

For unattended upgrades you must have a good ...


-
-For unattended upgrades you must have a good testing (or canary) strategy. #sre #reliability #downtime #ubuntu #systemd #kubernetes
-
-newsletter.pragmaticengineer.com/p/why-reliability-is-hard-at-scale
-
-

Surely, in the age of #AI and #LLM, people ...


-
-Surely, in the age of #AI and #LLM, people are not writing as much code manually as before, but I don't think skills like using #Vim (or #HelixEditor) are obsolete just yet. You still need to understand what's happening under the hood, and being comfortable with these tools can make you much more efficient when you do need to edit or review code.
-
-www.youtube.com/watch?v=tW0BSgzr2AM
-
-

On #AI changes everything... ...


-
-On #AI changes everything...
-
-lucumr.pocoo.org/2025/6/4/changes/
-
-

Maps in Go under the hood #golang ...


-
-Maps in Go under the hood #golang
-
-victoriametrics.com/blog/go-map/
-
-

"A project that looks complex might just be ...


-
-"A project that looks complex might just be unfamiliar" - Quote from the Applied Go Weekly Newsletter
-
-

I must admit that partly I see myself there ...


-
-I must admit that partly I see myself there (sometimes). But it is fun :-) #tools #happy
-
-borretti.me/article/you-can-choose-tools-that-make-you-happy
-
-

Makes me think of good old times, where I ...


-
-Makes me think of good old times, where I shipped 5 times as fast.: What happens when code reviews aren’t mandatory? What happens when code reviews aren’t mandatory? via @wallabagapp #productivity #code
-
-testdouble.com/insights/when-code-reviews-arent-mandatory
-
-

Neat little blog post, showcasing various ...


-
-Neat little blog post, showcasing various methods used for generic programming before the introduction of generics. Only reflection wasn't listed. #golang
-
-bitfieldconsulting.com/posts/generics
-
-

share Didn't know, that on MacOS, besides of ...


-
-share Didn't know, that on MacOS, besides of .so (shared object files, which can be dynamically loaded as well) there is also the MacOS' native .dylib format which serves a similar purpose! #macos #dylib #so
-
-cpu.land/becoming-an-elf-lord
-
-

I think this is the way: use LLMs for code you ...


-
-I think this is the way: use LLMs for code you don't care much about and write code manually for what matters most to you. This way, most boring and boilerplate stuff can be auto-generated.
-
-registerspill.thorstenball.com/p/surely-not-all-codes-worth-it
-
-

Always enable keepalive? I'd say most of the ...


-
-Always enable keepalive? I'd say most of the time. I've seen cases, where connections weren't reused but new additional were edtablished, causing the servers to run out of worker threads #sre Always. Enable. Keepalives.
-
-www.honeycomb.io/blog/always-enable-keepalives
-
-

I just finished reading "Chaos Engineering" by ...


-
-I just finished reading "Chaos Engineering" by Casey Rosenthal—an absolute must-read for anyone passionate about building resilient systems! Chaos Engineering is not abbreaking things randomly—it's a disciplined approach to uncovering weaknesses before they become outages. SREs, this book is packed with practical insights and real-world strategies to strengthen your systems against failure. Highly recommended! #ChaosEngineering #Resilience
-
-www.oreilly.com/library/view/chaos-engineering/9781492043850/
-
-

fx is a neat and tidy command-line tool for ...


-
-fx is a neat and tidy command-line tool for interactively viewing JSON files! What I like about it is that it is not too complex (open the help with ? and it is only about one page long) but still very useful. #json #golang
-
-github.com/antonmedv/fx
-
-

Some nice #Golang tricks there ...


-
-Some nice #Golang tricks there
-
-blog.devtrovert.com/p/12-personal-go-tricks-that-transformed
-
-

October 2025


-
-

Word! What Are We Losing With AI? #llm #ai ...


-
-Word! What Are We Losing With AI? #llm #ai
-
-josem.co/what-are-we-losing-with-ai/
-
-

It's not yet time for the friday #fun, but: ...


-
-It's not yet time for the friday #fun, but: OpenOffice does not print on Tuesdays ― Andreas Zwinkau :-)
-
-beza1e1.tuxen.de/lore/print_on_tuesday.html
-
-

Finally, I retired my AWS/ECS setup for my ...


-
-Finally, I retired my AWS/ECS setup for my self-hosted apps, as it was too expensive to operate—I had to pay $20 monthly just to run pods for only a day or so each month, so I rarely used them. Now, everything has been migrated to my FreeBSD-powered Kubernetes home cluster! Part 7 of this blog series covers the initial pod deployments. #freebsd #k8s #selfhosing
-
-foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.gmi (Gemini)
-foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.html
-
-

A great blog post about my favourite text ...


-
-A great blog post about my favourite text editor. why even helix? #HeliEditor Now I am considering forking it myself as well :-)
-
-axlefublr.github.io/why-even-helix/
-
-

One of the more confusing parts in Go, nil ...


-
-One of the more confusing parts in Go, nil values vs nil errors: #golang
-
-unexpected-go.com/nil-errors-that-are-non-nil-errors.html
-
-

Strong engineers are pragmatic, work fast, have ...


-
-Strong engineers are pragmatic, work fast, have technical ability, dont need to be technical geniuses and believe in their ability to solve almost any problem #productivity
-
-www.seangoedecke.com/what-makes-strong-engineers-strong/
-
-

I am currently binge-listening to the Google ...


-
-I am currently binge-listening to the Google #SRE ProdCast. It's really great to learn about the stories of individual SREs and their journeys. It is not just about SREs at Google; there are also external guests.
-
-sre.google/prodcast/
-
-

Looks like a neat library for writing ...


-
-Looks like a neat library for writing script-a-like programs in #Golang. But honestly, why not directly use a scripting language like #RakuLang or #Ruby
-
-github.com/bitfield/script
-
-

Where Gen AI shines is the generation and ...


-
-Where Gen AI shines is the generation and management of YAML files... e.g. Kubernetes manifests. Who likes to write YAML files by hand? #genai #llm #ai #yaml #kubernetes #k8s
-
-

At work, everybody is replacable. Some with a ...


-
-At work, everybody is replacable. Some with a hic-up, others with none. There will always someone to step up after you leave.
-
-adamstacoviak.com/im-a-cog/
-
-

I actually would switch back to #FreeBSD as ...


-
-I actually would switch back to #FreeBSD as my main Operating System for personal use on my Laptop - FreeBSD used to be my main driver a couple of years ago when I still used "normal" PCs
-
-www.osnews.com/story/140841/freebsd-to-invest-in-laptop-support/
-
-

Amazing Print is amazing ...


-
-Amazing Print is amazing
-
-github.com/amazing-print/amazing_print
-
-

Always worth a reminde, what are bloom filters ...


-
-Always worth a reminde, what are bloom filters and how do they work? #bloom #bloomfilter #datastructure
-
-micahkepe.com/blog/bloom-filters/
-
-

Some #Ruby book notes of mine: ...


-
-Some #Ruby book notes of mine:
-
-foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.gmi (Gemini)
-foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.html
-
-

Sad story. #work #scrum #jira ...


-
-Sad story. #work #scrum #jira
-
-lambdaland.org/posts/2023-02-21_metric_worship/
-
-

One of my favorite books: "Some Thoughts on ...


-
-One of my favorite books: "Some Thoughts on Deep Work"
-
-atthis.link/blog/2020/deepwork.html
-
-

ltex-ls is great for integrating ...


-
-ltex-ls is great for integrating #LanguageTool prose checking via #LSP into your #HelixEditor! ... There is also vale-ls, which I have enabled as well. I just download ltex-ls and configure it as an LSP for your .txt and .md docs... that's it!
-
-valentjn.github.io/ltex/
-
-

supernote-tool is awesome, as I can now ...


-
-supernote-tool is awesome, as I can now download my Supernote notes on my #Linux desktop and convert them into PDFs - enables me to use the Supernote Nomad device as mine completely offline!
-
-

Fun story! :-) The case of the 500-mile email ...


-
-Fun story! :-) The case of the 500-mile email ― Andreas Zwinkau via @wallabagapp #unix #sunos #sendmail
-
-beza1e1.tuxen.de/lore/500mile_email.html
-
-

Operating myself some software over 10 years of ...


-
-Operating myself some software over 10 years of age for over 10 years now, this podcast really resonated with me: #podcast #software #maintainability #maintenance
-
-changelog.com/podcast/627
-
-

#git worktrees are awesome! ...


-
-#git worktrees are awesome!
-
-

LLMs for anomaly detection? "While some ...


-
-LLMs for anomaly detection? "While some ML-powered monitoring features have their place, good old-fashioned standard statistics remain hard to beat" Lessons from the pre-LLM AI in Observability: Anomaly Detection and AI-Ops vs. P99 | #llm #monitoring
-
-quesma.com/blog-detail/aiops-observability
-
-

After having heavily vibe-coded (personal pet ...


-
-After having heavily vibe-coded (personal pet projects) for 2 months other the summer, I've come back to more structured and intentional AI coding practices. Surly, it was a great learnig experiment: #llm #ai #risk #code #sre #development #genai
-
-www.okoone.com/spark/technology-innova..-..ode-is-quietly-increasing-system-risk/
-
-

Slowly, one after another, I am switching all ...


-
-Slowly, one after another, I am switching all my Go projects to Mage. Having a Makefile or Taskfile in a native Go format is so much better.
-
-magefile.org/
-
-

Some neat slice tricks for Go: #golang ...


-
-Some neat slice tricks for Go: #golang
-
-blog.devtrovert.com/p/12-slice-tricks-to-enhance-your-go
-
-

I spent way too much time on this site. It's ...


-
-I spent way too much time on this site. It's full of tools for the #terminal! Terminal Trove - The $HOME of all things in the terminal. #linux #bsd #unix #terminal #cli #tools
-
-terminaltrove.com/
-
-

I share similar experiences with #rust, but I ...


-
-I share similar experiences with #rust, but I am sure one just needs a bit more time to feel productive in it. It's not enough just to try rust out once before becoming fluent in it.
-
-m.slashdot.org/story/446164
-
-

Pipelines in Go using channels. #golang ...


-
-Pipelines in Go using channels. #golang
-
-go.dev/blog/pipelines
-
-

Some nifty #Ruby tricks: In my opinion, Ruby ...


-
-Some nifty #Ruby tricks: In my opinion, Ruby is unterrated. It's a great language even without Rails.
-
-www.rubyinside.com/21-ruby-tricks-902.html
-
-

Reflects my experience ...


-
-Reflects my experience
-
-simonwillison.net/2025/Sep/12/matt-webb/#atom-everything
-
-

I like the fact that Markdown fikes, a RCS. an ...


-
-I like the fact that Markdown fikes, a RCS. an text editor and standard unix tools like #grep and #find are all you need for taking notes digitally. I am the same :-) My favorite note-taking method
-
-unixdigest.com/articles/my-favorite-note-taking-method.html
-
-

Rich Interactive Widgets for Terminal UIs, it ...


-
-Rich Interactive Widgets for Terminal UIs, it must not always be BubbleTea #golang #terminal #widgets
-
-github.com/rivo/tview
-
-

Always fun to dig in the #Perl @Perl woods. ...


-
-Always fun to dig in the #Perl @Perl woods. Now, no more Perl 4 pseudo multi-dimensional hashes in Perl 5 (well, they are still there when you require an older version for compatibility via use flag, though)! :-)
-
-www.effectiveperlprogramming.com/2024/..-..fake-multidimensional-data-structures/
-
-

How does #virtual #memory work? #ram ...


-
-How does #virtual #memory work? #ram
-
-drewdevault.com/2018/10/29/How-does-virtual-memory-work.html
-
-

flamelens - An interactive flamegraph viewer in ...


-
-flamelens - An interactive flamegraph viewer in the terminal. - Terminal Trove
-
-terminaltrove.com/flamelens/
-
-

You can now run Ansible Playbooks and shell ...


-
-You can now run Ansible Playbooks and shell scripts from your Terraform more easily #ansible #terraform #iac
-
-danielmschmidt.de/posts/2025-09-26-terraform-actions-introduction/
-
-

For people working with #k8s, this tool is ...


-
-For people working with #k8s, this tool is useful. It lets you fuzzy find different k8s resource types and read a description about them: #kubernetes #fuzzy #cli #tools #devops
-
-github.com/keisku/kubectl-explore
-
-

November 2025


-
-

Yes, using the right #tool for the job and ...


-
-Yes, using the right #tool for the job and also learn along the way!
-
-drewdevault.com/2016/09/17/Use-the-right-tool.html
-
-

Some neat Go tricks: #golang ...


-
-Some neat Go tricks: #golang
-
-harrisoncramer.me/15-go-sublteties-you-may-not-already-know/
-
-

There are some truths in this #SRE article: ...


-
-There are some truths in this #SRE article: However, in my opinion, the more experience you have, the more you are expected to be able to resolve issues. So you can't always fallback to others. New starters are treated differently, of course. #oncall
-
-ntietz.com/blog/what-i-tell-people-new-to-oncall/.
-
-

The Go flight recorder is a tool that allows ...


-
-The Go flight recorder is a tool that allows developers to capture and analyze the execution of Go programs. It provides insights into performance, memory usage, and other runtime characteristics by recording events and metrics during the program's execution. Yet another tool why Go is awesome! #go #golang #tools
-
-go.dev/blog/flight-recorder
-
-

This is useful #golang ...


-
-This is useful #golang
-
-antonz.org/chans/
-
-

Great visually animated guide how #raft ...


-
-Great visually animated guide how #raft #consensus works
-
-thesecretlivesofdata.com/raft/
-
-

"Today’s junior devs who skip the “hard ...


-
-"Today’s junior devs who skip the “hard way” may plateau early, lacking the depth to grow into senior engineers tomorrow." ... Avoiding Skill Atrophy in the Age of AI
-
-addyo.substack.com/p/avoiding-skill-atrophy-in-the-age
-
-

I actually enjoyed readong through the #Fish ...


-
-I actually enjoyed readong through the #Fish #shell docs It's much cleaner than posix shells
-
-fishshell.com/docs/current/language.html
-
-

There can be many things which can go wrong, ...


-
-There can be many things which can go wrong, more than mentioned here: #linux
-
-notes.eatonphil.com/2025-03-27-things-that-go-wrong-with-disk-io.html
-
-

IMHO, motivation is not always enough. There ...


-
-IMHO, motivation is not always enough. There must also be some discipline. That helps then theres only a little or no motivation
-
-world.hey.com/jason/motivation-50ab8280
-
-

Have been generating those CPU flame graphs on ...


-
-Have been generating those CPU flame graphs on bare metal, so being able to use them in k8s seems to be pretty useful to me. #flamegraphs #k8s #kubernetes
-
-www.percona.com/blog/kubernetes-observability-code-profiling-with-flame-graphs/
-
-

I personally don't like the typical whiteboard ...


-
-I personally don't like the typical whiteboard coding exercises, nor do I think LeetCode is the answer. It's impossible to assess the skills of a candidate with a few interviews but it is possible to filter out the bad ones. The aim is to get an idea about the candidate and be positive about their potential. #interview #interviewing #hiring
-
-danielabaron.me/blog/reimagining-technical-interviews/
-
-

If you've wondered how CPUs and operating ...


-
-If you've wondered how CPUs and operating systems generally work and want the basics explained in an easily digestible format without going to college, have a look at CPU.land. I had a lot of fun reading it! #CPU
-
-cpu.land
-
-

And there's an unexpected winner :-) #erlang ...


-
-And there's an unexpected winner :-) #erlang #architecture
-
-freedium.cfd/https://medium.com/@codep..-..t-wasn-t-what-we-expected-67f84c79dc34
-
-

Is it it? This is it. What Is It (in Ruby 3.4)? ...


-
-Is it it? This is it. What Is It (in Ruby 3.4)? #ruby
-
-kevinjmurphy.com/posts/what-is-it-in-ruby-34/
-
-

From my recent #London trip, I've uploaded ...


-
-From my recent #London trip, I've uploaded some new Street Photography photos to my photo site All photos were post-processed using Open-Source software including #Darktable and #Shotwell. The site itself was generated with a simple #bash script! Not all photos are from London, just the recent additions were.
-
-irregular.ninja!
-
-

Agreed, you should make your own programming ...


-
-Agreed, you should make your own programming language, even if it's only for the sake of learning. I also did so over a decade ago. Mine was called Fype - "For Your Program Execution"
-
-ntietz.com/blog/you-should-make-a-new-terrible-programming-language/
-foo.zone/gemfeed/2010-05-09-the-fype-programming-language.gmi (Gemini)
-foo.zone/gemfeed/2010-05-09-the-fype-programming-language.html
-
-

Principles for C programming #C ...


-
-Principles for C programming #C #programming
-
-drewdevault.com/2017/03/15/How-I-learned-to-stop-worrying-and-love-C.html
-
-

#Typst appears to be a great modern ...


-
-#Typst appears to be a great modern alternative to #LaTeX
-
-

Things you can do with a debugger but not with ...


-
-Things you can do with a debugger but not with print debugging #debugger #debugging #coding #programming
-
-mahesh-hegde.github.io/posts/what_debugger_can/
-
-

Neat tutorial, I think I've to try #jujutsu ...


-
-Neat tutorial, I think I've to try #jujutsu out now! #git #vcs #jujutsu #jj
-
-www.stavros.io/posts/switch-to-jujutsu-already-a-tutorial/
-
-

Wise words Best practices are not rules. They ...


-
-Wise words Best practices are not rules. They are guidelines that help you make better decisions. They are not absolute truths, but rather suggestions based on experience and common sense. You should always use your own judgment and adapt them to your specific situation.
-
-www.arp242.net/best-practices.html
-
-

How to build a #Linux #Container from ...


-
-How to build a #Linux #Container from scratch without #Docker, #Podman, etc. #Linux #container from scratch
-
-michalpitr.substack.com/p/linux-contai..-..rom-scratch?r=gt6tv&triedRedirect=true
-
-

When I reach the point where I am trying to ...


-
-When I reach the point where I am trying to recover from panics in Go, something else has already gone wrong with the design of the codebase, IMHO. However, I must admit that my viewpoint may be flawed, as I code small, self-contained tools and rely on as few dependencies as possible. So I rarely rely on 3rd party libs, which may panic (which wouldn’t be nice to begin with; it would be better if they returned errors). #golang
-
-blog.devtrovert.com/p/go-panic-and-recover-dont-make-these
-
-

Personally one of the main benefits of using ...


-
-Personally one of the main benefits of using #tmux over other solutions is, that I can use the same setup on my personal devices (Linux and BSD) and for work (#macOS): you might not need tmux
-
-bower.sh/you-might-not-need-tmux
-
-

December 2025


-
-

Rhese are some nice #Ruby tricks (Ruby is onw ...


-
-Rhese are some nice #Ruby tricks (Ruby is onw of my favourite languages) 11 Ruby Tricks You Haven’t Seen Before via @wallabagapp
-
-www.rubyguides.com/2016/01/ruby-tricks/
-
-

That's fun, use the C preprocessor as a HTML ...


-
-That's fun, use the C preprocessor as a HTML template engine! #c #cpp #fun
-
-wheybags.com/blog/macroblog.html
-
-

#jq but for #Markdown? Thats interesting, ...


-
-#jq but for #Markdown? Thats interesting, never thought of that. mdq: jq for Markdown via @wallabagapp
-
-github.com/yshavit/mdq
-
-

Elvish seems to be a neat little shell. It's ...


-
-Elvish seems to be a neat little shell. It's implemented in #Golang and can make use of the great Go standard library. The language is more modern than other shells out there (e.g., supporting nested data structures) and eliminates backward compatibility issues (e.g., awkward string parsing with spaces that often causes problems in traditional shells). Elvish also comes with some neat interactive TUI elements. Furthermore, there will be a whole TUI framework built directly into the shell. If I weren't so deeply intertwined with #bash and #zsh, I would personally give #Elvish a try... Interesting, at least, it is.
-
-elv.sh/
-
-

Google #SRE required better Wifi on the ...


-
-Google #SRE required better Wifi on the toilet, otherwise YouTube could go down :-)
-
-podcasts.apple.com/us/podcast/incident..-..ai-stacey/id1615778073?i=1000672365156
-
-

Indeed ...


-
-Indeed
-
-aaronfrancis.com/2024/because-i-wanted-to-12c5137c
-
-

Very interesting post how pods are scheduled ...


-
-Very interesting post how pods are scheduled and terminated with some tips how to improve reliability (pods may be terminated before ingress rules are updated and some traffic may hits non existing pods) #k8s #kubernetes
-
-learnk8s.io/graceful-shutdown
-
-

I have added observability to the #Kubernetes ...


-
-I have added observability to the #Kubernetes cluster in the eighth part of my #Kubernetes on #FreeBSD series. #Grafana #Loki #Prometheus #Alloy #k3s #OpenBSD #RockyLinux
-
-foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.gmi (Gemini)
-foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html
-
-

Wondering where I could make use of it ...


-
-Wondering where I could make use of it blog/2025/12/an-svg-is-all-you-need.mld #SVG
-
-jon.recoil.org/blog/2025/12/an-svg-is-all-you-need.html
-
-

Trying out #COSMIC #Desktop... seems ...


-
-Trying out #COSMIC #Desktop... seems snappier than #GNOME and I like the tiling features...
-
-

Best thing I've ever read about #container ...


-
-Best thing I've ever read about #container #security in #kubernetes:
-
-learnkube.com/security-contexts
-
-

While acknowledging luck in finding the right ...


-
-While acknowledging luck in finding the right team and company culture, the author stresses that staying and choosing long-term ownership is a deliberate choice for those valuing deep technical ownership over external validation: Why I Ignore The Spotlight as a Staff Engineer #engineering
-
-lalitm.com/software-engineering-outside-the-spotlight/
-
-

Great explanation #slo #sla #sli #sre ...


-
-Great explanation #slo #sla #sli #sre
-
-blog.alexewerlof.com/p/sla-vs-slo
-
-

Nice service, you send a drive, they host ...


-
-Nice service, you send a drive, they host #ZFS for you!
-
-zfs.rent/
-
-Other related posts:
-
-2025-01-01 Posts from October to December 2024
-2025-07-01 Posts from January to June 2025
-2026-01-01 Posts from July to December 2025 (You are currently reading this)
-
-E-Mail your comments to paul@nospam.buetow.org :-)
-
-Back to the main site
-
-
-
- - Cloudless Kobo Forma with KOReader - - gemini://foo.zone/gemfeed/2026-01-01-cloudless-kobo-forma-with-koreader.gmi - 2025-12-31T16:08:33+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - I am an reader, and for years I've been searching for a good digital e-reader to complement my paper books. I advocate for privacy-first and prefer open-source or self-hosted solutions. If that is not possible, I opt for offline solutions. Even if I don't have anything to hide, the tinkerer in me wants those things anyway. I found my ideal device in the Kobo Forma 7 years ago. Now, I use it without Kobo's cloud sync, and in this post, I'll show you how. - -
-

Cloudless Kobo Forma with KOReader


-
-Published at 2025-12-31T16:08:33+02:00
-
-I am an reader, and for years I've been searching for a good digital e-reader to complement my paper books. I advocate for privacy-first and prefer open-source or self-hosted solutions. If that is not possible, I opt for offline solutions. Even if I don't have anything to hide, the tinkerer in me wants those things anyway. I found my ideal device in the Kobo Forma 7 years ago. Now, I use it without Kobo's cloud sync, and in this post, I'll show you how.
-
-
-Art by Donovan Bake
-
-      __...--~~~~~-._   _.-~~~~~--...__
-    //               `V'               \\ 
-   //                 |                 \\ 
-  //__...--~~~~~~-._  |  _.-~~~~~~--...__\\ 
- //__.....----~~~~._\ | /_.~~~~----.....__\\
-====================\\|//====================
-                dwb `---`
-
-
-

Table of Contents


-
-
-
-I initially bought the Kobo Forma because I wanted a device with a large screen for reading PDFs and ePubs. However, as time went on, I became more concerned about the privacy implications of having all my reading data synced to the Kobo cloud. So, I looked into alternative ways to use this device.
-
-KOReader running on Kobo Forma
-
-The Kobo Forma is so old that it can't be purchased from Kobo directly anymore. But I love the form factor; it's much lighter than the Kobo Sage and still has a 7" screen. It's just that the stock firmware is becoming too slow and sluggish.
-
-Kobo Forma
-
-Note: Some of the screenshots in this post are taken from my Kobo Clara HD, which is another Kobo eReader I have. It's smaller and better for travel, and I use the same KOReader setup on both devices.
-
-

KOReader to the Rescue


-
-I keep my Kobo Forma disconnected from the cloud entirely, and KOReader makes that possible. KOReader is a versatile, open-source document and image viewer which can also be installed on some E Ink reader devices like the Kobo Forma. No cloud sync, no tracking, just reading.
-
-KOReader
-
-By not syncing my reading progress and library to Kobo's cloud service, I retain full ownership and control over my data. There's no risk of my personal reading habits being accessed or mined by third parties.
-
-

Installation


-
-Installing KOReader is straightforward. You can follow the official guide for that. I used the Linux one:
-
-https://github.com/koreader/koreader/wiki/Installation-on-desktop-linux
-
-Basically, what I had to do is to download a .zip file of the KOReader binary and an install.sh script. Then, I plugged in the Kobo Forma via USB and ran the install script, which did the rest for me.
-
-KOReader installation via USB
-
-After the initial install, KOReader can update itself through its menus.
-
-KOReader self-update menu
-
-It is worth noting that after the KOReader install, the Kobo Forma still boots into the proprietary window manager. To start KOReader, you have to select it from the new "Nickel Menu". KOReader will then stay open until you reboot the device. It's a small annoyance, but it's well worth it!
-
-Nickel Menu
-
-

Sideloaded Mode


-
-To use the Kobo Forma completely without a Kobo account, you can enable "Sideloaded Mode". This mode allows you to use the device without being signed in to a Kobo account. When enabled, the home screen will default to your library instead of showing Kobo recommendations, and the sync button will disappear. This prevents the device from trying to sync with the Kobo cloud.
-
-To enable it, you need to edit the configuration file. Connect your Kobo device to your computer via USB. Open the file .kobo/Kobo/Kobo eReader.conf and add the following lines:
-
-
-[ApplicationPreferences]
-SideloadedMode=true
-
-
-After saving the file, eject the device. You might need to restart it for the changes to take effect.
-
-KOReader is much faster than the stock firmware; it feels about three times as fast. Before trying out KOReader, I was thinking about selling the Forma as it felt too sluggish. But now there is new life in this 7-year-old device! It also offers a night mode (inverted colors), a feature that the stock firmware on the Forma is lacking.
-
-KOReader dark mode (inverted colors)
-
-

My Workflow


-
-My workflow is simple and efficient, relying on a direct USB connection to my Linux laptop for sideloading books and a self-hosted sync server for progress synchronization.
-
-

Sideloading Books


-
-I connect my Kobo Forma to my Linux laptop via a USB-C cable. The device is automatically recognized as a storage device, and I can directly access its storage to copy over ePubs, PDFs, and other supported formats.
-
-

KOReader Sync Server


-
-To keep my reading progress synchronized across multiple devices (my Kobo, my phone, and my Linux laptop), I run a koreader-sync-server instance in my k3s cluster. This allows me to pick up reading where I left off, no matter which device I'm using.
-
-https://codeberg.org/snonux/conf/src/branch/master/f3s/kobo-sync-server
-
-Custom sync server configuration
-
-To configure the sync server in KOReader, open a document, go to "Settings" -> "Progress Sync", and select "Custom sync server". There you can enter the URL of your server and your credentials. The progress can then also be synced to and from KOReader running on other devices (e.g. a Laptop or a Smartphone!)
-
-KOReader sync menu
-
-

Exporting Book Notes and Highlights


-
-KOReader allows you to export book notes and highlights directly from the device in various formats, including plain text and Markdown. Unfortunately, these are not automatically synced to the sync server. I have an offline backup procedure where I regularly sync them via USB to my backup server. There's a 3rd party plugin available for KOReader, which seems to be able to do this kind of sync, though.
-
-

Wallabag Integration


-
-KOReader has built-in Wallabag support. This allows me to save articles from the web to my self-hosted Wallabag instance and then read them comfortably on my Kobo.
-
-https://wallabag.org/
-
-I haven't tried it out yet, though. I may will and will update this blog post here after done so.
-
-

Purchasing e-books


-
-If you search a little bit you also find stores which sell digital rights management (DRM) free e-books (in ePub format), for example buecher.de does, they sell german and english books. Before purchasing, just make sure that the book is DRM-free (not all their books are that.)
-
-All the books I read you can see here:
-
-Novels I've read
-Resources, Technical Books, Podcasts, Courses and Guides I recommend
-
-

Conclusion


-
-I'm really happy with this setup. Offline Kobo with KOReader, manual book transfers, self-hosted services—it's simple, private, and the reading experience is just great. If you care about owning your data (and not getting distracted), give it a try.
-
-Other related posts:
-
-2026-01-01 Using Supernote Nomad offline
-2026-01-01 Cloudless Kobo Forma with KOReader (You are currently reading this)
-
-E-Mail your comments to paul@nospam.buetow.org :-)
-
-Back to the main site
-
-
-
- - X-RAG Observability Hackathon - - gemini://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.gmi - 2025-12-24T09:45:29+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This post describes my hackathon efforts adding observability to X-RAG, the extensible Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I made time over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as 'let's add some metrics' turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs. - -
-

X-RAG Observability Hackathon


-
-Published at 2025-12-24T09:45:29+02:00
-
-This post describes my hackathon efforts adding observability to X-RAG, the extensible Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I made time over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as "let's add some metrics" turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs.
-
-X-RAG source code on GitHub
-
-

Table of Contents


-
-
-

What is X-RAG?


-
-X-RAG is the extensible RAG (Retrieval-Augmented Generation) platform running on Kubernetes. The idea behind RAG is simple: instead of asking an LLM to answer questions from its training data alone, you first retrieve relevant documents from your own knowledge base, then feed those documents to the LLM as context. The LLM synthesises an answer grounded in your actual content—reducing hallucinations and enabling answers about private or recent information the model was never trained on.
-
-X-RAG handles the full pipeline: ingest documents, chunk them into searchable pieces, generate vector embeddings, store them in a vector database, and at query time, retrieve relevant chunks and pass them to an LLM for answer generation. The system supports both local LLMs (Florian runs his on a beefy desktop) and cloud APIs like OpenAI. I configured an OpenAI API key since my laptop's CPU and GPU aren't fast enough for decent local inference.
-
-All services are implemented in Python. I'm more used to Ruby, Go, and Bash these days, but for this project it didn't matter—Python's OpenTelemetry integration is straightforward, I wasn't planning to write or rewrite tons of application code, and with GenAI assistance the language barrier was a non-issue. The OpenTelemetry concepts and patterns should translate to other languages too—the SDK APIs are intentionally similar across Python, Go, Java, and others.
-
-X-RAG consists of several independently scalable microservices:
-
-
    -
  • Search UI: FastAPI web interface for queries
  • -
  • Ingestion API: Document upload endpoint
  • -
  • Embedding Service: gRPC service for vector embeddings
  • -
  • Indexer: Kafka consumer that processes documents
  • -
  • Search Service: gRPC service orchestrating the RAG pipeline
  • -

-The Embedding Service deserves extra explanation because in the beginning I didn't really knew what it was. Text isn't directly searchable in a vector database—you need to convert it to numerical vectors (embeddings) that capture semantic meaning. The Embedding Service takes text chunks and calls an embedding model (OpenAI's text-embedding-3-small in my case, or a local model on Florian's setup) to produce these vectors. For the LLM search completion answer, I used gpt-4o-mini.
-
-Similar concepts end up with similar vectors, so "What is machine learning?" and "Explain ML" produce vectors close together in the embedding space. At query time, your question gets embedded too, and the vector database finds chunks with nearby vectors—that's semantic search.
-
-The data layer includes Weaviate (vector database with hybrid search), Kafka (message queue), MinIO (object storage), and Redis (cache). All of this runs in a Kind Kubernetes cluster for local development, with the same manifests deployable to production.
-
-
-┌─────────────────────────────────────────────────────────────────────────┐
-│                      X-RAG Kubernetes Cluster                           │
-├─────────────────────────────────────────────────────────────────────────┤
-│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
-│   │ Search UI   │  │Search Svc   │  │Embed Service│  │   Indexer   │    │
-│   └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘    │
-│          │                │                │                │           │
-│          └────────────────┴────────────────┴────────────────┘           │
-│                                    │                                    │
-│                                    ▼                                    │
-│          ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
-│          │  Weaviate   │  │   Kafka     │  │   MinIO     │              │
-│          └─────────────┘  └─────────────┘  └─────────────┘              │
-└─────────────────────────────────────────────────────────────────────────┘
-
-
-

Running Kubernetes locally with Kind


-
-X-RAG runs on Kubernetes, but you don't need a cloud account to develop it. The project uses Kind (Kubernetes in Docker)—a tool originally created by the Kubernetes SIG for testing Kubernetes itself.
-
-Kind - Kubernetes in Docker
-
-Kind spins up a full Kubernetes cluster using Docker containers as nodes. The control plane (API server, etcd, scheduler, controller-manager) runs in one container, and worker nodes run in separate containers. Inside these "node containers," pods run just like they would on real servers—using containerd as the container runtime. It's containers all the way down.
-
-Technically, each Kind node is a Docker container running a minimal Linux image with kubelet and containerd installed. When you deploy a pod, kubelet inside the node container instructs containerd to pull and run the container image. So you have Docker running node containers, and inside those, containerd running application containers. Network-wise, Kind sets up a Docker bridge network and uses CNI plugins (kindnet by default) for pod networking within the cluster.
-
-
-$ docker ps --format "table {{.Names}}\t{{.Image}}"
-NAMES                  IMAGE
-xrag-k8-control-plane  kindest/node:v1.32.0
-xrag-k8-worker         kindest/node:v1.32.0
-xrag-k8-worker2        kindest/node:v1.32.0
-
-
-The kindest/node image contains everything needed: kubelet, containerd, CNI plugins, and pre-pulled pause containers. Port mappings in the Kind config expose services to the host—that's how http://localhost:8080 reaches the search-ui running inside a pod, inside a worker container, inside Docker.
-
-
-┌─────────────────────────────────────────────────────────────────────────┐
-│                           Docker Host                                   │
-├─────────────────────────────────────────────────────────────────────────┤
-│  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐    │
-│  │ xrag-k8-control   │  │ xrag-k8-worker    │  │ xrag-k8-worker2   │    │
-│  │ -plane (container)│  │ (container)       │  │ (container)       │    │
-│  │                   │  │                   │  │                   │    │
-│  │ K8s API server    │  │ Pods:             │  │ Pods:             │    │
-│  │ etcd, scheduler   │  │ • search-ui       │  │ • weaviate        │    │
-│  │                   │  │ • search-service  │  │ • kafka           │    │
-│  │                   │  │ • embedding-svc   │  │ • prometheus      │    │
-│  │                   │  │ • indexer         │  │ • grafana         │    │
-│  └───────────────────┘  └───────────────────┘  └───────────────────┘    │
-└─────────────────────────────────────────────────────────────────────────┘
-
-
-Why Kind? It gives you a real Kubernetes environment—the same manifests deploy to production clouds unchanged. No minikube quirks, no Docker Compose translation layer. Just Kubernetes. I already have a k3s cluster running at home, but Kind made collaboration easier—everyone working on X-RAG gets the exact same setup by cloning the repo and running make cluster-start.
-
-Florian developed X-RAG on macOS, but it worked seamlessly on my Linux laptop. The only difference was Docker's resource allocation: on macOS you configure limits in Docker Desktop, on Linux it uses host resources directly. That's because under macOS the Linux Docker containers run on an emulation layer as macOS is not Linux.
-
-My hardware: a ThinkPad X1 Carbon Gen 9 with an 11th Gen Intel Core i7-1185G7 (4 cores, 8 threads at 3.00GHz) and 32GB RAM (running Fedora Linux). During the hackathon, memory usage peaked around 15GB—comfortable headroom. CPU was the bottleneck; with ~38 pods running across all namespaces (rag-system, monitoring, kube-system, etc.), plus Discord for the remote video call and Tidal streaming hi-res music, things got tight. When rebuilding Docker images or restarting the cluster, Discord video and audio would stutter—my fellow hackers probably wondered why I kept freezing mid-sentence. A beefier CPU would have meant less waiting and smoother calls, but it was manageable.
-
-

Motivation


-
-When I joined the hackathon, Florian's X-RAG was functional but opaque. With five services communicating via gRPC, Kafka, and HTTP, debugging was cumbersome. When a search request take 5 seconds, there was no visibility into where the time was being spent. Was it the embedding generation? The vector search? The LLM synthesis? Nobody would be able to figure it out quickly.
-
-Distributed systems are inherently opaque. Each service logs its own view of the world, but correlating events across service boundaries is archaeology. Grepping through logs on many pods, trying to mentally reconstruct what happened—not fun. This was the perfect hackathon project: Explore this Observability Stack in greater depth.
-
-

The observability stack


-
-Before diving into implementation, here's what I deployed. The complete stack runs in the monitoring namespace:
-
-
-$ kubectl get pods -n monitoring
-NAME                                  READY   STATUS
-alloy-84ddf4cd8c-7phjp                1/1     Running
-grafana-6fcc89b4d6-pnh8l              1/1     Running
-kube-state-metrics-5d954c569f-2r45n   1/1     Running
-loki-8c9bbf744-sc2p5                  1/1     Running
-node-exporter-kb8zz                   1/1     Running
-node-exporter-zcrdz                   1/1     Running
-node-exporter-zmskc                   1/1     Running
-prometheus-7f755f675-dqcht            1/1     Running
-tempo-55df7dbcdd-t8fg9                1/1     Running
+
+
#!/usr/bin/env bash
+set -u -o pipefail
+
+LOG_ENABLED=0
+log_file="${TMPDIR:-/tmp}/tmux-edit-send.log"
+log() {
+  if [ "$LOG_ENABLED" -eq 1 ]; then
+    printf '%s\n' "$*" >> "$log_file"
+  fi
+}
+
+# Read the target pane id from a temp file created by tmux binding.
+read_target_from_file() {
+  local file_path="$1"
+  local pane_id
+  if [ -n "$file_path" ] && [ -f "$file_path" ]; then
+    pane_id="$(sed -n '1p' "$file_path" | tr -d '[:space:]')"
+    # Ensure pane ID has % prefix
+    if [ -n "$pane_id" ] && [[ "$pane_id" != %* ]]; then
+      pane_id="%${pane_id}"
+    fi
+    printf '%s' "$pane_id"
+  fi
+}
+
+# Read the target pane id from tmux environment if present.
+read_target_from_env() {
+  local env_line pane_id
+  env_line="$(tmux show-environment -g TMUX_EDIT_TARGET 2>/dev/null || true)"
+  case "$env_line" in
+    TMUX_EDIT_TARGET=*)
+      pane_id="${env_line#TMUX_EDIT_TARGET=}"
+      # Ensure pane ID has % prefix
+      if [ -n "$pane_id" ] && [[ "$pane_id" != %* ]] && [[ "$pane_id" =~ ^[0-9]+$ ]]; then
+        pane_id="%${pane_id}"
+      fi
+      printf '%s' "$pane_id"
+      ;;
+  esac
+}
+
+# Resolve the target pane id, falling back to the last pane.
+resolve_target_pane() {
+  local candidate="$1"
+  local current_pane last_pane
+
+  current_pane="$(tmux display-message -p "#{pane_id}" 2>/dev/null || true)"
+  log "current pane=${current_pane:-<empty>}"
+  
+  # Ensure candidate has % prefix if it's a pane ID
+  if [ -n "$candidate" ] && [[ "$candidate" =~ ^[0-9]+$ ]]; then
+    candidate="%${candidate}"
+    log "normalized candidate to $candidate"
+  fi
+  
+  if [ -n "$candidate" ] && [[ "$candidate" == *"#{"* ]]; then
+    log "format target detected, clearing"
+    candidate=""
+  fi
+  if [ -z "$candidate" ]; then
+    candidate="$(tmux display-message -p "#{last_pane}" 2>/dev/null || true)"
+    log "using last pane as fallback: $candidate"
+  elif [ "$candidate" = "$current_pane" ]; then
+    last_pane="$(tmux display-message -p "#{last_pane}" 2>/dev/null || true)"
+    if [ -n "$last_pane" ]; then
+      candidate="$last_pane"
+      log "candidate was current, using last pane: $candidate"
+    fi
+  fi
+  printf '%s' "$candidate"
+}
+
+# Capture the latest multi-line prompt content from the pane.
+capture_prompt_text() {
+  local target="$1"
+  tmux capture-pane -p -t "$target" -S -2000 2>/dev/null | awk '
+    function trim_box(line) {
+      sub(/^ *│ ?/, "", line)
+      sub(/ *│ *$/, "", line)
+      sub(/[[:space:]]+$/, "", line)
+      return line
+    }
+    /^ *│ *→/ && index($0,"INSERT")==0 && index($0,"Add a follow-up")==0 {
+      if (text != "") last = text
+      text = ""
+      capture = 1
+      line = $0
+      sub(/^.*→ ?/, "", line)
+      line = trim_box(line)
+      if (line != "") text = line
+      next
+    }
+    capture {
+      if ($0 ~ /^ *└/) {
+        capture = 0
+        if (text != "") last = text
+        next
+      }
+      if ($0 ~ /^ *│/ && index($0,"INSERT")==0 && index($0,"Add a follow-up")==0) {
+        line = trim_box($0)
+        if (line != "") {
+          if (text != "") text = text " " line
+          else text = line
+        }
+      }
+    }
+    END {
+      if (text != "") last = text
+      if (last != "") print last
+    }
+  '
+}
+
+# Write captured prompt text into the temp file if available.
+prefill_tmpfile() {
+  local tmpfile="$1"
+  local prompt_text="$2"
+  if [ -n "$prompt_text" ]; then
+    printf '%s\n' "$prompt_text" > "$tmpfile"
+  fi
+}
+
+# Ensure the target pane exists before sending keys.
+validate_target_pane() {
+  local target="$1"
+  local pane target_found
+  if [ -z "$target" ]; then
+    log "error: no target pane determined"
+    echo "Could not determine target pane." >&2
+    return 1
+  fi
+  target_found=0
+  log "validate: looking for target='$target' in all panes:"
+  for pane in $(tmux list-panes -a -F "#{pane_id}" 2>/dev/null || true); do
+    log "validate: checking pane='$pane'"
+    if [ "$pane" = "$target" ]; then
+      target_found=1
+      log "validate: MATCH FOUND!"
+      break
+    fi
+  done
+  if [ "$target_found" -ne 1 ]; then
+    log "error: target pane not found: $target"
+    echo "Target pane not found: $target" >&2
+    return 1
+  fi
+  log "validate: target pane validated successfully"
+}
+
+# Send temp file contents to the target pane line by line.
+send_content() {
+  local target="$1"
+  local tmpfile="$2"
+  local prompt_text="$3"
+  local first_line=1
+  local line
+  log "send_content: target=$target, prompt_text='$prompt_text'"
+  while IFS= read -r line || [ -n "$line" ]; do
+    log "send_content: read line='$line'"
+    if [ "$first_line" -eq 1 ] && [ -n "$prompt_text" ]; then
+      if [[ "$line" == "$prompt_text"* ]]; then
+        local old_line="$line"
+        line="${line#"$prompt_text"}"
+        log "send_content: stripped prompt, was='$old_line' now='$line'"
+      fi
+    fi
+    first_line=0
+    log "send_content: sending line='$line'"
+    tmux send-keys -t "$target" -l "$line"
+    tmux send-keys -t "$target" Enter
+  done < "$tmpfile"
+  log "sent content to $target"
+}
+
+# Main entry point.
+main() {
+  local target_file="${1:-}"
+  local target
+  local editor="${EDITOR:-vi}"
+  local tmpfile
+  local prompt_text
+
+  log "=== tmux-edit-send starting ==="
+  log "target_file=$target_file"
+  log "EDITOR=$editor"
+  
+  target="$(read_target_from_file "$target_file" || true)"
+  if [ -n "$target" ]; then
+    log "file target=${target:-<empty>}"
+    rm -f "$target_file"
+  fi
+  if [ -z "$target" ]; then
+    target="${TMUX_EDIT_TARGET:-}"
+  fi
+  log "env target=${target:-<empty>}"
+  if [ -z "$target" ]; then
+    target="$(read_target_from_env || true)"
+  fi
+  log "tmux env target=${target:-<empty>}"
+  target="$(resolve_target_pane "$target")"
+  log "fallback target=${target:-<empty>}"
+
+  tmpfile="$(mktemp)"
+  log "created tmpfile=$tmpfile"
+  if [ ! -f "$tmpfile" ]; then
+    log "ERROR: mktemp failed to create file"
+    echo "ERROR: mktemp failed" >&2
+    exit 1
+  fi
+  mv "$tmpfile" "${tmpfile}.md" 2>&1 | while read -r line; do log "mv output: $line"; done
+  tmpfile="${tmpfile}.md"
+  log "renamed to tmpfile=$tmpfile"
+  if [ ! -f "$tmpfile" ]; then
+    log "ERROR: tmpfile does not exist after rename"
+    echo "ERROR: tmpfile rename failed" >&2
+    exit 1
+  fi
+  trap 'rm -f "$tmpfile"' EXIT
+
+  log "capturing prompt text from target=$target"
+  prompt_text="$(capture_prompt_text "$target")"
+  log "captured prompt_text='$prompt_text'"
+  prefill_tmpfile "$tmpfile" "$prompt_text"
+  log "prefilled tmpfile"
+
+  log "launching editor: $editor $tmpfile"
+  "$editor" "$tmpfile"
+  local editor_exit=$?
+  log "editor exited with status $editor_exit"
+
+  if [ ! -s "$tmpfile" ]; then
+    log "empty file, nothing sent"
+    exit 0
+  fi
+  
+  log "tmpfile contents:"
+  log "$(cat "$tmpfile")"
+
+  log "validating target pane"
+  validate_target_pane "$target"
+  log "sending content to target=$target"
+  send_content "$target" "$tmpfile" "$prompt_text"
+  log "=== tmux-edit-send completed ==="
+}
+
+main "$@"
 

-Each component has a specific role:
+

Challenges and small discoveries



-
    -
  • Grafana Alloy: The unified collector. Receives OTLP from applications, scrapes Prometheus endpoints, tails log files. Think of it as the central nervous system.
  • -
  • Prometheus: Time-series database for metrics. Stores counters, gauges, and histograms with 15-day retention.
  • -
  • Tempo: Trace storage. Receives spans via OTLP, correlates them by trace ID, enables TraceQL queries.
  • -
  • Loki: Log aggregation. Indexes labels (namespace, pod, container), stores log chunks, enables LogQL queries.
  • -
  • Grafana: The unified UI. Queries all three backends, correlates signals, displays dashboards.
  • -
  • kube-state-metrics: Exposes Kubernetes object metrics (pod status, deployments, resource requests).
  • -
  • node-exporter: Exposes host-level metrics (CPU, memory, disk, network) from each Kubernetes node.
  • -

-Everything is accessible via port-forwards:
+The problems were mostly small but annoying:

    -
  • Grafana: http://localhost:3000 (unified UI for all three signals)
  • -
  • Prometheus: http://localhost:9090 (metrics queries)
  • -
  • Tempo: http://localhost:3200 (trace queries)
  • -
  • Loki: http://localhost:3100 (log queries)
  • +
  • Getting the right target pane was the first hurdle. I ended up storing the pane id in a file because of tmux format expansion quirks.
  • +
  • The Cursor UI draws a nice box around the prompt, so the prompt line contains a and other markers. I had to filter those out and strip the box-drawing characters.
  • +
  • When I prefilled text and then sent it back, I sometimes duplicated the prompt. Stripping the prefilled prompt text from the submitted text fixed that.

-

Grafana Alloy: the unified collector


-
-Before diving into the individual signals, I want to highlight Grafana Alloy—the component that ties everything together. Alloy is Grafana's vendor-neutral OpenTelemetry Collector distribution, and it became the backbone of the observability stack.
-
-Grafana Alloy documentation
+

Test cases (for a future rewrite)



-Why use a centralised collector instead of having each service push directly to backends?
+These are the cases I test whenever I touch the script:

    -
  • Decoupling: Applications don't need to know about Prometheus, Tempo, or Loki. They speak OTLP, and Alloy handles the translation.
  • -
  • Unified timestamps: All telemetry flows through one system, making correlation in Grafana more reliable.
  • -
  • Processing pipeline: Batch data before sending, filter noisy metrics, enrich with labels—all in one place.
  • -
  • Backend flexibility: Switch from Tempo to Jaeger without changing application code.
  • -

-Alloy uses a configuration language called River, which feels similar to Terraform's HCL—declarative blocks with attributes. If you've written Terraform, River will look familiar. The full Alloy configuration runs to over 1400 lines with comments explaining each section. It handles OTLP receiving, batch processing, Prometheus export, Tempo export, Kubernetes metrics scraping, infrastructure metrics, and pod log collection. All three signals—metrics, traces, logs—flow through this single component, making Alloy the central nervous system of the observability stack.
-
-In the following sections, I'll cover each observability pillar and show the relevant Alloy configuration for each.
-
-

Centralised logging with Loki


-
-Getting all logs in one place was the foundation. I deployed Grafana Loki in the monitoring namespace, with Grafana Alloy running as a DaemonSet on each node to collect logs.
-
-
-┌──────────────────────────────────────────────────────────────────────┐
-│                           LOGS PIPELINE                              │
-├──────────────────────────────────────────────────────────────────────┤
-│  Applications write to stdout → containerd stores in /var/log/pods   │
-│                                    │                                 │
-│                              File tail                               │
-│                                    ▼                                 │
-│                         Grafana Alloy (DaemonSet)                    │
-│                    Discovers pods, extracts metadata                 │
-│                                    │                                 │
-│                       HTTP POST /loki/api/v1/push                    │
-│                                    ▼                                 │
-│                           Grafana Loki                               │
-│                   Indexes labels, stores chunks                      │
-└──────────────────────────────────────────────────────────────────────┘
-
-
-

Alloy configuration for logs


-
-Alloy discovers pods via the Kubernetes API, tails their log files from /var/log/pods/, and ships to Loki. Importantly, Alloy runs as a DaemonSet on each worker node—it doesn't run inside the application pods. Since containerd writes all container stdout/stderr to /var/log/pods/ on the node's filesystem, Alloy can tail logs for every pod on that node from a single location without any sidecar injection:
-
-
-loki.source.kubernetes "pod_logs" {
-  targets    = discovery.relabel.pod_logs.output
-  forward_to = [loki.process.pod_logs.receiver]
-}
-
-loki.write "default" {
-  endpoint {
-    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
-  }
-}
-
-
-

Querying logs with LogQL


-
-Now I could query logs in Loki (e.g. via Grafana UI) with LogQL:
-
-
-{namespace="rag-system", container="search-ui"} |= "ERROR"
-
-
-

Metrics with Prometheus


+
  • Single-line prompt: capture everything after and prefill the editor.
  • +
  • Multi-line boxed prompt: capture the wrapped lines inside the │ ... │ box and join them with spaces (no newline in the editor).
  • +
  • Ignore UI noise: do not capture lines containing INSERT or Add a follow-up.
  • +
  • Preserve appended text: if I add juju to an existing line, the space before juju must survive.
  • +
  • No duplicate send: if the prefilled text is still at the start of the first line, it must be stripped once before sending back.
  • +
    +

    (Almost) works with any editor (or any TUI)



    -I added Prometheus metrics to every service. Following the Four Golden Signals (latency, traffic, errors, saturation), I instrumented the codebase with histograms, counters, and gauges:
    +Although I use Helix, this is just $EDITOR. If you prefer Vim, Neovim, or something more exotic, it should work. The same mechanism can be used to feed text into any TUI that reads from a terminal pane, not just Cursor Agent.

    - -
    from prometheus_client import Histogram, Counter, Gauge
    -
    -search_duration = Histogram(
    -    "search_service_request_duration_seconds",
    -    "Total duration of Search Service requests",
    -    ["method"],
    -    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 60.0],
    -)
    -
    -errors_total = Counter(
    -    "search_service_errors_total",
    -    "Error count by type",
    -    ["method", "error_type"],
    -)
    -
    +One caveat: different agents draw different prompt UIs, so the capture logic depends on the prompt shape. A future version of this script should be more modular in that respect; for now this is just a PoC tailored to Cursor Agent.

    -Initially, I used Prometheus scraping—each service exposed a /metrics endpoint, and Prometheus pulled metrics every 15 seconds. This worked, but I wanted a unified pipeline.
    +Another thing is, what if Cursor decides to change the design of its TUI? I would need to change my script as well.

    -

    Alloy configuration for application metrics


    +If I get a chance, I'll clean it up and rewrite it in Go (and release it properly or include it into Hexai, another AI related tool of mine, of which I haven't blogged about yet). For now, I am happy with this little hack. It already feels like a native editing workflow for Cursor Agent prompts.

    -The breakthrough came with Grafana Alloy as an OpenTelemetry collector. Services now push metrics via OTLP (OpenTelemetry Protocol), and Alloy converts them to Prometheus format:
    +https://codeberg.org/snonux/hexai

    -
    -┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    -│ search-ui   │  │search-svc   │  │embed-svc    │  │  indexer    │
    -│ OTel Meter  │  │ OTel Meter  │  │ OTel Meter  │  │ OTel Meter  │
    -│      │      │  │      │      │  │      │      │  │      │      │
    -│ OTLPExporter│  │ OTLPExporter│  │ OTLPExporter│  │ OTLPExporter│
    -└──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
    -       │                │                │                │
    -       └────────────────┴────────────────┴────────────────┘
    -                                 │
    -                                 ▼ OTLP/gRPC (port 4317)
    -                        ┌─────────────────────┐
    -                        │   Grafana Alloy     │
    -                        └──────────┬──────────┘
    -                                   │ prometheus.remote_write
    -                                   ▼
    -                        ┌─────────────────────┐
    -                        │    Prometheus       │
    -                        └─────────────────────┘
    -
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Alloy receives OTLP on ports 4317 (gRPC) or 4318 (HTTP), batches the data for efficiency, and exports to Prometheus:
    +Other related posts are:

    -
    -otelcol.receiver.otlp "default" {
    -  grpc { endpoint = "0.0.0.0:4317" }
    -  http { endpoint = "0.0.0.0:4318" }
    -  output {
    -    metrics = [otelcol.processor.batch.metrics.input]
    -    traces  = [otelcol.processor.batch.traces.input]
    -  }
    -}
    -
    -otelcol.processor.batch "metrics" {
    -  timeout = "5s"
    -  send_batch_size = 1000
    -  output { metrics = [otelcol.exporter.prometheus.default.input] }
    -}
    -
    -otelcol.exporter.prometheus "default" {
    -  forward_to = [prometheus.remote_write.prom.receiver]
    -}
    -
    +2026-02-02 A tmux popup editor for Cursor Agent CLI prompts (You are currently reading this)
    +2025-08-05 Local LLM for Coding with Ollama on macOS
    +2025-05-02 Terminal multiplexing with tmux - Fish edition
    +2024-06-23 Terminal multiplexing with tmux - Z-Shell edition

    -Instead of sending each metric individually, Alloy accumulates up to 1000 metrics (or waits 5 seconds) before flushing. This reduces network overhead and protects backends from being overwhelmed.
    +Back to the main site
    +
    +
    +
    + + Using Supernote Nomad offline + + gemini://foo.zone/gemfeed/2026-01-01-using-supernote-nomad-offline.gmi + 2025-12-31T16:25:30+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + I am a note taker. For years, I've been searching for a good digital device that could complement my paper notebooks. I've finally found it in the Supernote Nomad. I use it completely offline without cloud-sync, and in this post, I'll explain why this is a benefit. + +
    +

    Using Supernote Nomad offline



    -

    Kubernetes metrics: kubelet, cAdvisor, and kube-state-metrics


    +Published at 2025-12-31T16:25:30+02:00

    -Alloy also pulls metrics from Kubernetes itself—kubelet resource metrics, cAdvisor container metrics, and kube-state-metrics for cluster state.
    +I am a note taker. For years, I've been searching for a good digital device that could complement my paper notebooks. I've finally found it in the Supernote Nomad. I use it completely offline without cloud-sync, and in this post, I'll explain why this is a benefit.

    -Why three separate sources? It does feel fragmented, but each serves a distinct purpose. kubelet exposes resource metrics about pod CPU and memory usage from its own bookkeeping—lightweight summaries of what's running on each node. cAdvisor (Container Advisor) runs inside kubelet and provides detailed container-level metrics: CPU throttling, memory working sets, filesystem I/O, network bytes. These are the raw runtime stats from containerd. kube-state-metrics is different—it doesn't measure resource usage at all. Instead, it queries the Kubernetes API and exposes the *desired state*: how many replicas a Deployment wants, whether a Pod is pending or running, what resource requests and limits are configured. You need all three because "container used 500MB" (cAdvisor), "pod requested 1GB" (kube-state-metrics), and "node has 4GB available" (kubelet) are complementary views. The fragmentation is a consequence of Kubernetes' architecture—no single component has the complete picture.
    +Supernote Nomad

    -None of these components speak OpenTelemetry—they all expose Prometheus-format metrics via HTTP endpoints. That's why Alloy uses prometheus.scrape instead of receiving OTLP pushes. Alloy handles both worlds: OTLP from our applications, Prometheus scraping for infrastructure.
    +I initially bought it because Retta (the manufacturer of the Supernote) stated on their website that an open-source Linux firmware would be released soon. However, after over a year, there still hasn't been any progress (hopefully there will be someday). So I looked into alternative ways to use this device.

    -prometheus.scrape "kubelet_resource" {
    -  targets         = discovery.relabel.kubelet.output
    -  job_name        = "kubelet-resource"
    -  scheme          = "https"
    -  scrape_interval = "30s"
    -  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
    -  tls_config { insecure_skip_verify = true }
    -  forward_to      = [prometheus.remote_write.prom.receiver]
    -}
    -
    -prometheus.scrape "cadvisor" {
    -  targets         = discovery.relabel.cadvisor.output
    -  job_name        = "cadvisor"
    -  scheme          = "https"
    -  scrape_interval = "60s"
    -  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
    -  tls_config { insecure_skip_verify = true }
    -  forward_to      = [prometheus.relabel.cadvisor_filter.receiver]
    -}
    -
    -prometheus.scrape "kube_state_metrics" {
    -  targets = [
    -    {"__address__" = "kube-state-metrics.monitoring.svc.cluster.local:8080"},
    -  ]
    -  job_name        = "kube-state-metrics"
    -  scrape_interval = "30s"
    -  forward_to      = [prometheus.relabel.kube_state_filter.receiver]
    -}
    +⣿⣿⣿⣿⣿⣿⡿⠿⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣿⣿⣏⠀⢶⣆⡘⠉⠙⠛⠿⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣿⣿⠋⣤⣄⠘⠃⢠⣀⣀⠀⠀⠀⠀⠀⠉⠉⠛⠛⠿⢿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣿⡿⠀⡉⠻⡟⠀⠈⠉⠙⠛⠷⠶⣦⣤⣄⣀⠀⠀⠀⠀⠀⣾⣿⣿⣿⣿
    +⣿⣿⣿⣿⡄⠸⢿⣤⠀⢠⣤⣀⡀⠀⠀⠀⠀⠀⠉⠙⠛⠻⠶⠀⢰⣿⣿⠻⣿⣿
    +⣿⣿⣿⣿⠠⣶⣆⡉⠀⠀⠈⠉⠙⠛⠳⠶⠦⣤⣤⣄⣀⡀⢀⣴⠟⠋⠙⢷⣬⣿
    +⣿⣿⣿⠏⣠⡄⠹⠁⠰⢶⣤⣤⣀⡀⠀⠀⠀⠀⠀⠉⢉⣿⠟⠁⠀⠀⣠⣾⣿⣿
    +⣿⣿⡿⠂⠙⠻⡆⠀⠀⠀⠀⠈⠉⠛⠛⠷⠶⣦⣤⣴⠟⠁⠀⠀⣠⣾⣿⣿⣿⣿
    +⣿⣿⡇⠸⣿⣄⠀⠰⠶⢶⣤⣄⣀⡀⠀⠀⠀⣴⣟⠁⠀⠀⣠⣾⣿⣿⣿⣿⣿⣿
    +⣿⡟⠀⣶⣀⠃⠀⠀⠀⠀⠀⠈⠉⠙⠛⠓⢾⡟⢙⣷⣤⢾⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⠋⣀⡉⠻⠀⠘⠛⠻⠶⢶⣤⣤⣀⡀⢠⠿⠟⠛⠉⠁⣸⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⡀⠛⠳⠆⠀⠀⠀⠀⠀⠀⠀⠉⠉⠛⠛⠷⠶⣦⠄⢀⣿⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣶⣦⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⣿⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⣶⣤⣤⣀⣀⠀⠀⠀⢠⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
    +⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣶⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
     

    -Note that kubelet and cAdvisor require HTTPS with bearer token authentication (using the service account token mounted by Kubernetes), while kube-state-metrics is a simple HTTP target. cAdvisor is scraped less frequently (60s) because it returns many more metrics with higher cardinality.
    +

    Table of Contents



    -

    Infrastructure metrics: Kafka, Redis, MinIO


    +
    +

    The Joy of Being Offline



    -Application metrics weren't enough. I also needed visibility into the data layer. Each infrastructure component has a specific role in X-RAG and got its own exporter:
    +I keep my Supernote Nomad offline at all times. No Wi-Fi, no cloud sync, just me and my notes. And honestly, it's great.

    -Redis is the caching layer. It stores search results and embeddings to avoid redundant API calls to OpenAI. We collect 25 metrics via oliver006/redis_exporter running as a sidecar, including cache hit/miss rates, memory usage, connected clients, and command latencies. The key metric? redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) tells you if caching is actually helping.
    +With Wi-Fi off, the battery lasts about a week on a single charge (how convenient :-)).

    -Kafka is the message queue connecting the ingestion API to the indexer. Documents are published to a topic, and the indexer consumes them asynchronously. We collect 12 metrics via danielqsj/kafka-exporter, with consumer lag being the most critical—it shows how far behind the indexer is. High lag means documents aren't being indexed fast enough.
    +Privacy was my main concern, though. I don't sync anything to Retta's cloud, so my notes stay mine. No one's reading or mining my stuff. Simple as that.

    -MinIO is the S3-compatible object storage where raw documents are stored before processing. We collect 16 metrics from its native /minio/v2/metrics/cluster endpoint, covering request rates, error counts, storage usage, and cluster health.
    +A picture of the Supernote Nomad

    -You can verify these counts by querying Prometheus directly:
    +

    My Offline Workflow



    -
    -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    -    | jq -r '.data[]' | grep -c '^redis_'
    -25
    -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    -    | jq -r '.data[]' | grep -c '^kafka_'
    -12
    -$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    -    | jq -r '.data[]' | grep -c '^minio_'
    -16
    -
    +My workflow is simple, only relying on a direct USB connection to my Linux laptop.

    -Full Alloy configuration with detailed metric filtering
    +I connect my Supernote Nomad to my Linux laptop via a USB-C cable. The device is automatically recognized as a storage device, and I can directly access the Note folder, which contains all my notes as .note files. I then copy these files to a dedicated archive folder on my laptop.

    -Alloy scrapes all of these and remote-writes to Prometheus:
    +

    Converting Notes to PDF



    -
    -prometheus.scrape "redis_exporter" {
    -  targets = [
    -    {"__address__" = "xrag-redis.rag-system.svc.cluster.local:9121"},
    -  ]
    -  job_name        = "redis"
    -  scrape_interval = "30s"
    -  forward_to      = [prometheus.relabel.redis_filter.receiver]
    -}
    +To make my notes accessible and shareable, I convert them from the proprietary .note format to PDF. For this, I use a fantastic open-source tool called supernote-tool. It's not an official tool from Ratta, but it works flawlessly.
    +
    +https://github.com/jya-dev/supernote-tool
    +
    +I've created a small shell script to automate the conversion process using tis tool. This script, convert-notes-to-pdfs.sh, resides in my notes archive folder:
    +
    + +
    #!/usr/bin/env bash
     
    -prometheus.scrape "kafka_exporter" {
    -  targets = [
    -    {"__address__" = "kafka-exporter.rag-system.svc.cluster.local:9308"},
    -  ]
    -  job_name        = "kafka"
    -  scrape_interval = "30s"
    -  forward_to      = [prometheus.relabel.kafka_filter.receiver]
    +convert () {
    +  find . -name \*.note \
    +    | while read -r note; do
    +        echo supernote-tool convert -a -t pdf "$note" "${note/.note/.pdf}"
    +        supernote-tool convert -a -t pdf "$note" "${note/.note/.pdf}.tmp"
    +        mv "${note/.note/.pdf}.tmp" "${note/.note/.pdf}"
    +        du -hs "$note" "${note/.note/.pdf}"
    +        echo
    +      done
     }
     
    -prometheus.scrape "minio" {
    -  targets = [
    -    {"__address__" = "xrag-minio.rag-system.svc.cluster.local:9000"},
    -  ]
    -  job_name     = "minio"
    -  metrics_path = "/minio/v2/metrics/cluster"
    -  scrape_interval = "30s"
    -  forward_to   = [prometheus.relabel.minio_filter.receiver]
    +# Make the PDFs available on my Phone as well
    +copy () {
    +  if [ ! -d ~/Documents/Supernote ]; then
    +    echo "Directory ~/Documents/Supernote does not exist, skipping"
    +    exit 1
    +  fi
    +
    +  rsync -delete -av --include='*/' --include='*.pdf' --exclude='*' . ~/Documents/Supernote/
    +  echo This was copied from $(pwd) so dont edit manually >~/Documents/Supernote/README.txt
     }
    +
    +convert
    +copy
     

    -Note that MinIO exposes metrics at a custom path (/minio/v2/metrics/cluster) rather than the default /metrics. Each exporter forwards to a relabel component that filters down to essential metrics before sending to Prometheus.
    +This script does two things:
    +
    +
      +
    • It finds all .note files in the current directory and converts them to PDF using supernote-tool.
    • +
    • It copies the generated PDFs to my ~/Documents/Supernote folder.
    • +

    +

    Syncing to my Phone


    +
    +The ~/Documents/Supernote folder on my laptop is synchronized with my phone using Syncthing. This way, I have access to all my notes in PDF format on my phone, wherever I go, without relying on any cloud service.
    +
    +https://syncthing.net/
    +
    +

    Firmware updates


    +
    +One usually updates the software or firmware of the Supernote Nomad via Wi-Fi. However, it is also possible to update it completely offline. To install the firmware update, follow the steps below (the following instructions were copied from the Supernote website):

    -With all metrics in Prometheus, I can use PromQL queries in Grafana dashboards. For example, to check Kafka consumer lag and see if the indexer is falling behind:
    +
      +
    • Connect your Supernote to your PC with a USB-C cable. For macOS, an MTP software (e.g. OpenMTP or Android File Transfer) is required for your Supernote to show up on your Mac.
    • +
    • For Manta, Nomad, A5 X and A6 X devices, copy the firmware (DO NOT UNZIP) to the "Export" folder of Supernote; for A5 and A6 devices, copy the firmware (DO NOT UNZIP) to the root directory of Supernote.
    • +
    • Unplug the USB connection, tap “OK” on your Supernote to continue, and if no prompt pops up, please restart your device directly to proceed to update.
    • +

    +

    The Writing Experience



    -
    -sum by (consumergroup, topic) (kafka_consumergroup_lag)
    -
    +The writing feel of the Supernote Nomad is simply great. The combination of the screen's texture and the ceramic nib of the pen creates a feeling that is remarkably close to writing on real paper. The latency is almost non-existent, and the pressure sensitivity allows for a natural and expressive writing experience. It's great to write on, and it makes me want to take more notes.

    -Or check Redis cache effectiveness:
    +Another picture of the Supernote Nomad

    -
    -redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)
    -
    +

    Conclusion



    -

    Distributed tracing with Tempo


    +The Supernote Nomad has become an additional tool for me. By using it offline, I've created a distraction-free and private note-taking environment. The simple, manual workflow for transferring and converting notes gives me full control over my data, and the writing experience is second to none. If you're looking for a digital notebook that respects your privacy and helps you focus, I highly recommend giving the Supernote Nomad a try with an offline-first approach.

    -

    Understanding traces, spans, and the trace tree


    +The Supernote didn't fully replace my traditional paper journals, though. Each of them has its own use case. However, that is outside the scope of this blog post.

    -Before diving into the implementation, let me explain the core concepts I learned. A trace represents a single request's journey through the entire distributed system. Think of it as a receipt that follows your request from the moment it enters the system until the final response.
    +Other related posts:

    -Each trace is identified by a trace ID—a 128-bit identifier (32 hex characters) that stays constant across all services. When I make a search request, every service handling that request uses the same trace ID: 9df981cac91857b228eca42b501c98c6.
    +2026-01-01 Using Supernote Nomad offline (You are currently reading this)
    +2026-01-01 Cloudless Kobo Forma with KOReader

    -Quick video explaining the difference between trace IDs and span IDs in OpenTelemetry
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Within a trace, individual operations are recorded as spans. A span has:
    +Back to the main site
    +
    +
    +
    + + Posts from July to December 2025 + + gemini://foo.zone/gemfeed/2026-01-01-posts-from-july-to-december-2025.gmi + 2025-12-31T15:49:06+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Hello there, I wish you all a happy new year! These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay. + +
    +

    Posts from July to December 2025


    +
    +Published at 2025-12-31T15:49:06+02:00
    +
    +Hello there, I wish you all a happy new year! These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.
    +
    +These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.
    +
    +My about page
    +https://codeberg.org/snonux/gos
    +
    +

    Table of Contents




    -The first span in a trace is the root span—it has no parent. When the root span calls another service, that service creates a child span with the root's span ID as its parent. This parent-child relationship forms a tree structure:
    +

    July 2025



    -
    -                        ┌─────────────────────────┐
    -                        │      Root Span          │
    -                        │  POST /api/search       │
    -                        │  span_id: a1b2c3d4...   │
    -                        │  parent: (none)         │
    -                        └───────────┬─────────────┘
    -                                    │
    -              ┌─────────────────────┴─────────────────────┐
    -              │                                           │
    -              ▼                                           ▼
    -┌─────────────────────────┐             ┌─────────────────────────┐
    -│      Child Span         │             │      Child Span         │
    -│  gRPC Search            │             │  render_template        │
    -│  span_id: e5f6g7h8...   │             │  span_id: i9j0k1l2...   │
    -│  parent: a1b2c3d4...    │             │  parent: a1b2c3d4...    │
    -└───────────┬─────────────┘             └─────────────────────────┘
    -            │
    -            ├──────────────────┬──────────────────┐
    -            ▼                  ▼                  ▼
    -     ┌────────────┐     ┌────────────┐     ┌────────────┐
    -     │ Grandchild │     │ Grandchild │     │ Grandchild │
    -     │ embedding  │     │ vector     │     │ llm.rag    │
    -     │ .generate  │     │ _search    │     │ _completion│
    -     └────────────┘     └────────────┘     └────────────┘
    -
    +

    In #Golang, values are actually copied when ...


    +
    +In #Golang, values are actually copied when assigned (boxed) into an interface. That can have performance impact.
    +
    +goperf.dev/01-common-patterns/interface-boxing/
    +
    +

    Same experiences I had, but it's a time saver. ...


    +
    +Same experiences I had, but it's a time saver. and when done correctly, those tools are amazing: #llm #coding #programming
    +
    +lucumr.pocoo.org/2025/06/21/my-first-ai-library/
    +
    +

    We (programmers) all use them (I hope): ...


    +
    +We (programmers) all use them (I hope): language servers. LSP stands for Language Server Protocol, which standardizes communication between coding editors or IDEs and language servers, facilitating features like autocompletion, refactoring, linting, error-checking, etc.... It's interesting to look under the hood a little bit to see how your code editor actually communicates with a language server. #LSP #coding #programming

    -This tree structure answers the critical question: "What called what?" When I see a slow span, I can trace up to see what triggered it and down to see what it's waiting on.
    +packagemain.tech/p/understanding-the-language-server-protocol

    -

    How trace context propagates


    +

    Shells of the early unices didnt understand ...



    -The magic that links spans across services is trace context propagation. When Service A calls Service B, it must pass along the trace ID and its own span ID (which becomes the parent). OpenTelemetry uses the W3C traceparent header:
    +Shells of the early unices didnt understand file globbing, that was done by the external glob command! #unix #history #shell

    -
    -traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
    -             │   │                                │                 │
    -             │   │                                │                 └── flags
    -             │   │                                └── parent span ID (16 hex)
    -             │   └── trace ID (32 hex)
    -             └── version
    -
    +utcc.utoronto.ca/%7Ecks/space/blog/unix/EtcGlobHistory

    -For HTTP, this travels as a request header. For gRPC, it's passed as metadata. For Kafka, it's embedded in message headers. The receiving service extracts this context, creates a new span with the propagated trace ID and the caller's span ID as parent, then continues the chain.
    +

    I've picked up a few techniques from this blog ...



    -This is why all my spans link together—OpenTelemetry's auto-instrumentation handles propagation automatically for HTTP, gRPC, and Kafka clients.
    +I've picked up a few techniques from this blog post and found them worth sharing here: #ai #llm #prompting #techniques

    -

    Implementation


    +cracking-ai-engineering.com/writing/2025/07/07/four-prompting-paradigms/

    -This is where distributed tracing made the difference. I integrated OpenTelemetry auto-instrumentation for FastAPI, gRPC, and HTTP clients, plus manual spans for RAG-specific operations:
    +

    I've published the sixth part of my "Kubernetes ...



    - -
    from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
    -from opentelemetry.instrumentation.grpc import GrpcAioInstrumentorClient
    -
    -# Auto-instrument frameworks
    -FastAPIInstrumentor.instrument_app(app)
    -GrpcAioInstrumentorClient().instrument()
    -
    -# Manual spans for custom operations
    -with tracer.start_as_current_span("llm.rag_completion") as span:
    -    span.set_attribute("llm.model", model_name)
    -    result = await generate_answer(query, context)
    -
    +I've published the sixth part of my "Kubernetes with FreeBSD" blog series. This time, I set up the storage, which will be used with persistent volume claims later on in the Kubernetes cluster. Have a lot of fun! #freebsd #nfs #ha #zfs #zrepl #carp #kubernetes #k8s #k3s #homelab

    -Auto-instrumentation is the quick win: one line of code and you get spans for every HTTP request, gRPC call, or database query. The instrumentor patches the framework at runtime, so existing code works without modification. The downside? You only get what the library authors decided to capture—generic HTTP attributes like http.method and http.status_code, but nothing domain-specific. Auto-instrumented spans also can't know your business logic, so a slow request shows up as "POST /api/search took 5 seconds" without revealing which internal operation caused the delay.
    +foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.gmi (Gemini)
    +foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.html

    -Manual spans fill that gap. By wrapping specific operations (like llm.rag_completion or vector_search.query), you get visibility into your application's unique behaviour. You can add custom attributes (llm.model, query.top_k, cache.hit) that make traces actually useful for debugging. The downside is maintenance: manual spans are code you write and maintain, and you need to decide where instrumentation adds value versus where it just adds noise. In practice, I found the right balance was auto-instrumentation for framework boundaries (HTTP, gRPC) plus manual spans for the 5-10 operations that actually matter for understanding performance.
    +

    The book "Coders at Work" offers a fascinating ...



    -The magic is trace context propagation. When the Search UI calls the Search Service via gRPC, the trace ID travels in metadata headers:
    +The book "Coders at Work" offers a fascinating glimpse into how programming legends emerged in the early days of computing. I especially enjoyed the personal stories and insights. It would be great to see a new edition reflecting today’s AI and LLM revolution—so much has changed since!

    -
    -Metadata: [
    -  ("traceparent", "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01"),
    -  ("content-type", "application/grpc"),
    -]
    -
    +www.goodreads.com/book/show/6713575-coders-at-work

    -Spans from all services are linked by this trace ID, forming a tree:
    +

    For me, that's all normal. Couldn't imagine a ...



    -
    -Trace ID: 0af7651916cd43dd8448eb211c80319c
    -
    -├─ [search-ui] POST /api/search (300ms)
    -│   │
    -│   ├─ [search-service] Search (gRPC server) (275ms)
    -│   │   │
    -│   │   ├─ [search-service] embedding.generate (50ms)
    -│   │   │   └─ [embedding-service] Embed (45ms)
    -│   │   │       └─ POST https://api.openai.com (35ms)
    -│   │   │
    -│   │   ├─ [search-service] vector_search.query (100ms)
    -│   │   │
    -│   │   └─ [search-service] llm.rag_completion (120ms)
    -│           └─ openai.chat (115ms)
    -
    +For me, that's all normal. Couldn't imagine a simpler job. #software

    -

    Alloy configuration for traces


    +0x1.pt/2025/04/06/the-insanity-of-being-a-software-engineer/

    -Traces are collected by Alloy and stored in Grafana Tempo. Alloy batches traces for efficiency before exporting via OTLP:
    +

    This is similar to my #dtail project. It got ...



    -
    -otelcol.processor.batch "traces" {
    -  timeout = "5s"
    -  send_batch_size = 500
    -  output { traces = [otelcol.exporter.otlp.tempo.input] }
    -}
    -
    -otelcol.exporter.otlp "tempo" {
    -  client {
    -    endpoint = "tempo.monitoring.svc.cluster.local:4317"
    -    tls { insecure = true }
    -  }
    -}
    -
    +This is similar to my #dtail project. It got some features, which dtail doesnt, and dtail has some features, which #nerdlog hasnt. But the principle is the same, both tools don't have a centralised log store and both use SSH to connect to the servers (sources of the logs) directly.

    -In Tempo's UI, I can finally see exactly where time is spent. That 5-second query? Turns out the vector search was waiting on a cold Weaviate connection. Now I knew what to fix.
    +github.com/dimonomid/nerdlog

    -

    Async ingestion trace walkthrough


    +

    I also feel the most comfortable in the ...



    -One of the most powerful aspects of distributed tracing is following requests across async boundaries like message queues. The document ingestion pipeline flows through Kafka, creating spans that are linked even though they execute in different processes at different times.
    +I also feel the most comfortable in the #terminal. There are a few high-level tools where it doesn't make always a lot of sense like web-browsing most of the web, but for most of the things I do, I prefer the terminal. I think it's a good idea to have a terminal-based interface for most of the things you do. It makes it easier to automate things and to work with other tools.

    -

    Step 1: Ingest a document


    +lambdaland.org/posts/2025-05-13_real_programmers/

    -
    -$ curl -s -X POST http://localhost:8082/ingest \
    -  -H "Content-Type: application/json" \
    -  -d '{
    -    "text": "This is the X-RAG Observability Guide...",
    -    "metadata": {
    -      "title": "X-RAG Observability Guide",
    -      "source_file": "docs/OBSERVABILITY.md",
    -      "type": "markdown"
    -    },
    -    "namespace": "default"
    -  }' | jq .
    -{
    -  "document_id": "8538656a-ba99-406c-8da7-87c5f0dda34d",
    -  "status": "accepted",
    -  "minio_bucket": "documents",
    -  "minio_key": "8538656a-ba99-406c-8da7-87c5f0dda34d.json",
    -  "message": "Document accepted for processing"
    -}
    -
    +

    I have been enjoying lately as an alternative ...



    -The ingestion API immediately returns—it doesn't wait for indexing. The document is stored in MinIO and a message is published to Kafka.
    +I have been enjoying lately as an alternative TUI to Claude Code CLI. It is a 100% open-source agentic coding tool, which supports all models from including local ones (e.g. DeepSeek), and has got some nice tweaks like side-by-side diffs and you can also use your favourite text $EDITOR for prompt editing! Highly recommend! #llm #coding #programming #agentic #ai

    -

    Step 2: Find the ingestion trace


    +opencode.ai
    +models.dev

    -Using Tempo's HTTP API (port 3200), we can search for traces by span name using TraceQL:
    +

    Jonathan's reflection of 10 years of ...



    -
    -$ curl -s -G "http://localhost:3200/api/search" \
    -  --data-urlencode 'q={name="POST /ingest"}' \
    -  --data-urlencode 'limit=3' | jq '.traces[0].traceID'
    -"b3fc896a1cf32b425b8e8c46c86c76f7"
    -
    +Jonathan's reflection of 10 years of programming!

    -

    Step 3: Fetch the complete trace


    +jonathan-frere.com/posts/10-years-of-programming/

    -
    -$ curl -s "http://localhost:3200/api/traces/b3fc896a1cf32b425b8e8c46c86c76f7" \
    -  | jq '[.batches[] | ... | {service, span}] | unique'
    -[
    -  { "service": "ingestion-api", "span": "POST /ingest" },
    -  { "service": "ingestion-api", "span": "storage.upload" },
    -  { "service": "ingestion-api", "span": "messaging.publish" },
    -  { "service": "indexer", "span": "indexer.process_document" },
    -  { "service": "indexer", "span": "document.duplicate_check" },
    -  { "service": "indexer", "span": "document.pipeline" },
    -  { "service": "indexer", "span": "storage.download" },
    -  { "service": "indexer", "span": "/xrag.embedding.EmbeddingService/EmbedBatch" },
    -  { "service": "embedding-service", "span": "openai.embeddings" },
    -  { "service": "indexer", "span": "db.insert" }
    -]
    -
    +

    Some neat zero-copy #Golang tricks here ...



    -The trace spans three services: ingestion-api, indexer, and embedding-service. The trace context propagates through Kafka, linking the original HTTP request to the async consumer processing.
    +Some neat zero-copy #Golang tricks here

    -

    Step 4: Analyse the async trace


    +goperf.dev/01-common-patterns/zero-copy/

    -
    -ingestion-api | POST /ingest             |   16ms  ← HTTP response returns
    -ingestion-api | storage.upload           |   13ms  ← Save to MinIO
    -ingestion-api | messaging.publish        |    1ms  ← Publish to Kafka
    -              |                          |         
    -              | ~~~ Kafka queue ~~~      |         ← Async boundary
    -              |                          |         
    -indexer       | indexer.process_document | 1799ms  ← Consumer picks up message
    -indexer       | document.duplicate_check |    1ms
    -indexer       | document.pipeline        | 1796ms
    -indexer       | storage.download         |    1ms  ← Fetch from MinIO
    -indexer       | EmbedBatch (gRPC)        |  754ms  ← Call embedding service
    -embedding-svc | openai.embeddings        |  752ms  ← OpenAI API
    -indexer       | db.insert                | 1038ms  ← Store in Weaviate
    -
    +

    What was it like working at GitLab? A scary ...



    -The total async processing takes ~1.8 seconds, but the user sees a 16ms response. Without tracing, debugging "why isn't my document showing up in search results?" would require correlating logs from three services manually.
    +What was it like working at GitLab? A scary moment was the deletion of the gitlab.com database, though fortunately, there was a six-hour-old copy on the staging server. More people don't necessarily produce better results. Additionally, Ruby's metaprogramming isn't ideal for large projects. A burnout. And many more insights....

    -Key insight: The trace context propagates through Kafka message headers, allowing the indexer's spans to link back to the original ingestion request. This is configured via OpenTelemetry's Kafka instrumentation.
    +yorickpeterse.com/articles/what-it-was-like-working-for-gitlab/

    -

    Viewing traces in Grafana


    +

    I have learned a lot from the Practical #AI ...



    -To view a trace in Grafana's UI:
    +I have learned a lot from the Practical #AI #podcast, especially from episode 312, which discusses the #MCP (model context protocol). Are there any MCP servers you plan to use or to build?

    -1. Open Grafana at http://localhost:3000/explore
    -2. Select Tempo as the data source (top-left dropdown)
    -3. Choose TraceQL as the query type
    -4. Paste the trace ID: b3fc896a1cf32b425b8e8c46c86c76f7
    -5. Click Run query
    +practicalai.fm/312

    -The trace viewer shows a Gantt chart with all spans, their timing, and parent-child relationships. Click any span to see its attributes.
    +

    August 2025



    -Async ingestion trace in Grafana Tempo
    +

    At the end of the article it's mentione that ...



    -Ingestion trace node graph showing service dependencies
    +At the end of the article it's mentione that it's difficult to stay in the zone when AI does the coding for you. I think it's possible to stay in the zon, but only when you use AI surgically. #llm #ai #programming

    -

    End-to-end search trace walkthrough


    +newsletter.pragmaticengineer.com/p/cur..-..email=true&r=4ijqut&triedRedirect=true

    -To demonstrate the observability stack in action, here's a complete trace from a search request through all services.
    +

    Great blog post a out #OpenBSDAmsterdam, of ...



    -

    Step 1: Make a search request


    +Great blog post a out #OpenBSDAmsterdam, of which I am a customer too for some years now. #OpenBSD

    -Normally you'd use the Search UI web interface at http://localhost:8080, but for demonstration purposes curl makes it easier to show the raw request and response:
    +www.tumfatig.net/2025/cruising-a-vps-at-openbsd-amsterdam/

    -
    -$ curl -s -X POST http://localhost:8080/api/search \
    -  -H "Content-Type: application/json" \
    -  -d '{"query": "What is RAG?", "namespace": "default", "mode": "hybrid", "top_k": 5}' | jq .
    -{
    -  "answer": "I don't have enough information to answer this question.",
    -  "sources": [
    -    {
    -      "id": "71adbc34-56c1-4f75-9248-4ed38094ac69",
    -      "content": "# X-RAG Observability Guide This document describes...",
    -      "score": 0.8292956352233887,
    -      "metadata": {
    -        "source": "docs/OBSERVABILITY.md",
    -        "type": "markdown",
    -        "namespace": "default"
    -      }
    -    }
    -  ],
    -  "metadata": {
    -    "namespace": "default",
    -    "num_sources": "5",
    -    "cache_hit": "False",
    -    "mode": "hybrid",
    -    "top_k": "5",
    -    "trace_id": "9df981cac91857b228eca42b501c98c6"
    -  }
    -}
    -
    +

    Interesting. #llm #ai #slowdown ...


    +
    +Interesting. #llm #ai #slowdown
    +
    +m.slashdot.org/story/444304
    +
    +

    With the help of genai, I could generate this ...


    +
    +With the help of genai, I could generate this neat small showcase site, of many of my small to medium sized side projects. The projects descriptions were generated by Claude Code CLI with Sonnet 4 based on the git repo contents. The page content by gitsyncer, a tool I created (listed on the showcase page as well) and gemtexter, which did the HTML generation part (another tool I wrote, listed on the showcase page as well). The stats seem neat, over time a lot of stuff starts to pile up! With the age of AI (so far, only 8 projects were created AI-assisted), I think more projects will spin up faster (not just for me, but for everyone working on side projects). I have more (older) side projects archived on my local NAS, but they are not worth digging out... 📦 Total Projects: 55 📊 Total Commits: 10,379 📈 Total Lines of Code: 252,969 📄 Total Lines of Documentation: 24,167 💻 Languages: Java (22.4%), Go (17.6%), HTML (14.0%), C++ (8.9%), C (7.3%), Perl (6.3%), Shell (6.3%), C/C++ (5.8%), XML (4.6%), Config (1.5%), Ruby (1.1%), HCL (1.1%), Make (0.7%), Python (0.6%), CSS (0.6%), JSON (0.3%), Raku (0.3%), Haskell (0.2%), YAML (0.2%), TOML (0.1%) 📚 Documentation: Text (47.4%), Markdown (38.4%), LaTeX (14.2%) 🤖 AI-Assisted Projects: 8 out of 55 (14.5% AI-assisted, 85.5% human-only) 🚀 Release Status: 31 released, 24 experimental (56.4% with releases, 43.6% experimental) #llm #genai #showcase #coding #programming
    +
    +foo.zone/about/showcase.gmi (Gemini)
    +foo.zone/about/showcase.html
    +
    +

    I tinkered a bit with local LLMs for coding: ...


    +
    +I tinkered a bit with local LLMs for coding: #llm #local #ai #coding #ollama #qwen #deepseek #HelixEditor #LSP #codecompletion #aider
    +
    +foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.gmi (Gemini)
    +foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.html
    +
    +

    Good stuff: 10 years of functional options and ...


    +
    +Good stuff: 10 years of functional options and key lessons Learned along the way #golang
    +
    +www.bytesizego.com/blog/10-years-functional-options-golang

    -The response includes a trace_id that links this request to all spans across services.
    +

    Top 5 performance boosters #golang ...



    -

    Step 2: Query Tempo for the trace


    +Top 5 performance boosters #golang

    -Using the trace ID from the response, query Tempo's API:
    +blog.devtrovert.com/p/go-performance-boosters-the-top-5

    -
    -$ curl -s "http://localhost:3200/api/traces/9df981cac91857b228eca42b501c98c6" \
    -  | jq '.batches[].scopeSpans[].spans[] 
    -        | {name, service: .attributes[] 
    -           | select(.key=="service.name") 
    -           | .value.stringValue}'
    -
    +

    This person found the balance.. although I ...



    -The raw trace shows spans from multiple services:
    +This person found the balance.. although I would use a different code editor: Why Open Source Maintainers Thrive in the LLM Era via @wallabagapp #ai #llm #coding #programming

    -
      -
    • search-ui: POST /api/search (root span, 2138ms total)
    • -
    • search-ui: /xrag.search.SearchService/Search (gRPC client call)
    • -
    • search-service: /xrag.search.SearchService/Search (gRPC server)
    • -
    • search-service: /xrag.embedding.EmbeddingService/Embed (gRPC client)
    • -
    • embedding-service: /xrag.embedding.EmbeddingService/Embed (gRPC server)
    • -
    • embedding-service: openai.embeddings (OpenAI API call, 647ms)
    • -
    • embedding-service: POST https://api.openai.com/v1/embeddings (HTTP client)
    • -
    • search-service: vector_search.query (Weaviate hybrid search, 13ms)
    • -
    • search-service: openai.chat (LLM answer generation, 1468ms)
    • -
    • search-service: POST https://api.openai.com/v1/chat/completions (HTTP client)
    • -

    -

    Step 3: Analyse the trace


    +mikemcquaid.com/why-open-source-maintainers-thrive-in-the-llm-era/

    -From this single trace, I can see exactly where time is spent:
    +

    Let's rewrite all slow in #assembly, surely ...



    -
    -Total request:                     2138ms
    -├── gRPC to search-service:        2135ms
    -│   ├── Embedding generation:       649ms
    -│   │   └── OpenAI embeddings API:   640ms
    -│   ├── Vector search (Weaviate):    13ms
    -│   └── LLM answer generation:     1468ms
    -│       └── OpenAI chat API:       1463ms
    -
    +Let's rewrite all slow in #assembly, surely it's not just about the language but also about the architecture and the algorithms used. Still, impressive.

    -The bottleneck is clear: 68% of time is spent in LLM answer generation. The vector search (13ms) and embedding generation (649ms) are relatively fast. Without tracing, I would have guessed the embedding service was slow—traces proved otherwise.
    +x.com/FFmpeg/status/1945478331077374335

    -

    Step 4: Search traces with TraceQL


    +

    How to store data forever? #storage ...



    -Tempo supports TraceQL for querying traces by attributes:
    +How to store data forever? #storage #archiving

    -
    -$ curl -s -G "http://localhost:3200/api/search" \
    -  --data-urlencode 'q={resource.service.name="search-service"}' \
    -  --data-urlencode 'limit=5' | jq '.traces[:2] | .[].rootTraceName'
    -"/xrag.search.SearchService/Search"
    -"GET /health/ready"
    -
    +drewdevault.com/2020/04/22/How-to-store-data-forever.html

    -Other useful TraceQL queries:
    +

    No wonder, that almost everyone doing something ...



    -
    -# Find slow searches (> 2 seconds)
    -{resource.service.name="search-ui" && name="POST /api/search"} | duration > 2s
    -
    -# Find errors
    -{status=error}
    -
    -# Find OpenAI calls
    -{name=~"openai.*"}
    -
    +No wonder, that almost everyone doing something with AI is releasing their own aentic coding tool now. As it's so dead simple to write one. #ai #llm #agenticcoding

    -

    Viewing the search trace in Grafana


    +ampcode.com/how-to-build-an-agent

    -Follow the same steps as above, but use the search trace ID: 9df981cac91857b228eca42b501c98c6
    +

    Another drawback of running load tests in a ...



    -Search trace in Grafana Tempo
    +Another drawback of running load tests in a pre-prod environment is that it is not always possible to reproduce production load, especially in a complex environment. I personally prefer a combination of pre-prod load testing, production canaries, and gradual production deployment. What are your thoughts? #sre #loadtesting #lt #loadtesting

    -Search trace node graph showing service flow
    +thefridaydeploy.substack.com/p/load-testing-prepare-for-the-growth

    -

    Correlating the three signals


    +

    Interesting read Learnings from two years of ...



    -The real power comes from correlating traces, metrics, and logs. When an alert fires for high error rate, I follow this workflow:
    +Interesting read Learnings from two years of using AI tools for software engineering #ai #llm #genai

    -1. Metrics: Prometheus shows error spike started at 10:23:00
    -2. Traces: Query Tempo for traces with status=error around that time
    -3. Logs: Use the trace ID to find detailed error messages in Loki
    +newsletter.pragmaticengineer.com/p/two-years-of-using-ai

    -
    -{namespace="rag-system"} |= "trace_id=abc123" |= "error"
    -
    +

    Neat little story a school girl writing her ...



    -Prometheus exemplars link specific metric samples to trace IDs, so I can click directly from a latency spike to the responsible trace.
    +Neat little story a school girl writing her first (and only) malware and have it infected her school.

    -

    Grafana dashboards


    +ntietz.com/blog/that-time-i-wrote-malware/

    -During the hackathon, I also created six pre-built Grafana dashboards that are automatically provisioned when the monitoring stack starts:
    +

    Happy, that I am not yet obsolete! #llm ...



    -| Dashboard | Description |
    -|-----------|-------------|
    -| **X-RAG Overview** | The main dashboard with 22 panels covering request rates, latencies, error rates, and service health across all X-RAG components |
    -| **OpenTelemetry HTTP Metrics** | HTTP request/response metrics from OpenTelemetry-instrumented services—request rates, latency percentiles, and status code breakdowns |
    -| **Pod System Metrics** | Kubernetes pod resource utilisation: CPU usage, memory consumption, network I/O, disk I/O, and pod state from kube-state-metrics |
    -| **Redis** | Cache performance: memory usage, hit/miss rates, commands per second, connected clients, and memory fragmentation |
    -| **Kafka** | Message queue health: consumer lag (critical for indexer monitoring), broker status, topic partitions, and throughput |
    -| **MinIO** | Object storage metrics: S3 request rates, error counts, traffic volume, bucket sizes, and disk usage |
    +Happy, that I am not yet obsolete! #llm #sre

    -All dashboards are stored as JSON files in infra/k8s/monitoring/grafana-dashboards/ and deployed via ConfigMaps, so they survive pod restarts and cluster recreations.
    +clickhouse.com/blog/llm-observability-challenge

    -X-RAG Overview dashboard
    -Pod System Metrics dashboard
    +

    September 2025



    -

    Results: two days well spent


    +

    Loving this as well: #slackware #linux ...



    -What did two days of hackathon work achieve? The system went from flying blind to fully instrumented:
    +Loving this as well: #slackware #linux

    -
      -
    • All three pillars implemented: logs (Loki), metrics (Prometheus), traces (Tempo)
    • -
    • Unified collection via Grafana Alloy
    • -
    • Infrastructure metrics for Kafka, Redis, and MinIO
    • -
    • Six pre-built Grafana dashboards covering application metrics, pod resources, and infrastructure
    • -
    • Trace context propagation across all gRPC calls
    • -

    -The biggest insight from testing? The embedding service wasn't the bottleneck I assumed. Traces revealed that LLM synthesis dominated latency, not embedding generation. Without tracing, optimisation efforts would have targeted the wrong component.
    +www.osnews.com/story/142145/what-makes-slackware-different/

    -Beyond the technical wins, I had a lot of fun. The hackathon brought together people working on different projects, and I got to know some really nice folks during the sessions themselves. There's something energising about being in a (virtual) room with other people all heads-down on their own challenges—even if you're not collaborating directly, the shared focus is motivating.
    +

    Some #fun: Random Weird Things Part III blog ...



    -

    SLIs, SLOs and SLAs


    +Some #fun: Random Weird Things Part III blog post

    -The system now has full observability, but there's always more. And to be clear: this is not production-grade yet. It works well for development and could scale to production, but that would need to be validated with proper load testing and chaos testing first. We haven't stress-tested the observability pipeline under heavy load, nor have we tested failure scenarios like Tempo going down or Alloy running out of memory. The Alloy config includes comments on sampling strategies and rate limiting that would be essential for high-traffic environments.
    +foo.zone/gemfeed/2025-08-15-random-weird-things-iii.gmi (Gemini)
    +foo.zone/gemfeed/2025-08-15-random-weird-things-iii.html

    -One thing we didn't cover: monitoring and alerting. These are related but distinct from observability. Observability is about collecting and exploring data to understand system behaviour. Monitoring is about defining thresholds and alerting when they're breached. We have Prometheus with all the metrics, but no alerting rules yet—no PagerDuty integration, no Slack notifications when latency spikes or error rates climb.
    +

    Yes, write more useless software. I agree that ...



    -We also didn't define any SLIs (Service Level Indicators) or SLOs (Service Level Objectives). An SLI is a quantitative measure of service quality—for example, "99th percentile search latency" or "percentage of requests returning successfully." An SLO is a target for that indicator—"99th percentile latency should be under 2 seconds" or "99.9% of requests should succeed." Without SLOs, you don't know what "good" looks like, and alerting becomes arbitrary.
    +Yes, write more useless software. I agree that play has a vital role in learning and experimentation. Also, programming is a lot of fun this way. I've learned programming mostly by writing useless software or almost useful tools for myself, but I can now apply all that knowledge to real work as well. #coding #programming

    -For X-RAG specifically, potential SLOs might include:
    +ntietz.com/blog/write-more-useless-software/

    -
      -
    • Search latency: 99th percentile over 5 minutes search response time under 3 seconds
    • -
    • Uptime: 99.9% availability of the search API endpoint
    • -
    • Response quality: How good was the search? There are some metrics which could be used...
    • -

    -SLAs (Service Level Agreements) are often confused with SLOs, but they're different. An SLA is a contractual commitment to customers—a legally binding promise with consequences (refunds, credits, penalties) if you fail to meet it. SLOs are internal engineering targets; SLAs are external business promises. Typically, SLAs are less strict than SLOs: if your internal target is 99.9% availability (SLO), your customer contract might promise 99.5% (SLA), giving you a buffer before you owe anyone money.
    +

    I learned a lot from this #OpenBSD #relayd ...



    -But then again, X-RAG is a proof-of-concept, a prototype, a learning system—there are no real customers to disappoint. SLOs would become essential if this ever served actual users, and SLAs would follow once there's a business relationship to protect.
    +I learned a lot from this #OpenBSD #relayd talk, and I already put the information into production! I know the excellent OpenBSD manual pages document everything, but it is a bit different when you see it presented in a talk.

    -

    Using Amp for AI-assisted development


    +www.youtube.com/watch?v=yW8QSZyEs6E

    -I used Amp (formerly Ampcode) throughout this project. While I knew what I wanted to achieve, I let the LLM generate the actual configurations, Kubernetes manifests, and Python instrumentation code.
    +

    Six weeks of claude code



    -Amp - AI coding agent by Sourcegraph
    +blog.puzzmo.com/posts/2025/07/30/six-weeks-of-claude-code/

    -My workflow was step-by-step rather than handing over a grand plan:
    +

    It's good that there is now a truly open-source ...



    -1. "Deploy Grafana Alloy to the monitoring namespace"
    -2. "Verify Alloy is running and receiving data"
    -3. "Document what we did to docs/OBSERVABILITY.md"
    -4. "Commit with message 'feat: add Grafana Alloy for telemetry collection'"
    -5. Hand off context, start fresh: "Now instrument the search-ui with OpenTelemetry to push traces to Alloy..."
    +It's good that there is now a truly open-source LLM model; I'm just wondering how it will perform. The difference compared to other open models is that the others only provide open weights, but you can't reproduce the training! That issue would be solved with this Swiss model. I will definitively have a look! #llm #opensource #privacy

    -Chaining many small, focused tasks worked better than one massive plan. Each task had clear success criteria, and I could verify results before moving on. The LLM generated the River configuration, the OpenTelemetry Python code, the Kubernetes manifests—I reviewed, tweaked, and committed.
    +m.slashdot.org/story/446310

    -I only ran out of the 200k token context window once, during a debugging session that involved restarting the Kubernetes cluster multiple times. The fix required correlating error messages across several services, and the conversation history grew too long. Starting a fresh context and summarising the problem solved it.
    +

    Have to try this at some point ...



    -Amp automatically selects the best model for the task at hand. Based on the response speed and Sourcegraph's recent announcements, I believe it was using Claude Opus 4.5 for most of my coding and infrastructure work. The quality was excellent—it understood Python, Kubernetes, OpenTelemetry, and Grafana tooling without much hand-holding.
    +Have to try this at some point, troubleshooting #k8s with the help of #genai

    -Let me be clear: without the LLM, I'd never have managed to write all these configuration files by hand in two days. The Alloy config alone is 1400+ lines. But I also reviewed and verified every change manually, verified it made sense, and understood what was being deployed. This wasn't vibe-coding—the whole point of the hackathon was to learn. I already knew Grafana and Prometheus from previous work, but OpenTelemetry, Alloy, Tempo, Loki and the X-RAG system overall were all pretty new to me. By reviewing each generated config and understanding why it was structured that way, I actually learned the tools rather than just deploying magic incantations.
    +blog.palark.com/k8sgpt-ai-troubleshooting-kubernetes/

    -Cost-wise, I spent around 20 USD on Amp credits over the two-day hackathon. For the amount of code generated, configs reviewed, and debugging assistance—that's remarkably affordable.
    +

    I could not agree more. For me, a personal ...



    -

    Other changes along the way


    +I could not agree more. For me, a personal (tech oriented) website is not a business contact card, but a playground to experience and learn with/about technologies. The Value of a Personal Site #website #personal #tech

    -Looking at the git history, I made 25 commits during the hackathon. Beyond the main observability features, there were several smaller but useful additions:
    +atthis.link/blog/2021/personalsite.html

    -OBSERVABILITY_ENABLED flag: Added an environment variable to completely disable the monitoring stack. Set OBSERVABILITY_ENABLED=false in .env and the cluster starts without Prometheus, Grafana, Tempo, Loki, or Alloy. Useful when you just want to work on application code without the overhead.
    +

    The true enterprise developer can write Java in ...



    -Load generator: Added a make load-gen target that fires concurrent requests at the search API. Useful for generating enough trace data to see patterns in Tempo, and for stress-testing the observability pipeline itself.
    +The true enterprise developer can write Java in any language. #java #programming

    -Verification scripts: Created scripts to test that OTLP is actually reaching Alloy and that traces appear in Tempo. Debugging "why aren't my traces showing up?" is frustrating without a systematic way to verify each hop in the pipeline.
    +

    #fx is a neat little tool for viewing JSON ...



    -Moving monitoring to dedicated namespace: Refactored from having observability components scattered across namespaces to a clean monitoring namespace. Makes kubectl get pods -n monitoring show exactly what's running for observability.
    +#fx is a neat little tool for viewing JSON files!

    -

    Lessons learned


    +fx.wtf

    -
      -
    • Start with metrics, but don't stop there—they tell you *what*, not *why*
    • -
    • Trace context propagation is the key to distributed debugging
    • -
    • Grafana Alloy as a unified collector simplifies the pipeline
    • -
    • Infrastructure metrics matter—your app is only as fast as your data layer
    • -
    • The three pillars work together; none is sufficient alone
    • -

    -All manifests and observability code live in Florian's repository:
    +

    I wish I had as much time as this guy. He ...



    -X-RAG on GitHub (source code, K8s manifests, observability configs)
    +I wish I had as much time as this guy. He writes entire operating systems, including a Unix clone called "Bunnix" in a month. He is also the inventor of the Hare programming language (If I am not wrong). Now, he is also creating a new shell, primarily for his other operating systems and kernels he is working on. #shell #unix #programming #operatingsystem #bunnix #hare

    -The best part? Everything I learned during this hackathon—OpenTelemetry instrumentation, Grafana Alloy configuration, trace context propagation, PromQL queries—I can immediately apply at work as we are shifting to that new observability stack and I am going to have a few meetings talking with developers how and what they need to implement for application instrumentalization. Observability patterns are universal, and hands-on experience with a real distributed system beats reading documentation any day.
    +drewdevault.com/2023/04/18/2023-04-18-A-new-shell-for-Unix.html

    -E-Mail your comments to paul@nospam.buetow.org
    +

    What exactly was the point of [ “x$var” = ...



    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 8: Observability - - gemini://foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.gmi - 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 8: Observability


    +What exactly was the point of [ “x$var” = “xval” ]? #bash #shell #posix #sh #history

    -Published at 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026
    +www.vidarholen.net/contents/blog/?p=1035

    -This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.
    +

    Neat #ZFS feature (here #FreeBSD) which I ...



    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)
    +Neat #ZFS feature (here #FreeBSD) which I didn't know of before: Pool snapshots, which are different to snapshots of individual data sets:

    -f3s logo
    +it-notes.dragas.net/2024/07/01/enhanci..-..d-stability-with-zfs-pool-checkpoints/
    +
    +

    Longer hours help only short term. About 40 ...


    +
    +Longer hours help only short term. About 40 hours #productivity
    +
    +thesquareplanet.com/blog/about-40-hours/
    +
    +

    You could also use #bpf instead of #strace, ...


    +
    +You could also use #bpf instead of #strace, albeit modern strace uses bpf if told so: How to use the new Docker Seccomp profiles
    +
    +blog.jessfraz.com/post/how-to-use-new-docker-seccomp-profiles/
    +
    +

    Some great things are approaching #bhyve on ...


    +
    +Some great things are approaching #bhyve on #FreeBSD and VM Live Migration – Quo vadis? #freebsd #virtualization #bhyve
    +
    +gyptazy.com/bhyve-on-freebsd-and-vm-live-migration-quo-vadis/
    +
    +

    Another synchronization tool part of the ...


    +
    +Another synchronization tool part of the #golang std lib, singleflight! Used to not overload external resources (like DBs) with N concurrent requests. Useful!
    +
    +victoriametrics.com/blog/go-singleflight/index.html
    +
    +

    Too many open files #linux ...



    -

    Table of Contents


    +Too many open files #linux

    -
    -

    Introduction


    +mattrighetti.com/2025/06/04/too-many-files-open.html

    -In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what's happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the monitoring namespace:
    +

    Just posted Part 4 of my #Bash #Golf ...



    -
      -
    • Prometheus: time-series database for metrics collection and alerting
    • -
    • Grafana: visualisation and dashboarding frontend
    • -
    • Loki: log aggregation system (like Prometheus, but for logs)
    • -
    • Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo
    • -
    • Tempo: distributed tracing backend for request flow analysis across microservices
    • -

    -Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.
    +Just posted Part 4 of my #Bash #Golf series:

    -All manifests for the f3s stack live in my configuration repository:
    +foo.zone/gemfeed/2025-09-14-bash-golf-part-4.gmi (Gemini)
    +foo.zone/gemfeed/2025-09-14-bash-golf-part-4.html

    -codeberg.org/snonux/conf/f3s
    +

    #Perl is like a swiss army knife, as one of ...



    -

    Important Note: GitOps Migration


    +#Perl is like a swiss army knife, as one of the comments states:

    -**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests, Helm charts, and Justfiles in the repository have been reorganized for ArgoCD-based continuous deployment.
    +developers.slashdot.org/story/25/09/14..-..10th-most-popular-programming-language

    -**To view the exact configuration as it existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:
    +

    Personally, mainly working with colorless ...



    - -
    $ git clone https://codeberg.org/snonux/conf.git
    -$ cd conf
    -$ git checkout 15a86f3  # Last commit before ArgoCD migration
    -$ cd f3s/prometheus/
    -
    +Personally, mainly working with colorless languages like #ruby and #golang, now slowly understand the pain ppl would have w/ Rust or JS. It wasn't just me when I got confused writing that Grafana DS plugin in TypeScript...

    -**Current master branch** contains the ArgoCD-managed versions with:
    -
      -
    • Application manifests organized under argocd-apps/{monitoring,services,infra,test}/
    • -
    • Resources organized under prometheus/manifests/, loki/, etc.
    • -
    • Justfiles updated to trigger ArgoCD syncs instead of direct Helm commands
    • -

    -The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD).
    +jpcamara.com/2024/07/15/ruby-methods-are.html

    -

    Persistent storage recap


    +

    How do GPUs work? Usually, people only know ...



    -All observability components need persistent storage so that metrics and logs survive pod restarts. As covered in Part 6 of this series, the cluster uses NFS-backed persistent volumes:
    +How do GPUs work? Usually, people only know about CPUs... ... I got the gist, but #gpu #cpu

    -f3s: Kubernetes with FreeBSD - Part 6: Storage
    +blog.codingconfessions.com/p/gpu-computing

    -The FreeBSD hosts (f0, f1) serve as master-standby NFS servers, exporting ZFS datasets that are replicated across hosts using zrepl. The Rocky Linux k3s nodes (r0, r1, r2) mount these exports at /data/nfs/k3svolumes. This directory contains subdirectories for each application that needs persistent storage—including Prometheus, Grafana, and Loki.
    +

    For unattended upgrades you must have a good ...



    -For example, the observability stack uses these paths on the NFS share:
    +For unattended upgrades you must have a good testing (or canary) strategy. #sre #reliability #downtime #ubuntu #systemd #kubernetes

    -
      -
    • /data/nfs/k3svolumes/prometheus/data — Prometheus time-series database
    • -
    • /data/nfs/k3svolumes/grafana/data — Grafana configuration, dashboards, and plugins
    • -
    • /data/nfs/k3svolumes/loki/data — Loki log chunks and index
    • -
    • /data/nfs/k3svolumes/tempo/data — Tempo trace data and WAL
    • -

    -Each path gets a corresponding PersistentVolume and PersistentVolumeClaim in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.
    +newsletter.pragmaticengineer.com/p/why-reliability-is-hard-at-scale

    -

    The monitoring namespace


    +

    Surely, in the age of #AI and #LLM, people ...



    -First, I created the monitoring namespace where all observability components will live:
    +Surely, in the age of #AI and #LLM, people are not writing as much code manually as before, but I don't think skills like using #Vim (or #HelixEditor) are obsolete just yet. You still need to understand what's happening under the hood, and being comfortable with these tools can make you much more efficient when you do need to edit or review code.

    - -
    $ kubectl create namespace monitoring
    -namespace/monitoring created
    -
    +www.youtube.com/watch?v=tW0BSgzr2AM

    -

    Installing Prometheus and Grafana


    +

    On #AI changes everything... ...



    -Prometheus and Grafana are deployed together using the kube-prometheus-stack Helm chart from the Prometheus community. This chart bundles Prometheus, Grafana, Alertmanager, and various exporters (Node Exporter, Kube State Metrics) into a single deployment. Ill explain what each component does in detail later when we look at the running pods.
    +On #AI changes everything...

    -

    Prerequisites


    +lucumr.pocoo.org/2025/6/4/changes/

    -Add the Prometheus Helm chart repository:
    +

    Maps in Go under the hood #golang ...



    - -
    $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    -$ helm repo update
    -
    +Maps in Go under the hood #golang

    -Create the directories on the NFS server for persistent storage:
    +victoriametrics.com/blog/go-map/

    - -
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/prometheus/data
    -[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/grafana/data
    -
    +

    "A project that looks complex might just be ...



    -

    Deploying with the Justfile


    +"A project that looks complex might just be unfamiliar" - Quote from the Applied Go Weekly Newsletter

    -The configuration repository contains a Justfile that automates the deployment. just is a handy command runner—think of it as a simpler, more modern alternative to make. I use it throughout the f3s repository to wrap repetitive Helm and kubectl commands:
    +

    I must admit that partly I see myself there ...



    -just - A handy way to save and run project-specific commands
    -codeberg.org/snonux/conf/f3s/prometheus
    +I must admit that partly I see myself there (sometimes). But it is fun :-) #tools #happy

    -To install everything:
    +borretti.me/article/you-can-choose-tools-that-make-you-happy

    - -
    $ cd conf/f3s/prometheus
    -$ just install
    -kubectl apply -f persistent-volumes.yaml
    -persistentvolume/prometheus-data-pv created
    -persistentvolume/grafana-data-pv created
    -persistentvolumeclaim/grafana-data-pvc created
    -helm install prometheus prometheus-community/kube-prometheus-stack \
    -    --namespace monitoring -f persistence-values.yaml
    -NAME: prometheus
    -LAST DEPLOYED: ...
    -NAMESPACE: monitoring
    -STATUS: deployed
    -
    +

    Makes me think of good old times, where I ...



    -The persistence-values.yaml configures Prometheus and Grafana to use the NFS-backed persistent volumes I mentioned earlier, ensuring data survives pod restarts. It also enables scraping of etcd and kube-controller-manager metrics:
    +Makes me think of good old times, where I shipped 5 times as fast.: What happens when code reviews aren’t mandatory? What happens when code reviews aren’t mandatory? via @wallabagapp #productivity #code

    -
    -kubeEtcd:
    -  enabled: true
    -  endpoints:
    -    - 192.168.2.120
    -    - 192.168.2.121
    -    - 192.168.2.122
    -  service:
    -    enabled: true
    -    port: 2381
    -    targetPort: 2381
    -
    -kubeControllerManager:
    -  enabled: true
    -  endpoints:
    -    - 192.168.2.120
    -    - 192.168.2.121
    -    - 192.168.2.122
    -  service:
    -    enabled: true
    -    port: 10257
    -    targetPort: 10257
    -  serviceMonitor:
    -    enabled: true
    -    https: true
    -    insecureSkipVerify: true
    -
    +testdouble.com/insights/when-code-reviews-arent-mandatory

    -By default, k3s binds the controller-manager to localhost only and doesn't expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to /etc/rancher/k3s/config.yaml on each k3s server node:
    +

    Neat little blog post, showcasing various ...



    - -
    [root@r0 ~]# cat >> /etc/rancher/k3s/config.yaml << 'EOF'
    -kube-controller-manager-arg:
    -  - bind-address=0.0.0.0
    -etcd-expose-metrics: true
    -EOF
    -[root@r0 ~]# systemctl restart k3s
    -
    +Neat little blog post, showcasing various methods used for generic programming before the introduction of generics. Only reflection wasn't listed. #golang

    -Repeat for r1 and r2. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.
    +bitfieldconsulting.com/posts/generics

    -Verify etcd metrics are exposed:
    +

    share Didn't know, that on MacOS, besides of ...



    - -
    [root@r0 ~]# curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader
    -etcd_server_has_leader 1
    -
    +share Didn't know, that on MacOS, besides of .so (shared object files, which can be dynamically loaded as well) there is also the MacOS' native .dylib format which serves a similar purpose! #macos #dylib #so

    -The full persistence-values.yaml and all other Prometheus configuration files are available on Codeberg:
    +cpu.land/becoming-an-elf-lord

    -codeberg.org/snonux/conf/f3s/prometheus
    +

    I think this is the way: use LLMs for code you ...



    -The persistent volume definitions bind to specific paths on the NFS share using hostPath volumes—the same pattern used for other services in Part 7:
    +I think this is the way: use LLMs for code you don't care much about and write code manually for what matters most to you. This way, most boring and boilerplate stuff can be auto-generated.

    -f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +registerspill.thorstenball.com/p/surely-not-all-codes-worth-it

    -

    Exposing Grafana via ingress


    +

    Always enable keepalive? I'd say most of the ...



    -The chart also deploys an ingress for Grafana, making it accessible at grafana.f3s.foo.zone. The ingress configuration follows the same pattern as other services in the cluster—Traefik handles the routing internally, while the OpenBSD edge relays terminate TLS and forward traffic through WireGuard.
    +Always enable keepalive? I'd say most of the time. I've seen cases, where connections weren't reused but new additional were edtablished, causing the servers to run out of worker threads #sre Always. Enable. Keepalives.

    -Once deployed, Grafana is accessible and comes pre-configured with Prometheus as a data source. You can verify the Prometheus service is running:
    +www.honeycomb.io/blog/always-enable-keepalives

    - -
    $ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
    -NAME                                    TYPE        CLUSTER-IP      PORT(S)
    -prometheus-kube-prometheus-prometheus   ClusterIP   10.43.152.163   9090/TCP,8080/TCP
    -
    +

    I just finished reading "Chaos Engineering" by ...



    -Grafana connects to Prometheus using the internal service URL http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090. The default Grafana credentials are admin/prom-operator, which should be changed immediately after first login.
    +I just finished reading "Chaos Engineering" by Casey Rosenthal—an absolute must-read for anyone passionate about building resilient systems! Chaos Engineering is not abbreaking things randomly—it's a disciplined approach to uncovering weaknesses before they become outages. SREs, this book is packed with practical insights and real-world strategies to strengthen your systems against failure. Highly recommended! #ChaosEngineering #Resilience

    -Grafana dashboard showing Prometheus metrics
    +www.oreilly.com/library/view/chaos-engineering/9781492043850/

    -Grafana dashboard showing cluster metrics
    +

    fx is a neat and tidy command-line tool for ...



    -Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times
    +fx is a neat and tidy command-line tool for interactively viewing JSON files! What I like about it is that it is not too complex (open the help with ? and it is only about one page long) but still very useful. #json #golang

    -

    Installing Loki and Alloy


    +github.com/antonmedv/fx

    -While Prometheus handles metrics, Loki handles logs. It's designed to be cost-effective and easy to operate—it doesn't index the contents of logs, only the metadata (labels), making it very efficient for storage.
    +

    Some nice #Golang tricks there ...



    -Alloy is Grafana's telemetry collector (the successor to Promtail). It runs as a DaemonSet on each node, tails container logs, and ships them to Loki.
    +Some nice #Golang tricks there

    -

    Prerequisites


    +blog.devtrovert.com/p/12-personal-go-tricks-that-transformed

    -Create the data directory on the NFS server:
    +

    October 2025



    - -
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/loki/data
    -
    +

    Word! What Are We Losing With AI? #llm #ai ...



    -

    Deploying Loki and Alloy


    +Word! What Are We Losing With AI? #llm #ai

    -The Loki configuration also lives in the repository:
    +josem.co/what-are-we-losing-with-ai/

    -codeberg.org/snonux/conf/f3s/loki
    +

    It's not yet time for the friday #fun, but: ...



    -To install:
    +It's not yet time for the friday #fun, but: OpenOffice does not print on Tuesdays ― Andreas Zwinkau :-)

    - -
    $ cd conf/f3s/loki
    -$ just install
    -helm repo add grafana https://grafana.github.io/helm-charts || true
    -helm repo update
    -kubectl apply -f persistent-volumes.yaml
    -persistentvolume/loki-data-pv created
    -persistentvolumeclaim/loki-data-pvc created
    -helm install loki grafana/loki --namespace monitoring -f values.yaml
    -NAME: loki
    -LAST DEPLOYED: ...
    -NAMESPACE: monitoring
    -STATUS: deployed
    -...
    -helm install alloy grafana/alloy --namespace monitoring -f alloy-values.yaml
    -NAME: alloy
    -LAST DEPLOYED: ...
    -NAMESPACE: monitoring
    -STATUS: deployed
    -
    +beza1e1.tuxen.de/lore/print_on_tuesday.html

    -Loki runs in single-binary mode with a single replica (loki-0), which is appropriate for a home lab cluster. This means there's only one Loki pod running at any time. If the node hosting Loki fails, Kubernetes will automatically reschedule the pod to another worker node—but there will be a brief downtime (typically under a minute) while this happens. For my home lab use case, this is perfectly acceptable.
    +

    Finally, I retired my AWS/ECS setup for my ...



    -For full high-availability, you'd deploy Loki in microservices mode with separate read, write, and backend components, backed by object storage like S3 or MinIO instead of local filesystem storage. That's a more complex setup that I might explore in a future blog post—but for now, the single-binary mode with NFS-backed persistence strikes the right balance between simplicity and durability.
    +Finally, I retired my AWS/ECS setup for my self-hosted apps, as it was too expensive to operate—I had to pay $20 monthly just to run pods for only a day or so each month, so I rarely used them. Now, everything has been migrated to my FreeBSD-powered Kubernetes home cluster! Part 7 of this blog series covers the initial pod deployments. #freebsd #k8s #selfhosing

    -

    Configuring Alloy


    +foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.gmi (Gemini)
    +foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.html

    -Alloy is configured via alloy-values.yaml to discover all pods in the cluster and forward their logs to Loki:
    +

    A great blog post about my favourite text ...



    - -
    discovery.kubernetes "pods" {
    -  role = "pod"
    -}
    -
    -discovery.relabel "pods" {
    -  targets = discovery.kubernetes.pods.targets
    -
    -  rule {
    -    source_labels = ["__meta_kubernetes_namespace"]
    -    target_label  = "namespace"
    -  }
    -
    -  rule {
    -    source_labels = ["__meta_kubernetes_pod_name"]
    -    target_label  = "pod"
    -  }
    -
    -  rule {
    -    source_labels = ["__meta_kubernetes_pod_container_name"]
    -    target_label  = "container"
    -  }
    -
    -  rule {
    -    source_labels = ["__meta_kubernetes_pod_label_app"]
    -    target_label  = "app"
    -  }
    -}
    -
    -loki.source.kubernetes "pods" {
    -  targets    = discovery.relabel.pods.output
    -  forward_to = [loki.write.default.receiver]
    -}
    -
    -loki.write "default" {
    -  endpoint {
    -    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
    -  }
    -}
    -
    +A great blog post about my favourite text editor. why even helix? #HeliEditor Now I am considering forking it myself as well :-)

    -This configuration automatically labels each log line with the namespace, pod name, container name, and app label, making it easy to filter logs in Grafana.
    +axlefublr.github.io/why-even-helix/

    -

    Adding Loki as a Grafana data source


    +

    One of the more confusing parts in Go, nil ...



    -Loki doesn't have its own web UI—you query it through Grafana. First, verify the Loki service is running:
    +One of the more confusing parts in Go, nil values vs nil errors: #golang

    - -
    $ kubectl get svc -n monitoring loki
    -NAME   TYPE        CLUSTER-IP    PORT(S)
    -loki   ClusterIP   10.43.64.60   3100/TCP,9095/TCP
    -
    +unexpected-go.com/nil-errors-that-are-non-nil-errors.html

    -To add Loki as a data source in Grafana:
    +

    Strong engineers are pragmatic, work fast, have ...



    -
      -
    • Navigate to Configuration → Data Sources
    • -
    • Click "Add data source"
    • -
    • Select "Loki"
    • -
    • Set the URL to: http://loki.monitoring.svc.cluster.local:3100
    • -
    • Click "Save & Test"
    • -

    -Once configured, you can explore logs in Grafana's "Explore" view. I'll show some example queries in the "Using the observability stack" section below.
    +Strong engineers are pragmatic, work fast, have technical ability, dont need to be technical geniuses and believe in their ability to solve almost any problem #productivity

    -Exploring logs in Grafana with Loki
    +www.seangoedecke.com/what-makes-strong-engineers-strong/

    -

    The complete monitoring stack


    +

    I am currently binge-listening to the Google ...



    -After deploying everything, here's what's running in the monitoring namespace:
    +I am currently binge-listening to the Google #SRE ProdCast. It's really great to learn about the stories of individual SREs and their journeys. It is not just about SREs at Google; there are also external guests.

    - -
    $ kubectl get pods -n monitoring
    -NAME                                                     READY   STATUS    RESTARTS   AGE
    -alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          42d
    -alloy-g5fgj                                              2/2     Running   0          29m
    -alloy-nfw8w                                              2/2     Running   0          29m
    -alloy-tg9vj                                              2/2     Running   0          29m
    -loki-0                                                   2/2     Running   0          25m
    -prometheus-grafana-868f9dc7cf-lg2vl                      3/3     Running   0          42d
    -prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4      1/1     Running   0          42d
    -prometheus-kube-state-metrics-7c5fb9d798-hh2fx           1/1     Running   0          42d
    -prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          42d
    -prometheus-prometheus-node-exporter-2nsg9                1/1     Running   0          42d
    -prometheus-prometheus-node-exporter-mqr25                1/1     Running   0          42d
    -prometheus-prometheus-node-exporter-wp4ds                1/1     Running   0          42d
    -tempo-0                                                  1/1     Running   0          1d
    -
    +sre.google/prodcast/

    -Note: Tempo (tempo-0) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.
    +

    Looks like a neat library for writing ...



    -And the services:
    +Looks like a neat library for writing script-a-like programs in #Golang. But honestly, why not directly use a scripting language like #RakuLang or #Ruby

    - -
    $ kubectl get svc -n monitoring
    -NAME                                      TYPE        CLUSTER-IP      PORT(S)
    -alertmanager-operated                     ClusterIP   None            9093/TCP,9094/TCP
    -alloy                                     ClusterIP   10.43.74.14     12345/TCP
    -loki                                      ClusterIP   10.43.64.60     3100/TCP,9095/TCP
    -loki-headless                             ClusterIP   None            3100/TCP
    -prometheus-grafana                        ClusterIP   10.43.46.82     80/TCP
    -prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.208.43    9093/TCP,8080/TCP
    -prometheus-kube-prometheus-operator       ClusterIP   10.43.246.121   443/TCP
    -prometheus-kube-prometheus-prometheus     ClusterIP   10.43.152.163   9090/TCP,8080/TCP
    -prometheus-kube-state-metrics             ClusterIP   10.43.64.26     8080/TCP
    -prometheus-prometheus-node-exporter       ClusterIP   10.43.127.242   9100/TCP
    -tempo                                     ClusterIP   10.43.91.44     3200/TCP,4317/TCP,4318/TCP
    -
    +github.com/bitfield/script

    -Let me break down what each pod does:
    +

    Where Gen AI shines is the generation and ...



    -
      -
    • alertmanager-prometheus-kube-prometheus-alertmanager-0: the Alertmanager instance that receives alerts from Prometheus, deduplicates them, groups related alerts together, and routes notifications to the appropriate receivers (email, Slack, PagerDuty, etc.). It runs as a StatefulSet with persistent storage for silences and notification state.
    • -

    -
      -
    • alloy-g5fgj, alloy-nfw8w, alloy-tg9vj: three Alloy pods running as a DaemonSet, one on each k3s node. Each pod tails the container logs from its local node via the Kubernetes API and forwards them to Loki. This ensures log collection continues even if a node becomes isolated from the others.
    • -

    -
      -
    • loki-0: the single Loki instance running in single-binary mode. It receives log streams from Alloy, stores them in chunks on the NFS-backed persistent volume, and serves queries from Grafana. The -0 suffix indicates it's a StatefulSet pod.
    • -

    -
      -
    • prometheus-grafana-...: the Grafana web interface for visualising metrics and logs. It comes pre-configured with Prometheus as a data source and includes dozens of dashboards for Kubernetes monitoring. Dashboards, users, and settings are persisted to the NFS share.
    • -

    -
      -
    • prometheus-kube-prometheus-operator-...: the Prometheus Operator that watches for custom resources (ServiceMonitor, PodMonitor, PrometheusRule) and automatically configures Prometheus to scrape new targets. This allows applications to declare their own monitoring requirements.
    • -

    -
      -
    • prometheus-kube-state-metrics-...: generates metrics about the state of Kubernetes objects themselves: how many pods are running, pending, or failed; deployment replica counts; node conditions; PVC status; and more. Essential for cluster-level dashboards.
    • -

    -
      -
    • prometheus-prometheus-kube-prometheus-prometheus-0: the Prometheus server that scrapes metrics from all configured targets (pods, services, nodes), stores them in a time-series database, evaluates alerting rules, and serves queries to Grafana.
    • -

    -
      -
    • prometheus-prometheus-node-exporter-...: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.
    • -

    -
      -
    • tempo-0: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.
    • -

    -

    Using the observability stack


    +Where Gen AI shines is the generation and management of YAML files... e.g. Kubernetes manifests. Who likes to write YAML files by hand? #genai #llm #ai #yaml #kubernetes #k8s

    -

    Viewing metrics in Grafana


    +

    At work, everybody is replacable. Some with a ...



    -The kube-prometheus-stack comes with many pre-built dashboards. Some useful ones include:
    +At work, everybody is replacable. Some with a hic-up, others with none. There will always someone to step up after you leave.

    -
      -
    • Kubernetes / Compute Resources / Cluster: overview of CPU and memory usage across the cluster
    • -
    • Kubernetes / Compute Resources / Namespace (Pods): resource usage by namespace
    • -
    • Node Exporter / Nodes: detailed host metrics like disk I/O, network, and CPU
    • -

    -

    Querying logs with LogQL


    +adamstacoviak.com/im-a-cog/

    -In Grafana's Explore view, select Loki as the data source and try queries like:
    +

    I actually would switch back to #FreeBSD as ...



    -
    -# All logs from the services namespace
    -{namespace="services"}
    -
    -# Logs from pods matching a pattern
    -{pod=~"miniflux.*"}
    -
    -# Filter by log content
    -{namespace="services"} |= "error"
    -
    -# Parse JSON logs and filter
    -{namespace="services"} | json | level="error"
    -
    +I actually would switch back to #FreeBSD as my main Operating System for personal use on my Laptop - FreeBSD used to be my main driver a couple of years ago when I still used "normal" PCs

    -

    Creating alerts


    +www.osnews.com/story/140841/freebsd-to-invest-in-laptop-support/

    -Prometheus supports alerting rules that can notify you when something goes wrong. The kube-prometheus-stack includes many default alerts for common issues like high CPU usage, pod crashes, and node problems. These can be customised via PrometheusRule CRDs.
    +

    Amazing Print is amazing ...



    -

    Monitoring external FreeBSD hosts


    +Amazing Print is amazing

    -The observability stack can also monitor servers outside the Kubernetes cluster. The FreeBSD hosts (f0, f1, f2) that serve NFS storage can be added to Prometheus using the Node Exporter.
    +github.com/amazing-print/amazing_print

    -

    Installing Node Exporter on FreeBSD


    +

    Always worth a reminde, what are bloom filters ...



    -On each FreeBSD host, install the node_exporter package:
    +Always worth a reminde, what are bloom filters and how do they work? #bloom #bloomfilter #datastructure

    - -
    paul@f0:~ % doas pkg install -y node_exporter
    -
    +micahkepe.com/blog/bloom-filters/

    -Enable the service to start at boot:
    +

    Some #Ruby book notes of mine: ...



    - -
    paul@f0:~ % doas sysrc node_exporter_enable=YES
    -node_exporter_enable:  -> YES
    -
    +Some #Ruby book notes of mine:

    -Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:
    +foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.gmi (Gemini)
    +foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.html

    - -
    paul@f0:~ % doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
    -node_exporter_args:  -> --web.listen-address=192.168.2.130:9100
    -
    +

    Sad story. #work #scrum #jira ...



    -Start the service:
    +Sad story. #work #scrum #jira

    - -
    paul@f0:~ % doas service node_exporter start
    -Starting node_exporter.
    -
    +lambdaland.org/posts/2023-02-21_metric_worship/

    -Verify it's running:
    +

    One of my favorite books: "Some Thoughts on ...



    - -
    paul@f0:~ % curl -s http://192.168.2.130:9100/metrics | head -3
    -# HELP go_gc_duration_seconds A summary of the wall-time pause...
    -# TYPE go_gc_duration_seconds summary
    -go_gc_duration_seconds{quantile="0"} 0
    -
    +One of my favorite books: "Some Thoughts on Deep Work"

    -Repeat for the other FreeBSD hosts (f1, f2) with their respective WireGuard IPs.
    +atthis.link/blog/2020/deepwork.html

    -

    Adding FreeBSD hosts to Prometheus


    +

    ltex-ls is great for integrating ...



    -Create a file additional-scrape-configs.yaml in the prometheus configuration directory:
    +ltex-ls is great for integrating #LanguageTool prose checking via #LSP into your #HelixEditor! ... There is also vale-ls, which I have enabled as well. I just download ltex-ls and configure it as an LSP for your .txt and .md docs... that's it!

    -
    -- job_name: 'node-exporter'
    -  static_configs:
    -    - targets:
    -      - '192.168.2.130:9100'  # f0 via WireGuard
    -      - '192.168.2.131:9100'  # f1 via WireGuard
    -      - '192.168.2.132:9100'  # f2 via WireGuard
    -      labels:
    -        os: freebsd
    -
    +valentjn.github.io/ltex/

    -The job_name must be node-exporter to match the existing dashboards. The os: freebsd label allows filtering these hosts separately if needed.
    +

    supernote-tool is awesome, as I can now ...



    -Create a Kubernetes secret from this file:
    +supernote-tool is awesome, as I can now download my Supernote notes on my #Linux desktop and convert them into PDFs - enables me to use the Supernote Nomad device as mine completely offline!

    - -
    $ kubectl create secret generic additional-scrape-configs \
    -    --from-file=additional-scrape-configs.yaml \
    -    -n monitoring
    -
    +

    Fun story! :-) The case of the 500-mile email ...



    -Update persistence-values.yaml to reference the secret:
    +Fun story! :-) The case of the 500-mile email ― Andreas Zwinkau via @wallabagapp #unix #sunos #sendmail

    -
    -prometheus:
    -  prometheusSpec:
    -    additionalScrapeConfigsSecret:
    -      enabled: true
    -      name: additional-scrape-configs
    -      key: additional-scrape-configs.yaml
    -
    +beza1e1.tuxen.de/lore/500mile_email.html

    -Upgrade the Prometheus deployment:
    +

    Operating myself some software over 10 years of ...



    - -
    $ just upgrade
    -
    +Operating myself some software over 10 years of age for over 10 years now, this podcast really resonated with me: #podcast #software #maintainability #maintenance

    -After a minute or so, the FreeBSD hosts appear in the Prometheus targets and in the Node Exporter dashboards in Grafana.
    +changelog.com/podcast/627

    -FreeBSD hosts in the Node Exporter dashboard
    +

    #git worktrees are awesome! ...



    -

    FreeBSD memory metrics compatibility


    +#git worktrees are awesome!

    -The default Node Exporter dashboards are designed for Linux and expect metrics like node_memory_MemAvailable_bytes. FreeBSD uses different metric names (node_memory_size_bytes, node_memory_free_bytes, etc.), so memory panels will show "No data" out of the box.
    +

    LLMs for anomaly detection? "While some ...



    -To fix this, I created a PrometheusRule that generates synthetic Linux-compatible metrics from the FreeBSD equivalents:
    +LLMs for anomaly detection? "While some ML-powered monitoring features have their place, good old-fashioned standard statistics remain hard to beat" Lessons from the pre-LLM AI in Observability: Anomaly Detection and AI-Ops vs. P99 | #llm #monitoring

    -
    -apiVersion: monitoring.coreos.com/v1
    -kind: PrometheusRule
    -metadata:
    -  name: freebsd-memory-rules
    -  namespace: monitoring
    -  labels:
    -    release: prometheus
    -spec:
    -  groups:
    -    - name: freebsd-memory
    -      rules:
    -        - record: node_memory_MemTotal_bytes
    -          expr: node_memory_size_bytes{os="freebsd"}
    -        - record: node_memory_MemAvailable_bytes
    -          expr: |
    -            node_memory_free_bytes{os="freebsd"}
    -              + node_memory_inactive_bytes{os="freebsd"}
    -              + node_memory_cache_bytes{os="freebsd"}
    -        - record: node_memory_MemFree_bytes
    -          expr: node_memory_free_bytes{os="freebsd"}
    -        - record: node_memory_Buffers_bytes
    -          expr: node_memory_buffer_bytes{os="freebsd"}
    -        - record: node_memory_Cached_bytes
    -          expr: node_memory_cache_bytes{os="freebsd"}
    -
    +quesma.com/blog-detail/aiops-observability

    -This file is saved as freebsd-recording-rules.yaml and applied as part of the Prometheus installation. The os="freebsd" label (set in the scrape config) ensures these rules only apply to FreeBSD hosts. After applying, the memory panels in the Node Exporter dashboards populate correctly for FreeBSD.
    +

    After having heavily vibe-coded (personal pet ...



    -freebsd-recording-rules.yaml on Codeberg
    +After having heavily vibe-coded (personal pet projects) for 2 months other the summer, I've come back to more structured and intentional AI coding practices. Surly, it was a great learnig experiment: #llm #ai #risk #code #sre #development #genai

    -

    Disk I/O metrics limitation


    +www.okoone.com/spark/technology-innova..-..ode-is-quietly-increasing-system-risk/

    -Unlike memory metrics, disk I/O metrics (node_disk_read_bytes_total, node_disk_written_bytes_total, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn't have a FreeBSD equivalent in the node_exporter.
    +

    Slowly, one after another, I am switching all ...



    -The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (node_zfs_arcstats_*) for ARC cache performance, and per-dataset I/O stats are available via sysctl kstat.zfs, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.
    +Slowly, one after another, I am switching all my Go projects to Mage. Having a Makefile or Taskfile in a native Go format is so much better.

    -

    ZFS Monitoring for FreeBSD Servers


    +magefile.org/

    -The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.
    +

    Some neat slice tricks for Go: #golang ...



    -

    Node Exporter ZFS Collector


    +Some neat slice tricks for Go: #golang

    -The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:
    +blog.devtrovert.com/p/12-slice-tricks-to-enhance-your-go

    -
      -
    • ARC (Adaptive Replacement Cache) statistics
    • -
    • Cache hit/miss rates
    • -
    • Memory usage and allocation
    • -
    • MRU/MFU cache breakdown
    • -
    • Data vs metadata distribution
    • -

    -

    Verifying ZFS Metrics


    +

    I spent way too much time on this site. It's ...



    -On any FreeBSD server, check that ZFS metrics are being exposed:
    +I spent way too much time on this site. It's full of tools for the #terminal! Terminal Trove - The $HOME of all things in the terminal. #linux #bsd #unix #terminal #cli #tools

    -
    -paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
    -      69
    -
    +terminaltrove.com/

    -The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.
    +

    I share similar experiences with #rust, but I ...



    -

    ZFS Recording Rules


    +I share similar experiences with #rust, but I am sure one just needs a bit more time to feel productive in it. It's not enough just to try rust out once before becoming fluent in it.

    -Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:
    +m.slashdot.org/story/446164

    -
    -apiVersion: monitoring.coreos.com/v1
    -kind: PrometheusRule
    -metadata:
    -  name: freebsd-zfs-rules
    -  namespace: monitoring
    -  labels:
    -    release: prometheus
    -spec:
    -  groups:
    -    - name: freebsd-zfs-arc
    -      interval: 30s
    -      rules:
    -        - record: node_zfs_arc_hit_rate_percent
    -          expr: |
    -            100 * (
    -              rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
    -              (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
    -               rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
    -            )
    -          labels:
    -            os: freebsd
    -        - record: node_zfs_arc_memory_usage_percent
    -          expr: |
    -            100 * (
    -              node_zfs_arcstats_size_bytes{os="freebsd"} /
    -              node_zfs_arcstats_c_max_bytes{os="freebsd"}
    -            )
    -          labels:
    -            os: freebsd
    -        # Additional rules for metadata %, target %, MRU/MFU %, etc.
    -
    +

    Pipelines in Go using channels. #golang ...



    -These recording rules calculate:
    +Pipelines in Go using channels. #golang

    -
      -
    • ARC hit rate percentage
    • -
    • ARC memory usage percentage (current vs maximum)
    • -
    • ARC target percentage (target vs maximum)
    • -
    • Metadata vs data percentages
    • -
    • MRU vs MFU cache percentages
    • -
    • Demand data and metadata hit rates
    • -

    -

    Grafana Dashboards


    +go.dev/blog/pipelines

    -Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):
    +

    Some nifty #Ruby tricks: In my opinion, Ruby ...



    -**Dashboard 1: FreeBSD ZFS (per-host detailed view)**
    +Some nifty #Ruby tricks: In my opinion, Ruby is unterrated. It's a great language even without Rails.

    -Includes variables to select:
    +www.rubyinside.com/21-ruby-tricks-902.html

    -
      -
    • FreeBSD server (f0, f1, or f2)
    • -
    • ZFS pool (zdata, zroot, or all)
    • -

    -Pool Overview Row:
    +

    Reflects my experience ...


    +
    +Reflects my experience
    +
    +simonwillison.net/2025/Sep/12/matt-webb/#atom-everything
    +
    +

    I like the fact that Markdown fikes, a RCS. an ...


    +
    +I like the fact that Markdown fikes, a RCS. an text editor and standard unix tools like #grep and #find are all you need for taking notes digitally. I am the same :-) My favorite note-taking method
    +
    +unixdigest.com/articles/my-favorite-note-taking-method.html

    -
      -
    • Pool Capacity gauge (with thresholds: green <70%, yellow <85%, red >85%)
    • -
    • Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)
    • -
    • Total Pool Size stat
    • -
    • Free Space stat
    • -
    • Pool Space Usage Over Time (stacked: used + free)
    • -
    • Pool Capacity Trend time series
    • -

    -Dataset Statistics Row:
    +

    Rich Interactive Widgets for Terminal UIs, it ...



    -
      -
    • Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced
    • -
    • Automatically filters by selected pool
    • -

    -ARC Cache Statistics Row:
    +Rich Interactive Widgets for Terminal UIs, it must not always be BubbleTea #golang #terminal #widgets

    -
      -
    • ARC Hit Rate gauge (red <70%, yellow <90%, green >=90%)
    • -
    • ARC Size time series (current, target, max)
    • -
    • ARC Memory Usage percentage gauge
    • -
    • ARC Hits vs Misses rate
    • -
    • ARC Data vs Metadata stacked time series
    • -

    -**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**
    +github.com/rivo/tview

    -Cluster-Wide Pool Statistics Row:
    +

    Always fun to dig in the #Perl @Perl woods. ...



    -
      -
    • Total Storage Capacity across all servers
    • -
    • Total Used space
    • -
    • Total Free space
    • -
    • Average Pool Capacity gauge
    • -
    • Pool Health Status (worst case across cluster)
    • -
    • Total Pool Space Usage Over Time
    • -
    • Per-Pool Capacity time series (all pools on all hosts)
    • -

    -Per-Host Pool Breakdown Row:
    +Always fun to dig in the #Perl @Perl woods. Now, no more Perl 4 pseudo multi-dimensional hashes in Perl 5 (well, they are still there when you require an older version for compatibility via use flag, though)! :-)

    -
      -
    • Bar gauge showing capacity by host and pool
    • -
    • Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health
    • -

    -Cluster-Wide ARC Statistics Row:
    +www.effectiveperlprogramming.com/2024/..-..fake-multidimensional-data-structures/

    -
      -
    • Average ARC Hit Rate gauge across all hosts
    • -
    • ARC Hit Rate by Host time series
    • -
    • Total ARC Size Across Cluster
    • -
    • Total ARC Hits vs Misses (cluster-wide sum)
    • -
    • ARC Size by Host
    • -

    -Dashboard Visualization:
    +

    How does #virtual #memory work? #ram ...



    -ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput
    -ZFS ARC cache statistics showing hit rate, memory usage, and size trends
    -ZFS datasets table and ARC data vs metadata breakdown
    +How does #virtual #memory work? #ram

    -

    Deployment


    +drewdevault.com/2018/10/29/How-does-virtual-memory-work.html

    -Applied the resources to the cluster:
    +

    flamelens - An interactive flamegraph viewer in ...



    -
    -cd /home/paul/git/conf/f3s/prometheus
    -kubectl apply -f zfs-recording-rules.yaml
    -kubectl apply -f zfs-dashboards.yaml
    -
    +flamelens - An interactive flamegraph viewer in the terminal. - Terminal Trove

    -Updated Justfile to include ZFS recording rules in install and upgrade targets:
    +terminaltrove.com/flamelens/

    -
    -install:
    -    kubectl apply -f persistent-volumes.yaml
    -    kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
    -    helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
    -    kubectl apply -f freebsd-recording-rules.yaml
    -    kubectl apply -f openbsd-recording-rules.yaml
    -    kubectl apply -f zfs-recording-rules.yaml
    -    just -f grafana-ingress/Justfile install
    -
    +

    You can now run Ansible Playbooks and shell ...



    -

    Verifying ZFS Metrics in Prometheus


    +You can now run Ansible Playbooks and shell scripts from your Terraform more easily #ansible #terraform #iac

    -Check that ZFS metrics are being collected:
    +danielmschmidt.de/posts/2025-09-26-terraform-actions-introduction/

    -
    -kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
    -  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes'
    -
    +

    For people working with #k8s, this tool is ...



    -Check recording rules are calculating correctly:
    +For people working with #k8s, this tool is useful. It lets you fuzzy find different k8s resource types and read a description about them: #kubernetes #fuzzy #cli #tools #devops

    -
    -kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
    -  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent'
    -
    +github.com/keisku/kubectl-explore

    -Example output shows memory usage percentage for each FreeBSD server:
    +

    November 2025



    -
    -"result":[
    -  {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
    -  {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
    -  {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
    -]
    -
    +

    Yes, using the right #tool for the job and ...



    -

    Key Metrics to Monitor


    +Yes, using the right #tool for the job and also learn along the way!

    -
      -
    • ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.
    • -
    • ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.
    • -
    • Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.
    • -
    • MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.
    • -
    • Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.
    • -
    • Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.
    • -
    • Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.
    • -

    -

    ZFS Pool and Dataset Metrics via Textfile Collector


    +drewdevault.com/2016/09/17/Use-the-right-tool.html

    -To complement the ARC statistics from node_exporter's built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.
    +

    Some neat Go tricks: #golang ...



    -Created a script at /usr/local/bin/zfs_pool_metrics.sh on each FreeBSD server:
    +Some neat Go tricks: #golang

    -
    -#!/bin/sh
    -# ZFS Pool and Dataset Metrics Collector for Prometheus
    -
    -OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
    -FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
    -
    -mkdir -p /var/tmp/node_exporter
    -
    -{
    -    # Pool metrics
    -    echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
    -    echo "# TYPE zfs_pool_size_bytes gauge"
    -    echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
    -    echo "# TYPE zfs_pool_allocated_bytes gauge"
    -    echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
    -    echo "# TYPE zfs_pool_free_bytes gauge"
    -    echo "# HELP zfs_pool_capacity_percent Capacity percentage"
    -    echo "# TYPE zfs_pool_capacity_percent gauge"
    -    echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
    -    echo "# TYPE zfs_pool_health gauge"
    -
    -    zpool list -Hp -o name,size,allocated,free,capacity,health | \
    -    while IFS=$'\t' read name size alloc free cap health; do
    -        case "$health" in
    -            ONLINE)   health_val=0 ;;
    -            DEGRADED) health_val=1 ;;
    -            FAULTED)  health_val=2 ;;
    -            *)        health_val=6 ;;
    -        esac
    -        cap_num=$(echo "$cap" | sed 's/%//')
    -
    -        echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
    -        echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
    -        echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
    -        echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
    -        echo "zfs_pool_health{pool=\"$name\"} $health_val"
    -    done
    -
    -    # Dataset metrics
    -    echo "# HELP zfs_dataset_used_bytes Used space in dataset"
    -    echo "# TYPE zfs_dataset_used_bytes gauge"
    -    echo "# HELP zfs_dataset_available_bytes Available space"
    -    echo "# TYPE zfs_dataset_available_bytes gauge"
    -    echo "# HELP zfs_dataset_referenced_bytes Referenced space"
    -    echo "# TYPE zfs_dataset_referenced_bytes gauge"
    -
    -    zfs list -Hp -t filesystem -o name,used,available,referenced | \
    -    while IFS=$'\t' read name used avail ref; do
    -        pool=$(echo "$name" | cut -d/ -f1)
    -        echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
    -        echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
    -        echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
    -    done
    -} > "$OUTPUT_FILE"
    -
    -mv "$OUTPUT_FILE" "$FINAL_FILE"
    -
    +harrisoncramer.me/15-go-sublteties-you-may-not-already-know/

    -Deployed to all FreeBSD servers:
    +

    There are some truths in this #SRE article: ...



    -
    -for host in f0 f1 f2; do
    -    scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
    -    ssh paul@$host 'doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ && \
    -                    doas chmod +x /usr/local/bin/zfs_pool_metrics.sh'
    -done
    -
    +There are some truths in this #SRE article: However, in my opinion, the more experience you have, the more you are expected to be able to resolve issues. So you can't always fallback to others. New starters are treated differently, of course. #oncall

    -Set up cron jobs to run every minute:
    +ntietz.com/blog/what-i-tell-people-new-to-oncall/.

    -
    -for host in f0 f1 f2; do
    -    ssh paul@$host 'echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh >/dev/null 2>&1" | \
    -                    doas crontab -'
    -done
    -
    +

    The Go flight recorder is a tool that allows ...



    -The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.
    +The Go flight recorder is a tool that allows developers to capture and analyze the execution of Go programs. It provides insights into performance, memory usage, and other runtime characteristics by recording events and metrics during the program's execution. Yet another tool why Go is awesome! #go #golang #tools

    -Verify metrics are being exposed:
    +go.dev/blog/flight-recorder

    -
    -paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
    -zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
    -zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
    -zfs_pool_capacity_percent{pool="zdata"} 64
    -zfs_pool_capacity_percent{pool="zroot"} 10
    -zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
    -
    +

    This is useful #golang ...



    -All ZFS-related configuration files are available on Codeberg:
    +This is useful #golang

    -zfs-recording-rules.yaml on Codeberg
    -zfs-dashboards.yaml on Codeberg
    +antonz.org/chans/

    -

    Monitoring external OpenBSD hosts


    +

    Great visually animated guide how #raft ...



    -The same approach works for OpenBSD hosts. I have two OpenBSD edge relay servers (blowfish, fishfinger) that handle TLS termination and forward traffic through WireGuard to the cluster. These can also be monitored with Node Exporter.
    +Great visually animated guide how #raft #consensus works

    -

    Installing Node Exporter on OpenBSD


    +thesecretlivesofdata.com/raft/

    -On each OpenBSD host, install the node_exporter package:
    +

    "Today’s junior devs who skip the “hard ...



    - -
    blowfish:~ $ doas pkg_add node_exporter
    -quirks-7.103 signed on 2025-10-13T22:55:16Z
    -The following new rcscripts were installed: /etc/rc.d/node_exporter
    -See rcctl(8) for details.
    -
    +"Today’s junior devs who skip the “hard way” may plateau early, lacking the depth to grow into senior engineers tomorrow." ... Avoiding Skill Atrophy in the Age of AI

    -Enable the service to start at boot:
    +addyo.substack.com/p/avoiding-skill-atrophy-in-the-age

    - -
    blowfish:~ $ doas rcctl enable node_exporter
    -
    +

    I actually enjoyed readong through the #Fish ...



    -Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:
    +I actually enjoyed readong through the #Fish #shell docs It's much cleaner than posix shells

    - -
    blowfish:~ $ doas rcctl set node_exporter flags '--web.listen-address=192.168.2.110:9100'
    -
    +fishshell.com/docs/current/language.html

    -Start the service:
    +

    There can be many things which can go wrong, ...



    - -
    blowfish:~ $ doas rcctl start node_exporter
    -node_exporter(ok)
    -
    +There can be many things which can go wrong, more than mentioned here: #linux

    -Verify it's running:
    +notes.eatonphil.com/2025-03-27-things-that-go-wrong-with-disk-io.html

    - -
    blowfish:~ $ curl -s http://192.168.2.110:9100/metrics | head -3
    -# HELP go_gc_duration_seconds A summary of the wall-time pause...
    -# TYPE go_gc_duration_seconds summary
    -go_gc_duration_seconds{quantile="0"} 0
    -
    +

    IMHO, motivation is not always enough. There ...



    -Repeat for the other OpenBSD host (fishfinger) with its respective WireGuard IP (192.168.2.111).
    +IMHO, motivation is not always enough. There must also be some discipline. That helps then theres only a little or no motivation

    -

    Adding OpenBSD hosts to Prometheus


    +world.hey.com/jason/motivation-50ab8280

    -Update additional-scrape-configs.yaml to include the OpenBSD targets:
    +

    Have been generating those CPU flame graphs on ...



    -
    -- job_name: 'node-exporter'
    -  static_configs:
    -    - targets:
    -      - '192.168.2.130:9100'  # f0 via WireGuard
    -      - '192.168.2.131:9100'  # f1 via WireGuard
    -      - '192.168.2.132:9100'  # f2 via WireGuard
    -      labels:
    -        os: freebsd
    -    - targets:
    -      - '192.168.2.110:9100'  # blowfish via WireGuard
    -      - '192.168.2.111:9100'  # fishfinger via WireGuard
    -      labels:
    -        os: openbsd
    -
    +Have been generating those CPU flame graphs on bare metal, so being able to use them in k8s seems to be pretty useful to me. #flamegraphs #k8s #kubernetes

    -The os: openbsd label allows filtering these hosts separately from FreeBSD and Linux nodes.
    +www.percona.com/blog/kubernetes-observability-code-profiling-with-flame-graphs/

    -

    OpenBSD memory metrics compatibility


    +

    I personally don't like the typical whiteboard ...



    -OpenBSD uses the same memory metric names as FreeBSD (node_memory_size_bytes, node_memory_free_bytes, etc.), so a similar PrometheusRule is needed to generate Linux-compatible metrics:
    +I personally don't like the typical whiteboard coding exercises, nor do I think LeetCode is the answer. It's impossible to assess the skills of a candidate with a few interviews but it is possible to filter out the bad ones. The aim is to get an idea about the candidate and be positive about their potential. #interview #interviewing #hiring

    -
    -apiVersion: monitoring.coreos.com/v1
    -kind: PrometheusRule
    -metadata:
    -  name: openbsd-memory-rules
    -  namespace: monitoring
    -  labels:
    -    release: prometheus
    -spec:
    -  groups:
    -    - name: openbsd-memory
    -      rules:
    -        - record: node_memory_MemTotal_bytes
    -          expr: node_memory_size_bytes{os="openbsd"}
    -          labels:
    -            os: openbsd
    -        - record: node_memory_MemAvailable_bytes
    -          expr: |
    -            node_memory_free_bytes{os="openbsd"}
    -              + node_memory_inactive_bytes{os="openbsd"}
    -              + node_memory_cache_bytes{os="openbsd"}
    -          labels:
    -            os: openbsd
    -        - record: node_memory_MemFree_bytes
    -          expr: node_memory_free_bytes{os="openbsd"}
    -          labels:
    -            os: openbsd
    -        - record: node_memory_Cached_bytes
    -          expr: node_memory_cache_bytes{os="openbsd"}
    -          labels:
    -            os: openbsd
    -
    +danielabaron.me/blog/reimagining-technical-interviews/

    -This file is saved as openbsd-recording-rules.yaml and applied alongside the FreeBSD rules. Note that OpenBSD doesn't expose a buffer memory metric, so that rule is omitted.
    +

    If you've wondered how CPUs and operating ...



    -openbsd-recording-rules.yaml on Codeberg
    +If you've wondered how CPUs and operating systems generally work and want the basics explained in an easily digestible format without going to college, have a look at CPU.land. I had a lot of fun reading it! #CPU

    -After running just upgrade, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.
    +cpu.land

    -

    Distributed Tracing with Grafana Tempo


    +

    And there's an unexpected winner :-) #erlang ...



    -After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.
    +And there's an unexpected winner :-) #erlang #architecture

    -For a preview of what distributed tracing with Tempo looks like in Grafana, see the X-RAG blog post:
    +freedium.cfd/https://medium.com/@codep..-..t-wasn-t-what-we-expected-67f84c79dc34

    -X-RAG Observability Hackathon
    +

    Is it it? This is it. What Is It (in Ruby 3.4)? ...



    -

    Why Distributed Tracing?


    +Is it it? This is it. What Is It (in Ruby 3.4)? #ruby

    -In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:
    +kevinjmurphy.com/posts/what-is-it-in-ruby-34/

    -
      -
    • Tracks requests across service boundaries
    • -
    • Identifies performance bottlenecks
    • -
    • Visualizes service dependencies
    • -
    • Correlates with logs and metrics
    • -
    • Helps debug complex distributed systems
    • -

    -

    Deploying Grafana Tempo


    +

    From my recent #London trip, I've uploaded ...



    -Tempo is deployed in monolithic mode, following the same pattern as Loki's SingleBinary deployment.
    +From my recent #London trip, I've uploaded some new Street Photography photos to my photo site All photos were post-processed using Open-Source software including #Darktable and #Shotwell. The site itself was generated with a simple #bash script! Not all photos are from London, just the recent additions were.

    -#### Configuration Strategy
    +irregular.ninja!

    -**Deployment Mode:** Monolithic (all components in one process)
    -
      -
    • Simpler operation than microservices mode
    • -
    • Suitable for the cluster scale
    • -
    • Consistent with Loki deployment pattern
    • -

    -**Storage:** Filesystem backend using hostPath
    -
      -
    • 10Gi storage at /data/nfs/k3svolumes/tempo/data
    • -
    • 7-day retention (168h)
    • -
    • Local storage is the only option for monolithic mode
    • -

    -**OTLP Receivers:** Standard OpenTelemetry Protocol ports
    -
      -
    • gRPC: 4317
    • -
    • HTTP: 4318
    • -
    • Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue
    • -

    -#### Tempo Deployment Files
    +

    Agreed, you should make your own programming ...



    -Created in /home/paul/git/conf/f3s/tempo/:
    +Agreed, you should make your own programming language, even if it's only for the sake of learning. I also did so over a decade ago. Mine was called Fype - "For Your Program Execution"

    -**values.yaml** - Helm chart configuration:
    +ntietz.com/blog/you-should-make-a-new-terrible-programming-language/
    +foo.zone/gemfeed/2010-05-09-the-fype-programming-language.gmi (Gemini)
    +foo.zone/gemfeed/2010-05-09-the-fype-programming-language.html

    -
    -tempo:
    -  retention: 168h
    -  storage:
    -    trace:
    -      backend: local
    -      local:
    -        path: /var/tempo/traces
    -      wal:
    -        path: /var/tempo/wal
    -  receivers:
    -    otlp:
    -      protocols:
    -        grpc:
    -          endpoint: 0.0.0.0:4317
    -        http:
    -          endpoint: 0.0.0.0:4318
    -
    -persistence:
    -  enabled: true
    -  size: 10Gi
    -  storageClassName: ""
    -
    -resources:
    -  limits:
    -    cpu: 1000m
    -    memory: 2Gi
    -  requests:
    -    cpu: 500m
    -    memory: 1Gi
    -
    +

    Principles for C programming #C ...



    -**persistent-volumes.yaml** - Storage configuration:
    +Principles for C programming #C #programming

    -
    -apiVersion: v1
    -kind: PersistentVolume
    -metadata:
    -  name: tempo-data-pv
    -spec:
    -  capacity:
    -    storage: 10Gi
    -  accessModes:
    -    - ReadWriteOnce
    -  persistentVolumeReclaimPolicy: Retain
    -  hostPath:
    -    path: /data/nfs/k3svolumes/tempo/data
    ----
    -apiVersion: v1
    -kind: PersistentVolumeClaim
    -metadata:
    -  name: tempo-data-pvc
    -  namespace: monitoring
    -spec:
    -  storageClassName: ""
    -  accessModes:
    -    - ReadWriteOnce
    -  resources:
    -    requests:
    -      storage: 10Gi
    -
    +drewdevault.com/2017/03/15/How-I-learned-to-stop-worrying-and-love-C.html

    -**Grafana Datasource Provisioning**
    +

    #Typst appears to be a great modern ...



    -All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.
    +#Typst appears to be a great modern alternative to #LaTeX

    -In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:
    +

    Things you can do with a debugger but not with ...



    -
    -apiVersion: v1
    -kind: ConfigMap
    -metadata:
    -  name: grafana-datasources-all
    -  namespace: monitoring
    -data:
    -  datasources.yaml: |
    -    apiVersion: 1
    -    datasources:
    -      - name: Prometheus
    -        type: prometheus
    -        uid: prometheus
    -        url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
    -        access: proxy
    -        isDefault: true
    -      - name: Alertmanager
    -        type: alertmanager
    -        uid: alertmanager
    -        url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
    -      - name: Loki
    -        type: loki
    -        uid: loki
    -        url: http://loki.monitoring.svc.cluster.local:3100
    -      - name: Tempo
    -        type: tempo
    -        uid: tempo
    -        url: http://tempo.monitoring.svc.cluster.local:3200
    -        jsonData:
    -          tracesToLogsV2:
    -            datasourceUid: loki
    -            spanStartTimeShift: -1h
    -            spanEndTimeShift: 1h
    -          tracesToMetrics:
    -            datasourceUid: prometheus
    -          serviceMap:
    -            datasourceUid: prometheus
    -          nodeGraph:
    -            enabled: true
    -
    +Things you can do with a debugger but not with print debugging #debugger #debugging #coding #programming

    -The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:
    -
      -
    • Disable sidecar-based datasource provisioning
    • -
    • Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/
    • -

    -This direct mounting approach is simpler and more reliable than sidecar-based discovery.
    +mahesh-hegde.github.io/posts/what_debugger_can/

    -#### Installation
    +

    Neat tutorial, I think I've to try #jujutsu ...



    -
    -cd /home/paul/git/conf/f3s/tempo
    -just install
    -
    +Neat tutorial, I think I've to try #jujutsu out now! #git #vcs #jujutsu #jj

    -Verify Tempo is running:
    +www.stavros.io/posts/switch-to-jujutsu-already-a-tutorial/

    -
    -kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
    -kubectl exec -n monitoring <tempo-pod> -- wget -qO- http://localhost:3200/ready
    -
    +

    Wise words Best practices are not rules. They ...



    -

    Configuring Grafana Alloy for Trace Collection


    +Wise words Best practices are not rules. They are guidelines that help you make better decisions. They are not absolute truths, but rather suggestions based on experience and common sense. You should always use your own judgment and adapt them to your specific situation.

    -Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.
    +www.arp242.net/best-practices.html

    -#### OTLP Receiver Configuration
    +

    How to build a #Linux #Container from ...



    -Added to Alloy configuration after the log collection pipeline:
    +How to build a #Linux #Container from scratch without #Docker, #Podman, etc. #Linux #container from scratch

    -
    -// OTLP receiver for traces via gRPC and HTTP
    -otelcol.receiver.otlp "default" {
    -  grpc {
    -    endpoint = "0.0.0.0:4317"
    -  }
    -  http {
    -    endpoint = "0.0.0.0:4318"
    -  }
    -  output {
    -    traces = [otelcol.processor.batch.default.input]
    -  }
    -}
    -
    -// Batch processor for efficient trace forwarding
    -otelcol.processor.batch "default" {
    -  timeout = "5s"
    -  send_batch_size = 100
    -  send_batch_max_size = 200
    -  output {
    -    traces = [otelcol.exporter.otlp.tempo.input]
    -  }
    -}
    -
    -// OTLP exporter to send traces to Tempo
    -otelcol.exporter.otlp "tempo" {
    -  client {
    -    endpoint = "tempo.monitoring.svc.cluster.local:4317"
    -    tls {
    -      insecure = true
    -    }
    -    compression = "gzip"
    -  }
    -}
    -
    +michalpitr.substack.com/p/linux-contai..-..rom-scratch?r=gt6tv&triedRedirect=true

    -The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.
    +

    When I reach the point where I am trying to ...



    -#### Upgrade Alloy
    +When I reach the point where I am trying to recover from panics in Go, something else has already gone wrong with the design of the codebase, IMHO. However, I must admit that my viewpoint may be flawed, as I code small, self-contained tools and rely on as few dependencies as possible. So I rarely rely on 3rd party libs, which may panic (which wouldn’t be nice to begin with; it would be better if they returned errors). #golang

    -
    -cd /home/paul/git/conf/f3s/loki
    -just upgrade
    -
    +blog.devtrovert.com/p/go-panic-and-recover-dont-make-these

    -Verify OTLP receivers are listening:
    +

    Personally one of the main benefits of using ...



    -
    -kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
    -kubectl exec -n monitoring <alloy-pod> -- netstat -ln | grep -E ':(4317|4318)'
    -
    +Personally one of the main benefits of using #tmux over other solutions is, that I can use the same setup on my personal devices (Linux and BSD) and for work (#macOS): you might not need tmux

    -

    Demo Tracing Application


    +bower.sh/you-might-not-need-tmux

    -Created a three-tier Python application to demonstrate distributed tracing in action.
    +

    December 2025



    -#### Application Architecture
    +

    Rhese are some nice #Ruby tricks (Ruby is onw ...



    -
    -User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
    -           ↓                          ↓                        ↓
    -                    Alloy (OTLP:4317) → Tempo → Grafana
    -
    +Rhese are some nice #Ruby tricks (Ruby is onw of my favourite languages) 11 Ruby Tricks You Haven’t Seen Before via @wallabagapp

    -Frontend Service:
    +www.rubyguides.com/2016/01/ruby-tricks/

    -
      -
    • Receives HTTP requests at /api/process
    • -
    • Forwards to middleware service
    • -
    • Creates parent span for the entire request
    • -

    -Middleware Service:
    +

    That's fun, use the C preprocessor as a HTML ...



    -
      -
    • Transforms data at /api/transform
    • -
    • Calls backend service
    • -
    • Creates child span linked to frontend
    • -

    -Backend Service:
    +That's fun, use the C preprocessor as a HTML template engine! #c #cpp #fun

    -
      -
    • Returns data at /api/data
    • -
    • Simulates database query (100ms sleep)
    • -
    • Creates leaf span in the trace
    • -

    -OpenTelemetry Instrumentation:
    +wheybags.com/blog/macroblog.html

    -All services use Python OpenTelemetry libraries:
    +

    #jq but for #Markdown? Thats interesting, ...



    -**Dependencies:**
    -
    -flask==3.0.0
    -requests==2.31.0
    -opentelemetry-distro==0.49b0
    -opentelemetry-exporter-otlp==1.28.0
    -opentelemetry-instrumentation-flask==0.49b0
    -opentelemetry-instrumentation-requests==0.49b0
    -
    +#jq but for #Markdown? Thats interesting, never thought of that. mdq: jq for Markdown via @wallabagapp

    -**Auto-instrumentation pattern** (used in all services):
    +github.com/yshavit/mdq

    - -
    from opentelemetry import trace
    -from opentelemetry.sdk.trace import TracerProvider
    -from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
    -from opentelemetry.instrumentation.flask import FlaskInstrumentor
    -from opentelemetry.instrumentation.requests import RequestsInstrumentor
    -from opentelemetry.sdk.resources import Resource
    -
    -# Define service identity
    -resource = Resource(attributes={
    -    "service.name": "frontend",
    -    "service.namespace": "tracing-demo",
    -    "service.version": "1.0.0"
    -})
    -
    -provider = TracerProvider(resource=resource)
    -
    -# Export to Alloy
    -otlp_exporter = OTLPSpanExporter(
    -    endpoint="http://alloy.monitoring.svc.cluster.local:4317",
    -    insecure=True
    -)
    -
    -processor = BatchSpanProcessor(otlp_exporter)
    -provider.add_span_processor(processor)
    -trace.set_tracer_provider(provider)
    -
    -# Auto-instrument Flask and requests
    -FlaskInstrumentor().instrument_app(app)
    -RequestsInstrumentor().instrument()
    -
    +

    Elvish seems to be a neat little shell. It's ...



    -The auto-instrumentation automatically:
    -
      -
    • Creates spans for HTTP requests
    • -
    • Propagates trace context via W3C Trace Context headers
    • -
    • Links parent and child spans across service boundaries
    • -

    -Deployment:
    +Elvish seems to be a neat little shell. It's implemented in #Golang and can make use of the great Go standard library. The language is more modern than other shells out there (e.g., supporting nested data structures) and eliminates backward compatibility issues (e.g., awkward string parsing with spaces that often causes problems in traditional shells). Elvish also comes with some neat interactive TUI elements. Furthermore, there will be a whole TUI framework built directly into the shell. If I weren't so deeply intertwined with #bash and #zsh, I would personally give #Elvish a try... Interesting, at least, it is.

    -Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.
    +elv.sh/

    -Build and deploy:
    +

    Google #SRE required better Wifi on the ...



    -
    -cd /home/paul/git/conf/f3s/tracing-demo
    -just build
    -just import
    -just install
    -
    +Google #SRE required better Wifi on the toilet, otherwise YouTube could go down :-)

    -Verify deployment:
    +podcasts.apple.com/us/podcast/incident..-..ai-stacey/id1615778073?i=1000672365156

    -
    -kubectl get pods -n services | grep tracing-demo
    -kubectl get ingress -n services tracing-demo-ingress
    -
    +

    Indeed ...



    -Access the application at:
    +Indeed

    -http://tracing-demo.f3s.buetow.org
    +aaronfrancis.com/2024/because-i-wanted-to-12c5137c

    -

    Visualizing Traces in Grafana


    +

    Very interesting post how pods are scheduled ...



    -The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.
    +Very interesting post how pods are scheduled and terminated with some tips how to improve reliability (pods may be terminated before ingress rules are updated and some traffic may hits non existing pods) #k8s #kubernetes

    -#### Accessing Traces
    +learnk8s.io/graceful-shutdown

    -Navigate to Grafana → Explore → Select "Tempo" datasource
    +

    I have added observability to the #Kubernetes ...



    -**Search Interface:**
    -
      -
    • Search by Trace ID
    • -
    • Search by service name
    • -
    • Search by tags
    • -

    -**TraceQL Queries:**
    +I have added observability to the #Kubernetes cluster in the eighth part of my #Kubernetes on #FreeBSD series. #Grafana #Loki #Prometheus #Alloy #k3s #OpenBSD #RockyLinux

    -Find all traces from demo app:
    -
    -{ resource.service.namespace = "tracing-demo" }
    -
    +foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.gmi (Gemini)
    +foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.html

    -Find slow requests (>200ms):
    -
    -{ duration > 200ms }
    -
    +

    Wondering where I could make use of it ...



    -Find traces from specific service:
    -
    -{ resource.service.name = "frontend" }
    -
    +Wondering where I could make use of it blog/2025/12/an-svg-is-all-you-need.mld #SVG

    -Find errors:
    -
    -{ status = error }
    -
    +jon.recoil.org/blog/2025/12/an-svg-is-all-you-need.html

    -Complex query - frontend traces calling middleware:
    -
    -{ resource.service.namespace = "tracing-demo" } && { span.http.status_code >= 500 }
    -
    +

    Trying out #COSMIC #Desktop... seems ...



    -#### Service Graph Visualization
    +Trying out #COSMIC #Desktop... seems snappier than #GNOME and I like the tiling features...

    -The service graph shows visual connections between services:
    +

    Best thing I've ever read about #container ...



    -1. Navigate to Explore → Tempo
    -2. Enable "Service Graph" view
    -3. Shows: Frontend → Middleware → Backend with request rates
    +Best thing I've ever read about #container #security in #kubernetes:

    -The service graph uses Prometheus metrics generated from trace data.
    +learnkube.com/security-contexts

    -

    Correlation Between Observability Signals


    +

    While acknowledging luck in finding the right ...



    -Tempo integrates with Loki and Prometheus to provide unified observability.
    +While acknowledging luck in finding the right team and company culture, the author stresses that staying and choosing long-term ownership is a deliberate choice for those valuing deep technical ownership over external validation: Why I Ignore The Spotlight as a Staff Engineer #engineering

    -#### Traces-to-Logs
    +lalitm.com/software-engineering-outside-the-spotlight/

    -Click on any span in a trace to see related logs:
    +

    Great explanation #slo #sla #sli #sre ...



    -1. View trace in Grafana
    -2. Click on a span
    -3. Select "Logs for this span"
    -4. Loki shows logs filtered by:
    - * Time range (span duration ± 1 hour)
    - * Service name
    - * Namespace
    - * Pod
    +Great explanation #slo #sla #sli #sre

    -This helps correlate what the service was doing when the span was created.
    +blog.alexewerlof.com/p/sla-vs-slo

    -#### Traces-to-Metrics
    +

    Nice service, you send a drive, they host ...



    -View Prometheus metrics for services in the trace:
    +Nice service, you send a drive, they host #ZFS for you!

    -1. View trace in Grafana
    -2. Select "Metrics" tab
    -3. Shows metrics like:
    - * Request rate
    - * Error rate
    - * Duration percentiles
    +zfs.rent/

    -#### Logs-to-Traces
    +Other related posts:

    -From logs, you can jump to related traces:
    +2025-01-01 Posts from October to December 2024
    +2025-07-01 Posts from January to June 2025
    +2026-01-01 Posts from July to December 2025 (You are currently reading this)

    -1. In Loki, logs that contain trace IDs are automatically linked
    -2. Click the trace ID to view the full trace
    -3. See the complete request flow
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -

    Generating Traces for Testing


    +Back to the main site
    +
    +
    +
    + + Cloudless Kobo Forma with KOReader + + gemini://foo.zone/gemfeed/2026-01-01-cloudless-kobo-forma-with-koreader.gmi + 2025-12-31T16:08:33+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + I am an reader, and for years I've been searching for a good digital e-reader to complement my paper books. I advocate for privacy-first and prefer open-source or self-hosted solutions. If that is not possible, I opt for offline solutions. Even if I don't have anything to hide, the tinkerer in me wants those things anyway. I found my ideal device in the Kobo Forma 7 years ago. Now, I use it without Kobo's cloud sync, and in this post, I'll show you how. + +
    +

    Cloudless Kobo Forma with KOReader



    -Test the demo application:
    +Published at 2025-12-31T16:08:33+02:00
    +
    +I am an reader, and for years I've been searching for a good digital e-reader to complement my paper books. I advocate for privacy-first and prefer open-source or self-hosted solutions. If that is not possible, I opt for offline solutions. Even if I don't have anything to hide, the tinkerer in me wants those things anyway. I found my ideal device in the Kobo Forma 7 years ago. Now, I use it without Kobo's cloud sync, and in this post, I'll show you how.

    -curl http://tracing-demo.f3s.buetow.org/api/process
    +Art by Donovan Bake
    +
    +      __...--~~~~~-._   _.-~~~~~--...__
    +    //               `V'               \\ 
    +   //                 |                 \\ 
    +  //__...--~~~~~~-._  |  _.-~~~~~~--...__\\ 
    + //__.....----~~~~._\ | /_.~~~~----.....__\\
    +====================\\|//====================
    +                dwb `---`
     

    -Load test (generates 50 traces):
    +

    Table of Contents



    -
    -cd /home/paul/git/conf/f3s/tracing-demo
    -just load-test
    -
    +

    -Each request creates a distributed trace spanning all three services.
    +I initially bought the Kobo Forma because I wanted a device with a large screen for reading PDFs and ePubs. However, as time went on, I became more concerned about the privacy implications of having all my reading data synced to the Kobo cloud. So, I looked into alternative ways to use this device.

    -

    Verifying the Complete Pipeline


    +KOReader running on Kobo Forma

    -Check the trace flow end-to-end:
    +The Kobo Forma is so old that it can't be purchased from Kobo directly anymore. But I love the form factor; it's much lighter than the Kobo Sage and still has a 7" screen. It's just that the stock firmware is becoming too slow and sluggish.

    -**1. Application generates traces:**
    -
    -kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
    -
    +Kobo Forma

    -**2. Alloy receives traces:**
    -
    -kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
    -
    +Note: Some of the screenshots in this post are taken from my Kobo Clara HD, which is another Kobo eReader I have. It's smaller and better for travel, and I use the same KOReader setup on both devices.

    -**3. Tempo stores traces:**
    -
    -kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
    -
    +

    KOReader to the Rescue



    -**4. Grafana displays traces:**
    -Navigate to Explore → Tempo → Search for traces
    +I keep my Kobo Forma disconnected from the cloud entirely, and KOReader makes that possible. KOReader is a versatile, open-source document and image viewer which can also be installed on some E Ink reader devices like the Kobo Forma. No cloud sync, no tracking, just reading.

    -

    Practical Example: Viewing a Distributed Trace


    +KOReader

    -Let's generate a trace and examine it in Grafana.
    +By not syncing my reading progress and library to Kobo's cloud service, I retain full ownership and control over my data. There's no risk of my personal reading habits being accessed or mined by third parties.

    -**1. Generate a trace by calling the demo application:**
    +

    Installation



    -
    -curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
    -
    +Installing KOReader is straightforward. You can follow the official guide for that. I used the Linux one:

    -**Response (HTTP 200):**
    +https://github.com/koreader/koreader/wiki/Installation-on-desktop-linux

    - -
    {
    -  "middleware_response": {
    -    "backend_data": {
    -      "data": {
    -        "id": 12345,
    -        "query_time_ms": 100.0,
    -        "timestamp": "2025-12-28T18:35:01.064538",
    -        "value": "Sample data from backend service"
    -      },
    -      "service": "backend"
    -    },
    -    "middleware_processed": true,
    -    "original_data": {
    -      "source": "GET request"
    -    },
    -    "transformation_time_ms": 50
    -  },
    -  "request_data": {
    -    "source": "GET request"
    -  },
    -  "service": "frontend",
    -  "status": "success"
    -}
    -
    +Basically, what I had to do is to download a .zip file of the KOReader binary and an install.sh script. Then, I plugged in the Kobo Forma via USB and ran the install script, which did the rest for me.

    -**2. Find the trace in Tempo via API:**
    +KOReader installation via USB

    -After a few seconds (for batch export), search for recent traces:
    +After the initial install, KOReader can update itself through its menus.

    -
    -kubectl exec -n monitoring tempo-0 -- wget -qO- \
    -  'http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&limit=5' 2>/dev/null | \
    -  python3 -m json.tool
    -
    +KOReader self-update menu

    -Returns traces including:
    +It is worth noting that after the KOReader install, the Kobo Forma still boots into the proprietary window manager. To start KOReader, you have to select it from the new "Nickel Menu". KOReader will then stay open until you reboot the device. It's a small annoyance, but it's well worth it!

    - -
    {
    -  "traceID": "4be1151c0bdcd5625ac7e02b98d95bd5",
    -  "rootServiceName": "frontend",
    -  "rootTraceName": "GET /api/process",
    -  "durationMs": 221
    -}
    -
    +Nickel Menu

    -**3. Fetch complete trace details:**
    +

    Sideloaded Mode



    -
    -kubectl exec -n monitoring tempo-0 -- wget -qO- \
    -  'http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5' 2>/dev/null | \
    -  python3 -m json.tool
    -
    +To use the Kobo Forma completely without a Kobo account, you can enable "Sideloaded Mode". This mode allows you to use the device without being signed in to a Kobo account. When enabled, the home screen will default to your library instead of showing Kobo recommendations, and the sync button will disappear. This prevents the device from trying to sync with the Kobo cloud.

    -**Trace structure (8 spans across 3 services):**
    +To enable it, you need to edit the configuration file. Connect your Kobo device to your computer via USB. Open the file .kobo/Kobo/Kobo eReader.conf and add the following lines:

    -Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
    -Services: 3 (frontend, middleware, backend)
    -
    -Service: frontend
    -  └─ GET /api/process                 221.10ms  (HTTP server span)
    -  └─ frontend-process                 216.23ms  (custom business logic span)
    -  └─ POST                             209.97ms  (HTTP client span to middleware)
    -
    -Service: middleware
    -  └─ POST /api/transform              186.02ms  (HTTP server span)
    -  └─ middleware-transform             180.96ms  (custom business logic span)
    -  └─ GET                              127.52ms  (HTTP client span to backend)
    -
    -Service: backend
    -  └─ GET /api/data                    103.93ms  (HTTP server span)
    -  └─ backend-get-data                 102.11ms  (custom business logic span with 100ms sleep)
    +[ApplicationPreferences]
    +SideloadedMode=true
     

    -**4. View the trace in Grafana UI:**
    -
    -Navigate to: Grafana → Explore → Tempo datasource
    +After saving the file, eject the device. You might need to restart it for the changes to take effect.

    -Search using TraceQL:
    -
    -{ resource.service.namespace = "tracing-demo" }
    -
    +KOReader is much faster than the stock firmware; it feels about three times as fast. Before trying out KOReader, I was thinking about selling the Forma as it felt too sluggish. But now there is new life in this 7-year-old device! It also offers a night mode (inverted colors), a feature that the stock firmware on the Forma is lacking.

    -Or directly open the trace by pasting the trace ID in the search box:
    -
    -4be1151c0bdcd5625ac7e02b98d95bd5
    -
    +KOReader dark mode (inverted colors)

    -**5. Trace visualization:**
    +

    My Workflow



    -The trace waterfall view in Grafana shows the complete request flow with timing:
    +My workflow is simple and efficient, relying on a direct USB connection to my Linux laptop for sideloading books and a self-hosted sync server for progress synchronization.

    -Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans
    +

    Sideloading Books



    -For additional examples of Tempo trace visualization, see also:
    +I connect my Kobo Forma to my Linux laptop via a USB-C cable. The device is automatically recognized as a storage device, and I can directly access its storage to copy over ePubs, PDFs, and other supported formats.

    -X-RAG Observability Hackathon (more Grafana Tempo screenshots)
    +

    KOReader Sync Server



    -The trace reveals the distributed request flow:
    +To keep my reading progress synchronized across multiple devices (my Kobo, my phone, and my Linux laptop), I run a koreader-sync-server instance in my k3s cluster. This allows me to pick up reading where I left off, no matter which device I'm using.

    -
      -
    • Frontend (221ms): Receives GET /api/process, executes business logic, calls middleware
    • -
    • Middleware (186ms): Receives POST /api/transform, transforms data, calls backend
    • -
    • Backend (104ms): Receives GET /api/data, simulates database query with 100ms sleep
    • -
    • Total request time: 221ms end-to-end
    • -
    • Span propagation: W3C Trace Context headers automatically link all spans
    • -

    -**6. Service graph visualization:**
    +https://codeberg.org/snonux/conf/src/branch/master/f3s/kobo-sync-server

    -The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.
    +Custom sync server configuration

    -X-RAG Observability Hackathon (includes service graph screenshots)
    +To configure the sync server in KOReader, open a document, go to "Settings" -> "Progress Sync", and select "Custom sync server". There you can enter the URL of your server and your credentials. The progress can then also be synced to and from KOReader running on other devices (e.g. a Laptop or a Smartphone!)

    -This visualization helps identify:
    +KOReader sync menu

    -
      -
    • Request rates between services
    • -
    • Average latency for each hop
    • -
    • Error rates (if any)
    • -
    • Service dependencies and communication patterns
    • -

    -

    Storage and Retention


    +

    Exporting Book Notes and Highlights



    -Monitor Tempo storage usage:
    +KOReader allows you to export book notes and highlights directly from the device in various formats, including plain text and Markdown. Unfortunately, these are not automatically synced to the sync server. I have an offline backup procedure where I regularly sync them via USB to my backup server. There's a 3rd party plugin available for KOReader, which seems to be able to do this kind of sync, though.

    -
    -kubectl exec -n monitoring <tempo-pod> -- df -h /var/tempo
    -
    +

    Wallabag Integration



    -With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:
    +KOReader has built-in Wallabag support. This allows me to save articles from the web to my self-hosted Wallabag instance and then read them comfortably on my Kobo.

    -
      -
    • Reduce retention to 72h (3 days)
    • -
    • Implement sampling in Alloy
    • -
    • Increase PV size
    • -

    -

    Configuration Files


    +https://wallabag.org/

    -All configuration files are available on Codeberg:
    +I haven't tried it out yet, though. I may will and will update this blog post here after done so.

    -Tempo configuration
    -Alloy configuration (updated for traces)
    -Demo tracing application
    +

    Purchasing e-books



    -

    Summary


    +If you search a little bit you also find stores which sell digital rights management (DRM) free e-books (in ePub format), for example buecher.de does, they sell german and english books. Before purchasing, just make sure that the book is DRM-free (not all their books are that.)

    -With Prometheus, Grafana, Loki, Alloy, and Tempo deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:
    +All the books I read you can see here:

    -
      -
    • Metrics: Prometheus collects and stores time-series data from all components, including etcd and ZFS
    • -
    • Logs: Loki aggregates logs from all containers, searchable via Grafana
    • -
    • Traces: Tempo provides distributed request tracing with service dependency mapping
    • -
    • Visualisation: Grafana provides dashboards and exploration tools with correlation between all three signals
    • -
    • Alerting: Alertmanager can notify on conditions defined in Prometheus rules
    • -

    -This observability stack runs entirely on the home lab infrastructure, with data persisted to the NFS share. It's lightweight enough for a three-node cluster but provides the same capabilities as production-grade setups.
    +Novels I've read
    +Resources, Technical Books, Podcasts, Courses and Guides I recommend

    -All configuration files are available on Codeberg:
    +

    Conclusion



    -Prometheus, Grafana, and recording rules configuration
    -Loki and Alloy configuration
    -Tempo configuration
    -Demo tracing application
    +I'm really happy with this setup. Offline Kobo with KOReader, manual book transfers, self-hosted services—it's simple, private, and the reading experience is just great. If you care about owning your data (and not getting distracted), give it a try.

    -Other *BSD-related posts:
    +Other related posts:

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +2026-01-01 Using Supernote Nomad offline
    +2026-01-01 Cloudless Kobo Forma with KOReader (You are currently reading this)

    -E-Mail your comments to paul@nospam.buetow.org
    +E-Mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - 'The Courage To Be Disliked' book notes - - gemini://foo.zone/gemfeed/2025-11-02-the-courage-to-be-disliked-book-notes.gmi - 2025-11-01T17:28:38+02:00 + X-RAG Observability Hackathon + + gemini://foo.zone/gemfeed/2025-12-24-x-rag-observability-hackathon.gmi + 2025-12-24T09:45:29+02:00 Paul Buetow aka snonux paul@dev.buetow.org - These are my personal book notes from Ichiro Kishimi and Fumitake Koga's 'The Courage To Be Disliked'. They are for me, but I hope they might be useful to you too. + This post describes my hackathon efforts adding observability to X-RAG, the extensible Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I made time over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as 'let's add some metrics' turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs.
    -

    "The Courage To Be Disliked" book notes


    +

    X-RAG Observability Hackathon



    -Published at 2025-11-01T17:28:38+02:00
    +Published at 2025-12-24T09:45:29+02:00

    -These are my personal book notes from Ichiro Kishimi and Fumitake Koga's "The Courage To Be Disliked". They are for me, but I hope they might be useful to you too.
    +This post describes my hackathon efforts adding observability to X-RAG, the extensible Retrieval-Augmented Generation (RAG) platform built by my brother Florian. I made time over the weekend to join his 3-day hackathon (attending 2 days) with the goal of instrumenting his existing distributed system with observability. What started as "let's add some metrics" turned into a comprehensive implementation of the three pillars of observability: tracing, metrics, and logs.

    -
    -         ,..........   ..........,
    -     ,..,'          '.'          ',..,
    -    ,' ,'            :            ', ',
    -   ,' ,'             :             ', ',
    -  ,' ,'              :              ', ',
    - ,' ,'............., : ,.............', ',
    -,'  '............   '.'   ............'  ',
    - '''''''''''''''''';''';''''''''''''''''''
    -                    '''
    -
    +X-RAG source code on GitHub

    Table of Contents




    -

    The Nature of Life and Happiness


    +

    What is X-RAG?



    -
      -
    • Life and the world are fundamentally simple; we are the ones who make them complicated. Drama does not exist.
    • -
    • Happiness is a choice and is attainable for everyone. Often, we lack the courage to be happy because it's easier to stay in a familiar, albeit unhappy, situation than to choose a new lifestyle, which may bring anxiety and unknowns.
    • -
    • Unhappiness is something you choose for yourself.
    • -

    -

    Subjective Reality and Perception


    +X-RAG is the extensible RAG (Retrieval-Augmented Generation) platform running on Kubernetes. The idea behind RAG is simple: instead of asking an LLM to answer questions from its training data alone, you first retrieve relevant documents from your own knowledge base, then feed those documents to the LLM as context. The LLM synthesises an answer grounded in your actual content—reducing hallucinations and enabling answers about private or recent information the model was never trained on.

    -
      -
    • Our perception of the world is subjective. We don't see the world as it is, but as we are.
    • -
    • The world you see is different from the one I see, and it's impossible to truly share your world with anyone else.
    • -

    -This is illustrated by the "10 people" example: if one person dislikes you, two love you, and seven are indifferent, focusing only on the one who dislikes you gives a distorted and negative view of your life. You are focusing on a tiny, insignificant part and judging the whole by it.
    +X-RAG handles the full pipeline: ingest documents, chunk them into searchable pieces, generate vector embeddings, store them in a vector database, and at query time, retrieve relevant chunks and pass them to an LLM for answer generation. The system supports both local LLMs (Florian runs his on a beefy desktop) and cloud APIs like OpenAI. I configured an OpenAI API key since my laptop's CPU and GPU aren't fast enough for decent local inference.

    -The challenge is to find the courage to see the world directly, without the filters of our own subjective views.
    +All services are implemented in Python. I'm more used to Ruby, Go, and Bash these days, but for this project it didn't matter—Python's OpenTelemetry integration is straightforward, I wasn't planning to write or rewrite tons of application code, and with GenAI assistance the language barrier was a non-issue. The OpenTelemetry concepts and patterns should translate to other languages too—the SDK APIs are intentionally similar across Python, Go, Java, and others.

    -

    The Power to Change and the Role of the Past


    +X-RAG consists of several independently scalable microservices:

      -
    • We are not defined by our past experiences but by the meaning we assign to them. The past does not determine our future.
    • -
    • The book rejects Freudian etiology (the idea that past trauma defines us) in favor of teleology (the idea that we are driven by our present goals).
    • -
    • Change is possible for everyone at any moment, regardless of their circumstances or age. This change must come from your own doing, not from others.
    • -
    • We live in accordance with our present goals, not past causes. The past does not exist; the only issue is the present.
    • -
    • Emotions, like anger, can be fabricated tools used to achieve a goal (e.g., to control or shout at someone) rather than uncontrollable forces that rule us.
    • +
    • Search UI: FastAPI web interface for queries
    • +
    • Ingestion API: Document upload endpoint
    • +
    • Embedding Service: gRPC service for vector embeddings
    • +
    • Indexer: Kafka consumer that processes documents
    • +
    • Search Service: gRPC service orchestrating the RAG pipeline

    -

    Self-Acceptance, Lifestyle, and Life Lies


    +The Embedding Service deserves extra explanation because in the beginning I didn't really knew what it was. Text isn't directly searchable in a vector database—you need to convert it to numerical vectors (embeddings) that capture semantic meaning. The Embedding Service takes text chunks and calls an embedding model (OpenAI's text-embedding-3-small in my case, or a local model on Florian's setup) to produce these vectors. For the LLM search completion answer, I used gpt-4o-mini.

    -
      -
    • Your "lifestyle"—your worldview and outlook on life—is a choice, not a fixed personality trait. You can change it instantly.
    • -
    • The key is self-acceptance, not self-affirmation. Accept what you cannot change and have the courage to change what you can.
    • -
    • You cannot be reborn as someone else. It is better to learn to love yourself and make the best use of the "equipment" you were born with.
    • -
    • Workaholism is a "life lie." It is a form of being in disharmony with life, using work as an excuse to avoid other life tasks and responsibilities.
    • -

    -

    Interpersonal Relationships


    +Similar concepts end up with similar vectors, so "What is machine learning?" and "Explain ML" produce vectors close together in the embedding space. At query time, your question gets embedded too, and the vector database finds chunks with nearby vectors—that's semantic search.

    -
      -
    • All problems are, at their core, problems of interpersonal relationships. To escape all problems would mean to live alone in the universe, which is impossible.
    • -
    • The book identifies three "Life Tasks" that everyone faces: the task of work, the task of friendship, and the task of love.
    • -
    • **Competition:** Life is not a competition. When we stop comparing ourselves to others, we cease to see them as enemies. They become comrades, and we can genuinely celebrate their successes. This removes the fear of losing and allows for peace.
    • -
    • **Power Struggles:** When someone is angry with you, recognize it as their attempt at a power struggle. The person who attacks you is the one with the problem. Do not get drawn in. Arguing about who is right or wrong is a trap. Admitting a fault is not a defeat.
    • -
    • **Horizontal vs. Vertical Relationships:** Strive for "horizontal relationships" based on equality, rather than "vertical relationships" based on hierarchy. Praise and rebuke are forms of manipulation found in vertical relationships. Instead, offer encouragement. (Note: The original author expresses disagreement with applying this to children, feeling a hierarchy is necessary and that children appreciate praise).
    • -
    • **Separation of Tasks:** Understand what is your responsibility and what is someone else's. For example, if someone takes advantage of your trust, that is their task. Your task is to decide whether to trust them in the first place.
    • -
    • **Confidence in Others:** Having unconditional confidence in others helps build deep relationships and a sense of belonging, turning others into comrades.
    • -

    -

    Inferiority and Superiority


    +The data layer includes Weaviate (vector database with hybrid search), Kafka (message queue), MinIO (object storage), and Redis (cache). All of this runs in a Kind Kubernetes cluster for local development, with the same manifests deployable to production.

    -
      -
    • A feeling of inferiority is not inherently bad; it can be a catalyst for growth when we compare ourselves to our ideal self. This "pursuit of superiority" drives progress.
    • -
    • This is different from an "inferiority complex," which is using feelings of inadequacy as an excuse to avoid change and responsibility.
    • -
    • Value is based on a social context. An object's worth is subjective and can be reinterpreted.
    • -

    -

    Community, Contribution, and Happiness


    +
    +┌─────────────────────────────────────────────────────────────────────────┐
    +│                      X-RAG Kubernetes Cluster                           │
    +├─────────────────────────────────────────────────────────────────────────┤
    +│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
    +│   │ Search UI   │  │Search Svc   │  │Embed Service│  │   Indexer   │    │
    +│   └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘    │
    +│          │                │                │                │           │
    +│          └────────────────┴────────────────┴────────────────┘           │
    +│                                    │                                    │
    +│                                    ▼                                    │
    +│          ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
    +│          │  Weaviate   │  │   Kafka     │  │   MinIO     │              │
    +│          └─────────────┘  └─────────────┘  └─────────────┘              │
    +└─────────────────────────────────────────────────────────────────────────┘
    +

    -
      -
    • The definition of happiness is the feeling of contribution.
    • -
    • A true sense of self-worth comes from feeling useful to a community (the "community feeling").
    • -
    • This contribution doesn't have to be grand. You can be of worth to the community simply by being.
    • -
    • When you have a genuine feeling of contribution, you no longer need recognition or praise from others.
    • -

    -

    Living in the Here and Now


    +

    Running Kubernetes locally with Kind



    -
      -
    • Life is a series of moments ("dots"), not a continuous line. We should live fully in the "here and now."
    • -
    • The greatest life lie is to dwell on the past and the future, which do not exist, instead of focusing on the present moment.
    • -
    • Focus on the process, not just the outcome. The goal of a dance is the dancing itself, not just reaching a destination.
    • -

    -

    The Courage to Be Normal


    +X-RAG runs on Kubernetes, but you don't need a cloud account to develop it. The project uses Kind (Kubernetes in Docker)—a tool originally created by the Kubernetes SIG for testing Kubernetes itself.

    -
      -
    • Why does everyone want to be special? Is it inferior to be normal?
    • -
    • Embracing being normal, instead of striving for a special status, is a form of courage. In the grander sense, isn't everyone normal?
    • -

    -

    Freedom is Being Disliked


    +Kind - Kubernetes in Docker

    -
      -
    • The price of true freedom is to be disliked by other people. It is a sign that you are living in accordance with your own principles.
    • -

    -

    The Meaning of Life


    +Kind spins up a full Kubernetes cluster using Docker containers as nodes. The control plane (API server, etcd, scheduler, controller-manager) runs in one container, and worker nodes run in separate containers. Inside these "node containers," pods run just like they would on real servers—using containerd as the container runtime. It's containers all the way down.

    -
      -
    • Life has no inherent meaning. It is up to each individual to assign meaning to their own life.
    • -
    • Do not be afraid of being disliked by others for living your life according to the meaning you create.
    • -
    • You have the power to change yourself, and in doing so, you change your world. No one else can change it for you.
    • -

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +Technically, each Kind node is a Docker container running a minimal Linux image with kubelet and containerd installed. When you deploy a pod, kubelet inside the node container instructs containerd to pull and run the container image. So you have Docker running node containers, and inside those, containerd running application containers. Network-wise, Kind sets up a Docker bridge network and uses CNI plugins (kindnet by default) for pod networking within the cluster.
    +
    +
    +$ docker ps --format "table {{.Names}}\t{{.Image}}"
    +NAMES                  IMAGE
    +xrag-k8-control-plane  kindest/node:v1.32.0
    +xrag-k8-worker         kindest/node:v1.32.0
    +xrag-k8-worker2        kindest/node:v1.32.0
    +
    +
    +The kindest/node image contains everything needed: kubelet, containerd, CNI plugins, and pre-pulled pause containers. Port mappings in the Kind config expose services to the host—that's how http://localhost:8080 reaches the search-ui running inside a pod, inside a worker container, inside Docker.
    +
    +
    +┌─────────────────────────────────────────────────────────────────────────┐
    +│                           Docker Host                                   │
    +├─────────────────────────────────────────────────────────────────────────┤
    +│  ┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐    │
    +│  │ xrag-k8-control   │  │ xrag-k8-worker    │  │ xrag-k8-worker2   │    │
    +│  │ -plane (container)│  │ (container)       │  │ (container)       │    │
    +│  │                   │  │                   │  │                   │    │
    +│  │ K8s API server    │  │ Pods:             │  │ Pods:             │    │
    +│  │ etcd, scheduler   │  │ • search-ui       │  │ • weaviate        │    │
    +│  │                   │  │ • search-service  │  │ • kafka           │    │
    +│  │                   │  │ • embedding-svc   │  │ • prometheus      │    │
    +│  │                   │  │ • indexer         │  │ • grafana         │    │
    +│  └───────────────────┘  └───────────────────┘  └───────────────────┘    │
    +└─────────────────────────────────────────────────────────────────────────┘
    +
    +
    +Why Kind? It gives you a real Kubernetes environment—the same manifests deploy to production clouds unchanged. No minikube quirks, no Docker Compose translation layer. Just Kubernetes. I already have a k3s cluster running at home, but Kind made collaboration easier—everyone working on X-RAG gets the exact same setup by cloning the repo and running make cluster-start.

    -Other book notes of mine are:
    +Florian developed X-RAG on macOS, but it worked seamlessly on my Linux laptop. The only difference was Docker's resource allocation: on macOS you configure limits in Docker Desktop, on Linux it uses host resources directly. That's because under macOS the Linux Docker containers run on an emulation layer as macOS is not Linux.

    -2025-11-02 'The Courage To Be Disliked' book notes (You are currently reading this)
    -2025-06-07 'A Monk's Guide to Happiness' book notes
    -2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    -2024-10-24 'Staff Engineer' book notes
    -2024-07-07 'The Stoic Challenge' book notes
    -2024-05-01 'Slow Productivity' book notes
    -2023-11-11 'Mind Management' book notes
    -2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    -2023-05-06 'The Obstacle is the Way' book notes
    -2023-04-01 'Never split the difference' book notes
    -2023-03-16 'The Pragmatic Programmer' book notes
    +My hardware: a ThinkPad X1 Carbon Gen 9 with an 11th Gen Intel Core i7-1185G7 (4 cores, 8 threads at 3.00GHz) and 32GB RAM (running Fedora Linux). During the hackathon, memory usage peaked around 15GB—comfortable headroom. CPU was the bottleneck; with ~38 pods running across all namespaces (rag-system, monitoring, kube-system, etc.), plus Discord for the remote video call and Tidal streaming hi-res music, things got tight. When rebuilding Docker images or restarting the cluster, Discord video and audio would stutter—my fellow hackers probably wondered why I kept freezing mid-sentence. A beefier CPU would have meant less waiting and smoother calls, but it was manageable.

    -Back to the main site
    -
    -
    -
    - - Perl New Features and Foostats - - gemini://foo.zone/gemfeed/2025-11-02-perl-new-features-and-foostats.gmi - 2025-11-01T16:10:35+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Perl recently reached rank 10 in the TIOBE index. That headline made me write this blog post as I was developing the Foostats script for simple analytics of my personal websites and Gemini capsules (e.g. `foo.zone`) and there were a couple of new features added to the Perl language over the last releases. The book *Perl New Features* by brian d foy documents the changes well; this post shows how those features look in a real program that runs every morning for my stats generation. - -
    -

    Perl New Features and Foostats


    +

    Motivation



    -Published at 2025-11-01T16:10:35+02:00
    +When I joined the hackathon, Florian's X-RAG was functional but opaque. With five services communicating via gRPC, Kafka, and HTTP, debugging was cumbersome. When a search request take 5 seconds, there was no visibility into where the time was being spent. Was it the embedding generation? The vector search? The LLM synthesis? Nobody would be able to figure it out quickly.

    -Perl recently reached rank 10 in the TIOBE index. That headline made me write this blog post as I was developing the Foostats script for simple analytics of my personal websites and Gemini capsules (e.g. foo.zone) and there were a couple of new features added to the Perl language over the last releases. The book *Perl New Features* by brian d foy documents the changes well; this post shows how those features look in a real program that runs every morning for my stats generation.
    +Distributed systems are inherently opaque. Each service logs its own view of the world, but correlating events across service boundaries is archaeology. Grepping through logs on many pods, trying to mentally reconstruct what happened—not fun. This was the perfect hackathon project: Explore this Observability Stack in greater depth.

    -Perl re-enters the top ten
    -Perl New Features by Joshua McAdams and brian d foy
    +

    The observability stack


    +
    +Before diving into implementation, here's what I deployed. The complete stack runs in the monitoring namespace:

    -$b="24P7cP3dP31P3bPaP28P24P64P31P2cP24P64P32P2cP24P73P2cP24P67P2cP24P7
    -2P29P3dP28P22P31P30P30P30P30P22P2cP22P31P30P30P30P30P30P22P2cP22P4aP75
    -P7                                                                  3P
    -74                                                                  P2
    -0P  41P6eP6fP74P     68P65P72P20P50 P65P72P6cP2     0P48P           61
    -P6  3P6bP65P72P22P   29P3bPaP40P6dP 3dP73P70P6cP6   9P74P           20
    -P2  fP2fP    2cP22P  2cP2eP3aP21P2  bP2aP    30P4f  P40P2           2P
    -3b  PaP24      P6eP3 dP6c           P65P6      eP67 P74P6           8P
    -20  P24P7      3P3bP aP24           P75P3      dP22 P20P2           2P
    -78  P24P6      eP3bP aPaP           70P72      P69P 6eP74           P2
    -0P  22P5c    P6eP20  P20P           24P75    P5cP7  2P22P           3b
    -Pa  PaP66P6fP72P2    8P24P7aP20P    3dP20P31P3bP    20P24           P7
    -aP  3cP3dP24P6       eP3bP20P24     P7aP2bP2bP      29P20           P7
    -bP  aPaP9            P77P28P24P6    4P31P29P        3bPaP           9P
    -24  P72P3            dP69           P6eP74P28       P72P6           1P
    -6e  P64P2            8P24           P6eP2 9P29P     3bPaP           9P
    -24  P67P3            dP73           P75P6  2P73P    74P72           P2
    -0P  24P73            P2cP24P72P2cP  31P3b   PaP9P   24P67P20P3fP20  P6
    -4P  6fP20            P9P7bP20PaP9P9 P9P9P    9P66P  6fP72P20P28P24  P6
    -bP  3dP30            P3bP24P6bP3cP3 9P3bP    24P6bP 2bP2bP29P20P7b  Pa
    -P9                                                                  P9
    -P9                                                                  P9
    -P9  P9P73P75P6     2P73   P74P  72P2       8P24P75P2c     P24P72    P2
    -cP  31P29P3dP24P   6dP5   bP24  P6bP       5dP3bP20Pa   P9P9  P9P9  P9
    -P9  P70P    72P69  P6eP   74P2  0P22       P20P20P24P  75P      5cP 72
    -P2  2P3b      PaP9 P9P9   P9P9  P9P7       7P28       P24        P6 4P
    -32  P29P      3bPa P9P9   P9P9  P9P7       dPaP       9P9           P9
    -P9  P9P7      3P75 P62P   73P7  4P72       P28P        24P7         5P
    -2c  P24P    72P2c  P31P   29P3  dP24       P67P3bP20P   aP9P9       P9
    -P9  P7dP20PaP9P    9P3a   P20P  72P6       5P64P6fP3b      PaP9     P7
    -3P  75P62P73P      74P7   2P28  P24P       73P2cP24P7        2P2c   P3
    -1P  29P3dP2        2P30   P22P  3bPa       P9P7                0P7  2P
    -69  P6eP74P2       0P22   P20P  20P2       4P75                 P5c P7
    -2P  22P3 bPaPa     P7dP   aPaP  77P2       0P28                 P24 P6
    -4P  32P2  9P3bP    aP70   P72P  69P6       eP74       P2        0P2 2P
    -20  P20P   24P75   P20P21P5cP7  2P22P3bPaP 73P6cP65P6 5P7     0P20  P3
    -2P  3bPa    P70P7  2P69P6eP74P  20P22P20P2 0P24P75P20  P21P  5cP6   eP
    -22  P3bP     aPaP7  3P75P62P2   0P77P20P7b PaP9P24P6c    P3dP73     P6
    -8P                                                                  69
    -P6                                                                  6P
    -74P3bPaP9P66P6fP72P28P24P6aP3dP30P3bP24P6aP3cP24P6cP3bP24P6aP2bP2bP29P
    -7bP7dPaP7dP";$b=~s/\s//g;split /P/,$b;foreach(@_){$c.=chr hex};eval $c
    -
    -The above Perl script prints out "Just Another Perl Hacker !" in an
    -animation of sorts.
    -
    +$ kubectl get pods -n monitoring
    +NAME                                  READY   STATUS
    +alloy-84ddf4cd8c-7phjp                1/1     Running
    +grafana-6fcc89b4d6-pnh8l              1/1     Running
    +kube-state-metrics-5d954c569f-2r45n   1/1     Running
    +loki-8c9bbf744-sc2p5                  1/1     Running
    +node-exporter-kb8zz                   1/1     Running
    +node-exporter-zcrdz                   1/1     Running
    +node-exporter-zmskc                   1/1     Running
    +prometheus-7f755f675-dqcht            1/1     Running
    +tempo-55df7dbcdd-t8fg9                1/1     Running
     

    -

    Table of Contents


    +Each component has a specific role:


    -

    Motivation


    -
    -I've been running foo.zone for a while now, but I've never looked into visitor statistics or analytics. I value privacy—not just my own, but also the privacy of others (the visitors of this site) — so I hesitated to use any off-the-shelf analytics plugins. All I wanted to collect were:
    +Everything is accessible via port-forwards:

      -
    • Which blog posts had the most (unique) visitors
    • -
    • Exclude, if possible, any bots and scrapers from the stats
    • -
    • Track only anonymized IP addresses, never store raw addresses
    • +
    • Grafana: http://localhost:3000 (unified UI for all three signals)
    • +
    • Prometheus: http://localhost:9090 (metrics queries)
    • +
    • Tempo: http://localhost:3200 (trace queries)
    • +
    • Loki: http://localhost:3100 (log queries)

    -With Foostats I've created a Perl script which does that for my highly opinionated website/blog setup, which consists of:
    +

    Grafana Alloy: the unified collector



    -Gemtexter, my static site and Gemini capsule generator
    -How I host this site highly-available using OpenBSD
    +Before diving into the individual signals, I want to highlight Grafana Alloy—the component that ties everything together. Alloy is Grafana's vendor-neutral OpenTelemetry Collector distribution, and it became the backbone of the observability stack.

    -

    Why I used Perl


    +Grafana Alloy documentation

    -Even though nowadays I code more in Go and Ruby, I stuck with Perl for Foostats for four simple reasons:
    +Why use a centralised collector instead of having each service push directly to backends?

      -
    • I wanted an excuse to explore the newer features of my first programming love.
    • -
    • Sometimes, I miss Perl.
    • -
    • Perl ships with OpenBSD (the operating system on which my sites run) by default.
    • -
    • It really does live up to its Practical Extraction and Report Language (that's what the name Perl means) for this kind of log grinding I did with Foostats.
    • +
    • Decoupling: Applications don't need to know about Prometheus, Tempo, or Loki. They speak OTLP, and Alloy handles the translation.
    • +
    • Unified timestamps: All telemetry flows through one system, making correlation in Grafana more reliable.
    • +
    • Processing pipeline: Batch data before sending, filter noisy metrics, enrich with labels—all in one place.
    • +
    • Backend flexibility: Switch from Tempo to Jaeger without changing application code.

    -

    Inside Foostats


    +Alloy uses a configuration language called River, which feels similar to Terraform's HCL—declarative blocks with attributes. If you've written Terraform, River will look familiar. The full Alloy configuration runs to over 1400 lines with comments explaining each section. It handles OTLP receiving, batch processing, Prometheus export, Tempo export, Kubernetes metrics scraping, infrastructure metrics, and pod log collection. All three signals—metrics, traces, logs—flow through this single component, making Alloy the central nervous system of the observability stack.

    -Foostats is simply a log file analyser, which analyses the OpenBSD httpd and relayd logs.
    +In the following sections, I'll cover each observability pillar and show the relevant Alloy configuration for each.

    -https://man.openbsd.org/httpd.8
    -https://man.openbsd.org/relayd.8
    +

    Centralised logging with Loki



    -

    Log pipeline


    +Getting all logs in one place was the foundation. I deployed Grafana Loki in the monitoring namespace, with Grafana Alloy running as a DaemonSet on each node to collect logs.

    -A CRON job starts Foostats, reads OpenBSD httpd and relayd access logs, and produces the numbers published at https://stats.foo.zone and gemini://stats.foo.zone. The dashboards are humble because traffic on my sites is still light, yet the trends are interesting for spotting patterns. The script is opinionated (I am repeating myself here, I know), and I will probably be the only one ever using it for my own sites. However, the code demonstrates how Perl's newer features help keep a small script like this exciting and fun!
    +
    +┌──────────────────────────────────────────────────────────────────────┐
    +│                           LOGS PIPELINE                              │
    +├──────────────────────────────────────────────────────────────────────┤
    +│  Applications write to stdout → containerd stores in /var/log/pods   │
    +│                                    │                                 │
    +│                              File tail                               │
    +│                                    ▼                                 │
    +│                         Grafana Alloy (DaemonSet)                    │
    +│                    Discovers pods, extracts metadata                 │
    +│                                    │                                 │
    +│                       HTTP POST /loki/api/v1/push                    │
    +│                                    ▼                                 │
    +│                           Grafana Loki                               │
    +│                   Indexes labels, stores chunks                      │
    +└──────────────────────────────────────────────────────────────────────┘
    +

    -Foostats (HTTP)
    -Foostats (Gemini)
    +

    Alloy configuration for logs



    -On OpenBSD, I've configured the job via the daily.local on both of my OpenBSD servers (fishfinger.buetow.org and blowfish.buetow.org - note one is the master server, the other is the standby server, but the script runs on both and the stats are merged later in the process):
    +Alloy discovers pods via the Kubernetes API, tails their log files from /var/log/pods/, and ships to Loki. Importantly, Alloy runs as a DaemonSet on each worker node—it doesn't run inside the application pods. Since containerd writes all container stdout/stderr to /var/log/pods/ on the node's filesystem, Alloy can tail logs for every pod on that node from a single location without any sidecar injection:
    +
    +
    +loki.source.kubernetes "pod_logs" {
    +  targets    = discovery.relabel.pod_logs.output
    +  forward_to = [loki.process.pod_logs.receiver]
    +}
    +
    +loki.write "default" {
    +  endpoint {
    +    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
    +  }
    +}
    +
    +
    +

    Querying logs with LogQL


    +
    +Now I could query logs in Loki (e.g. via Grafana UI) with LogQL:
    +
    +
    +{namespace="rag-system", container="search-ui"} |= "ERROR"
    +
    +
    +

    Metrics with Prometheus


    +
    +I added Prometheus metrics to every service. Following the Four Golden Signals (latency, traffic, errors, saturation), I instrumented the codebase with histograms, counters, and gauges:

    -
    fishfinger$ grep foostats /etc/daily.local
    -perl /usr/local/bin/foostats.pl --parse-logs --replicate --report
    +
    from prometheus_client import Histogram, Counter, Gauge
    +
    +search_duration = Histogram(
    +    "search_service_request_duration_seconds",
    +    "Total duration of Search Service requests",
    +    ["method"],
    +    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 60.0],
    +)
    +
    +errors_total = Counter(
    +    "search_service_errors_total",
    +    "Error count by type",
    +    ["method", "error_type"],
    +)
    +
    +
    +Initially, I used Prometheus scraping—each service exposed a /metrics endpoint, and Prometheus pulled metrics every 15 seconds. This worked, but I wanted a unified pipeline.
    +
    +

    Alloy configuration for application metrics


    +
    +The breakthrough came with Grafana Alloy as an OpenTelemetry collector. Services now push metrics via OTLP (OpenTelemetry Protocol), and Alloy converts them to Prometheus format:
    +
    +
    +┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
    +│ search-ui   │  │search-svc   │  │embed-svc    │  │  indexer    │
    +│ OTel Meter  │  │ OTel Meter  │  │ OTel Meter  │  │ OTel Meter  │
    +│      │      │  │      │      │  │      │      │  │      │      │
    +│ OTLPExporter│  │ OTLPExporter│  │ OTLPExporter│  │ OTLPExporter│
    +└──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘
    +       │                │                │                │
    +       └────────────────┴────────────────┴────────────────┘
    +                                 │
    +                                 ▼ OTLP/gRPC (port 4317)
    +                        ┌─────────────────────┐
    +                        │   Grafana Alloy     │
    +                        └──────────┬──────────┘
    +                                   │ prometheus.remote_write
    +                                   ▼
    +                        ┌─────────────────────┐
    +                        │    Prometheus       │
    +                        └─────────────────────┘
     

    -Internally, Foostats::Logreader parses each line of the log files /var/log/daemon* and /var/www/logs/access_log*, turns timestamps into YYYYMMDD/HHMMSS values, hashes IP addresses with SHA3 (for anonymization), and hands a normalized event to Foostats::Filter. The filter compares the URI against entries in fooodds.txt, tracks how many times an IP address requests within the exact second, and drops anything suspicious (e.g., from web crawlers or malicious attackers). Valid events reach Foostats::Aggregator, which counts requests per protocol, records unique visitors for the Gemtext and Atom feeds, and remembers page-level IP sets. Foostats::FileOutputter writes the result as gzipped JSON files—one per day and per protocol—with IPv4/IPv6 splits, filtered counters, feed readership, and hashes for long URLs.
    -
    -

    fooodds.txt


    -
    -fooodds.txt is a plain text list of substrings of URLs to be blocked, making it quick to shut down web crawlers. Foostats also detects rapid requests (an indicator of excessive crawling) and blocks the IP. Audit lines are written to /var/log/fooodds, which can later be reviewed for false or true positives (I do this around once a month). The Justfile even has a gather-fooodds target that collects suspicious paths from remote logs so new patterns can be added quickly.
    -
    -

    Feed kinds


    -
    -There are different kinds of feeds being tracked by Foostats:
    +Alloy receives OTLP on ports 4317 (gRPC) or 4318 (HTTP), batches the data for efficiency, and exports to Prometheus:

    -
      -
    • The Atom web-feed
    • -
    • The same feed via Gemini
    • -
    • The Gemfeed (a special format popular in the Geminispace)
    • -

    -

    Aggregation and output


    +
    +otelcol.receiver.otlp "default" {
    +  grpc { endpoint = "0.0.0.0:4317" }
    +  http { endpoint = "0.0.0.0:4318" }
    +  output {
    +    metrics = [otelcol.processor.batch.metrics.input]
    +    traces  = [otelcol.processor.batch.traces.input]
    +  }
    +}
    +
    +otelcol.processor.batch "metrics" {
    +  timeout = "5s"
    +  send_batch_size = 1000
    +  output { metrics = [otelcol.exporter.prometheus.default.input] }
    +}
    +
    +otelcol.exporter.prometheus "default" {
    +  forward_to = [prometheus.remote_write.prom.receiver]
    +}
    +

    -As mentioned, Foostats merges the stats from both hosts, master and standby. For the master-standby setup description, read:
    +Instead of sending each metric individually, Alloy accumulates up to 1000 metrics (or waits 5 seconds) before flushing. This reduces network overhead and protects backends from being overwhelmed.

    -KISS high-availability with OpenBSD
    +

    Kubernetes metrics: kubelet, cAdvisor, and kube-state-metrics



    -Those gzipped files land in stats/. From there, Foostats::Replicator can pull matching files from the partner host (fishfinger or blowfish) so the view covers both servers, Foostats::Merger combines them into daily summaries, and Foostats::Reporter rebuilds Gemtext and HTML reports.
    +Alloy also pulls metrics from Kubernetes itself—kubelet resource metrics, cAdvisor container metrics, and kube-state-metrics for cluster state.

    -Those are the raw stats files:
    +Why three separate sources? It does feel fragmented, but each serves a distinct purpose. kubelet exposes resource metrics about pod CPU and memory usage from its own bookkeeping—lightweight summaries of what's running on each node. cAdvisor (Container Advisor) runs inside kubelet and provides detailed container-level metrics: CPU throttling, memory working sets, filesystem I/O, network bytes. These are the raw runtime stats from containerd. kube-state-metrics is different—it doesn't measure resource usage at all. Instead, it queries the Kubernetes API and exposes the *desired state*: how many replicas a Deployment wants, whether a Pod is pending or running, what resource requests and limits are configured. You need all three because "container used 500MB" (cAdvisor), "pod requested 1GB" (kube-state-metrics), and "node has 4GB available" (kubelet) are complementary views. The fragmentation is a consequence of Kubernetes' architecture—no single component has the complete picture.

    -https://blowfish.buetow.org/foostats/
    -https://fishfinger.buetow.org/foostats/
    +None of these components speak OpenTelemetry—they all expose Prometheus-format metrics via HTTP endpoints. That's why Alloy uses prometheus.scrape instead of receiving OTLP pushes. Alloy handles both worlds: OTLP from our applications, Prometheus scraping for infrastructure.

    -These are the 30-day reports generated (already linked earlier in this post, but adding here again for clarity):
    +
    +prometheus.scrape "kubelet_resource" {
    +  targets         = discovery.relabel.kubelet.output
    +  job_name        = "kubelet-resource"
    +  scheme          = "https"
    +  scrape_interval = "30s"
    +  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
    +  tls_config { insecure_skip_verify = true }
    +  forward_to      = [prometheus.remote_write.prom.receiver]
    +}
    +
    +prometheus.scrape "cadvisor" {
    +  targets         = discovery.relabel.cadvisor.output
    +  job_name        = "cadvisor"
    +  scheme          = "https"
    +  scrape_interval = "60s"
    +  bearer_token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
    +  tls_config { insecure_skip_verify = true }
    +  forward_to      = [prometheus.relabel.cadvisor_filter.receiver]
    +}
    +
    +prometheus.scrape "kube_state_metrics" {
    +  targets = [
    +    {"__address__" = "kube-state-metrics.monitoring.svc.cluster.local:8080"},
    +  ]
    +  job_name        = "kube-state-metrics"
    +  scrape_interval = "30s"
    +  forward_to      = [prometheus.relabel.kube_state_filter.receiver]
    +}
    +

    -stats.foo.zone Gemini capsule dashboard
    -stats.foo.zone HTTP dashboard
    +Note that kubelet and cAdvisor require HTTPS with bearer token authentication (using the service account token mounted by Kubernetes), while kube-state-metrics is a simple HTTP target. cAdvisor is scraped less frequently (60s) because it returns many more metrics with higher cardinality.

    -

    Command-line entry points


    +

    Infrastructure metrics: Kafka, Redis, MinIO



    -foostats_main is the command entry point. --parse-logs refreshes the gzipped files, --replicate runs the cross-host sync, and --report rebuilds the HTML and Gemini report pages. --all performs everything in one go. Defaults point to /var/www/htdocs/buetow.org/self/foostats for data, /var/gemini/stats.foo.zone for Gemtext output, and /var/www/htdocs/gemtexter/stats.foo.zone for HTML output. Replication always forces the three most recent days' worth of data across HTTPS and leaves older files untouched to save bandwidth.
    +Application metrics weren't enough. I also needed visibility into the data layer. Each infrastructure component has a specific role in X-RAG and got its own exporter:

    -The complete source lives on Codeberg here:
    +Redis is the caching layer. It stores search results and embeddings to avoid redundant API calls to OpenAI. We collect 25 metrics via oliver006/redis_exporter running as a sidecar, including cache hit/miss rates, memory usage, connected clients, and command latencies. The key metric? redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) tells you if caching is actually helping.

    -Foostats on Codeberg
    +Kafka is the message queue connecting the ingestion API to the indexer. Documents are published to a topic, and the indexer consumes them asynchronously. We collect 12 metrics via danielqsj/kafka-exporter, with consumer lag being the most critical—it shows how far behind the indexer is. High lag means documents aren't being indexed fast enough.

    -Now let's go to some new Perl features:
    +MinIO is the S3-compatible object storage where raw documents are stored before processing. We collect 16 metrics from its native /minio/v2/metrics/cluster endpoint, covering request rates, error counts, storage usage, and cluster health.

    -

    Packages as real blocks


    +You can verify these counts by querying Prometheus directly:

    -

    Scoped packages


    +
    +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    +    | jq -r '.data[]' | grep -c '^redis_'
    +25
    +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    +    | jq -r '.data[]' | grep -c '^kafka_'
    +12
    +$ curl -s 'http://localhost:9090/api/v1/label/__name__/values' \
    +    | jq -r '.data[]' | grep -c '^minio_'
    +16
    +

    -Recent Perl versions allow the block form package Foo { ... }. Foostats uses it for every package. Imports stay local to the block, helper subs do not leak into the global symbol table, and configuration happens where the code needs it.
    +Full Alloy configuration with detailed metric filtering

    -The old way:
    +Alloy scrapes all of these and remote-writes to Prometheus:

    - -
    package foo;
    -
    -sub hello {
    -    print "Hello from package foo\n";
    +
    +prometheus.scrape "redis_exporter" {
    +  targets = [
    +    {"__address__" = "xrag-redis.rag-system.svc.cluster.local:9121"},
    +  ]
    +  job_name        = "redis"
    +  scrape_interval = "30s"
    +  forward_to      = [prometheus.relabel.redis_filter.receiver]
     }
     
    -package bar;
    +prometheus.scrape "kafka_exporter" {
    +  targets = [
    +    {"__address__" = "kafka-exporter.rag-system.svc.cluster.local:9308"},
    +  ]
    +  job_name        = "kafka"
    +  scrape_interval = "30s"
    +  forward_to      = [prometheus.relabel.kafka_filter.receiver]
    +}
     
    -sub hello {
    -    print "Hello from package bar\n";
    +prometheus.scrape "minio" {
    +  targets = [
    +    {"__address__" = "xrag-minio.rag-system.svc.cluster.local:9000"},
    +  ]
    +  job_name     = "minio"
    +  metrics_path = "/minio/v2/metrics/cluster"
    +  scrape_interval = "30s"
    +  forward_to   = [prometheus.relabel.minio_filter.receiver]
     }
     

    -But now it is also possible to do this:
    +Note that MinIO exposes metrics at a custom path (/minio/v2/metrics/cluster) rather than the default /metrics. Each exporter forwards to a relabel component that filters down to essential metrics before sending to Prometheus.

    - -
    package foo {
    -    sub hello {
    -        print "Hello from package foo\n";
    -    }
    -}
    -
    -package bar {
    -    sub hello {
    -        print "Hello from package bar\n";
    -    }
    -}
    +With all metrics in Prometheus, I can use PromQL queries in Grafana dashboards. For example, to check Kafka consumer lag and see if the indexer is falling behind:
    +
    +
    +sum by (consumergroup, topic) (kafka_consumergroup_lag)
     

    -

    Postfix dereferencing keeps data structures tidy


    +Or check Redis cache effectiveness:

    -

    Clear dereferencing


    +
    +redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total)
    +

    -The script handles nested hashes and arrays. Postfix dereferencing ($hash->%*, $array->@*) keeps that readable.
    +

    Distributed tracing with Tempo



    -E.g. instead of having to write:
    +

    Understanding traces, spans, and the trace tree



    - -
    for my $elem (@{$array_ref}) {
    -    print "$elem\n";
    -}
    -
    +Before diving into the implementation, let me explain the core concepts I learned. A trace represents a single request's journey through the entire distributed system. Think of it as a receipt that follows your request from the moment it enters the system until the final response.

    -one can now do:
    +Each trace is identified by a trace ID—a 128-bit identifier (32 hex characters) that stays constant across all services. When I make a search request, every service handling that request uses the same trace ID: 9df981cac91857b228eca42b501c98c6.

    - -
    for my $elem ($array_ref->@*) {
    -    print "$elem\n";
    -}
    +Quick video explaining the difference between trace IDs and span IDs in OpenTelemetry
    +
    +Within a trace, individual operations are recorded as spans. A span has:
    +
    +
      +
    • A span ID: 64-bit identifier (16 hex characters) unique to this operation
    • +
    • A parent span ID: links this span to its caller
    • +
    • A name: what operation this represents (e.g., "POST /api/search")
    • +
    • Start time and duration
    • +
    • Attributes: key-value metadata (e.g., http.status_code=200)
    • +

    +The first span in a trace is the root span—it has no parent. When the root span calls another service, that service creates a child span with the root's span ID as its parent. This parent-child relationship forms a tree structure:
    +
    +
    +                        ┌─────────────────────────┐
    +                        │      Root Span          │
    +                        │  POST /api/search       │
    +                        │  span_id: a1b2c3d4...   │
    +                        │  parent: (none)         │
    +                        └───────────┬─────────────┘
    +                                    │
    +              ┌─────────────────────┴─────────────────────┐
    +              │                                           │
    +              ▼                                           ▼
    +┌─────────────────────────┐             ┌─────────────────────────┐
    +│      Child Span         │             │      Child Span         │
    +│  gRPC Search            │             │  render_template        │
    +│  span_id: e5f6g7h8...   │             │  span_id: i9j0k1l2...   │
    +│  parent: a1b2c3d4...    │             │  parent: a1b2c3d4...    │
    +└───────────┬─────────────┘             └─────────────────────────┘
    +            │
    +            ├──────────────────┬──────────────────┐
    +            ▼                  ▼                  ▼
    +     ┌────────────┐     ┌────────────┐     ┌────────────┐
    +     │ Grandchild │     │ Grandchild │     │ Grandchild │
    +     │ embedding  │     │ vector     │     │ llm.rag    │
    +     │ .generate  │     │ _search    │     │ _completion│
    +     └────────────┘     └────────────┘     └────────────┘
     

    -You see that this feature becomes increasingly useful with nested data structures, e.g. to print all keys of the nested hash:
    +This tree structure answers the critical question: "What called what?" When I see a slow span, I can trace up to see what triggered it and down to see what it's waiting on.

    - -
    print for keys $hash->{stats}->%*;
    +

    How trace context propagates


    +
    +The magic that links spans across services is trace context propagation. When Service A calls Service B, it must pass along the trace ID and its own span ID (which becomes the parent). OpenTelemetry uses the W3C traceparent header:
    +
    +
    +traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
    +             │   │                                │                 │
    +             │   │                                │                 └── flags
    +             │   │                                └── parent span ID (16 hex)
    +             │   └── trace ID (32 hex)
    +             └── version
     

    -Loops over like $stats->{page_ips}->{urls}->%* or $merge{$key}->{$_}->%* show which level of the structure is in play. The merger in Foostats updates host and URL statistics without building temporary arrays, and the reporter code mirrors the layout of the final tables. Before postfix dereferencing, the same code relied on braces within braces and was harder to read.
    +For HTTP, this travels as a request header. For gRPC, it's passed as metadata. For Kafka, it's embedded in message headers. The receiving service extracts this context, creates a new span with the propagated trace ID and the caller's span ID as parent, then continues the chain.
    +
    +This is why all my spans link together—OpenTelemetry's auto-instrumentation handles propagation automatically for HTTP, gRPC, and Kafka clients.

    -

    say is the default voice now


    +

    Implementation



    -say became the default once the script switched to use v5.38;. It adds a newline to every message printed, comparable to Ruby's puts, making log messages like "Processing $path" or "Writing report to $report_path" cleaner:
    +This is where distributed tracing made the difference. I integrated OpenTelemetry auto-instrumentation for FastAPI, gRPC, and HTTP clients, plus manual spans for RAG-specific operations:

    -
    use v5.38;
    +
    from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
    +from opentelemetry.instrumentation.grpc import GrpcAioInstrumentorClient
     
    -print "Hello, world!\n";    # old way
    -say "Hello, world!";        # new way
    +# Auto-instrument frameworks
    +FastAPIInstrumentor.instrument_app(app)
    +GrpcAioInstrumentorClient().instrument()
    +
    +# Manual spans for custom operations
    +with tracer.start_as_current_span("llm.rag_completion") as span:
    +    span.set_attribute("llm.model", model_name)
    +    result = await generate_answer(query, context)
     

    -

    Lexical subs promote local reasoning


    +Auto-instrumentation is the quick win: one line of code and you get spans for every HTTP request, gRPC call, or database query. The instrumentor patches the framework at runtime, so existing code works without modification. The downside? You only get what the library authors decided to capture—generic HTTP attributes like http.method and http.status_code, but nothing domain-specific. Auto-instrumented spans also can't know your business logic, so a slow request shows up as "POST /api/search took 5 seconds" without revealing which internal operation caused the delay.

    -Lexical subroutines keep helpers close to the code that needs them. In Foostats::Logreader::parse_web_logs, functions such as my sub parse_date and my sub open_file live only inside that scope.
    +Manual spans fill that gap. By wrapping specific operations (like llm.rag_completion or vector_search.query), you get visibility into your application's unique behaviour. You can add custom attributes (llm.model, query.top_k, cache.hit) that make traces actually useful for debugging. The downside is maintenance: manual spans are code you write and maintain, and you need to decide where instrumentation adds value versus where it just adds noise. In practice, I found the right balance was auto-instrumentation for framework boundaries (HTTP, gRPC) plus manual spans for the 5-10 operations that actually matter for understanding performance.

    -This is an example of a lexical sub named trim, which is only visible within the outer sub named process_lines:
    +The magic is trace context propagation. When the Search UI calls the Search Service via gRPC, the trace ID travels in metadata headers:

    - -
    use v5.38;
    -
    -sub process_lines (@lines) {
    -    my sub trim ($str) {
    -        $str =~ s/^\s+|\s+$//gr;
    -    }
    -    return [ map { trim($_) } @lines ];
    -}
    +
    +Metadata: [
    +  ("traceparent", "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01"),
    +  ("content-type", "application/grpc"),
    +]
    +
    +
    +Spans from all services are linked by this trace ID, forming a tree:
    +
    +
    +Trace ID: 0af7651916cd43dd8448eb211c80319c
     
    -my @raw = ("  foo  ", " bar", "baz ");
    -my $cleaned = process_lines(@raw);
    -say for @$cleaned; # prints "foo", "bar", "baz"
    +├─ [search-ui] POST /api/search (300ms)
    +│   │
    +│   ├─ [search-service] Search (gRPC server) (275ms)
    +│   │   │
    +│   │   ├─ [search-service] embedding.generate (50ms)
    +│   │   │   └─ [embedding-service] Embed (45ms)
    +│   │   │       └─ POST https://api.openai.com (35ms)
    +│   │   │
    +│   │   ├─ [search-service] vector_search.query (100ms)
    +│   │   │
    +│   │   └─ [search-service] llm.rag_completion (120ms)
    +│           └─ openai.chat (115ms)
     

    -

    Reference aliasing makes intent explicit


    +

    Alloy configuration for traces



    -Reference aliasing can be enabled with use feature qw(refaliasing) and helps communicate intent more clearly (if you remember the Perl syntax, of course—otherwise, it can look rather cryptic). The filter starts with \my $uri_path = \$event->{uri_path} so any later modification touches the original event. This is an example with ref aliasing in action:
    +Traces are collected by Alloy and stored in Grafana Tempo. Alloy batches traces for efficiency before exporting via OTLP:

    - -
    use feature qw(refaliasing);
    -
    -my $hash = { foo => 42 };
    -\my $foo = \$hash->{foo};
    +
    +otelcol.processor.batch "traces" {
    +  timeout = "5s"
    +  send_batch_size = 500
    +  output { traces = [otelcol.exporter.otlp.tempo.input] }
    +}
     
    -$foo = 99;
    -print $hash->{foo}; # prints 99
    +otelcol.exporter.otlp "tempo" {
    +  client {
    +    endpoint = "tempo.monitoring.svc.cluster.local:4317"
    +    tls { insecure = true }
    +  }
    +}
     

    -The aggregator in Foostats aliases $self->{stats}{$date_key} before updating counters, so the structure remains intact. Combined with subroutine signatures, this makes it obvious when a piece of data is shared instead of copied, preventing silent bugs. This enables having shorter names for long nested data structures.
    +In Tempo's UI, I can finally see exactly where time is spent. That 5-second query? Turns out the vector search was waiting on a cold Weaviate connection. Now I knew what to fix.

    -

    Persistent state without globals


    +

    Async ingestion trace walkthrough



    -A Perl state variable is declared with state $var and retains its value between calls to the enclosing subroutine. Foostats uses that for rate limiting and de-duplicated logging.
    +One of the most powerful aspects of distributed tracing is following requests across async boundaries like message queues. The document ingestion pipeline flows through Kafka, creating spans that are linked even though they execute in different processes at different times.

    -This is a small example demonstrating the use of a state variable in Perl:
    +

    Step 1: Ingest a document



    - -
    sub counter {
    -    state $count = 0;
    -    $count++;
    -    return $count;
    +
    +$ curl -s -X POST http://localhost:8082/ingest \
    +  -H "Content-Type: application/json" \
    +  -d '{
    +    "text": "This is the X-RAG Observability Guide...",
    +    "metadata": {
    +      "title": "X-RAG Observability Guide",
    +      "source_file": "docs/OBSERVABILITY.md",
    +      "type": "markdown"
    +    },
    +    "namespace": "default"
    +  }' | jq .
    +{
    +  "document_id": "8538656a-ba99-406c-8da7-87c5f0dda34d",
    +  "status": "accepted",
    +  "minio_bucket": "documents",
    +  "minio_key": "8538656a-ba99-406c-8da7-87c5f0dda34d.json",
    +  "message": "Document accepted for processing"
     }
    -
    -say counter(); # 1
    -say counter(); # 2
    -say counter(); # 3
     

    -Hash and array state variables have been supported since state arrived in Perl 5.10. Scalar state variables were already supported previously.
    +The ingestion API immediately returns—it doesn't wait for indexing. The document is stored in MinIO and a message is published to Kafka.

    -

    Rate limiting state


    +

    Step 2: Find the ingestion trace



    -In Foostats, state variables store run-specific state without using package globals. state %blocked remembers IP hashes that already triggered the odd-request filter, and state $last_time and state %count track how many requests an IP makes in the exact second.
    +Using Tempo's HTTP API (port 3200), we can search for traces by span name using TraceQL:

    -

    De-duplicated logging


    +
    +$ curl -s -G "http://localhost:3200/api/search" \
    +  --data-urlencode 'q={name="POST /ingest"}' \
    +  --data-urlencode 'limit=3' | jq '.traces[0].traceID'
    +"b3fc896a1cf32b425b8e8c46c86c76f7"
    +

    -state %dedup keeps the log output of the suspicious calls to one warning per URI. Early versions utilized global hashes for the same tasks, producing inconsistent results during tests. Switching to state removed those edge cases.
    +

    Step 3: Fetch the complete trace



    -

    Subroutine signatures


    +
    +$ curl -s "http://localhost:3200/api/traces/b3fc896a1cf32b425b8e8c46c86c76f7" \
    +  | jq '[.batches[] | ... | {service, span}] | unique'
    +[
    +  { "service": "ingestion-api", "span": "POST /ingest" },
    +  { "service": "ingestion-api", "span": "storage.upload" },
    +  { "service": "ingestion-api", "span": "messaging.publish" },
    +  { "service": "indexer", "span": "indexer.process_document" },
    +  { "service": "indexer", "span": "document.duplicate_check" },
    +  { "service": "indexer", "span": "document.pipeline" },
    +  { "service": "indexer", "span": "storage.download" },
    +  { "service": "indexer", "span": "/xrag.embedding.EmbeddingService/EmbedBatch" },
    +  { "service": "embedding-service", "span": "openai.embeddings" },
    +  { "service": "indexer", "span": "db.insert" }
    +]
    +

    -Perl now supports subroutine signatures like other modern languages do. Foostats uses them everywhere. Examples:
    +The trace spans three services: ingestion-api, indexer, and embedding-service. The trace context propagates through Kafka, linking the original HTTP request to the async consumer processing.

    - -
    # Old way
    -sub greet_old { my $name = shift; print "Hello, $name!\n" }
    -
    -# Another old way
    -sub greet_old2 ($) { my $name = shift; print "Hello, $name!\n" }
    -
    -# New way
    -sub greet ($name) { say "Hello, $name!"; }
    -
    -greet("Alice"); # prints "Hello, Alice!"
    +

    Step 4: Analyse the async trace


    +
    +
    +ingestion-api | POST /ingest             |   16ms  ← HTTP response returns
    +ingestion-api | storage.upload           |   13ms  ← Save to MinIO
    +ingestion-api | messaging.publish        |    1ms  ← Publish to Kafka
    +              |                          |         
    +              | ~~~ Kafka queue ~~~      |         ← Async boundary
    +              |                          |         
    +indexer       | indexer.process_document | 1799ms  ← Consumer picks up message
    +indexer       | document.duplicate_check |    1ms
    +indexer       | document.pipeline        | 1796ms
    +indexer       | storage.download         |    1ms  ← Fetch from MinIO
    +indexer       | EmbedBatch (gRPC)        |  754ms  ← Call embedding service
    +embedding-svc | openai.embeddings        |  752ms  ← OpenAI API
    +indexer       | db.insert                | 1038ms  ← Store in Weaviate
     

    -In Foostats, constructors declare sub new ($class, $odds_file, $log_path), anonymous callbacks expose sub ($event), and helper subs list the values they expect, e.g.:
    +The total async processing takes ~1.8 seconds, but the user sees a 16ms response. Without tracing, debugging "why isn't my document showing up in search results?" would require correlating logs from three services manually.

    - -
    my $anon = sub ($name) {
    -    say "Hello, $name!";
    -};
    -
    -$anon->("World"); # prints "Hello, World!"
    -
    +Key insight: The trace context propagates through Kafka message headers, allowing the indexer's spans to link back to the original ingestion request. This is configured via OpenTelemetry's Kafka instrumentation.

    -

    Defined-or assignment for defaults without boilerplate


    +

    Viewing traces in Grafana



    -The operator //= keeps configuration and counters simple. Environment variables may be missing when CRON runs the script, so //=, combined with signatures, sets defaults without warnings. Example use of that operator:
    +To view a trace in Grafana's UI:

    - -
    my $foo;
    -$foo //= 42;
    -say $foo; # prints 42
    -
    -$foo //= 99;
    -say $foo; # still prints 42, because $foo was already defined
    -
    +1. Open Grafana at http://localhost:3000/explore
    +2. Select Tempo as the data source (top-left dropdown)
    +3. Choose TraceQL as the query type
    +4. Paste the trace ID: b3fc896a1cf32b425b8e8c46c86c76f7
    +5. Click Run query

    -

    Cleanup with defer


    +The trace viewer shows a Gantt chart with all spans, their timing, and parent-child relationships. Click any span to see its attributes.

    -Even though not used in Foostats, this feature (similar to Go's defer) is neat to have in Perl now.
    +Async ingestion trace in Grafana Tempo

    -The defer block (use feature 'defer") schedules a piece of code to run when the current scope exits, regardless of how it exits (e.g. normal return, exception). This is perfect for ensuring resources, such as file handles, are closed.
    +Ingestion trace node graph showing service dependencies

    - -
    use feature qw(defer);
    -
    -sub parse_log_file ($path) {
    -    open my $fh, '<', $path or die "Cannot open $path: $!";
    -    defer { close $fh };
    -
    -    while (my $line = <$fh>) {
    -        # ... parsing logic that might throw an exception ...
    +

    End-to-end search trace walkthrough


    +
    +To demonstrate the observability stack in action, here's a complete trace from a search request through all services.
    +
    +

    Step 1: Make a search request


    +
    +Normally you'd use the Search UI web interface at http://localhost:8080, but for demonstration purposes curl makes it easier to show the raw request and response:
    +
    +
    +$ curl -s -X POST http://localhost:8080/api/search \
    +  -H "Content-Type: application/json" \
    +  -d '{"query": "What is RAG?", "namespace": "default", "mode": "hybrid", "top_k": 5}' | jq .
    +{
    +  "answer": "I don't have enough information to answer this question.",
    +  "sources": [
    +    {
    +      "id": "71adbc34-56c1-4f75-9248-4ed38094ac69",
    +      "content": "# X-RAG Observability Guide This document describes...",
    +      "score": 0.8292956352233887,
    +      "metadata": {
    +        "source": "docs/OBSERVABILITY.md",
    +        "type": "markdown",
    +        "namespace": "default"
    +      }
         }
    -    # $fh is automatically closed here
    +  ],
    +  "metadata": {
    +    "namespace": "default",
    +    "num_sources": "5",
    +    "cache_hit": "False",
    +    "mode": "hybrid",
    +    "top_k": "5",
    +    "trace_id": "9df981cac91857b228eca42b501c98c6"
    +  }
     }
     

    -This pattern replaces manual close calls in every exit path of the subroutine and is more robust than relying solely on object destructors.
    -
    -

    Builtins and booleans


    +The response includes a trace_id that links this request to all spans across services.

    -The script also utilizes other modern additions that often go unnoticed. use builtin qw(true false); combined with experimental::builtin provides more real boolean values.
    +

    Step 2: Query Tempo for the trace



    -

    Conclusion


    +Using the trace ID from the response, query Tempo's API:

    -I want to code more in Perl again. The newer features make it a joy to write small scripts like Foostats. If you haven't looked at Perl in a while, give it another try! The main thing which holds me back from writing more Perl is the lack of good tooling. For example, there is no proper LSP and tree sitter support available, which would work as good as the ones available for Go and Ruby.
    +
    +$ curl -s "http://localhost:3200/api/traces/9df981cac91857b228eca42b501c98c6" \
    +  | jq '.batches[].scopeSpans[].spans[] 
    +        | {name, service: .attributes[] 
    +           | select(.key=="service.name") 
    +           | .value.stringValue}'
    +

    -A reader pointed out that there's now a third-party Perl Tree-sitter implementation one could use:
    +The raw trace shows spans from multiple services:

    -https://github.com/tree-sitter-perl/tree-sitter-perl
    +
      +
    • search-ui: POST /api/search (root span, 2138ms total)
    • +
    • search-ui: /xrag.search.SearchService/Search (gRPC client call)
    • +
    • search-service: /xrag.search.SearchService/Search (gRPC server)
    • +
    • search-service: /xrag.embedding.EmbeddingService/Embed (gRPC client)
    • +
    • embedding-service: /xrag.embedding.EmbeddingService/Embed (gRPC server)
    • +
    • embedding-service: openai.embeddings (OpenAI API call, 647ms)
    • +
    • embedding-service: POST https://api.openai.com/v1/embeddings (HTTP client)
    • +
    • search-service: vector_search.query (Weaviate hybrid search, 13ms)
    • +
    • search-service: openai.chat (LLM answer generation, 1468ms)
    • +
    • search-service: POST https://api.openai.com/v1/chat/completions (HTTP client)
    • +

    +

    Step 3: Analyse the trace



    -E-Mail your comments to paul@nospam.buetow.org :-)
    +From this single trace, I can see exactly where time is spent:

    -Other related posts are:
    +
    +Total request:                     2138ms
    +├── gRPC to search-service:        2135ms
    +│   ├── Embedding generation:       649ms
    +│   │   └── OpenAI embeddings API:   640ms
    +│   ├── Vector search (Weaviate):    13ms
    +│   └── LLM answer generation:     1468ms
    +│       └── OpenAI chat API:       1463ms
    +

    -2025-11-02 Perl New Features and Foostats (You are currently reading this)
    -2023-05-01 Unveiling guprecords.raku: Global Uptime Records with Raku
    -2022-05-27 Perl is still a great choice
    -2011-05-07 Perl Daemon (Service Framework)
    -2008-06-26 Perl Poetry
    +The bottleneck is clear: 68% of time is spent in LLM answer generation. The vector search (13ms) and embedding generation (649ms) are relatively fast. Without tracing, I would have guessed the embedding service was slow—traces proved otherwise.

    -Back to the main site
    -
    -
    -
    - - Key Takeaways from The Well-Grounded Rubyist - - gemini://foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.gmi - 2025-10-11T15:25:14+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Some time ago, I wrote about my journey into Ruby and how 'The Well-Grounded Rubyist' helped me to get a better understanding of the language. I took a lot of notes while reading the book, and I think it's time to share some of them. This is not a comprehensive review, but rather a collection of interesting tidbits and concepts that stuck with me. - -
    -

    Key Takeaways from The Well-Grounded Rubyist


    +

    Step 4: Search traces with TraceQL



    -Published at 2025-10-11T15:25:14+03:00
    +Tempo supports TraceQL for querying traces by attributes:

    -Some time ago, I wrote about my journey into Ruby and how "The Well-Grounded Rubyist" helped me to get a better understanding of the language. I took a lot of notes while reading the book, and I think it's time to share some of them. This is not a comprehensive review, but rather a collection of interesting tidbits and concepts that stuck with me.
    +
    +$ curl -s -G "http://localhost:3200/api/search" \
    +  --data-urlencode 'q={resource.service.name="search-service"}' \
    +  --data-urlencode 'limit=5' | jq '.traces[:2] | .[].rootTraceName'
    +"/xrag.search.SearchService/Search"
    +"GET /health/ready"
    +

    -

    Table of Contents


    +Other useful TraceQL queries:

    -
    -My first post about the book.
    +
    +# Find slow searches (> 2 seconds)
    +{resource.service.name="search-ui" && name="POST /api/search"} | duration > 2s
    +
    +# Find errors
    +{status=error}
    +
    +# Find OpenAI calls
    +{name=~"openai.*"}
    +

    -
    +

    Viewing the search trace in Grafana



    -

    The Object Model


    +Follow the same steps as above, but use the search trace ID: 9df981cac91857b228eca42b501c98c6

    -One of the most fascinating aspects of Ruby is its object model. The book does a great job of explaining the details.
    +Search trace in Grafana Tempo

    -

    Everything is an object (almost)


    +Search trace node graph showing service flow

    -In Ruby, most things are objects. This includes numbers, strings, and even classes themselves. This has some interesting consequences. For example, you can't use i++ like in C or Java. Integers are immutable objects. 1 is always the same object. 1 + 1 returns a new object, 2.
    +

    Correlating the three signals



    -

    The self keyword


    +The real power comes from correlating traces, metrics, and logs. When an alert fires for high error rate, I follow this workflow:

    -There is always a current object, self. If you call a method without an explicit receiver, it's called on self. For example, puts "hello" is actually self.puts "hello".
    +1. Metrics: Prometheus shows error spike started at 10:23:00
    +2. Traces: Query Tempo for traces with status=error around that time
    +3. Logs: Use the trace ID to find detailed error messages in Loki

    - -
    # At the top level, self is the main object
    -p self
    -# => main
    -p self.class
    -# => Object
    -
    -def foo
    -  # Inside a method, self is the object that received the call
    -  p self
    -end
    -
    -foo
    -# => main
    +
    +{namespace="rag-system"} |= "trace_id=abc123" |= "error"
     

    -This code demonstrates how self changes depending on the context. At the top level, it's main, an instance of Object. When foo is called without a receiver, it's called on main.
    +Prometheus exemplars link specific metric samples to trace IDs, so I can click directly from a latency spike to the responsible trace.

    -

    Singleton Methods


    +

    Grafana dashboards



    -You can add methods to individual objects. These are called singleton methods.
    +During the hackathon, I also created six pre-built Grafana dashboards that are automatically provisioned when the monitoring stack starts:

    - -
    obj = "a string"
    -
    -def obj.shout
    -  self.upcase + "!"
    -end
    -
    -p obj.shout
    -# => "A STRING!"
    -
    -obj2 = "another string"
    -# obj2.shout would raise a NoMethodError
    -
    +| Dashboard | Description |
    +|-----------|-------------|
    +| **X-RAG Overview** | The main dashboard with 22 panels covering request rates, latencies, error rates, and service health across all X-RAG components |
    +| **OpenTelemetry HTTP Metrics** | HTTP request/response metrics from OpenTelemetry-instrumented services—request rates, latency percentiles, and status code breakdowns |
    +| **Pod System Metrics** | Kubernetes pod resource utilisation: CPU usage, memory consumption, network I/O, disk I/O, and pod state from kube-state-metrics |
    +| **Redis** | Cache performance: memory usage, hit/miss rates, commands per second, connected clients, and memory fragmentation |
    +| **Kafka** | Message queue health: consumer lag (critical for indexer monitoring), broker status, topic partitions, and throughput |
    +| **MinIO** | Object storage metrics: S3 request rates, error counts, traffic volume, bucket sizes, and disk usage |

    -Here, the shout method is only available on the obj object. This is a powerful feature for adding behavior to specific instances.
    +All dashboards are stored as JSON files in infra/k8s/monitoring/grafana-dashboards/ and deployed via ConfigMaps, so they survive pod restarts and cluster recreations.

    -

    Classes are Objects


    +X-RAG Overview dashboard
    +Pod System Metrics dashboard

    -Classes themselves are objects, instances of the Class class. This means you can create classes dynamically.
    +

    Results: two days well spent



    - -
    MyClass = Class.new do
    -  def say_hello
    -    puts "Hello from a dynamically created class!"
    -  end
    -end
    -
    -instance = MyClass.new
    -instance.say_hello
    -# => Hello from a dynamically created class!
    -
    +What did two days of hackathon work achieve? The system went from flying blind to fully instrumented:

    -This shows how to create a new class and assign it to a constant. This is what happens behind the scenes when you use the class keyword.
    +
      +
    • All three pillars implemented: logs (Loki), metrics (Prometheus), traces (Tempo)
    • +
    • Unified collection via Grafana Alloy
    • +
    • Infrastructure metrics for Kafka, Redis, and MinIO
    • +
    • Six pre-built Grafana dashboards covering application metrics, pod resources, and infrastructure
    • +
    • Trace context propagation across all gRPC calls
    • +

    +The biggest insight from testing? The embedding service wasn't the bottleneck I assumed. Traces revealed that LLM synthesis dominated latency, not embedding generation. Without tracing, optimisation efforts would have targeted the wrong component.

    -

    Control Flow and Methods


    +Beyond the technical wins, I had a lot of fun. The hackathon brought together people working on different projects, and I got to know some really nice folks during the sessions themselves. There's something energising about being in a (virtual) room with other people all heads-down on their own challenges—even if you're not collaborating directly, the shared focus is motivating.

    -The book clarified many things about how methods and control flow work in Ruby.
    +

    SLIs, SLOs and SLAs



    -

    case and the === operator


    +The system now has full observability, but there's always more. And to be clear: this is not production-grade yet. It works well for development and could scale to production, but that would need to be validated with proper load testing and chaos testing first. We haven't stress-tested the observability pipeline under heavy load, nor have we tested failure scenarios like Tempo going down or Alloy running out of memory. The Alloy config includes comments on sampling strategies and rate limiting that would be essential for high-traffic environments.

    -The case statement is more powerful than I thought. It uses the === (threequals or case equality) operator for comparison, not ==. Different classes can implement === in their own way.
    +One thing we didn't cover: monitoring and alerting. These are related but distinct from observability. Observability is about collecting and exploring data to understand system behaviour. Monitoring is about defining thresholds and alerting when they're breached. We have Prometheus with all the metrics, but no alerting rules yet—no PagerDuty integration, no Slack notifications when latency spikes or error rates climb.

    - -
    # For ranges, it checks for inclusion
    -p (1..5) === 3 # => true
    -
    -# For classes, it checks if the object is an instance of the class
    -p String === "hello" # => true
    -
    -# For regexes, it checks for a match
    -p /llo/ === "hello" # => true
    -
    -def check(value)
    -  case value
    -  when String
    -    "It's a string"
    -  when (1..10)
    -    "It's a number between 1 and 10"
    -  else
    -    "Something else"
    -  end
    -end
    -
    -p check(5) # => "It's a number between 1 and 10"
    -
    +We also didn't define any SLIs (Service Level Indicators) or SLOs (Service Level Objectives). An SLI is a quantitative measure of service quality—for example, "99th percentile search latency" or "percentage of requests returning successfully." An SLO is a target for that indicator—"99th percentile latency should be under 2 seconds" or "99.9% of requests should succeed." Without SLOs, you don't know what "good" looks like, and alerting becomes arbitrary.

    -

    Blocks and yield


    +For X-RAG specifically, potential SLOs might include:

    -Blocks are a cornerstone of Ruby. You can pass them to methods to customize their behavior. The yield keyword is used to call the block.
    +
      +
    • Search latency: 99th percentile over 5 minutes search response time under 3 seconds
    • +
    • Uptime: 99.9% availability of the search API endpoint
    • +
    • Response quality: How good was the search? There are some metrics which could be used...
    • +

    +SLAs (Service Level Agreements) are often confused with SLOs, but they're different. An SLA is a contractual commitment to customers—a legally binding promise with consequences (refunds, credits, penalties) if you fail to meet it. SLOs are internal engineering targets; SLAs are external business promises. Typically, SLAs are less strict than SLOs: if your internal target is 99.9% availability (SLO), your customer contract might promise 99.5% (SLA), giving you a buffer before you owe anyone money.

    - -
    def my_iterator
    -  puts "Entering the method"
    -  yield
    -  puts "Back in the method"
    -  yield
    -end
    -
    -my_iterator { puts "Inside the block" }
    -# Entering the method
    -# Inside the block
    -# Back in the method
    -# Inside the block
    -
    +But then again, X-RAG is a proof-of-concept, a prototype, a learning system—there are no real customers to disappoint. SLOs would become essential if this ever served actual users, and SLAs would follow once there's a business relationship to protect.

    -This simple iterator shows how yield transfers control to the block. You can also pass arguments to yield and get a return value from the block.
    +

    Using Amp for AI-assisted development



    - -
    def with_return
    -  result = yield(5)
    -  puts "The block returned #{result}"
    -end
    -
    -with_return { |n| n * 2 }
    -# => The block returned 10
    -
    +I used Amp (formerly Ampcode) throughout this project. While I knew what I wanted to achieve, I let the LLM generate the actual configurations, Kubernetes manifests, and Python instrumentation code.

    -This demonstrates passing an argument to the block and using its return value.
    +Amp - AI coding agent by Sourcegraph

    -

    Fun with Data Types


    +My workflow was step-by-step rather than handing over a grand plan:

    -Ruby's core data types are full of nice little features.
    +1. "Deploy Grafana Alloy to the monitoring namespace"
    +2. "Verify Alloy is running and receiving data"
    +3. "Document what we did to docs/OBSERVABILITY.md"
    +4. "Commit with message 'feat: add Grafana Alloy for telemetry collection'"
    +5. Hand off context, start fresh: "Now instrument the search-ui with OpenTelemetry to push traces to Alloy..."

    -

    Symbols


    +Chaining many small, focused tasks worked better than one massive plan. Each task had clear success criteria, and I could verify results before moving on. The LLM generated the River configuration, the OpenTelemetry Python code, the Kubernetes manifests—I reviewed, tweaked, and committed.

    -Symbols are like immutable strings. They are great for keys in hashes because they are unique and memory-efficient.
    +I only ran out of the 200k token context window once, during a debugging session that involved restarting the Kubernetes cluster multiple times. The fix required correlating error messages across several services, and the conversation history grew too long. Starting a fresh context and summarising the problem solved it.

    - -
    # Two strings with the same content are different objects
    -p "foo".object_id
    -p "foo".object_id
    -
    -# Two symbols with the same content are the same object
    -p :foo.object_id
    -p :foo.object_id
    -
    -# Modern hash syntax uses symbols as keys
    -my_hash = { name: "Paul", language: "Ruby" }
    -p my_hash[:name] # => "Paul"
    -
    +Amp automatically selects the best model for the task at hand. Based on the response speed and Sourcegraph's recent announcements, I believe it was using Claude Opus 4.5 for most of my coding and infrastructure work. The quality was excellent—it understood Python, Kubernetes, OpenTelemetry, and Grafana tooling without much hand-holding.

    -This code highlights the difference between strings and symbols and shows the convenient hash syntax.
    +Let me be clear: without the LLM, I'd never have managed to write all these configuration files by hand in two days. The Alloy config alone is 1400+ lines. But I also reviewed and verified every change manually, verified it made sense, and understood what was being deployed. This wasn't vibe-coding—the whole point of the hackathon was to learn. I already knew Grafana and Prometheus from previous work, but OpenTelemetry, Alloy, Tempo, Loki and the X-RAG system overall were all pretty new to me. By reviewing each generated config and understanding why it was structured that way, I actually learned the tools rather than just deploying magic incantations.

    -

    Arrays and Hashes


    +Cost-wise, I spent around 20 USD on Amp credits over the two-day hackathon. For the amount of code generated, configs reviewed, and debugging assistance—that's remarkably affordable.

    -Arrays and hashes have a rich API. The %w and %i shortcuts for creating arrays of strings and symbols are very handy.
    +

    Other changes along the way



    - -
    # Array of strings
    -p %w[one two three]
    -# => ["one", "two", "three"]
    -
    -# Array of symbols
    -p %i[one two three]
    -# => [:one, :two, :three]
    -
    +Looking at the git history, I made 25 commits during the hackathon. Beyond the main observability features, there were several smaller but useful additions:

    -A quick way to create arrays. You can also retrieve multiple values at once.
    +OBSERVABILITY_ENABLED flag: Added an environment variable to completely disable the monitoring stack. Set OBSERVABILITY_ENABLED=false in .env and the cluster starts without Prometheus, Grafana, Tempo, Loki, or Alloy. Useful when you just want to work on application code without the overhead.

    - -
    arr = [10, 20, 30, 40, 50]
    -p arr.values_at(0, 2, 4)
    -# => [10, 30, 50]
    -
    -hash = { a: 1, b: 2, c: 3 }
    -p hash.values_at(:a, :c)
    -# => [1, 3]
    -
    +Load generator: Added a make load-gen target that fires concurrent requests at the search API. Useful for generating enough trace data to see patterns in Tempo, and for stress-testing the observability pipeline itself.

    -The values_at method is a concise way to get multiple elements.
    +Verification scripts: Created scripts to test that OTLP is actually reaching Alloy and that traces appear in Tempo. Debugging "why aren't my traces showing up?" is frustrating without a systematic way to verify each hop in the pipeline.

    -

    Final Thoughts


    +Moving monitoring to dedicated namespace: Refactored from having observability components scattered across namespaces to a clean monitoring namespace. Makes kubectl get pods -n monitoring show exactly what's running for observability.

    -These are just a few of the many things I learned from "The Well-Grounded Rubyist". The book gave me a much deeper appreciation for the language and its design. If you are a Ruby programmer, I highly recommend it. Meanwhile, I also read the book "Programming Ruby 3.3", just I didn't have time to process my notes there yet.
    +

    Lessons learned



    -E-Mail your comments to paul@nospam.buetow.org :-)
    +
      +
    • Start with metrics, but don't stop there—they tell you *what*, not *why*
    • +
    • Trace context propagation is the key to distributed debugging
    • +
    • Grafana Alloy as a unified collector simplifies the pipeline
    • +
    • Infrastructure metrics matter—your app is only as fast as your data layer
    • +
    • The three pillars work together; none is sufficient alone
    • +

    +All manifests and observability code live in Florian's repository:

    -Other Ruby-related posts:
    +X-RAG on GitHub (source code, K8s manifests, observability configs)

    -2026-03-02 RCM: The Ruby Configuration Management DSL
    -2025-10-11 Key Takeaways from The Well-Grounded Rubyist (You are currently reading this)
    -2021-07-04 The Well-Grounded Rubyist
    +The best part? Everything I learned during this hackathon—OpenTelemetry instrumentation, Grafana Alloy configuration, trace context propagation, PromQL queries—I can immediately apply at work as we are shifting to that new observability stack and I am going to have a few meetings talking with developers how and what they need to implement for application instrumentalization. Observability patterns are universal, and hands-on experience with a real distributed system beats reading documentation any day.
    +
    +E-Mail your comments to paul@nospam.buetow.org

    Back to the main site
    - f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments - - gemini://foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.gmi - 2025-10-02T11:27:19+03:00, last updated Tue 30 Dec 10:11:58 EET 2025 + f3s: Kubernetes with FreeBSD - Part 8: Observability + + gemini://foo.zone/gemfeed/2025-12-07-f3s-kubernetes-with-freebsd-part-8.gmi + 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026 Paul Buetow aka snonux paul@dev.buetow.org - This is the seventh blog post about the f3s series for my self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines. + This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.
    -

    f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments


    +

    f3s: Kubernetes with FreeBSD - Part 8: Observability



    -Published at 2025-10-02T11:27:19+03:00, last updated Tue 30 Dec 10:11:58 EET 2025
    +Published at 2025-12-06T23:58:24+02:00, last updated Mon 09 Mar 09:33:08 EET 2026

    -This is the seventh blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.
    +This is the 8th blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.

    2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    @@ -6235,9964 +4537,11538 @@ p hash.values_at(:a, :c) 2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)

    f3s logo

    Table of Contents




    Introduction



    -In this blog post, I am finally going to install k3s (the Kubernetes distribution I use) to the whole setup and deploy the first workloads (helm charts, and a private registry) to it.
    +In this blog post, I set up a complete observability stack for the k3s cluster. Observability is crucial for understanding what's happening inside the cluster—whether its tracking resource usage, debugging issues, or analysing application behaviour. The stack consists of five main components, all deployed into the monitoring namespace:

    -https://k3s.io
    +
      +
    • Prometheus: time-series database for metrics collection and alerting
    • +
    • Grafana: visualisation and dashboarding frontend
    • +
    • Loki: log aggregation system (like Prometheus, but for logs)
    • +
    • Alloy: telemetry collector that ships logs and traces from all pods to Loki and Tempo
    • +
    • Tempo: distributed tracing backend for request flow analysis across microservices
    • +

    +Together, these form the "PLG" stack (Prometheus, Loki, Grafana) extended with Tempo for distributed tracing, which is a popular open-source alternative to commercial observability platforms.
    +
    +All manifests for the f3s stack live in my configuration repository:
    +
    +codeberg.org/snonux/conf/f3s

    Important Note: GitOps Migration



    -**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests and Helm charts in the repository have been reorganized for ArgoCD-based continuous deployment.
    +**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests, Helm charts, and Justfiles in the repository have been reorganized for ArgoCD-based continuous deployment.

    -**To view the exact manifests and charts as they existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:
    +**To view the exact configuration as it existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:

    -
    $ git clone https://codeberg.org/snonux/conf.git
    -$ cd conf
    -$ git checkout 15a86f3  # Last commit before ArgoCD migration
    -$ cd f3s/
    +
    $ git clone https://codeberg.org/snonux/conf.git
    +$ cd conf
    +$ git checkout 15a86f3  # Last commit before ArgoCD migration
    +$ cd f3s/prometheus/
     

    **Current master branch** contains the ArgoCD-managed versions with:
    • Application manifests organized under argocd-apps/{monitoring,services,infra,test}/
    • -
    • Additional resources under */manifests/ directories (e.g., prometheus/manifests/)
    • +
    • Resources organized under prometheus/manifests/, loki/, etc.
    • Justfiles updated to trigger ArgoCD syncs instead of direct Helm commands

    -The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD).
    +The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD).

    -

    Updating


    +

    Persistent storage recap



    -Before proceeding, I bring all systems involved up-to-date. On all three Rocky Linux 9 boxes r0, r1, and r2:
    +All observability components need persistent storage so that metrics and logs survive pod restarts. As covered in Part 6 of this series, the cluster uses NFS-backed persistent volumes:

    - -
    dnf update -y
    -reboot
    -
    +f3s: Kubernetes with FreeBSD - Part 6: Storage

    -On the FreeBSD hosts, I upgraded from FreeBSD 14.2 to 14.3-RELEASE, running this on all three hosts f0, f1 and f2:
    +The FreeBSD hosts (f0, f1) serve as master-standby NFS servers, exporting ZFS datasets that are replicated across hosts using zrepl. The Rocky Linux k3s nodes (r0, r1, r2) mount these exports at /data/nfs/k3svolumes. This directory contains subdirectories for each application that needs persistent storage—including Prometheus, Grafana, and Loki.

    - -
    paul@f0:~ % doas freebsd-update fetch
    -paul@f0:~ % doas freebsd-update install
    -paul@f0:~ % doas reboot
    -.
    -.
    -.
    -paul@f0:~ % doas freebsd-update -r 14.3-RELEASE upgrade
    -paul@f0:~ % doas freebsd-update install
    -paul@f0:~ % doas freebsd-update install
    -paul@f0:~ % doas reboot
    -.
    -.
    -.
    -paul@f0:~ % doas freebsd-update install
    -paul@f0:~ % doas pkg update
    -paul@f0:~ % doas pkg upgrade
    -paul@f0:~ % doas reboot
    -.
    -.
    -.
    -paul@f0:~ % uname -a
    -FreeBSD f0.lan.buetow.org 14.3-RELEASE FreeBSD 14.3-RELEASE
    -        releng/14.3-n271432-8c9ce319fef7 GENERIC amd64
    -
    +For example, the observability stack uses these paths on the NFS share:

    -

    Installing k3s


    +
      +
    • /data/nfs/k3svolumes/prometheus/data — Prometheus time-series database
    • +
    • /data/nfs/k3svolumes/grafana/data — Grafana configuration, dashboards, and plugins
    • +
    • /data/nfs/k3svolumes/loki/data — Loki log chunks and index
    • +
    • /data/nfs/k3svolumes/tempo/data — Tempo trace data and WAL
    • +

    +Each path gets a corresponding PersistentVolume and PersistentVolumeClaim in Kubernetes, allowing pods to mount them as regular volumes. Because the underlying storage is ZFS with replication, we get snapshots and redundancy for free.

    -

    Generating K3S_TOKEN and starting the first k3s node


    +

    The monitoring namespace



    -I generated the k3s token on my Fedora laptop with pwgen -n 32 and selected one of the results. Then, on all three r hosts, I ran the following (replace SECRET_TOKEN with the actual secret):
    +First, I created the monitoring namespace where all observability components will live:

    -
    [root@r0 ~]# echo -n SECRET_TOKEN > ~/.k3s_token
    +
    $ kubectl create namespace monitoring
    +namespace/monitoring created
     

    -The following steps are also documented on the k3s website:
    +

    Installing Prometheus and Grafana



    -https://docs.k3s.io/datastore/ha-embedded
    +Prometheus and Grafana are deployed together using the kube-prometheus-stack Helm chart from the Prometheus community. This chart bundles Prometheus, Grafana, Alertmanager, and various exporters (Node Exporter, Kube State Metrics) into a single deployment. Ill explain what each component does in detail later when we look at the running pods.

    -To bootstrap k3s on the first node, I ran this on r0:
    +

    Prerequisites


    +
    +Add the Prometheus Helm chart repository:

    -
    [root@r0 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    -        sh -s - server --cluster-init \
    -        --node-ip=192.168.2.120 \
    -        --advertise-address=192.168.2.120 \
    -        --tls-san=r0.wg0.wan.buetow.org
    -[INFO]  Finding release for channel stable
    -[INFO]  Using v1.32.6+k3s1 as release
    -.
    -.
    -.
    -[INFO]  systemd: Starting k3s
    +
    $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    +$ helm repo update
     

    -Note: The --node-ip and --advertise-address flags are important to ensure that the embedded etcd cluster communicates over the WireGuard interface (192.168.2.x) rather than the LAN interface (192.168.1.x). This ensures that all control plane traffic is encrypted via WireGuard.
    -
    -

    Adding the remaining nodes to the cluster


    -
    -Then I ran on the other two nodes r1 and r2:
    +Create the directories on the NFS server for persistent storage:

    -
    [root@r1 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    -        sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
    -        --node-ip=192.168.2.121 \
    -        --advertise-address=192.168.2.121 \
    -        --tls-san=r1.wg0.wan.buetow.org
    -
    -[root@r2 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    -        sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
    -        --node-ip=192.168.2.122 \
    -        --advertise-address=192.168.2.122 \
    -        --tls-san=r2.wg0.wan.buetow.org
    -.
    -.
    -.
    -
    +
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/prometheus/data
    +[root@r0 ~]# mkdir -p /data/nfs/k3svolumes/grafana/data
     

    -Once done, I had a three-node Kubernetes cluster control plane:
    +

    Deploying with the Justfile


    +
    +The configuration repository contains a Justfile that automates the deployment. just is a handy command runner—think of it as a simpler, more modern alternative to make. I use it throughout the f3s repository to wrap repetitive Helm and kubectl commands:
    +
    +just - A handy way to save and run project-specific commands
    +codeberg.org/snonux/conf/f3s/prometheus
    +
    +To install everything:

    -
    [root@r0 ~]# kubectl get nodes
    -NAME                STATUS   ROLES                       AGE     VERSION
    -r0.lan.buetow.org   Ready    control-plane,etcd,master   4m44s   v1.32.6+k3s1
    -r1.lan.buetow.org   Ready    control-plane,etcd,master   3m13s   v1.32.6+k3s1
    -r2.lan.buetow.org   Ready    control-plane,etcd,master   30s     v1.32.6+k3s1
    -
    -[root@r0 ~]# kubectl get pods --all-namespaces
    -NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
    -kube-system   coredns-5688667fd4-fs2jj                  1/1     Running     0          5m27s
    -kube-system   helm-install-traefik-crd-f9hgd            0/1     Completed   0          5m27s
    -kube-system   helm-install-traefik-zqqqk                0/1     Completed   2          5m27s
    -kube-system   local-path-provisioner-774c6665dc-jqlnc   1/1     Running     0          5m27s
    -kube-system   metrics-server-6f4c6675d5-5xpmp           1/1     Running     0          5m27s
    -kube-system   svclb-traefik-411cec5b-cdp2l              2/2     Running     0          78s
    -kube-system   svclb-traefik-411cec5b-f625r              2/2     Running     0          4m58s
    -kube-system   svclb-traefik-411cec5b-twrd7              2/2     Running     0          4m2s
    -kube-system   traefik-c98fdf6fb-lt6fx                   1/1     Running     0          4m58s
    +
    $ cd conf/f3s/prometheus
    +$ just install
    +kubectl apply -f persistent-volumes.yaml
    +persistentvolume/prometheus-data-pv created
    +persistentvolume/grafana-data-pv created
    +persistentvolumeclaim/grafana-data-pvc created
    +helm install prometheus prometheus-community/kube-prometheus-stack \
    +    --namespace monitoring -f persistence-values.yaml
    +NAME: prometheus
    +LAST DEPLOYED: ...
    +NAMESPACE: monitoring
    +STATUS: deployed
     

    -In order to connect with kubectl from my Fedora laptop, I had to copy /etc/rancher/k3s/k3s.yaml from r0 to ~/.kube/config and then replace the value of the server field with r0.lan.buetow.org. kubectl can now manage the cluster. Note that this step has to be repeated when I want to connect to another node of the cluster (e.g. when r0 is down).
    -
    -

    Test deployments


    +The persistence-values.yaml configures Prometheus and Grafana to use the NFS-backed persistent volumes I mentioned earlier, ensuring data survives pod restarts. It also enables scraping of etcd and kube-controller-manager metrics:

    -

    Test deployment to Kubernetes


    +
    +kubeEtcd:
    +  enabled: true
    +  endpoints:
    +    - 192.168.2.120
    +    - 192.168.2.121
    +    - 192.168.2.122
    +  service:
    +    enabled: true
    +    port: 2381
    +    targetPort: 2381
    +
    +kubeControllerManager:
    +  enabled: true
    +  endpoints:
    +    - 192.168.2.120
    +    - 192.168.2.121
    +    - 192.168.2.122
    +  service:
    +    enabled: true
    +    port: 10257
    +    targetPort: 10257
    +  serviceMonitor:
    +    enabled: true
    +    https: true
    +    insecureSkipVerify: true
    +

    -Let's create a test namespace:
    +By default, k3s binds the controller-manager to localhost only and doesn't expose etcd metrics, so the "Kubernetes / Controller Manager" and "etcd" dashboards in Grafana will show no data. To fix both, add the following to /etc/rancher/k3s/config.yaml on each k3s server node:

    -
    > ~ kubectl create namespace test
    -namespace/test created
    -
    -> ~ kubectl get namespaces
    -NAME              STATUS   AGE
    -default           Active   6h11m
    -kube-node-lease   Active   6h11m
    -kube-public       Active   6h11m
    -kube-system       Active   6h11m
    -test              Active   5s
    -
    -> ~ kubectl config set-context --current --namespace=test
    -Context "default" modified.
    +
    [root@r0 ~]# cat >> /etc/rancher/k3s/config.yaml << 'EOF'
    +kube-controller-manager-arg:
    +  - bind-address=0.0.0.0
    +etcd-expose-metrics: true
    +EOF
    +[root@r0 ~]# systemctl restart k3s
     

    -And let's also create an Apache test pod:
    +Repeat for r1 and r2. After restarting all nodes, the controller-manager metrics endpoint will be accessible and etcd metrics are available on port 2381. Prometheus can now scrape both.
    +
    +Verify etcd metrics are exposed:

    -
    > ~ cat <<END > apache-deployment.yaml
    -# Apache HTTP Server Deployment
    -apiVersion: apps/v1
    -kind: Deployment
    -metadata:
    -  name: apache-deployment
    -spec:
    -  replicas: 1
    -  selector:
    -    matchLabels:
    -      app: apache
    -  template:
    -    metadata:
    -      labels:
    -        app: apache
    -    spec:
    -      containers:
    -      - name: apache
    -        image: httpd:latest
    -        ports:
    -        # Container port where Apache listens
    -        - containerPort: 80
    -END
    -
    -> ~ kubectl apply -f apache-deployment.yaml
    -deployment.apps/apache-deployment created
    -
    -> ~ kubectl get all
    -NAME                                     READY   STATUS    RESTARTS   AGE
    -pod/apache-deployment-5fd955856f-4pjmf   1/1     Running   0          7s
    -
    -NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
    -deployment.apps/apache-deployment   1/1     1            1           7s
    -
    -NAME                                           DESIRED   CURRENT   READY   AGE
    -replicaset.apps/apache-deployment-5fd955856f   1         1         1       7s
    +
    [root@r0 ~]# curl -s http://127.0.0.1:2381/metrics | grep etcd_server_has_leader
    +etcd_server_has_leader 1
     

    -Let's also create a service:
    +The full persistence-values.yaml and all other Prometheus configuration files are available on Codeberg:
    +
    +codeberg.org/snonux/conf/f3s/prometheus
    +
    +The persistent volume definitions bind to specific paths on the NFS share using hostPath volumes—the same pattern used for other services in Part 7:
    +
    +f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +
    +

    Exposing Grafana via ingress


    +
    +The chart also deploys an ingress for Grafana, making it accessible at grafana.f3s.foo.zone. The ingress configuration follows the same pattern as other services in the cluster—Traefik handles the routing internally, while the OpenBSD edge relays terminate TLS and forward traffic through WireGuard.
    +
    +Once deployed, Grafana is accessible and comes pre-configured with Prometheus as a data source. You can verify the Prometheus service is running:

    -
    > ~ cat <<END > apache-service.yaml
    -apiVersion: v1
    -kind: Service
    -metadata:
    -  labels:
    -    app: apache
    -  name: apache-service
    -spec:
    -  ports:
    -    - name: web
    -      port: 80
    -      protocol: TCP
    -      # Expose port 80 on the service
    -      targetPort: 80
    -  selector:
    -  # Link this service to pods with the label app=apache
    -    app: apache
    -END
    -
    -> ~ kubectl apply -f apache-service.yaml
    -service/apache-service created
    -
    -> ~ kubectl get service
    -NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
    -apache-service   ClusterIP   10.43.249.165   <none>        80/TCP    4s
    +
    $ kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
    +NAME                                    TYPE        CLUSTER-IP      PORT(S)
    +prometheus-kube-prometheus-prometheus   ClusterIP   10.43.152.163   9090/TCP,8080/TCP
     

    -Now let's create an ingress:
    +Grafana connects to Prometheus using the internal service URL http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090. The default Grafana credentials are admin/prom-operator, which should be changed immediately after first login.

    -Note: I've modified the hosts listed in this example after I published this blog post to ensure that there aren't any bots scraping it.
    +Grafana dashboard showing Prometheus metrics
    +
    +Grafana dashboard showing cluster metrics
    +
    +Grafana etcd dashboard showing cluster health, RPC rate, disk sync duration, and peer round trip times
    +
    +

    Installing Loki and Alloy


    +
    +While Prometheus handles metrics, Loki handles logs. It's designed to be cost-effective and easy to operate—it doesn't index the contents of logs, only the metadata (labels), making it very efficient for storage.
    +
    +Alloy is Grafana's telemetry collector (the successor to Promtail). It runs as a DaemonSet on each node, tails container logs, and ships them to Loki.
    +
    +

    Prerequisites


    +
    +Create the data directory on the NFS server:

    -
    > ~ cat <<END > apache-ingress.yaml
    -
    -apiVersion: networking.k8s.io/v1
    -kind: Ingress
    -metadata:
    -  name: apache-ingress
    -  namespace: test
    -  annotations:
    -    spec.ingressClassName: traefik
    -    traefik.ingress.kubernetes.io/router.entrypoints: web
    -spec:
    -  rules:
    -    - host: f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -    - host: standby.f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -    - host: www.f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -END
    -
    -> ~ kubectl apply -f apache-ingress.yaml
    -ingress.networking.k8s.io/apache-ingress created
    -
    -> ~ kubectl describe ingress
    -Name:             apache-ingress
    -Labels:           <none>
    -Namespace:        test
    -Address:          192.168.2.120,192.168.2.121,192.168.2.122
    -Ingress Class:    traefik
    -Default backend:  <default>
    -Rules:
    -  Host                    Path  Backends
    -  ----                    ----  --------
    -  f3s.foo.zone
    -                          /   apache-service:80 (10.42.1.11:80)
    -  standby.f3s.foo.zone
    -                          /   apache-service:80 (10.42.1.11:80)
    -  www.f3s.foo.zone
    -                          /   apache-service:80 (10.42.1.11:80)
    -Annotations:              spec.ingressClassName: traefik
    -                          traefik.ingress.kubernetes.io/router.entrypoints: web
    -Events:                   <none>
    +
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/loki/data
     

    -Notes:
    +

    Deploying Loki and Alloy



    -
      -
    • In the ingress, I use plain HTTP (web) for the Traefik rule, as all the "production" traffic will be routed through a WireGuard tunnel anyway, as I will show later.
    • -

    -So I tested the Apache web server through the ingress rule:
    +The Loki configuration also lives in the repository:
    +
    +codeberg.org/snonux/conf/f3s/loki
    +
    +To install:

    -
    > ~ curl -H "Host: www.f3s.foo.zone" http://r0.lan.buetow.org:80
    -<html><body><h1>It works!</h1></body></html>
    +
    $ cd conf/f3s/loki
    +$ just install
    +helm repo add grafana https://grafana.github.io/helm-charts || true
    +helm repo update
    +kubectl apply -f persistent-volumes.yaml
    +persistentvolume/loki-data-pv created
    +persistentvolumeclaim/loki-data-pvc created
    +helm install loki grafana/loki --namespace monitoring -f values.yaml
    +NAME: loki
    +LAST DEPLOYED: ...
    +NAMESPACE: monitoring
    +STATUS: deployed
    +...
    +helm install alloy grafana/alloy --namespace monitoring -f alloy-values.yaml
    +NAME: alloy
    +LAST DEPLOYED: ...
    +NAMESPACE: monitoring
    +STATUS: deployed
     

    -

    Test deployment with persistent volume claim


    +Loki runs in single-binary mode with a single replica (loki-0), which is appropriate for a home lab cluster. This means there's only one Loki pod running at any time. If the node hosting Loki fails, Kubernetes will automatically reschedule the pod to another worker node—but there will be a brief downtime (typically under a minute) while this happens. For my home lab use case, this is perfectly acceptable.

    -Next, I modified the Apache example to serve the htdocs directory from the NFS share I created in the previous blog post. I used the following manifests. Most of them are the same as before, except for the persistent volume claim and the volume mount in the Apache deployment.
    +For full high-availability, you'd deploy Loki in microservices mode with separate read, write, and backend components, backed by object storage like S3 or MinIO instead of local filesystem storage. That's a more complex setup that I might explore in a future blog post—but for now, the single-binary mode with NFS-backed persistence strikes the right balance between simplicity and durability.
    +
    +

    Configuring Alloy


    +
    +Alloy is configured via alloy-values.yaml to discover all pods in the cluster and forward their logs to Loki:

    -
    > ~ cat <<END > apache-deployment.yaml
    -# Apache HTTP Server Deployment
    -apiVersion: apps/v1
    -kind: Deployment
    -metadata:
    -  name: apache-deployment
    -  namespace: test
    -spec:
    -  replicas: 2
    -  selector:
    -    matchLabels:
    -      app: apache
    -  template:
    -    metadata:
    -      labels:
    -        app: apache
    -    spec:
    -      containers:
    -      - name: apache
    -        image: httpd:latest
    -        ports:
    -        # Container port where Apache listens
    -        - containerPort: 80
    -        readinessProbe:
    -          httpGet:
    -            path: /
    -            port: 80
    -          initialDelaySeconds: 5
    -          periodSeconds: 10
    -        livenessProbe:
    -          httpGet:
    -            path: /
    -            port: 80
    -          initialDelaySeconds: 15
    -          periodSeconds: 10
    -        volumeMounts:
    -        - name: apache-htdocs
    -          mountPath: /usr/local/apache2/htdocs/
    -      volumes:
    -      - name: apache-htdocs
    -        persistentVolumeClaim:
    -          claimName: example-apache-pvc
    -END
    +
    discovery.kubernetes "pods" {
    +  role = "pod"
    +}
     
    -> ~ cat <<END > apache-ingress.yaml
    -apiVersion: networking.k8s.io/v1
    -kind: Ingress
    -metadata:
    -  name: apache-ingress
    -  namespace: test
    -  annotations:
    -    spec.ingressClassName: traefik
    -    traefik.ingress.kubernetes.io/router.entrypoints: web
    -spec:
    -  rules:
    -    - host: f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -    - host: standby.f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -    - host: www.f3s.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: apache-service
    -                port:
    -                  number: 80
    -END
    +discovery.relabel "pods" {
    +  targets = discovery.kubernetes.pods.targets
    +
    +  rule {
    +    source_labels = ["__meta_kubernetes_namespace"]
    +    target_label  = "namespace"
    +  }
     
    -> ~ cat <<END > apache-persistent-volume.yaml
    -apiVersion: v1
    -kind: PersistentVolume
    -metadata:
    -  name: example-apache-pv
    -spec:
    -  capacity:
    -    storage: 1Gi
    -  volumeMode: Filesystem
    -  accessModes:
    -    - ReadWriteOnce
    -  persistentVolumeReclaimPolicy: Retain
    -  hostPath:
    -    path: /data/nfs/k3svolumes/example-apache-volume-claim
    -    type: Directory
    ----
    -apiVersion: v1
    -kind: PersistentVolumeClaim
    -metadata:
    -  name: example-apache-pvc
    -  namespace: test
    -spec:
    -  storageClassName: ""
    -  accessModes:
    -    - ReadWriteOnce
    -  resources:
    -    requests:
    -      storage: 1Gi
    -END
    +  rule {
    +    source_labels = ["__meta_kubernetes_pod_name"]
    +    target_label  = "pod"
    +  }
     
    -> ~ cat <<END > apache-service.yaml
    -apiVersion: v1
    -kind: Service
    -metadata:
    -  labels:
    -    app: apache
    -  name: apache-service
    -  namespace: test
    -spec:
    -  ports:
    -    - name: web
    -      port: 80
    -      protocol: TCP
    -      # Expose port 80 on the service
    -      targetPort: 80
    -  selector:
    -  # Link this service to pods with the label app=apache
    -    app: apache
    -END
    +  rule {
    +    source_labels = ["__meta_kubernetes_pod_container_name"]
    +    target_label  = "container"
    +  }
    +
    +  rule {
    +    source_labels = ["__meta_kubernetes_pod_label_app"]
    +    target_label  = "app"
    +  }
    +}
    +
    +loki.source.kubernetes "pods" {
    +  targets    = discovery.relabel.pods.output
    +  forward_to = [loki.write.default.receiver]
    +}
    +
    +loki.write "default" {
    +  endpoint {
    +    url = "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
    +  }
    +}
     

    -I applied the manifests:
    +This configuration automatically labels each log line with the namespace, pod name, container name, and app label, making it easy to filter logs in Grafana.
    +
    +

    Adding Loki as a Grafana data source


    +
    +Loki doesn't have its own web UI—you query it through Grafana. First, verify the Loki service is running:

    -
    > ~ kubectl apply -f apache-persistent-volume.yaml
    -> ~ kubectl apply -f apache-service.yaml
    -> ~ kubectl apply -f apache-deployment.yaml
    -> ~ kubectl apply -f apache-ingress.yaml
    +
    $ kubectl get svc -n monitoring loki
    +NAME   TYPE        CLUSTER-IP    PORT(S)
    +loki   ClusterIP   10.43.64.60   3100/TCP,9095/TCP
     

    -Looking at the deployment, I could see it failed because the directory didn't exist yet on the NFS share (note that I also increased the replica count to 2 so if one node goes down there's already a replica running on another node for faster failover):
    +To add Loki as a data source in Grafana:
    +
    +
      +
    • Navigate to Configuration → Data Sources
    • +
    • Click "Add data source"
    • +
    • Select "Loki"
    • +
    • Set the URL to: http://loki.monitoring.svc.cluster.local:3100
    • +
    • Click "Save & Test"
    • +

    +Once configured, you can explore logs in Grafana's "Explore" view. I'll show some example queries in the "Using the observability stack" section below.
    +
    +Exploring logs in Grafana with Loki
    +
    +

    The complete monitoring stack


    +
    +After deploying everything, here's what's running in the monitoring namespace:

    -
    > ~ kubectl get pods
    -NAME                                 READY   STATUS              RESTARTS   AGE
    -apache-deployment-5b96bd6b6b-fv2jx   0/1     ContainerCreating   0          9m15s
    -apache-deployment-5b96bd6b6b-ax2ji   0/1     ContainerCreating   0          9m15s
    -
    -> ~ kubectl describe pod apache-deployment-5b96bd6b6b-fv2jx | tail -n 5
    -Events:
    -  Type     Reason       Age                   From               Message
    -  ----     ------       ----                  ----               -------
    -  Normal   Scheduled    9m34s                 default-scheduler  Successfully
    -    assigned test/apache-deployment-5b96bd6b6b-fv2jx to r2.lan.buetow.org
    -  Warning  FailedMount  80s (x12 over 9m34s)  kubelet            MountVolume.SetUp
    -    failed for volume "example-apache-pv" : hostPath type check failed:
    -    /data/nfs/k3svolumes/example-apache is not a directory
    +
    $ kubectl get pods -n monitoring
    +NAME                                                     READY   STATUS    RESTARTS   AGE
    +alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          42d
    +alloy-g5fgj                                              2/2     Running   0          29m
    +alloy-nfw8w                                              2/2     Running   0          29m
    +alloy-tg9vj                                              2/2     Running   0          29m
    +loki-0                                                   2/2     Running   0          25m
    +prometheus-grafana-868f9dc7cf-lg2vl                      3/3     Running   0          42d
    +prometheus-kube-prometheus-operator-8d7bbc48c-p4sf4      1/1     Running   0          42d
    +prometheus-kube-state-metrics-7c5fb9d798-hh2fx           1/1     Running   0          42d
    +prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          42d
    +prometheus-prometheus-node-exporter-2nsg9                1/1     Running   0          42d
    +prometheus-prometheus-node-exporter-mqr25                1/1     Running   0          42d
    +prometheus-prometheus-node-exporter-wp4ds                1/1     Running   0          42d
    +tempo-0                                                  1/1     Running   0          1d
     

    -That's intentional—I needed to create the directory on the NFS share first, so I did that (e.g. on r0):
    +Note: Tempo (tempo-0) is deployed later in this post in the "Distributed Tracing with Grafana Tempo" section. It is included in the pod listing here for completeness.
    +
    +And the services:

    -
    [root@r0 ~]# mkdir /data/nfs/k3svolumes/example-apache-volume-claim/
    +
    $ kubectl get svc -n monitoring
    +NAME                                      TYPE        CLUSTER-IP      PORT(S)
    +alertmanager-operated                     ClusterIP   None            9093/TCP,9094/TCP
    +alloy                                     ClusterIP   10.43.74.14     12345/TCP
    +loki                                      ClusterIP   10.43.64.60     3100/TCP,9095/TCP
    +loki-headless                             ClusterIP   None            3100/TCP
    +prometheus-grafana                        ClusterIP   10.43.46.82     80/TCP
    +prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.208.43    9093/TCP,8080/TCP
    +prometheus-kube-prometheus-operator       ClusterIP   10.43.246.121   443/TCP
    +prometheus-kube-prometheus-prometheus     ClusterIP   10.43.152.163   9090/TCP,8080/TCP
    +prometheus-kube-state-metrics             ClusterIP   10.43.64.26     8080/TCP
    +prometheus-prometheus-node-exporter       ClusterIP   10.43.127.242   9100/TCP
    +tempo                                     ClusterIP   10.43.91.44     3200/TCP,4317/TCP,4318/TCP
    +
    +
    +Let me break down what each pod does:
    +
    +
      +
    • alertmanager-prometheus-kube-prometheus-alertmanager-0: the Alertmanager instance that receives alerts from Prometheus, deduplicates them, groups related alerts together, and routes notifications to the appropriate receivers (email, Slack, PagerDuty, etc.). It runs as a StatefulSet with persistent storage for silences and notification state.
    • +

    +
      +
    • alloy-g5fgj, alloy-nfw8w, alloy-tg9vj: three Alloy pods running as a DaemonSet, one on each k3s node. Each pod tails the container logs from its local node via the Kubernetes API and forwards them to Loki. This ensures log collection continues even if a node becomes isolated from the others.
    • +

    +
      +
    • loki-0: the single Loki instance running in single-binary mode. It receives log streams from Alloy, stores them in chunks on the NFS-backed persistent volume, and serves queries from Grafana. The -0 suffix indicates it's a StatefulSet pod.
    • +

    +
      +
    • prometheus-grafana-...: the Grafana web interface for visualising metrics and logs. It comes pre-configured with Prometheus as a data source and includes dozens of dashboards for Kubernetes monitoring. Dashboards, users, and settings are persisted to the NFS share.
    • +

    +
      +
    • prometheus-kube-prometheus-operator-...: the Prometheus Operator that watches for custom resources (ServiceMonitor, PodMonitor, PrometheusRule) and automatically configures Prometheus to scrape new targets. This allows applications to declare their own monitoring requirements.
    • +

    +
      +
    • prometheus-kube-state-metrics-...: generates metrics about the state of Kubernetes objects themselves: how many pods are running, pending, or failed; deployment replica counts; node conditions; PVC status; and more. Essential for cluster-level dashboards.
    • +

    +
      +
    • prometheus-prometheus-kube-prometheus-prometheus-0: the Prometheus server that scrapes metrics from all configured targets (pods, services, nodes), stores them in a time-series database, evaluates alerting rules, and serves queries to Grafana.
    • +

    +
      +
    • prometheus-prometheus-node-exporter-...: three Node Exporter pods running as a DaemonSet, one on each node. They expose hardware and OS-level metrics: CPU usage, memory, disk I/O, filesystem usage, network statistics, and more. These feed the "Node Exporter" dashboards in Grafana.
    • +

    +
      +
    • tempo-0: the Grafana Tempo instance for distributed tracing. It receives trace data from Alloy via OTLP (OpenTelemetry Protocol), stores traces on the NFS-backed persistent volume, and serves queries to Grafana. Tempo is covered in detail in the "Distributed Tracing with Grafana Tempo" section later in this post.
    • +

    +

    Using the observability stack


    +
    +

    Viewing metrics in Grafana


    +
    +The kube-prometheus-stack comes with many pre-built dashboards. Some useful ones include:
    +
    +
      +
    • Kubernetes / Compute Resources / Cluster: overview of CPU and memory usage across the cluster
    • +
    • Kubernetes / Compute Resources / Namespace (Pods): resource usage by namespace
    • +
    • Node Exporter / Nodes: detailed host metrics like disk I/O, network, and CPU
    • +

    +

    Querying logs with LogQL


    +
    +In Grafana's Explore view, select Loki as the data source and try queries like:
    +
    +
    +# All logs from the services namespace
    +{namespace="services"}
     
    -[root@r0 ~]# cat <<END > /data/nfs/k3svolumes/example-apache-volume-claim/index.html
    -<!DOCTYPE html>
    -<html>
    -<head>
    -  <title>Hello, it works</title>
    -</head>
    -<body>
    -  <h1>Hello, it works!</h1>
    -  <p>This site is served via a PVC!</p>
    -</body>
    -</html>
    -END
    +# Logs from pods matching a pattern
    +{pod=~"miniflux.*"}
    +
    +# Filter by log content
    +{namespace="services"} |= "error"
    +
    +# Parse JSON logs and filter
    +{namespace="services"} | json | level="error"
     

    -The index.html file gives us some actual content to serve. After deleting the pod, it recreates itself and the volume mounts correctly:
    +

    Creating alerts


    +
    +Prometheus supports alerting rules that can notify you when something goes wrong. The kube-prometheus-stack includes many default alerts for common issues like high CPU usage, pod crashes, and node problems. These can be customised via PrometheusRule CRDs.
    +
    +

    Monitoring external FreeBSD hosts


    +
    +The observability stack can also monitor servers outside the Kubernetes cluster. The FreeBSD hosts (f0, f1, f2) that serve NFS storage can be added to Prometheus using the Node Exporter.
    +
    +

    Installing Node Exporter on FreeBSD


    +
    +On each FreeBSD host, install the node_exporter package:

    -
    > ~ kubectl delete pod apache-deployment-5b96bd6b6b-fv2jx
    -
    -> ~ curl -H "Host: www.f3s.foo.zone" http://r0.lan.buetow.org:80
    -<!DOCTYPE html>
    -<html>
    -<head>
    -  <title>Hello, it works</title>
    -</head>
    -<body>
    -  <h1>Hello, it works!</h1>
    -  <p>This site is served via a PVC!</p>
    -</body>
    -</html>
    +
    paul@f0:~ % doas pkg install -y node_exporter
    +
    +
    +Enable the service to start at boot:
    +
    + +
    paul@f0:~ % doas sysrc node_exporter_enable=YES
    +node_exporter_enable:  -> YES
     

    -

    Scaling Traefik for faster failover


    +Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:
    +
    + +
    paul@f0:~ % doas sysrc node_exporter_args='--web.listen-address=192.168.2.130:9100'
    +node_exporter_args:  -> --web.listen-address=192.168.2.130:9100
    +

    -Traefik (used for ingress on k3s) ships with a single replica by default, but for faster failover I bumped it to two replicas so each worker node runs one pod. That way, if a node disappears, the service stays up while Kubernetes schedules a replacement. Here's the command I used:
    +Start the service:

    -
    > ~ kubectl -n kube-system scale deployment traefik --replicas=2
    +
    paul@f0:~ % doas service node_exporter start
    +Starting node_exporter.
     

    -And the result:
    +Verify it's running:

    -
    > ~ kubectl -n kube-system get pods -l app.kubernetes.io/name=traefik
    -kube-system   traefik-c98fdf6fb-97kqk   1/1   Running   19 (53d ago)   64d
    -kube-system   traefik-c98fdf6fb-9npg2   1/1   Running   11 (53d ago)   61d
    +
    paul@f0:~ % curl -s http://192.168.2.130:9100/metrics | head -3
    +# HELP go_gc_duration_seconds A summary of the wall-time pause...
    +# TYPE go_gc_duration_seconds summary
    +go_gc_duration_seconds{quantile="0"} 0
     

    -

    Make it accessible from the public internet


    +Repeat for the other FreeBSD hosts (f1, f2) with their respective WireGuard IPs.

    -Next, I made this accessible through the public internet via the www.f3s.foo.zone hosts. As a reminder from part 1 of this series, I reviewed the section titled "OpenBSD/relayd to the rescue for external connectivity":
    +

    Adding FreeBSD hosts to Prometheus



    -f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +Create a file additional-scrape-configs.yaml in the prometheus configuration directory:

    -All apps should be reachable through the internet (e.g., from my phone or computer when travelling). For external connectivity and TLS management, I've got two OpenBSD VMs (one hosted by OpenBSD Amsterdam and another hosted by Hetzner) handling public-facing services like DNS, relaying traffic, and automating Let's Encrypt certificates.
    +
    +- job_name: 'node-exporter'
    +  static_configs:
    +    - targets:
    +      - '192.168.2.130:9100'  # f0 via WireGuard
    +      - '192.168.2.131:9100'  # f1 via WireGuard
    +      - '192.168.2.132:9100'  # f2 via WireGuard
    +      labels:
    +        os: freebsd
    +

    -All of this (every Linux VM to every OpenBSD box) will be connected via WireGuard tunnels, keeping everything private and secure. There will be 6 WireGuard tunnels (3 k3s nodes times two OpenBSD VMs).
    +The job_name must be node-exporter to match the existing dashboards. The os: freebsd label allows filtering these hosts separately if needed.

    -So, when I want to access a service running in k3s, I will hit an external DNS endpoint (with the authoritative DNS servers being the OpenBSD boxes). The DNS will resolve to the master OpenBSD VM (see my KISS highly-available with OpenBSD blog post), and from there, the relayd process (with a Let's Encrypt certificate—see my Let's Encrypt with OpenBSD and Rex blog post) will accept the TCP connection and forward it through the WireGuard tunnel to a reachable node port of one of the k3s nodes, thus serving the traffic.
    +Create a Kubernetes secret from this file:

    -
    > ~ curl https://f3s.foo.zone
    -<html><body><h1>It works!</h1></body></html>
    -
    -> ~ curl https://www.f3s.foo.zone
    -<html><body><h1>It works!</h1></body></html>
    -
    -> ~ curl https://standby.f3s.foo.zone
    -<html><body><h1>It works!</h1></body></html>
    +
    $ kubectl create secret generic additional-scrape-configs \
    +    --from-file=additional-scrape-configs.yaml \
    +    -n monitoring
     

    -This is how it works in relayd.conf on OpenBSD:
    +Update persistence-values.yaml to reference the secret:

    -

    OpenBSD relayd configuration


    +
    +prometheus:
    +  prometheusSpec:
    +    additionalScrapeConfigsSecret:
    +      enabled: true
    +      name: additional-scrape-configs
    +      key: additional-scrape-configs.yaml
    +

    -The OpenBSD edge relays keep the Kubernetes-facing addresses for the f3s ingress endpoints in a shared backend table so TLS traffic for every f3s hostname lands on the same pool of k3s nodes (pointing to the WireGuard IP addresses of those nodes - remember, they are running locally in my LAN, wheras the OpenBSD edge relays operate in the public internet):
    +Upgrade the Prometheus deployment:
    +
    + +
    $ just upgrade
    +
    +
    +After a minute or so, the FreeBSD hosts appear in the Prometheus targets and in the Node Exporter dashboards in Grafana.
    +
    +FreeBSD hosts in the Node Exporter dashboard
    +
    +

    FreeBSD memory metrics compatibility


    +
    +The default Node Exporter dashboards are designed for Linux and expect metrics like node_memory_MemAvailable_bytes. FreeBSD uses different metric names (node_memory_size_bytes, node_memory_free_bytes, etc.), so memory panels will show "No data" out of the box.
    +
    +To fix this, I created a PrometheusRule that generates synthetic Linux-compatible metrics from the FreeBSD equivalents:

    -table <f3s> {
    -  192.168.2.120
    -  192.168.2.121
    -  192.168.2.122
    -}
    +apiVersion: monitoring.coreos.com/v1
    +kind: PrometheusRule
    +metadata:
    +  name: freebsd-memory-rules
    +  namespace: monitoring
    +  labels:
    +    release: prometheus
    +spec:
    +  groups:
    +    - name: freebsd-memory
    +      rules:
    +        - record: node_memory_MemTotal_bytes
    +          expr: node_memory_size_bytes{os="freebsd"}
    +        - record: node_memory_MemAvailable_bytes
    +          expr: |
    +            node_memory_free_bytes{os="freebsd"}
    +              + node_memory_inactive_bytes{os="freebsd"}
    +              + node_memory_cache_bytes{os="freebsd"}
    +        - record: node_memory_MemFree_bytes
    +          expr: node_memory_free_bytes{os="freebsd"}
    +        - record: node_memory_Buffers_bytes
    +          expr: node_memory_buffer_bytes{os="freebsd"}
    +        - record: node_memory_Cached_bytes
    +          expr: node_memory_cache_bytes{os="freebsd"}
     

    -Inside the http protocol "https" block each public hostname gets its Let's Encrypt certificate. The protocol configures TLS keypairs for all f3s services and other public endpoints. For f3s hosts specifically, there are no explicit forward to rules in the protocol—they use the relay-level failover mechanism described later. Non-f3s hosts get explicit localhost routing to prevent them from trying the f3s backends:
    +This file is saved as freebsd-recording-rules.yaml and applied as part of the Prometheus installation. The os="freebsd" label (set in the scrape config) ensures these rules only apply to FreeBSD hosts. After applying, the memory panels in the Node Exporter dashboards populate correctly for FreeBSD.
    +
    +freebsd-recording-rules.yaml on Codeberg
    +
    +

    Disk I/O metrics limitation


    +
    +Unlike memory metrics, disk I/O metrics (node_disk_read_bytes_total, node_disk_written_bytes_total, etc.) are not available on FreeBSD. The Linux diskstats collector that provides these metrics doesn't have a FreeBSD equivalent in the node_exporter.
    +
    +The disk I/O panels in the Node Exporter dashboards will show "No data" for FreeBSD hosts. FreeBSD does expose ZFS-specific metrics (node_zfs_arcstats_*) for ARC cache performance, and per-dataset I/O stats are available via sysctl kstat.zfs, but mapping these to the Linux-style metrics the dashboards expect is non-trivial. To address this, I created custom ZFS-specific dashboards, covered in the next section.
    +
    +

    ZFS Monitoring for FreeBSD Servers


    +
    +The FreeBSD servers (f0, f1, f2) that provide NFS storage to the k3s cluster have ZFS filesystems. Monitoring ZFS performance is crucial for understanding storage performance and cache efficiency.
    +
    +

    Node Exporter ZFS Collector


    +
    +The node_exporter running on each FreeBSD server (v1.9.1) includes a built-in ZFS collector that exposes metrics via sysctls. The ZFS collector is enabled by default and provides:
    +
    +
      +
    • ARC (Adaptive Replacement Cache) statistics
    • +
    • Cache hit/miss rates
    • +
    • Memory usage and allocation
    • +
    • MRU/MFU cache breakdown
    • +
    • Data vs metadata distribution
    • +

    +

    Verifying ZFS Metrics


    +
    +On any FreeBSD server, check that ZFS metrics are being exposed:

    -http protocol "https" {
    -    # TLS certificates for all f3s services
    -    tls keypair f3s.foo.zone
    -    tls keypair www.f3s.foo.zone
    -    tls keypair standby.f3s.foo.zone
    -    tls keypair anki.f3s.foo.zone
    -    tls keypair www.anki.f3s.foo.zone
    -    tls keypair standby.anki.f3s.foo.zone
    -    tls keypair bag.f3s.foo.zone
    -    tls keypair www.bag.f3s.foo.zone
    -    tls keypair standby.bag.f3s.foo.zone
    -    tls keypair flux.f3s.foo.zone
    -    tls keypair www.flux.f3s.foo.zone
    -    tls keypair standby.flux.f3s.foo.zone
    -    tls keypair audiobookshelf.f3s.foo.zone
    -    tls keypair www.audiobookshelf.f3s.foo.zone
    -    tls keypair standby.audiobookshelf.f3s.foo.zone
    -    tls keypair gpodder.f3s.foo.zone
    -    tls keypair www.gpodder.f3s.foo.zone
    -    tls keypair standby.gpodder.f3s.foo.zone
    -    tls keypair radicale.f3s.foo.zone
    -    tls keypair www.radicale.f3s.foo.zone
    -    tls keypair standby.radicale.f3s.foo.zone
    -    tls keypair vault.f3s.foo.zone
    -    tls keypair www.vault.f3s.foo.zone
    -    tls keypair standby.vault.f3s.foo.zone
    -    tls keypair syncthing.f3s.foo.zone
    -    tls keypair www.syncthing.f3s.foo.zone
    -    tls keypair standby.syncthing.f3s.foo.zone
    -    tls keypair uprecords.f3s.foo.zone
    -    tls keypair www.uprecords.f3s.foo.zone
    -    tls keypair standby.uprecords.f3s.foo.zone
    -
    -    # Explicitly route non-f3s hosts to localhost
    -    match request header "Host" value "foo.zone" forward to <localhost>
    -    match request header "Host" value "www.foo.zone" forward to <localhost>
    -    match request header "Host" value "dtail.dev" forward to <localhost>
    -    # ... other non-f3s hosts ...
    -
    -    # NOTE: f3s hosts have NO match rules here!
    -    # They use relay-level failover (f3s -> localhost backup)
    -    # See the relay configuration below for automatic failover details
    -}
    +paul@f0:~ % curl -s http://localhost:9100/metrics | grep node_zfs_arcstats | wc -l
    +      69
     

    -Both IPv4 and IPv6 listeners reuse the same protocol definition, making the relay transparent for dual-stack clients while still health checking every k3s backend before forwarding traffic over WireGuard:
    +The metrics are automatically scraped by Prometheus through the existing static configuration in additional-scrape-configs.yaml which targets all FreeBSD servers on port 9100 with the os: freebsd label.
    +
    +

    ZFS Recording Rules


    +
    +Created recording rules for easier dashboard consumption in zfs-recording-rules.yaml:

    -relay "https4" {
    -    listen on 46.23.94.99 port 443 tls
    -    protocol "https"
    -    # Primary: f3s cluster (with health checks) - Falls back to localhost when all hosts down
    -    forward to <f3s> port 80 check tcp
    -    forward to <localhost> port 8080
    -}
    -
    -relay "https6" {
    -    listen on 2a03:6000:6f67:624::99 port 443 tls
    -    protocol "https"
    -    # Primary: f3s cluster (with health checks) - Falls back to localhost when all hosts down
    -    forward to <f3s> port 80 check tcp
    -    forward to <localhost> port 8080
    -}
    +apiVersion: monitoring.coreos.com/v1
    +kind: PrometheusRule
    +metadata:
    +  name: freebsd-zfs-rules
    +  namespace: monitoring
    +  labels:
    +    release: prometheus
    +spec:
    +  groups:
    +    - name: freebsd-zfs-arc
    +      interval: 30s
    +      rules:
    +        - record: node_zfs_arc_hit_rate_percent
    +          expr: |
    +            100 * (
    +              rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) /
    +              (rate(node_zfs_arcstats_hits_total{os="freebsd"}[5m]) +
    +               rate(node_zfs_arcstats_misses_total{os="freebsd"}[5m]))
    +            )
    +          labels:
    +            os: freebsd
    +        - record: node_zfs_arc_memory_usage_percent
    +          expr: |
    +            100 * (
    +              node_zfs_arcstats_size_bytes{os="freebsd"} /
    +              node_zfs_arcstats_c_max_bytes{os="freebsd"}
    +            )
    +          labels:
    +            os: freebsd
    +        # Additional rules for metadata %, target %, MRU/MFU %, etc.
     

    -In practice, that means relayd terminates TLS with the correct certificate, keeps the three WireGuard-connected backends in rotation, and ships each request to whichever bhyve VM answers first.
    +These recording rules calculate:

    -

    Automatic failover when f3s cluster is down


    +
      +
    • ARC hit rate percentage
    • +
    • ARC memory usage percentage (current vs maximum)
    • +
    • ARC target percentage (target vs maximum)
    • +
    • Metadata vs data percentages
    • +
    • MRU vs MFU cache percentages
    • +
    • Demand data and metadata hit rates
    • +

    +

    Grafana Dashboards


    +
    +Created two comprehensive ZFS monitoring dashboards (zfs-dashboards.yaml):
    +
    +**Dashboard 1: FreeBSD ZFS (per-host detailed view)**
    +
    +Includes variables to select:

    -Update: This section was added at Tue 30 Dec 10:11:44 EET 2025
    +
      +
    • FreeBSD server (f0, f1, or f2)
    • +
    • ZFS pool (zdata, zroot, or all)
    • +

    +Pool Overview Row:

    -One important aspect of this setup is graceful degradation: when all three f3s nodes are unreachable (e.g., during maintenance or a power outage in my LAN), users should see a friendly status page instead of an error message.
    +
      +
    • Pool Capacity gauge (with thresholds: green <70%, yellow <85%, red >85%)
    • +
    • Pool Health status (ONLINE/DEGRADED/FAULTED with color coding)
    • +
    • Total Pool Size stat
    • +
    • Free Space stat
    • +
    • Pool Space Usage Over Time (stacked: used + free)
    • +
    • Pool Capacity Trend time series
    • +

    +Dataset Statistics Row:

    -OpenBSD's relayd supports automatic failover through its health check mechanism. According to the relayd.conf manual:
    +
      +
    • Table showing all datasets with columns: Pool, Dataset, Used, Available, Referenced
    • +
    • Automatically filters by selected pool
    • +

    +ARC Cache Statistics Row:

    -This directive can be specified multiple times - subsequent entries will be used as the backup table if all hosts in the previous table are down.
    +
      +
    • ARC Hit Rate gauge (red <70%, yellow <90%, green >=90%)
    • +
    • ARC Size time series (current, target, max)
    • +
    • ARC Memory Usage percentage gauge
    • +
    • ARC Hits vs Misses rate
    • +
    • ARC Data vs Metadata stacked time series
    • +

    +**Dashboard 2: FreeBSD ZFS Summary (cluster-wide overview)**

    -The key is the order of forward to statements in the relay configuration. By placing the f3s table first with check tcp health checks, followed by localhost as a backup, relayd automatically routes traffic based on backend availability:
    +Cluster-Wide Pool Statistics Row:

    -When f3s cluster is UP:
    +
      +
    • Total Storage Capacity across all servers
    • +
    • Total Used space
    • +
    • Total Free space
    • +
    • Average Pool Capacity gauge
    • +
    • Pool Health Status (worst case across cluster)
    • +
    • Total Pool Space Usage Over Time
    • +
    • Per-Pool Capacity time series (all pools on all hosts)
    • +

    +Per-Host Pool Breakdown Row:

      -
    • Health checks on port 80 succeed for f3s nodes
    • -
    • All f3s traffic routes to the Kubernetes cluster
    • -
    • Localhost backup remains idle
    • +
    • Bar gauge showing capacity by host and pool
    • +
    • Table with all pools: Host, Pool, Size, Used, Free, Capacity %, Health

    -When f3s cluster is DOWN:
    +Cluster-Wide ARC Statistics Row:

      -
    • All health checks fail (nodes unreachable)
    • -
    • The <f3s> table becomes unavailable
    • -
    • Traffic automatically falls back to <localhost> on port 8080
    • -
    • OpenBSD's httpd serves a static fallback page
    • +
    • Average ARC Hit Rate gauge across all hosts
    • +
    • ARC Hit Rate by Host time series
    • +
    • Total ARC Size Across Cluster
    • +
    • Total ARC Hits vs Misses (cluster-wide sum)
    • +
    • ARC Size by Host

    +Dashboard Visualization:
    +
    +ZFS monitoring dashboard in Grafana showing pool capacity, health, and I/O throughput
    +ZFS ARC cache statistics showing hit rate, memory usage, and size trends
    +ZFS datasets table and ARC data vs metadata breakdown
    +
    +

    Deployment


    +
    +Applied the resources to the cluster:
    +
    -# NEW configuration - supports automatic failover
    -http protocol "https" {
    -    # Explicitly route non-f3s hosts to localhost
    -    match request header "Host" value "foo.zone" forward to <localhost>
    -    match request header "Host" value "dtail.dev" forward to <localhost>
    -    # ... other non-f3s hosts ...
    -
    -    # f3s hosts have NO protocol rules - they use relay-level failover
    -    # (no match rules for f3s.foo.zone, anki.f3s.foo.zone, etc.)
    -}
    -
    -relay "https4" {
    -    # f3s FIRST (with health checks), localhost as BACKUP
    -    forward to <f3s> port 80 check tcp
    -    forward to <localhost> port 8080
    -}
    +cd /home/paul/git/conf/f3s/prometheus
    +kubectl apply -f zfs-recording-rules.yaml
    +kubectl apply -f zfs-dashboards.yaml
     

    -This way, f3s traffic uses the relay's default behavior: try the first table, fall back to the second when health checks fail.
    +Updated Justfile to include ZFS recording rules in install and upgrade targets:

    -

    OpenBSD httpd fallback configuration


    +
    +install:
    +    kubectl apply -f persistent-volumes.yaml
    +    kubectl create secret generic additional-scrape-configs --from-file=additional-scrape-configs.yaml -n monitoring --dry-run=client -o yaml | kubectl apply -f -
    +    helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring -f persistence-values.yaml
    +    kubectl apply -f freebsd-recording-rules.yaml
    +    kubectl apply -f openbsd-recording-rules.yaml
    +    kubectl apply -f zfs-recording-rules.yaml
    +    just -f grafana-ingress/Justfile install
    +

    -The localhost httpd service on port 8080 serves the fallback content from /var/www/htdocs/f3s_fallback/. This directory contains a simple HTML page explaining the situation.
    +

    Verifying ZFS Metrics in Prometheus



    -The key configuration detail is using request rewrite to ensure the fallback page is served for ALL paths, not just the root. Without this, accessing paths like /login?redirect=/files/ would return 404 instead of the fallback page:
    +Check that ZFS metrics are being collected:

    -# OpenBSD httpd.conf
    -# Fallback for f3s hosts - serve fallback page for ALL paths
    -server "f3s.foo.zone" {
    -  listen on * port 8080
    -  log style forwarded
    -  location * {
    -    # Rewrite all requests to /index.html to show fallback page regardless of path
    -    request rewrite "/index.html"
    -    root "/htdocs/f3s_fallback"
    -  }
    -}
    -
    -server "anki.f3s.foo.zone" {
    -  listen on * port 8080
    -  log style forwarded
    -  location * {
    -    request rewrite "/index.html"
    -    root "/htdocs/f3s_fallback"
    -  }
    -}
    -
    -# ... similar blocks for all f3s hostnames ...
    +kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
    +  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arcstats_size_bytes'
     

    -The request rewrite "/index.html" directive ensures that whether someone accesses /, /login, /api/status, or any other path, they all receive the same fallback page. This prevents confusing 404 errors when users have bookmarked specific URLs or follow deep links while the cluster is down.
    +Check recording rules are calculating correctly:

    -The fallback page itself is straightforward:
    +
    +kubectl exec -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0 -c prometheus -- \
    +  wget -qO- 'http://localhost:9090/api/v1/query?query=node_zfs_arc_memory_usage_percent'
    +

    - -
    <!DOCTYPE html>
    -<html>
    -<head>
    -    <title>Server turned off</title>
    -    <style>
    -        body {
    -            font-family: sans-serif;
    -            text-align: center;
    -            padding-top: 50px;
    -        }
    -        .container {
    -            max-width: 600px;
    -            margin: 0 auto;
    -        }
    -    </style>
    -</head>
    -<body>
    -    <div class="container">
    -        <h1>Server turned off</h1>
    -        <p>The servers are all currently turned off.</p>
    -        <p>Please try again later.</p>
    -        <p>Or email <a href="mailto:paul@nospam.buetow.org">paul@nospam.buetow.org</a>
    -           - so I can turn them back on for you!</p>
    -    </div>
    -</body>
    -</html>
    +Example output shows memory usage percentage for each FreeBSD server:
    +
    +
    +"result":[
    +  {"metric":{"instance":"192.168.2.130:9100","os":"freebsd"},"value":[...,"37.58"]},
    +  {"metric":{"instance":"192.168.2.131:9100","os":"freebsd"},"value":[...,"12.85"]},
    +  {"metric":{"instance":"192.168.2.132:9100","os":"freebsd"},"value":[...,"13.44"]}
    +]
     

    -This approach provides several benefits:
    +

    Key Metrics to Monitor



      -
    • Automatic detection: Health checks run continuously; no manual intervention needed
    • -
    • Instant fallback: When all f3s nodes go down, the next request automatically routes to localhost
    • -
    • Transparent recovery: When f3s comes back online, health checks pass and traffic resumes automatically
    • -
    • User experience: Visitors see a helpful message instead of connection errors
    • -
    • No DNS changes: The same hostnames work whether f3s is up or down
    • +
    • ARC Hit Rate: Should typically be above 90% for optimal performance. Lower hit rates indicate the ARC cache is too small or workload has poor locality.
    • +
    • ARC Memory Usage: Shows how much of the maximum ARC size is being used. If consistently at or near maximum, the ARC is effectively utilizing available memory.
    • +
    • Data vs Metadata: Typically data should dominate, but workloads with many small files will show higher metadata percentages.
    • +
    • MRU vs MFU: Most Recently Used vs Most Frequently Used cache. The ratio depends on workload characteristics.
    • +
    • Pool Capacity: Monitor pool usage to ensure adequate free space. ZFS performance degrades when pools exceed 80% capacity.
    • +
    • Pool Health: Should always show ONLINE (green). DEGRADED (yellow) indicates a disk issue requiring attention. FAULTED (red) requires immediate action.
    • +
    • Dataset Usage: Track which datasets are consuming the most space to identify growth trends and plan capacity.

    -This fallback mechanism has proven invaluable during maintenance windows and unexpected outages, ensuring that users always get a response even when the home lab is offline.
    +

    ZFS Pool and Dataset Metrics via Textfile Collector



    -

    Exposing services via LAN ingress


    +To complement the ARC statistics from node_exporter's built-in ZFS collector, I added pool capacity and dataset metrics using the textfile collector feature.

    -In addition to external access through the OpenBSD relays, services can also be exposed on the local network using LAN-specific ingresses. This is useful for accessing services from within the home network without going through the internet, reducing latency and providing an alternative path if the external relays are unavailable.
    +Created a script at /usr/local/bin/zfs_pool_metrics.sh on each FreeBSD server:

    -The LAN ingress architecture leverages the existing FreeBSD CARP (Common Address Redundancy Protocol) failover infrastructure that's already in place for NFS-over-TLS (see Part 5). Instead of deploying MetalLB or another LoadBalancer implementation, we reuse the CARP virtual IP (192.168.1.138) by adding HTTP/HTTPS forwarding alongside the existing stunnel service on port 2323.
    +
    +#!/bin/sh
    +# ZFS Pool and Dataset Metrics Collector for Prometheus
    +
    +OUTPUT_FILE="/var/tmp/node_exporter/zfs_pools.prom.$$"
    +FINAL_FILE="/var/tmp/node_exporter/zfs_pools.prom"
    +
    +mkdir -p /var/tmp/node_exporter
    +
    +{
    +    # Pool metrics
    +    echo "# HELP zfs_pool_size_bytes Total size of ZFS pool"
    +    echo "# TYPE zfs_pool_size_bytes gauge"
    +    echo "# HELP zfs_pool_allocated_bytes Allocated space in ZFS pool"
    +    echo "# TYPE zfs_pool_allocated_bytes gauge"
    +    echo "# HELP zfs_pool_free_bytes Free space in ZFS pool"
    +    echo "# TYPE zfs_pool_free_bytes gauge"
    +    echo "# HELP zfs_pool_capacity_percent Capacity percentage"
    +    echo "# TYPE zfs_pool_capacity_percent gauge"
    +    echo "# HELP zfs_pool_health Pool health (0=ONLINE, 1=DEGRADED, 2=FAULTED)"
    +    echo "# TYPE zfs_pool_health gauge"
    +
    +    zpool list -Hp -o name,size,allocated,free,capacity,health | \
    +    while IFS=$'\t' read name size alloc free cap health; do
    +        case "$health" in
    +            ONLINE)   health_val=0 ;;
    +            DEGRADED) health_val=1 ;;
    +            FAULTED)  health_val=2 ;;
    +            *)        health_val=6 ;;
    +        esac
    +        cap_num=$(echo "$cap" | sed 's/%//')
    +
    +        echo "zfs_pool_size_bytes{pool=\"$name\"} $size"
    +        echo "zfs_pool_allocated_bytes{pool=\"$name\"} $alloc"
    +        echo "zfs_pool_free_bytes{pool=\"$name\"} $free"
    +        echo "zfs_pool_capacity_percent{pool=\"$name\"} $cap_num"
    +        echo "zfs_pool_health{pool=\"$name\"} $health_val"
    +    done
    +
    +    # Dataset metrics
    +    echo "# HELP zfs_dataset_used_bytes Used space in dataset"
    +    echo "# TYPE zfs_dataset_used_bytes gauge"
    +    echo "# HELP zfs_dataset_available_bytes Available space"
    +    echo "# TYPE zfs_dataset_available_bytes gauge"
    +    echo "# HELP zfs_dataset_referenced_bytes Referenced space"
    +    echo "# TYPE zfs_dataset_referenced_bytes gauge"
    +
    +    zfs list -Hp -t filesystem -o name,used,available,referenced | \
    +    while IFS=$'\t' read name used avail ref; do
    +        pool=$(echo "$name" | cut -d/ -f1)
    +        echo "zfs_dataset_used_bytes{pool=\"$pool\",dataset=\"$name\"} $used"
    +        echo "zfs_dataset_available_bytes{pool=\"$pool\",dataset=\"$name\"} $avail"
    +        echo "zfs_dataset_referenced_bytes{pool=\"$pool\",dataset=\"$name\"} $ref"
    +    done
    +} > "$OUTPUT_FILE"
    +
    +mv "$OUTPUT_FILE" "$FINAL_FILE"
    +
    +
    +Deployed to all FreeBSD servers:

    -

    Architecture overview


    +
    +for host in f0 f1 f2; do
    +    scp /tmp/zfs_pool_metrics.sh paul@$host:/tmp/
    +    ssh paul@$host 'doas mv /tmp/zfs_pool_metrics.sh /usr/local/bin/ && \
    +                    doas chmod +x /usr/local/bin/zfs_pool_metrics.sh'
    +done
    +

    -The LAN access path differs from external access:
    +Set up cron jobs to run every minute:

    -**External access (*.f3s.foo.zone):**
    -Internet → OpenBSD relayd (TLS termination, Let's Encrypt)
    -        → WireGuard tunnel
    -        → k3s Traefik :80 (HTTP)
    -        → Service
    +for host in f0 f1 f2; do
    +    ssh paul@$host 'echo "* * * * * /usr/local/bin/zfs_pool_metrics.sh >/dev/null 2>&1" | \
    +                    doas crontab -'
    +done
     

    -**LAN access (*.f3s.lan.foo.zone):**
    +The textfile collector (already configured with --collector.textfile.directory=/var/tmp/node_exporter) automatically picks up the metrics.
    +
    +Verify metrics are being exposed:
    +
    -LAN → FreeBSD CARP VIP (192.168.1.138)
    -    → FreeBSD relayd (TCP forwarding)
    -    → k3s Traefik :443 (TLS termination, cert-manager)
    -    → Service
    +paul@f0:~ % curl -s http://localhost:9100/metrics | grep "^zfs_pool" | head -5
    +zfs_pool_allocated_bytes{pool="zdata"} 6.47622733824e+11
    +zfs_pool_allocated_bytes{pool="zroot"} 5.3338578944e+10
    +zfs_pool_capacity_percent{pool="zdata"} 64
    +zfs_pool_capacity_percent{pool="zroot"} 10
    +zfs_pool_free_bytes{pool="zdata"} 3.48809678848e+11
     

    -The key architectural decisions:
    +All ZFS-related configuration files are available on Codeberg:

    -
      -
    • FreeBSD relayd performs pure TCP forwarding (Layer 4) for ports 80 and 443, not TLS termination
    • -
    • Traefik inside k3s handles TLS offloading using certificates from cert-manager
    • -
    • Self-signed CA for LAN domains (no external dependencies)
    • -
    • CARP provides automatic failover between f0 and f1
    • -
    • No code changes to applications—just add a LAN ingress resource
    • -

    -

    Installing cert-manager


    +zfs-recording-rules.yaml on Codeberg
    +zfs-dashboards.yaml on Codeberg

    -First, install cert-manager to handle certificate lifecycle management for LAN services. The installation is automated with a Justfile:
    +

    Monitoring external OpenBSD hosts



    -codeberg.org/snonux/conf/f3s/cert-manager
    +The same approach works for OpenBSD hosts. I have two OpenBSD edge relay servers (blowfish, fishfinger) that handle TLS termination and forward traffic through WireGuard to the cluster. These can also be monitored with Node Exporter.
    +
    +

    Installing Node Exporter on OpenBSD


    +
    +On each OpenBSD host, install the node_exporter package:

    -
    $ cd conf/f3s/cert-manager
    -$ just install
    -kubectl apply -f cert-manager.yaml
    -# ... cert-manager CRDs and resources created ...
    -kubectl apply -f self-signed-issuer.yaml
    -clusterissuer.cert-manager.io/selfsigned-issuer created
    -clusterissuer.cert-manager.io/selfsigned-ca-issuer created
    -kubectl apply -f ca-certificate.yaml
    -certificate.cert-manager.io/selfsigned-ca created
    -kubectl apply -f wildcard-certificate.yaml
    -certificate.cert-manager.io/f3s-lan-wildcard created
    +
    blowfish:~ $ doas pkg_add node_exporter
    +quirks-7.103 signed on 2025-10-13T22:55:16Z
    +The following new rcscripts were installed: /etc/rc.d/node_exporter
    +See rcctl(8) for details.
     

    -This creates:
    -
    -
      -
    • A self-signed ClusterIssuer
    • -
    • A CA certificate (f3s-lan-ca) valid for 10 years
    • -
    • A CA-signed ClusterIssuer
    • -
    • A wildcard certificate (*.f3s.lan.foo.zone) valid for 90 days with automatic renewal
    • -

    -Verify the certificates:
    +Enable the service to start at boot:

    -
    $ kubectl get certificate -n cert-manager
    -NAME               READY   SECRET                 AGE
    -f3s-lan-wildcard   True    f3s-lan-tls            5m
    -selfsigned-ca      True    selfsigned-ca-secret   5m
    +
    blowfish:~ $ doas rcctl enable node_exporter
     

    -The wildcard certificate (f3s-lan-tls) needs to be copied to any namespace that uses it:
    +Configure node_exporter to listen on the WireGuard interface. This ensures metrics are only accessible through the secure tunnel, not the public network. Replace the IP with the host's WireGuard address:

    -
    $ kubectl get secret f3s-lan-tls -n cert-manager -o yaml | \
    -    sed 's/namespace: cert-manager/namespace: services/' | \
    -    kubectl apply -f -
    +
    blowfish:~ $ doas rcctl set node_exporter flags '--web.listen-address=192.168.2.110:9100'
     

    -

    Configuring FreeBSD relayd for LAN access


    -
    -On both FreeBSD hosts (f0, f1), install and configure relayd for TCP forwarding:
    +Start the service:

    -
    paul@f0:~ % doas pkg install -y relayd
    -
    -
    -Create /usr/local/etc/relayd.conf:
    -
    -
    -# k3s nodes backend table
    -table <k3s_nodes> { 192.168.1.120 192.168.1.121 192.168.1.122 }
    -
    -# TCP forwarding to Traefik (no TLS termination)
    -relay "lan_http" {
    -    listen on 192.168.1.138 port 80
    -    forward to <k3s_nodes> port 80 check tcp
    -}
    -
    -relay "lan_https" {
    -    listen on 192.168.1.138 port 443
    -    forward to <k3s_nodes> port 443 check tcp
    -}
    -
    -
    -Note: The IP addresses 192.168.1.120-122 are the LAN IPs of the k3s nodes (r0, r1, r2), not their WireGuard IPs. FreeBSD relayd requires PF (Packet Filter) to be enabled. Create a minimal /etc/pf.conf:
    -
    -
    -# Basic PF rules for relayd
    -set skip on lo0
    -pass in quick
    -pass out quick
    +
    blowfish:~ $ doas rcctl start node_exporter
    +node_exporter(ok)
     

    -Enable PF and relayd:
    +Verify it's running:

    -
    paul@f0:~ % doas sysrc pf_enable=YES pflog_enable=YES relayd_enable=YES
    -paul@f0:~ % doas service pf start
    -paul@f0:~ % doas service pflog start
    -paul@f0:~ % doas service relayd start
    +
    blowfish:~ $ curl -s http://192.168.2.110:9100/metrics | head -3
    +# HELP go_gc_duration_seconds A summary of the wall-time pause...
    +# TYPE go_gc_duration_seconds summary
    +go_gc_duration_seconds{quantile="0"} 0
     

    -Verify relayd is listening on the CARP VIP:
    +Repeat for the other OpenBSD host (fishfinger) with its respective WireGuard IP (192.168.2.111).

    - -
    paul@f0:~ % doas sockstat -4 -l | grep 192.168.1.138
    -_relayd  relayd   2903  11  tcp4   192.168.1.138:80      *:*
    -_relayd  relayd   2903  12  tcp4   192.168.1.138:443     *:*
    +

    Adding OpenBSD hosts to Prometheus


    +
    +Update additional-scrape-configs.yaml to include the OpenBSD targets:
    +
    +
    +- job_name: 'node-exporter'
    +  static_configs:
    +    - targets:
    +      - '192.168.2.130:9100'  # f0 via WireGuard
    +      - '192.168.2.131:9100'  # f1 via WireGuard
    +      - '192.168.2.132:9100'  # f2 via WireGuard
    +      labels:
    +        os: freebsd
    +    - targets:
    +      - '192.168.2.110:9100'  # blowfish via WireGuard
    +      - '192.168.2.111:9100'  # fishfinger via WireGuard
    +      labels:
    +        os: openbsd
     

    -Repeat the same configuration on f1. Both hosts will run relayd listening on the CARP VIP, but only the CARP MASTER will respond to traffic. When failover occurs, the new MASTER takes over seamlessly.
    +The os: openbsd label allows filtering these hosts separately from FreeBSD and Linux nodes.

    -

    Adding LAN ingress to services


    +

    OpenBSD memory metrics compatibility



    -To expose a service on the LAN, add a second Ingress resource to its Helm chart. Here's an example:
    +OpenBSD uses the same memory metric names as FreeBSD (node_memory_size_bytes, node_memory_free_bytes, etc.), so a similar PrometheusRule is needed to generate Linux-compatible metrics:

    ----
    -# LAN Ingress for f3s.lan.foo.zone
    -apiVersion: networking.k8s.io/v1
    -kind: Ingress
    +apiVersion: monitoring.coreos.com/v1
    +kind: PrometheusRule
     metadata:
    -  name: ingress-lan
    -  namespace: services
    -  annotations:
    -    spec.ingressClassName: traefik
    -    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
    +  name: openbsd-memory-rules
    +  namespace: monitoring
    +  labels:
    +    release: prometheus
     spec:
    -  tls:
    -    - hosts:
    -        - f3s.lan.foo.zone
    -      secretName: f3s-lan-tls
    -  rules:
    -    - host: f3s.lan.foo.zone
    -      http:
    -        paths:
    -          - path: /
    -            pathType: Prefix
    -            backend:
    -              service:
    -                name: service
    -                port:
    -                  number: 4533
    +  groups:
    +    - name: openbsd-memory
    +      rules:
    +        - record: node_memory_MemTotal_bytes
    +          expr: node_memory_size_bytes{os="openbsd"}
    +          labels:
    +            os: openbsd
    +        - record: node_memory_MemAvailable_bytes
    +          expr: |
    +            node_memory_free_bytes{os="openbsd"}
    +              + node_memory_inactive_bytes{os="openbsd"}
    +              + node_memory_cache_bytes{os="openbsd"}
    +          labels:
    +            os: openbsd
    +        - record: node_memory_MemFree_bytes
    +          expr: node_memory_free_bytes{os="openbsd"}
    +          labels:
    +            os: openbsd
    +        - record: node_memory_Cached_bytes
    +          expr: node_memory_cache_bytes{os="openbsd"}
    +          labels:
    +            os: openbsd
     

    -Key points:
    +This file is saved as openbsd-recording-rules.yaml and applied alongside the FreeBSD rules. Note that OpenBSD doesn't expose a buffer memory metric, so that rule is omitted.

    -
      -
    • Use web,websecure entrypoints (both HTTP and HTTPS)
    • -
    • Reference the f3s-lan-tls secret in the tls section
    • -
    • Use .f3s.lan.foo.zone subdomain pattern
    • -
    • Same backend service as the external ingress
    • -

    -Apply the ingress and test:
    +openbsd-recording-rules.yaml on Codeberg

    - -
    $ kubectl apply -f ingress-lan.yaml
    -ingress.networking.k8s.io/ingress-lan created
    -
    -$ curl -k https://f3s.lan.foo.zone
    -HTTP/2 302 
    -location: /app/
    -
    +After running just upgrade, the OpenBSD hosts appear in Prometheus targets and the Node Exporter dashboards.

    -

    Client-side DNS and CA setup


    +

    Distributed Tracing with Grafana Tempo


    +
    +After implementing logs (Loki) and metrics (Prometheus), the final pillar of observability is distributed tracing. Grafana Tempo provides distributed tracing capabilities that help understand request flows across microservices.
    +
    +For a preview of what distributed tracing with Tempo looks like in Grafana, see the X-RAG blog post:
    +
    +X-RAG Observability Hackathon
    +
    +

    Why Distributed Tracing?


    +
    +In a microservices architecture, a single user request may traverse multiple services. Distributed tracing:
    +
    +
      +
    • Tracks requests across service boundaries
    • +
    • Identifies performance bottlenecks
    • +
    • Visualizes service dependencies
    • +
    • Correlates with logs and metrics
    • +
    • Helps debug complex distributed systems
    • +

    +

    Deploying Grafana Tempo



    -To access LAN services, clients need DNS entries and must trust the self-signed CA.
    +Tempo is deployed in monolithic mode, following the same pattern as Loki's SingleBinary deployment.

    -Add DNS entries to /etc/hosts on your laptop:
    +#### Configuration Strategy

    - -
    $ sudo tee -a /etc/hosts << 'EOF'
    -# f3s LAN services
    -192.168.1.138  f3s.lan.foo.zone
    -EOF
    -
    +**Deployment Mode:** Monolithic (all components in one process)
    +
      +
    • Simpler operation than microservices mode
    • +
    • Suitable for the cluster scale
    • +
    • Consistent with Loki deployment pattern
    • +

    +**Storage:** Filesystem backend using hostPath
    +
      +
    • 10Gi storage at /data/nfs/k3svolumes/tempo/data
    • +
    • 7-day retention (168h)
    • +
    • Local storage is the only option for monolithic mode
    • +

    +**OTLP Receivers:** Standard OpenTelemetry Protocol ports
    +
      +
    • gRPC: 4317
    • +
    • HTTP: 4318
    • +
    • Bind to 0.0.0.0 to avoid Tempo 2.7+ localhost-only binding issue
    • +

    +#### Tempo Deployment Files

    -The CARP VIP 192.168.1.138 provides high availability—traffic automatically fails over to the backup host if the master goes down.
    +Created in /home/paul/git/conf/f3s/tempo/:

    -Export the self-signed CA certificate:
    +**values.yaml** - Helm chart configuration:

    - -
    $ kubectl get secret selfsigned-ca-secret -n cert-manager -o jsonpath='{.data.ca\.crt}' | \
    -    base64 -d > f3s-lan-ca.crt
    +
    +tempo:
    +  retention: 168h
    +  storage:
    +    trace:
    +      backend: local
    +      local:
    +        path: /var/tempo/traces
    +      wal:
    +        path: /var/tempo/wal
    +  receivers:
    +    otlp:
    +      protocols:
    +        grpc:
    +          endpoint: 0.0.0.0:4317
    +        http:
    +          endpoint: 0.0.0.0:4318
    +
    +persistence:
    +  enabled: true
    +  size: 10Gi
    +  storageClassName: ""
    +
    +resources:
    +  limits:
    +    cpu: 1000m
    +    memory: 2Gi
    +  requests:
    +    cpu: 500m
    +    memory: 1Gi
     

    -Install the CA certificate on Linux (Fedora/Rocky):
    +**persistent-volumes.yaml** - Storage configuration:

    - -
    $ sudo cp f3s-lan-ca.crt /etc/pki/ca-trust/source/anchors/
    -$ sudo update-ca-trust
    +
    +apiVersion: v1
    +kind: PersistentVolume
    +metadata:
    +  name: tempo-data-pv
    +spec:
    +  capacity:
    +    storage: 10Gi
    +  accessModes:
    +    - ReadWriteOnce
    +  persistentVolumeReclaimPolicy: Retain
    +  hostPath:
    +    path: /data/nfs/k3svolumes/tempo/data
    +---
    +apiVersion: v1
    +kind: PersistentVolumeClaim
    +metadata:
    +  name: tempo-data-pvc
    +  namespace: monitoring
    +spec:
    +  storageClassName: ""
    +  accessModes:
    +    - ReadWriteOnce
    +  resources:
    +    requests:
    +      storage: 10Gi
     

    -After trusting the CA, browsers will accept the LAN certificates without warnings.
    -
    -

    Scaling to other services


    +**Grafana Datasource Provisioning**

    -The same pattern can be applied to any service. To add LAN access:
    +All Grafana datasources (Prometheus, Alertmanager, Loki, Tempo) are provisioned via a unified ConfigMap that is directly mounted to the Grafana pod. This approach ensures datasources are loaded on startup without requiring sidecar-based discovery.

    -1. Copy the f3s-lan-tls secret to the service's namespace (if not already there)
    -2. Add a LAN Ingress resource using the pattern above
    -3. Configure DNS: 192.168.1.138 service.f3s.lan.foo.zone
    +In /home/paul/git/conf/f3s/prometheus/grafana-datasources-all.yaml:

    -No changes needed to:
    +
    +apiVersion: v1
    +kind: ConfigMap
    +metadata:
    +  name: grafana-datasources-all
    +  namespace: monitoring
    +data:
    +  datasources.yaml: |
    +    apiVersion: 1
    +    datasources:
    +      - name: Prometheus
    +        type: prometheus
    +        uid: prometheus
    +        url: http://prometheus-kube-prometheus-prometheus.monitoring:9090/
    +        access: proxy
    +        isDefault: true
    +      - name: Alertmanager
    +        type: alertmanager
    +        uid: alertmanager
    +        url: http://prometheus-kube-prometheus-alertmanager.monitoring:9093/
    +      - name: Loki
    +        type: loki
    +        uid: loki
    +        url: http://loki.monitoring.svc.cluster.local:3100
    +      - name: Tempo
    +        type: tempo
    +        uid: tempo
    +        url: http://tempo.monitoring.svc.cluster.local:3200
    +        jsonData:
    +          tracesToLogsV2:
    +            datasourceUid: loki
    +            spanStartTimeShift: -1h
    +            spanEndTimeShift: 1h
    +          tracesToMetrics:
    +            datasourceUid: prometheus
    +          serviceMap:
    +            datasourceUid: prometheus
    +          nodeGraph:
    +            enabled: true
    +

    +The kube-prometheus-stack Helm values (persistence-values.yaml) are configured to:
      -
    • relayd configuration (forwards all traffic)
    • -
    • cert-manager (wildcard cert covers all *.f3s.lan.foo.zone)
    • -
    • CARP configuration (VIP shared by all services)
    • +
    • Disable sidecar-based datasource provisioning
    • +
    • Mount grafana-datasources-all ConfigMap directly to /etc/grafana/provisioning/datasources/

    -

    TLS offloaders summary


    -
    -The f3s infrastructure now has three distinct TLS offloaders:
    +This direct mounting approach is simpler and more reliable than sidecar-based discovery.

    -
      -
    • **OpenBSD relayd**: External internet traffic (*.f3s.foo.zone) using Let's Encrypt
    • -
    • **Traefik (k3s)**: LAN HTTPS traffic (*.f3s.lan.foo.zone) using cert-manager
    • -
    • **stunnel**: NFS-over-TLS (port 2323) using custom PKI
    • -

    -Each serves a different purpose with appropriate certificate management for its use case.
    +#### Installation

    -

    Deploying the private Docker image registry


    +
    +cd /home/paul/git/conf/f3s/tempo
    +just install
    +

    -As not all Docker images I want to deploy are available on public Docker registries and as I also build some of them by myself, there is the need of a private registry.
    +Verify Tempo is running:

    -All manifests for the f3s stack live in my configuration repository:
    +
    +kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
    +kubectl exec -n monitoring <tempo-pod> -- wget -qO- http://localhost:3200/ready
    +

    -codeberg.org/snonux/conf/f3s
    +

    Configuring Grafana Alloy for Trace Collection



    -Within that repo, the f3s/registry/ directory contains the Helm chart, a Justfile, and a detailed README. Here's the condensed walkthrough I used to roll out the registry with Helm.
    +Updated /home/paul/git/conf/f3s/loki/alloy-values.yaml to add OTLP receivers for traces while maintaining existing log collection.

    -

    Prepare the NFS-backed storage


    +#### OTLP Receiver Configuration

    -Create the directory that will hold the registry blobs on the NFS share (I ran this on r0, but any node that exports /data/nfs/k3svolumes works):
    +Added to Alloy configuration after the log collection pipeline:

    - -
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/registry
    +
    +// OTLP receiver for traces via gRPC and HTTP
    +otelcol.receiver.otlp "default" {
    +  grpc {
    +    endpoint = "0.0.0.0:4317"
    +  }
    +  http {
    +    endpoint = "0.0.0.0:4318"
    +  }
    +  output {
    +    traces = [otelcol.processor.batch.default.input]
    +  }
    +}
    +
    +// Batch processor for efficient trace forwarding
    +otelcol.processor.batch "default" {
    +  timeout = "5s"
    +  send_batch_size = 100
    +  send_batch_max_size = 200
    +  output {
    +    traces = [otelcol.exporter.otlp.tempo.input]
    +  }
    +}
    +
    +// OTLP exporter to send traces to Tempo
    +otelcol.exporter.otlp "tempo" {
    +  client {
    +    endpoint = "tempo.monitoring.svc.cluster.local:4317"
    +    tls {
    +      insecure = true
    +    }
    +    compression = "gzip"
    +  }
    +}
     

    -

    Install (or upgrade) the chart


    +The batch processor reduces network overhead by accumulating spans before forwarding to Tempo.

    -Clone the repo (or pull the latest changes) on a workstation that has helm configured for the cluster, then deploy the chart. The Justfile wraps the commands, but the raw Helm invocation looks like this:
    +#### Upgrade Alloy

    - -
    $ git clone https://codeberg.org/snonux/conf/f3s.git
    -$ cd conf/f3s/examples/conf/f3s/registry
    -$ helm upgrade --install registry ./helm-chart --namespace infra --create-namespace
    +
    +cd /home/paul/git/conf/f3s/loki
    +just upgrade
    +
    +
    +Verify OTLP receivers are listening:
    +
    +
    +kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i "otlp.*receiver"
    +kubectl exec -n monitoring <alloy-pod> -- netstat -ln | grep -E ':(4317|4318)'
     

    -Helm creates the infra namespace if it does not exist, provisions a PersistentVolume/PersistentVolumeClaim pair that points at /data/nfs/k3svolumes/registry, and spins up a single registry pod exposed via the docker-registry-service NodePort (30001). Verify everything is up before continuing:
    +

    Demo Tracing Application



    - -
    $ kubectl get pods --namespace infra
    -NAME                               READY   STATUS    RESTARTS      AGE
    -docker-registry-6bc9bb46bb-6grkr   1/1     Running   6 (53d ago)   54d
    -
    -$ kubectl get svc docker-registry-service -n infra
    -NAME                      TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
    -docker-registry-service   NodePort   10.43.141.56   <none>        5000:30001/TCP   54d
    +Created a three-tier Python application to demonstrate distributed tracing in action.
    +
    +#### Application Architecture
    +
    +
    +User → Frontend (Flask:5000) → Middleware (Flask:5001) → Backend (Flask:5002)
    +           ↓                          ↓                        ↓
    +                    Alloy (OTLP:4317) → Tempo → Grafana
     

    -

    Allow nodes and workstations to trust the registry


    +Frontend Service:

    -The registry listens on plain HTTP, so both Docker daemons on workstations and the k3s nodes need to treat it as an insecure registry. That's fine for my personal needs, as:
    +
      +
    • Receives HTTP requests at /api/process
    • +
    • Forwards to middleware service
    • +
    • Creates parent span for the entire request
    • +

    +Middleware Service:

      -
    • I don't store any secrets in the images
    • -
    • I access the registry this way only via my LAN
    • -
    • I may will change it later on...
    • +
    • Transforms data at /api/transform
    • +
    • Calls backend service
    • +
    • Creates child span linked to frontend

    -On my Fedora workstation where I build images:
    +Backend Service:

    - -
    $ cat <<"EOF" | sudo tee /etc/docker/daemon.json >/dev/null
    -{
    -  "insecure-registries": [
    -    "r0.lan.buetow.org:30001",
    -    "r1.lan.buetow.org:30001",
    -    "r2.lan.buetow.org:30001"
    -  ]
    -}
    -EOF
    -$ sudo systemctl restart docker
    +
      +
    • Returns data at /api/data
    • +
    • Simulates database query (100ms sleep)
    • +
    • Creates leaf span in the trace
    • +

    +OpenTelemetry Instrumentation:
    +
    +All services use Python OpenTelemetry libraries:
    +
    +**Dependencies:**
    +
    +flask==3.0.0
    +requests==2.31.0
    +opentelemetry-distro==0.49b0
    +opentelemetry-exporter-otlp==1.28.0
    +opentelemetry-instrumentation-flask==0.49b0
    +opentelemetry-instrumentation-requests==0.49b0
     

    -On each k3s node, make registry.lan.buetow.org resolve locally and point k3s at the NodePort:
    +**Auto-instrumentation pattern** (used in all services):

    -
    $ for node in r0 r1 r2; do
    ->   ssh root@$node "echo '127.0.0.1 registry.lan.buetow.org' >> /etc/hosts"
    -> done
    +
    from opentelemetry import trace
    +from opentelemetry.sdk.trace import TracerProvider
    +from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
    +from opentelemetry.instrumentation.flask import FlaskInstrumentor
    +from opentelemetry.instrumentation.requests import RequestsInstrumentor
    +from opentelemetry.sdk.resources import Resource
     
    -$ for node in r0 r1 r2; do
    -> ssh root@$node "cat <<'EOF' > /etc/rancher/k3s/registries.yaml
    -mirrors:
    -  "registry.lan.buetow.org:30001":
    -    endpoint:
    -      - "http://localhost:30001"
    -EOF
    -systemctl restart k3s"
    -> done
    +# Define service identity
    +resource = Resource(attributes={
    +    "service.name": "frontend",
    +    "service.namespace": "tracing-demo",
    +    "service.version": "1.0.0"
    +})
    +
    +provider = TracerProvider(resource=resource)
    +
    +# Export to Alloy
    +otlp_exporter = OTLPSpanExporter(
    +    endpoint="http://alloy.monitoring.svc.cluster.local:4317",
    +    insecure=True
    +)
    +
    +processor = BatchSpanProcessor(otlp_exporter)
    +provider.add_span_processor(processor)
    +trace.set_tracer_provider(provider)
    +
    +# Auto-instrument Flask and requests
    +FlaskInstrumentor().instrument_app(app)
    +RequestsInstrumentor().instrument()
     

    -Thanks to the relayd configuration earlier in the post, the external hostnames (f3s.foo.zone, etc.) can already reach NodePort 30001, so publishing the registry later to the outside world is just a matter of wiring the DNS the same way as the ingress hosts. But by default, that's not enabled for now due to security reasons.
    +The auto-instrumentation automatically:
    +
      +
    • Creates spans for HTTP requests
    • +
    • Propagates trace context via W3C Trace Context headers
    • +
    • Links parent and child spans across service boundaries
    • +

    +Deployment:

    -

    Pushing and pulling images


    +Created Helm chart in /home/paul/git/conf/f3s/tracing-demo/ with three separate deployments, services, and an ingress.

    -Tag any locally built image with one of the node IPs on port 30001, then push it. I usually target whichever node is closest to me, but any of the three will do:
    +Build and deploy:

    - -
    $ docker tag my-app:latest r0.lan.buetow.org:30001/my-app:latest
    -$ docker push r0.lan.buetow.org:30001/my-app:latest
    +
    +cd /home/paul/git/conf/f3s/tracing-demo
    +just build
    +just import
    +just install
     

    -Inside the cluster (or from other nodes), reference the image via the service name that Helm created:
    +Verify deployment:

    -image: docker-registry-service:5000/my-app:latest
    +kubectl get pods -n services | grep tracing-demo
    +kubectl get ingress -n services tracing-demo-ingress
     

    -You can test the pull path straight away:
    +Access the application at:

    - -
    $ kubectl run registry-test \
    ->   --image=docker-registry-service:5000/my-app:latest \
    ->   --restart=Never -n test --command -- sleep 300
    -
    +http://tracing-demo.f3s.buetow.org

    -If the pod pulls successfully, the private registry is ready for use by the rest of the workloads. Note, that the commands above actually don't work, they are only for illustration purpose mentioned here.
    +

    Visualizing Traces in Grafana



    -

    Example: Anki Sync Server from the private registry


    +The Tempo datasource is automatically discovered by Grafana through the ConfigMap label.

    -One of the first workloads I migrated onto the k3s cluster after standing up the registry was my Anki sync server. The configuration repo ships everything in examples/conf/f3s/anki-sync-server/: a Docker build context plus a Helm chart that references the freshly built image.
    +#### Accessing Traces

    -

    Build and push the image


    +Navigate to Grafana → Explore → Select "Tempo" datasource

    -The Dockerfile lives under docker-image/ and takes the Anki release to compile as an ANKI_VERSION build argument. The accompanying Justfile wraps the steps, but the raw commands look like this:
    +**Search Interface:**
    +
      +
    • Search by Trace ID
    • +
    • Search by service name
    • +
    • Search by tags
    • +

    +**TraceQL Queries:**

    - -
    $ cd conf/f3s/examples/conf/f3s/anki-sync-server/docker-image
    -$ docker build -t anki-sync-server:25.07.5b --build-arg ANKI_VERSION=25.07.5 .
    -$ docker tag anki-sync-server:25.07.5b \
    -    r0.lan.buetow.org:30001/anki-sync-server:25.07.5b
    -$ docker push r0.lan.buetow.org:30001/anki-sync-server:25.07.5b
    +Find all traces from demo app:
    +
    +{ resource.service.namespace = "tracing-demo" }
     

    -Because every k3s node treats registry.lan.buetow.org:30001 as an insecure mirror (see above), the push succeeds regardless of which node answers. If you prefer the shortcut, just f3s in that directory performs the same build/tag/push sequence.
    +Find slow requests (>200ms):
    +
    +{ duration > 200ms }
    +

    -

    Create the Anki secret and storage on the cluster


    +Find traces from specific service:
    +
    +{ resource.service.name = "frontend" }
    +

    -The Helm chart expects the services namespace, a pre-created NFS directory, and a Kubernetes secret that holds the credentials the upstream container understands:
    +Find errors:
    +
    +{ status = error }
    +

    - -
    $ ssh root@r0 "mkdir -p /data/nfs/k3svolumes/anki-sync-server/anki_data"
    -$ kubectl create namespace services
    -$ kubectl create secret generic anki-sync-server-secret \
    -    --from-literal=SYNC_USER1='paul:SECRETPASSWORD' \
    -    -n services
    +Complex query - frontend traces calling middleware:
    +
    +{ resource.service.namespace = "tracing-demo" } && { span.http.status_code >= 500 }
     

    -If the services namespace already exists, you can skip that line or let Kubernetes tell you the namespace is unchanged.
    +#### Service Graph Visualization

    -

    Deploy the chart


    +The service graph shows visual connections between services:

    -With the prerequisites in place, install (or upgrade) the chart. It pins the container image to the tag we just pushed and mounts the NFS export via a PersistentVolume/PersistentVolumeClaim pair:
    +1. Navigate to Explore → Tempo
    +2. Enable "Service Graph" view
    +3. Shows: Frontend → Middleware → Backend with request rates

    - -
    $ cd ../helm-chart
    -$ helm upgrade --install anki-sync-server . -n services
    -
    +The service graph uses Prometheus metrics generated from trace data.

    -Helm provisions everything referenced in the templates:
    +

    Correlation Between Observability Signals



    -
    -containers:
    -- name: anki-sync-server  image: registry.lan.buetow.org:30001/anki-sync-server:25.07.5b
    -  volumeMounts:
    -  - name: anki-data
    -    mountPath: /anki_data
    -
    +Tempo integrates with Loki and Prometheus to provide unified observability.

    -Once the release comes up, verify that the pod pulled the freshly pushed image and that the ingress we configured earlier resolves through relayd just like the Apache example.
    +#### Traces-to-Logs

    - -
    $ kubectl get pods -n services
    -$ kubectl get ingress anki-sync-server-ingress -n services
    -$ curl https://anki.f3s.foo.zone/health
    -
    +Click on any span in a trace to see related logs:

    -All of this runs solely on first-party images that now live in the private registry, proving the full flow from local bild to WireGuard-exposed service.
    +1. View trace in Grafana
    +2. Click on a span
    +3. Select "Logs for this span"
    +4. Loki shows logs filtered by:
    + * Time range (span duration ± 1 hour)
    + * Service name
    + * Namespace
    + * Pod

    -

    NFSv4 UID mapping for Postgres-backed (and other) apps


    +This helps correlate what the service was doing when the span was created.

    -NFSv4 only sees numeric user and group IDs, so the postgres account created inside the container must exist with the same UID/GID on the Kubernetes worker and on the FreeBSD NFS servers. Otherwise the pod starts with UID 999, the export sees it as an unknown anonymous user, and Postgres fails to initialise its data directory.
    +#### Traces-to-Metrics

    -To verify things line up end-to-end I run id in the container and on the hosts:
    +View Prometheus metrics for services in the trace:

    - -
    > ~ kubectl exec -n services deploy/miniflux-postgres -- id postgres
    -uid=999(postgres) gid=999(postgres) groups=999(postgres)
    -
    -[root@r0 ~]# id postgres
    -uid=999(postgres) gid=999(postgres) groups=999(postgres)
    -
    -paul@f0:~ % doas id postgres
    -uid=999(postgres) gid=99(postgres) groups=999(postgres)
    -
    +1. View trace in Grafana
    +2. Select "Metrics" tab
    +3. Shows metrics like:
    + * Request rate
    + * Error rate
    + * Duration percentiles

    -The Rocky Linux workers get their matching user with plain useradd/groupadd (repeat on r0, r1, and r2):
    +#### Logs-to-Traces

    - -
    [root@r0 ~]# groupadd --gid 999 postgres
    -[root@r0 ~]# useradd --uid 999 --gid 999 \
    -                --home-dir /var/lib/pgsql \
    -                --shell /sbin/nologin postgres
    -
    +From logs, you can jump to related traces:

    -FreeBSD uses pw, so on each NFS server (f0, f1, f2) I created the same account and disabled shell access:
    +1. In Loki, logs that contain trace IDs are automatically linked
    +2. Click the trace ID to view the full trace
    +3. See the complete request flow

    - -
    paul@f0:~ % doas pw groupadd postgres -g 999
    -paul@f0:~ % doas pw useradd postgres -u 999 -g postgres \
    -                -d /var/db/postgres -s /usr/sbin/nologin
    -
    +

    Generating Traces for Testing



    -Once the UID/GID exist everywhere, the Miniflux chart in examples/conf/f3s/miniflux deploys cleanly. The chart provisions both the application and its bundled Postgres database, mounts the exported directory, and builds the DSN at runtime. The important bits live in helm-chart/templates/persistent-volumes.yaml and deployment.yaml:
    +Test the demo application:

    -# Persistent volume lives on the NFS export
    -hostPath:
    -  path: /data/nfs/k3svolumes/miniflux/data
    -  type: Directory
    -...
    -containers:
    -- name: miniflux-postgres
    -  image: postgres:17
    -  volumeMounts:
    -  - name: miniflux-postgres-data
    -    mountPath: /var/lib/postgresql/data
    -
    -
    -Follow the README beside the chart to create the secrets and the target directory:
    -
    - -
    $ cd examples/conf/f3s/miniflux/helm-chart
    -$ mkdir -p /data/nfs/k3svolumes/miniflux/data
    -$ kubectl create secret generic miniflux-db-password \
    -    --from-literal=fluxdb_password='YOUR_PASSWORD' -n services
    -$ kubectl create secret generic miniflux-admin-password \
    -    --from-literal=admin_password='YOUR_ADMIN_PASSWORD' -n services
    -$ helm upgrade --install miniflux . -n services --create-namespace
    +curl http://tracing-demo.f3s.buetow.org/api/process
     

    -And to verify it's all up:
    +Load test (generates 50 traces):

    -$ kubectl get all --namespace=services | grep mini
    -pod/miniflux-postgres-556444cb8d-xvv2p   1/1     Running   0             54d
    -pod/miniflux-server-85d7c64664-stmt9     1/1     Running   0             54d
    -service/miniflux                   ClusterIP   10.43.47.80     <none>        8080/TCP             54d
    -service/miniflux-postgres          ClusterIP   10.43.139.50    <none>        5432/TCP             54d
    -deployment.apps/miniflux-postgres   1/1     1            1           54d
    -deployment.apps/miniflux-server     1/1     1            1           54d
    -replicaset.apps/miniflux-postgres-556444cb8d   1         1         1       54d
    -replicaset.apps/miniflux-server-85d7c64664     1         1         1       54d
    +cd /home/paul/git/conf/f3s/tracing-demo
    +just load-test
     

    -Or from the repository root I simply run:
    -
    -

    Helm charts currently in service


    -
    -These are the charts that already live under examples/conf/f3s and run on the cluster today (and I'll keep adding more as new services graduate into production):
    +Each request creates a distributed trace spanning all three services.

    -
      -
    • anki-sync-server — custom-built image served from the private registry, stores decks on /data/nfs/k3svolumes/anki-sync-server/anki_data, and authenticates through the anki-sync-server-secret.
    • -
    • koreade-sync-server — Sync server for KOReader.
    • -
    • audiobookshelf — media streaming stack with three hostPath mounts (config, audiobooks, podcasts) so the library survives node rebuilds.
    • -
    • example-apache — minimal HTTP service I use for smoke-testing ingress and relayd rules.
    • -
    • example-apache-volume-claim — Apache plus PVC variant that exercises NFS-backed storage for walkthroughs like the one earlier in this post.
    • -
    • miniflux — the Postgres-backed feed reader described above, wired for NFSv4 UID mapping and per-release secrets.
    • -
    • opodsync — podsync deployment with its data directory under /data/nfs/k3svolumes/opodsync/data.
    • -
    • radicale — CalDAV/CardDAV (and gpodder) backend with separate collections and auth volumes.
    • -
    • registry — the plain-HTTP Docker registry exposed on NodePort 30001 and mirrored internally as registry.lan.buetow.org:30001.
    • -
    • syncthing — two-volume setup for config and shared data, fronted by the syncthing.f3s.foo.zone ingress.
    • -
    • wallabag — read-it-later service with persistent data and images directories on the NFS export.
    • -

    -I hope you enjoyed this walkthrough. Read the next post of this series:
    +

    Verifying the Complete Pipeline



    -f3s: Kubernetes with FreeBSD - Part 8: Observability
    +Check the trace flow end-to-end:

    -Other *BSD-related posts:
    +**1. Application generates traces:**
    +
    +kubectl logs -n services -l app=tracing-demo-frontend | grep -i trace
    +

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +**2. Alloy receives traces:**
    +
    +kubectl logs -n monitoring -l app.kubernetes.io/name=alloy | grep -i otlp
    +

    -E-Mail your comments to paul@nospam.buetow.org
    +**3. Tempo stores traces:**
    +
    +kubectl logs -n monitoring -l app.kubernetes.io/name=tempo | grep -i trace
    +

    -Back to the main site
    -
    -
    -
    - - Bash Golf Part 4 - - gemini://foo.zone/gemfeed/2025-09-14-bash-golf-part-4.gmi - 2025-09-13T12:04:03+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the fourth blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time. - -
    -

    Bash Golf Part 4


    +**4. Grafana displays traces:**
    +Navigate to Explore → Tempo → Search for traces

    -Published at 2025-09-13T12:04:03+03:00
    +

    Practical Example: Viewing a Distributed Trace



    -This is the fourth blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time.
    +Let's generate a trace and examine it in Grafana.

    -2021-11-29 Bash Golf Part 1
    -2022-01-01 Bash Golf Part 2
    -2023-12-10 Bash Golf Part 3
    -2025-09-14 Bash Golf Part 4 (You are currently reading this)
    +**1. Generate a trace by calling the demo application:**

    -    '\       '\        '\        '\                   .  .        |>18>>
    -      \        \         \         \              .         ' .   |
    -     O>>      O>>       O>>       O>>         .                 'o |
    -      \       .\. ..    .\. ..    .\. ..   .                      |
    -      /\    .  /\     .  /\     .  /\    . .                      |
    -     / /   .  / /  .'.  / /  .'.  / /  .'    .                    |
    -jgs^^^^^^^`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    -                        Art by Joan Stark, mod. by Paul Buetow
    +curl -H "Host: tracing-demo.f3s.buetow.org" http://r0/api/process
     

    -

    Table of Contents


    +**Response (HTTP 200):**

    -
    -

    Split pipelines with tee + process substitution


    + +
    {
    +  "middleware_response": {
    +    "backend_data": {
    +      "data": {
    +        "id": 12345,
    +        "query_time_ms": 100.0,
    +        "timestamp": "2025-12-28T18:35:01.064538",
    +        "value": "Sample data from backend service"
    +      },
    +      "service": "backend"
    +    },
    +    "middleware_processed": true,
    +    "original_data": {
    +      "source": "GET request"
    +    },
    +    "transformation_time_ms": 50
    +  },
    +  "request_data": {
    +    "source": "GET request"
    +  },
    +  "service": "frontend",
    +  "status": "success"
    +}
    +

    -Sometimes you want to fan out one stream to multiple consumers and still continue the original pipeline. tee plus process substitution does exactly that:
    +**2. Find the trace in Tempo via API:**
    +
    +After a few seconds (for batch export), search for recent traces:

    -somecommand \
    -    | tee >(command1) >(command2) \
    -    | command3
    +kubectl exec -n monitoring tempo-0 -- wget -qO- \
    +  'http://localhost:3200/api/search?tags=service.namespace%3Dtracing-demo&limit=5' 2>/dev/null | \
    +  python3 -m json.tool
     

    -All of command1, command2, and command3 see the output of somecommand. Example:
    +Returns traces including:

    -
    printf 'a\nb\n' \
    -    | tee >(sed 's/.*/X:&/; s/$/ :c1/') >(tr a-z A-Z | sed 's/$/ :c2/') \
    -    | sed 's/$/ :c3/'
    +
    {
    +  "traceID": "4be1151c0bdcd5625ac7e02b98d95bd5",
    +  "rootServiceName": "frontend",
    +  "rootTraceName": "GET /api/process",
    +  "durationMs": 221
    +}
    +
    +
    +**3. Fetch complete trace details:**
    +
    +
    +kubectl exec -n monitoring tempo-0 -- wget -qO- \
    +  'http://localhost:3200/api/traces/4be1151c0bdcd5625ac7e02b98d95bd5' 2>/dev/null | \
    +  python3 -m json.tool
     

    -Output:
    +**Trace structure (8 spans across 3 services):**

    -a :c3
    -b :c3
    -A :c2 :c3
    -B :c2 :c3
    -X:a :c1 :c3
    -X:b :c1 :c3
    +Trace ID: 4be1151c0bdcd5625ac7e02b98d95bd5
    +Services: 3 (frontend, middleware, backend)
    +
    +Service: frontend
    +  └─ GET /api/process                 221.10ms  (HTTP server span)
    +  └─ frontend-process                 216.23ms  (custom business logic span)
    +  └─ POST                             209.97ms  (HTTP client span to middleware)
    +
    +Service: middleware
    +  └─ POST /api/transform              186.02ms  (HTTP server span)
    +  └─ middleware-transform             180.96ms  (custom business logic span)
    +  └─ GET                              127.52ms  (HTTP client span to backend)
    +
    +Service: backend
    +  └─ GET /api/data                    103.93ms  (HTTP server span)
    +  └─ backend-get-data                 102.11ms  (custom business logic span with 100ms sleep)
     

    -This relies on Bash process substitution (>(...)). Make sure your shell is Bash and not a POSIX /bin/sh.
    +**4. View the trace in Grafana UI:**

    -Example (fails under dash/POSIX sh):
    +Navigate to: Grafana → Explore → Tempo datasource

    - -
    /bin/sh -c 'echo hi | tee >(cat)'
    -# /bin/sh: 1: Syntax error: "(" unexpected
    +Search using TraceQL:
    +
    +{ resource.service.namespace = "tracing-demo" }
     

    -Combine with set -o pipefail if failures in side branches should fail the whole pipeline.
    +Or directly open the trace by pasting the trace ID in the search box:
    +
    +4be1151c0bdcd5625ac7e02b98d95bd5
    +

    -Example:
    +**5. Trace visualization:**

    - -
    set -o pipefail
    -printf 'ok\n' | tee >(false) | cat >/dev/null
    -echo $?   # 1 because a side branch failed
    -
    +The trace waterfall view in Grafana shows the complete request flow with timing:

    -Further reading:
    +Distributed trace visualization in Grafana Tempo showing Frontend → Middleware → Backend spans

    -Splitting pipelines with tee
    +For additional examples of Tempo trace visualization, see also:

    -

    Heredocs for remote sessions (and their gotchas)


    +X-RAG Observability Hackathon (more Grafana Tempo screenshots)

    -Heredocs are great to send multiple commands over SSH in a readable way:
    +The trace reveals the distributed request flow:

    - -
    ssh "$SSH_USER@$SSH_HOST" <<EOF
    -    # Go to the work directory
    -    cd "$WORK_DIR"
    -  
    -    # Make a git pull
    -    git pull
    -  
    -    # Export environment variables required for the service to run
    -    export AUTH_TOKEN="$APP_AUTH_TOKEN"
    -  
    -    # Start the service
    -    docker compose up -d --build
    -EOF
    -
    +
      +
    • Frontend (221ms): Receives GET /api/process, executes business logic, calls middleware
    • +
    • Middleware (186ms): Receives POST /api/transform, transforms data, calls backend
    • +
    • Backend (104ms): Receives GET /api/data, simulates database query with 100ms sleep
    • +
    • Total request time: 221ms end-to-end
    • +
    • Span propagation: W3C Trace Context headers automatically link all spans
    • +

    +**6. Service graph visualization:**

    -Tips:
    +The service graph is automatically generated from traces and shows service dependencies. For examples of service graph visualization in Grafana, see the screenshots in the X-RAG Observability Hackathon blog post.

    -Quoting the delimiter changes interpolation. Use <<'EOF' to avoid local expansion and send the content literally.
    +X-RAG Observability Hackathon (includes service graph screenshots)

    -Example:
    +This visualization helps identify:

    - -
    FOO=bar
    -cat <<'EOF'
    -$FOO is not expanded here
    -EOF
    -
    +
      +
    • Request rates between services
    • +
    • Average latency for each hop
    • +
    • Error rates (if any)
    • +
    • Service dependencies and communication patterns
    • +

    +

    Storage and Retention



    -Prefer explicit quoting for variables (as above) to avoid surprises. Example (spaces preserved only when quoted):
    +Monitor Tempo storage usage:

    - -
    WORK_DIR="/tmp/my work"
    -ssh host <<EOF
    -    cd $WORK_DIR      # may break if unquoted
    -    cd "$WORK_DIR"   # safe
    -EOF
    +
    +kubectl exec -n monitoring <tempo-pod> -- df -h /var/tempo
     

    -Consider set -euo pipefail at the top of the remote block for stricter error handling. Example:
    +With 10Gi storage and 7-day retention, the system handles moderate trace volumes. If storage fills up:

    - -
    ssh host <<'EOF'
    -    set -euo pipefail
    -    false   # causes immediate failure
    -    echo never
    -EOF
    -
    +
      +
    • Reduce retention to 72h (3 days)
    • +
    • Implement sampling in Alloy
    • +
    • Increase PV size
    • +

    +

    Configuration Files



    -Indent-friendly variant: use a dash to strip leading tabs in the body:
    +All configuration files are available on Codeberg:

    - -
    cat <<-EOF > script.sh
    -	#!/usr/bin/env bash
    -	echo "tab-indented content is dedented"
    -EOF
    -
    +Tempo configuration
    +Alloy configuration (updated for traces)
    +Demo tracing application

    -Further reading:
    +

    Summary



    -Heredoc headaches and fixes
    +With Prometheus, Grafana, Loki, Alloy, and Tempo deployed, I now have complete visibility into the k3s cluster, the FreeBSD storage servers, and the OpenBSD edge relays:

    -

    Namespacing and dynamic dispatch with ::


    +
      +
    • Metrics: Prometheus collects and stores time-series data from all components, including etcd and ZFS
    • +
    • Logs: Loki aggregates logs from all containers, searchable via Grafana
    • +
    • Traces: Tempo provides distributed request tracing with service dependency mapping
    • +
    • Visualisation: Grafana provides dashboards and exploration tools with correlation between all three signals
    • +
    • Alerting: Alertmanager can notify on conditions defined in Prometheus rules
    • +

    +This observability stack runs entirely on the home lab infrastructure, with data persisted to the NFS share. It's lightweight enough for a three-node cluster but provides the same capabilities as production-grade setups.

    -You can emulate simple namespacing by encoding hierarchy in function names. One neat pattern is pseudo-inheritance via a tiny super helper that maps pkg::lang::action to a pkg::base::action default.
    +All configuration files are available on Codeberg:

    - -
    #!/usr/bin/env bash
    -set -euo pipefail
    -
    -super() {
    -    local -r fn=${FUNCNAME[1]}
    -    # Split name on :: and dispatch to base implementation
    -    local -a parts=( ${fn//::/ } )
    -    "${parts[0]}::base::${parts[2]}" "$@"
    -}
    -
    -foo::base::greet() { echo "base: $@"; }
    -foo::german::greet()  { super "Guten Tag, $@!"; }
    -foo::english::greet() { super "Good day,  $@!"; }
    -
    -for lang in german english; do
    -    foo::$lang::greet Paul
    -done
    -
    +Prometheus, Grafana, and recording rules configuration
    +Loki and Alloy configuration
    +Tempo configuration
    +Demo tracing application

    -Output:
    +Other *BSD-related posts:

    -
    -base: Guten Tag, Paul!
    -base: Good day,  Paul!
    -
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability (You are currently reading this)
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD

    -

    Indirect references with namerefs


    +E-Mail your comments to paul@nospam.buetow.org

    -declare -n creates a name reference — a variable that points to another variable. It’s cleaner than eval for indirection:
    +Back to the main site
    +
    +
    +
    + + 'The Courage To Be Disliked' book notes + + gemini://foo.zone/gemfeed/2025-11-02-the-courage-to-be-disliked-book-notes.gmi + 2025-11-01T17:28:38+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + These are my personal book notes from Ichiro Kishimi and Fumitake Koga's 'The Courage To Be Disliked'. They are for me, but I hope they might be useful to you too. + +
    +

    "The Courage To Be Disliked" book notes



    - -
    user_name=paul
    -declare -n ref=user_name
    -echo "$ref"       # paul
    -ref=julia
    -echo "$user_name" # julia
    -
    +Published at 2025-11-01T17:28:38+02:00

    -Output:
    +These are my personal book notes from Ichiro Kishimi and Fumitake Koga's "The Courage To Be Disliked". They are for me, but I hope they might be useful to you too.

    -paul
    -julia
    +         ,..........   ..........,
    +     ,..,'          '.'          ',..,
    +    ,' ,'            :            ', ',
    +   ,' ,'             :             ', ',
    +  ,' ,'              :              ', ',
    + ,' ,'............., : ,.............', ',
    +,'  '............   '.'   ............'  ',
    + '''''''''''''''''';''';''''''''''''''''''
    +                    '''
     

    -Namerefs are local to functions when declared with local -n. Requires Bash ≥4.3.
    +

    Table of Contents



    -You can also construct the target name dynamically:
    +
    +

    The Nature of Life and Happiness



    - -
    make_var() {
    -    local idx=$1; shift
    -    local name="slot_$idx"
    -    printf -v "$name" '%s' "$*"   # create variable slot_$idx
    -}
    -
    -get_var() {
    -    local idx=$1
    -    local -n ref="slot_$idx"      # bind ref to slot_$idx
    -    printf '%s\n' "$ref"
    -}
    -
    -make_var 7 "seven"
    -get_var 7
    -
    +
      +
    • Life and the world are fundamentally simple; we are the ones who make them complicated. Drama does not exist.
    • +
    • Happiness is a choice and is attainable for everyone. Often, we lack the courage to be happy because it's easier to stay in a familiar, albeit unhappy, situation than to choose a new lifestyle, which may bring anxiety and unknowns.
    • +
    • Unhappiness is something you choose for yourself.
    • +

    +

    Subjective Reality and Perception



    -Output:
    +
      +
    • Our perception of the world is subjective. We don't see the world as it is, but as we are.
    • +
    • The world you see is different from the one I see, and it's impossible to truly share your world with anyone else.
    • +

    +This is illustrated by the "10 people" example: if one person dislikes you, two love you, and seven are indifferent, focusing only on the one who dislikes you gives a distorted and negative view of your life. You are focusing on a tiny, insignificant part and judging the whole by it.

    -
    -seven
    -
    +The challenge is to find the courage to see the world directly, without the filters of our own subjective views.

    -

    Function declaration forms


    +

    The Power to Change and the Role of the Past



    -All of these work in Bash, but only the first one is POSIX-ish:
    +
      +
    • We are not defined by our past experiences but by the meaning we assign to them. The past does not determine our future.
    • +
    • The book rejects Freudian etiology (the idea that past trauma defines us) in favor of teleology (the idea that we are driven by our present goals).
    • +
    • Change is possible for everyone at any moment, regardless of their circumstances or age. This change must come from your own doing, not from others.
    • +
    • We live in accordance with our present goals, not past causes. The past does not exist; the only issue is the present.
    • +
    • Emotions, like anger, can be fabricated tools used to achieve a goal (e.g., to control or shout at someone) rather than uncontrollable forces that rule us.
    • +

    +

    Self-Acceptance, Lifestyle, and Life Lies



    - -
    foo() { echo foo; }
    -function foo { echo foo; }
    -function foo() { echo foo; }
    -
    +
      +
    • Your "lifestyle"—your worldview and outlook on life—is a choice, not a fixed personality trait. You can change it instantly.
    • +
    • The key is self-acceptance, not self-affirmation. Accept what you cannot change and have the courage to change what you can.
    • +
    • You cannot be reborn as someone else. It is better to learn to love yourself and make the best use of the "equipment" you were born with.
    • +
    • Workaholism is a "life lie." It is a form of being in disharmony with life, using work as an excuse to avoid other life tasks and responsibilities.
    • +

    +

    Interpersonal Relationships



    -Recommendation: prefer name() { ... } for portability and consistency.
    +
      +
    • All problems are, at their core, problems of interpersonal relationships. To escape all problems would mean to live alone in the universe, which is impossible.
    • +
    • The book identifies three "Life Tasks" that everyone faces: the task of work, the task of friendship, and the task of love.
    • +
    • **Competition:** Life is not a competition. When we stop comparing ourselves to others, we cease to see them as enemies. They become comrades, and we can genuinely celebrate their successes. This removes the fear of losing and allows for peace.
    • +
    • **Power Struggles:** When someone is angry with you, recognize it as their attempt at a power struggle. The person who attacks you is the one with the problem. Do not get drawn in. Arguing about who is right or wrong is a trap. Admitting a fault is not a defeat.
    • +
    • **Horizontal vs. Vertical Relationships:** Strive for "horizontal relationships" based on equality, rather than "vertical relationships" based on hierarchy. Praise and rebuke are forms of manipulation found in vertical relationships. Instead, offer encouragement. (Note: The original author expresses disagreement with applying this to children, feeling a hierarchy is necessary and that children appreciate praise).
    • +
    • **Separation of Tasks:** Understand what is your responsibility and what is someone else's. For example, if someone takes advantage of your trust, that is their task. Your task is to decide whether to trust them in the first place.
    • +
    • **Confidence in Others:** Having unconditional confidence in others helps build deep relationships and a sense of belonging, turning others into comrades.
    • +

    +

    Inferiority and Superiority



    -

    Chaining function calls in conditionals


    +
      +
    • A feeling of inferiority is not inherently bad; it can be a catalyst for growth when we compare ourselves to our ideal self. This "pursuit of superiority" drives progress.
    • +
    • This is different from an "inferiority complex," which is using feelings of inadequacy as an excuse to avoid change and responsibility.
    • +
    • Value is based on a social context. An object's worth is subjective and can be reinterpreted.
    • +

    +

    Community, Contribution, and Happiness



    -Functions return a status like commands. You can short-circuit them in conditionals:
    +
      +
    • The definition of happiness is the feeling of contribution.
    • +
    • A true sense of self-worth comes from feeling useful to a community (the "community feeling").
    • +
    • This contribution doesn't have to be grand. You can be of worth to the community simply by being.
    • +
    • When you have a genuine feeling of contribution, you no longer need recognition or praise from others.
    • +

    +

    Living in the Here and Now



    - -
    deploy_check() { test -f deploy.yaml; }
    -smoke_test()   { curl -fsS http://localhost/healthz >/dev/null; }
    -
    -if deploy_check || smoke_test; then
    -    echo "All good."
    -else
    -    echo "Something failed." >&2
    -fi
    -
    +
      +
    • Life is a series of moments ("dots"), not a continuous line. We should live fully in the "here and now."
    • +
    • The greatest life lie is to dwell on the past and the future, which do not exist, instead of focusing on the present moment.
    • +
    • Focus on the process, not just the outcome. The goal of a dance is the dancing itself, not just reaching a destination.
    • +

    +

    The Courage to Be Normal



    -You can also compress it golf-style:
    +
      +
    • Why does everyone want to be special? Is it inferior to be normal?
    • +
    • Embracing being normal, instead of striving for a special status, is a form of courage. In the grander sense, isn't everyone normal?
    • +

    +

    Freedom is Being Disliked



    - -
    deploy_check || smoke_test && echo ok || echo fail >&2
    -
    +
      +
    • The price of true freedom is to be disliked by other people. It is a sign that you are living in accordance with your own principles.
    • +

    +

    The Meaning of Life



    -

    Grep, sed, awk quickies


    +
      +
    • Life has no inherent meaning. It is up to each individual to assign meaning to their own life.
    • +
    • Do not be afraid of being disliked by others for living your life according to the meaning you create.
    • +
    • You have the power to change yourself, and in doing so, you change your world. No one else can change it for you.
    • +

    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Word match and context: grep -w word file; with context: grep -C3 foo file (same as -A3 -B3). Example:
    +Other book notes of mine are:

    - -
    cat > /tmp/ctx.txt <<EOF
    -one
    -foo
    -two
    -three
    -bar
    -EOF
    -grep -C1 foo /tmp/ctx.txt
    -
    +2025-11-02 'The Courage To Be Disliked' book notes (You are currently reading this)
    +2025-06-07 'A Monk's Guide to Happiness' book notes
    +2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    +2024-10-24 'Staff Engineer' book notes
    +2024-07-07 'The Stoic Challenge' book notes
    +2024-05-01 'Slow Productivity' book notes
    +2023-11-11 'Mind Management' book notes
    +2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    +2023-05-06 'The Obstacle is the Way' book notes
    +2023-04-01 'Never split the difference' book notes
    +2023-03-16 'The Pragmatic Programmer' book notes

    -Output:
    +Back to the main site
    +
    +
    +
    + + Perl New Features and Foostats + + gemini://foo.zone/gemfeed/2025-11-02-perl-new-features-and-foostats.gmi + 2025-11-01T16:10:35+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Perl recently reached rank 10 in the TIOBE index. That headline made me write this blog post as I was developing the Foostats script for simple analytics of my personal websites and Gemini capsules (e.g. `foo.zone`) and there were a couple of new features added to the Perl language over the last releases. The book *Perl New Features* by brian d foy documents the changes well; this post shows how those features look in a real program that runs every morning for my stats generation. + +
    +

    Perl New Features and Foostats



    -
    -one
    -foo
    -two
    -
    +Published at 2025-11-01T16:10:35+02:00

    -Skip a directory while recursing: grep -R --exclude-dir=foo 'bar' /path. Example:
    +Perl recently reached rank 10 in the TIOBE index. That headline made me write this blog post as I was developing the Foostats script for simple analytics of my personal websites and Gemini capsules (e.g. foo.zone) and there were a couple of new features added to the Perl language over the last releases. The book *Perl New Features* by brian d foy documents the changes well; this post shows how those features look in a real program that runs every morning for my stats generation.

    - -
    mkdir -p /tmp/golf/foo /tmp/golf/src
    -printf 'bar\n' > /tmp/golf/src/a.txt
    -printf 'bar\n' > /tmp/golf/foo/skip.txt
    -grep -R --exclude-dir=foo 'bar' /tmp/golf
    +Perl re-enters the top ten
    +Perl New Features by Joshua McAdams and brian d foy
    +
    +
    +$b="24P7cP3dP31P3bPaP28P24P64P31P2cP24P64P32P2cP24P73P2cP24P67P2cP24P7
    +2P29P3dP28P22P31P30P30P30P30P22P2cP22P31P30P30P30P30P30P22P2cP22P4aP75
    +P7                                                                  3P
    +74                                                                  P2
    +0P  41P6eP6fP74P     68P65P72P20P50 P65P72P6cP2     0P48P           61
    +P6  3P6bP65P72P22P   29P3bPaP40P6dP 3dP73P70P6cP6   9P74P           20
    +P2  fP2fP    2cP22P  2cP2eP3aP21P2  bP2aP    30P4f  P40P2           2P
    +3b  PaP24      P6eP3 dP6c           P65P6      eP67 P74P6           8P
    +20  P24P7      3P3bP aP24           P75P3      dP22 P20P2           2P
    +78  P24P6      eP3bP aPaP           70P72      P69P 6eP74           P2
    +0P  22P5c    P6eP20  P20P           24P75    P5cP7  2P22P           3b
    +Pa  PaP66P6fP72P2    8P24P7aP20P    3dP20P31P3bP    20P24           P7
    +aP  3cP3dP24P6       eP3bP20P24     P7aP2bP2bP      29P20           P7
    +bP  aPaP9            P77P28P24P6    4P31P29P        3bPaP           9P
    +24  P72P3            dP69           P6eP74P28       P72P6           1P
    +6e  P64P2            8P24           P6eP2 9P29P     3bPaP           9P
    +24  P67P3            dP73           P75P6  2P73P    74P72           P2
    +0P  24P73            P2cP24P72P2cP  31P3b   PaP9P   24P67P20P3fP20  P6
    +4P  6fP20            P9P7bP20PaP9P9 P9P9P    9P66P  6fP72P20P28P24  P6
    +bP  3dP30            P3bP24P6bP3cP3 9P3bP    24P6bP 2bP2bP29P20P7b  Pa
    +P9                                                                  P9
    +P9                                                                  P9
    +P9  P9P73P75P6     2P73   P74P  72P2       8P24P75P2c     P24P72    P2
    +cP  31P29P3dP24P   6dP5   bP24  P6bP       5dP3bP20Pa   P9P9  P9P9  P9
    +P9  P70P    72P69  P6eP   74P2  0P22       P20P20P24P  75P      5cP 72
    +P2  2P3b      PaP9 P9P9   P9P9  P9P7       7P28       P24        P6 4P
    +32  P29P      3bPa P9P9   P9P9  P9P7       dPaP       9P9           P9
    +P9  P9P7      3P75 P62P   73P7  4P72       P28P        24P7         5P
    +2c  P24P    72P2c  P31P   29P3  dP24       P67P3bP20P   aP9P9       P9
    +P9  P7dP20PaP9P    9P3a   P20P  72P6       5P64P6fP3b      PaP9     P7
    +3P  75P62P73P      74P7   2P28  P24P       73P2cP24P7        2P2c   P3
    +1P  29P3dP2        2P30   P22P  3bPa       P9P7                0P7  2P
    +69  P6eP74P2       0P22   P20P  20P2       4P75                 P5c P7
    +2P  22P3 bPaPa     P7dP   aPaP  77P2       0P28                 P24 P6
    +4P  32P2  9P3bP    aP70   P72P  69P6       eP74       P2        0P2 2P
    +20  P20P   24P75   P20P21P5cP7  2P22P3bPaP 73P6cP65P6 5P7     0P20  P3
    +2P  3bPa    P70P7  2P69P6eP74P  20P22P20P2 0P24P75P20  P21P  5cP6   eP
    +22  P3bP     aPaP7  3P75P62P2   0P77P20P7b PaP9P24P6c    P3dP73     P6
    +8P                                                                  69
    +P6                                                                  6P
    +74P3bPaP9P66P6fP72P28P24P6aP3dP30P3bP24P6aP3cP24P6cP3bP24P6aP2bP2bP29P
    +7bP7dPaP7dP";$b=~s/\s//g;split /P/,$b;foreach(@_){$c.=chr hex};eval $c
    +
    +The above Perl script prints out "Just Another Perl Hacker !" in an
    +animation of sorts.
    +
     

    -Output:
    +

    Table of Contents



    -
    -/tmp/golf/src/a.txt:bar
    -
    +
    +

    Motivation



    -Insert lines with sed: sed -e '1isomething' -e '3isomething' file. Example:
    +I've been running foo.zone for a while now, but I've never looked into visitor statistics or analytics. I value privacy—not just my own, but also the privacy of others (the visitors of this site) — so I hesitated to use any off-the-shelf analytics plugins. All I wanted to collect were:

    - -
    printf 'A\nB\nC\n' > /tmp/s.txt
    -sed -e '1iHEAD' -e '3iMID' /tmp/s.txt
    -
    +
      +
    • Which blog posts had the most (unique) visitors
    • +
    • Exclude, if possible, any bots and scrapers from the stats
    • +
    • Track only anonymized IP addresses, never store raw addresses
    • +

    +With Foostats I've created a Perl script which does that for my highly opinionated website/blog setup, which consists of:

    -Output:
    +Gemtexter, my static site and Gemini capsule generator
    +How I host this site highly-available using OpenBSD

    -
    -HEAD
    -A
    -B
    -MID
    -C
    -
    +

    Why I used Perl



    -Drop last column with awk: awk 'NF{NF-=1};1' file. Example:
    +Even though nowadays I code more in Go and Ruby, I stuck with Perl for Foostats for four simple reasons:

    - -
    printf 'a b c\nx y z\n' > /tmp/t.txt
    -cat /tmp/t.txt
    -echo
    -awk 'NF{NF-=1};1' /tmp/t.txt
    -
    +
      +
    • I wanted an excuse to explore the newer features of my first programming love.
    • +
    • Sometimes, I miss Perl.
    • +
    • Perl ships with OpenBSD (the operating system on which my sites run) by default.
    • +
    • It really does live up to its Practical Extraction and Report Language (that's what the name Perl means) for this kind of log grinding I did with Foostats.
    • +

    +

    Inside Foostats



    -Output:
    +Foostats is simply a log file analyser, which analyses the OpenBSD httpd and relayd logs.

    -
    -a b c
    -x y z
    -
    -a b
    -x y
    -
    +https://man.openbsd.org/httpd.8
    +https://man.openbsd.org/relayd.8

    -

    Safe xargs with NULs


    +

    Log pipeline



    -Avoid breaking on spaces/newlines by pairing find -print0 with xargs -0:
    +A CRON job starts Foostats, reads OpenBSD httpd and relayd access logs, and produces the numbers published at https://stats.foo.zone and gemini://stats.foo.zone. The dashboards are humble because traffic on my sites is still light, yet the trends are interesting for spotting patterns. The script is opinionated (I am repeating myself here, I know), and I will probably be the only one ever using it for my own sites. However, the code demonstrates how Perl's newer features help keep a small script like this exciting and fun!

    - -
    find . -type f -name '*.log' -print0 | xargs -0 rm -f
    -
    +Foostats (HTTP)
    +Foostats (Gemini)

    -Example with spaces and NULs only:
    +On OpenBSD, I've configured the job via the daily.local on both of my OpenBSD servers (fishfinger.buetow.org and blowfish.buetow.org - note one is the master server, the other is the standby server, but the script runs on both and the stats are merged later in the process):

    -
    printf 'a\0b c\0' | xargs -0 -I{} printf '<%s>\n' {}
    +
    fishfinger$ grep foostats /etc/daily.local
    +perl /usr/local/bin/foostats.pl --parse-logs --replicate --report
     

    -Output:
    -
    -
    -<a>
    -<b c>
    -
    +Internally, Foostats::Logreader parses each line of the log files /var/log/daemon* and /var/www/logs/access_log*, turns timestamps into YYYYMMDD/HHMMSS values, hashes IP addresses with SHA3 (for anonymization), and hands a normalized event to Foostats::Filter. The filter compares the URI against entries in fooodds.txt, tracks how many times an IP address requests within the exact second, and drops anything suspicious (e.g., from web crawlers or malicious attackers). Valid events reach Foostats::Aggregator, which counts requests per protocol, records unique visitors for the Gemtext and Atom feeds, and remembers page-level IP sets. Foostats::FileOutputter writes the result as gzipped JSON files—one per day and per protocol—with IPv4/IPv6 splits, filtered counters, feed readership, and hashes for long URLs.

    -

    Efficient file-to-variable and arrays


    +

    fooodds.txt



    -Read a whole file into a variable without spawning cat:
    +fooodds.txt is a plain text list of substrings of URLs to be blocked, making it quick to shut down web crawlers. Foostats also detects rapid requests (an indicator of excessive crawling) and blocks the IP. Audit lines are written to /var/log/fooodds, which can later be reviewed for false or true positives (I do this around once a month). The Justfile even has a gather-fooodds target that collects suspicious paths from remote logs so new patterns can be added quickly.

    - -
    cfg=$(<config.ini)
    -
    +

    Feed kinds



    -Read lines into an array safely with mapfile (aka readarray):
    +There are different kinds of feeds being tracked by Foostats:

    - -
    mapfile -t lines < <(grep -v '^#' config.ini)
    -printf '%s\n' "${lines[@]}"
    -
    +
      +
    • The Atom web-feed
    • +
    • The same feed via Gemini
    • +
    • The Gemfeed (a special format popular in the Geminispace)
    • +

    +

    Aggregation and output



    -Assign formatted strings without a subshell using printf -v:
    +As mentioned, Foostats merges the stats from both hosts, master and standby. For the master-standby setup description, read:

    - -
    printf -v msg 'Hello %s, id=%04d' "$USER" 42
    -echo "$msg"
    -
    +KISS high-availability with OpenBSD

    -Output:
    +Those gzipped files land in stats/. From there, Foostats::Replicator can pull matching files from the partner host (fishfinger or blowfish) so the view covers both servers, Foostats::Merger combines them into daily summaries, and Foostats::Reporter rebuilds Gemtext and HTML reports.

    -
    -Hello paul, id=0042
    -
    +Those are the raw stats files:

    -Read NUL-delimited data (pairs well with -print0):
    +https://blowfish.buetow.org/foostats/
    +https://fishfinger.buetow.org/foostats/

    - -
    mapfile -d '' -t files < <(find . -type f -print0)
    -printf '%s\n' "${files[@]}"
    -
    +These are the 30-day reports generated (already linked earlier in this post, but adding here again for clarity):

    -

    Quick password generator


    +stats.foo.zone Gemini capsule dashboard
    +stats.foo.zone HTTP dashboard

    -Pure Bash with /dev/urandom:
    +

    Command-line entry points



    - -
    LC_ALL=C tr -dc 'A-Za-z0-9_' </dev/urandom | head -c 16; echo
    -
    +foostats_main is the command entry point. --parse-logs refreshes the gzipped files, --replicate runs the cross-host sync, and --report rebuilds the HTML and Gemini report pages. --all performs everything in one go. Defaults point to /var/www/htdocs/buetow.org/self/foostats for data, /var/gemini/stats.foo.zone for Gemtext output, and /var/www/htdocs/gemtexter/stats.foo.zone for HTML output. Replication always forces the three most recent days' worth of data across HTTPS and leaves older files untouched to save bandwidth.

    -Alternative using openssl:
    +The complete source lives on Codeberg here:

    - -
    openssl rand -base64 16 | tr -d '\n' | cut -c1-22
    -
    +Foostats on Codeberg

    -

    yes for automation


    +Now let's go to some new Perl features:

    -yes streams a string repeatedly; handy for feeding interactive commands or quick load generation:
    +

    Packages as real blocks



    - -
    yes | rm -r large_directory        # auto-confirm
    -yes n | dangerous-command          # auto-decline
    -yes anything | head -n1            # prints one line: anything
    -
    +

    Scoped packages



    -

    Forcing true to fail (and vice versa)


    +Recent Perl versions allow the block form package Foo { ... }. Foostats uses it for every package. Imports stay local to the block, helper subs do not leak into the global symbol table, and configuration happens where the code needs it.

    -You can shadow builtins with functions:
    +The old way:

    -
    true()  { return 1; }
    -false() { return 0; }
    +
    package foo;
     
    -true  || echo 'true failed'
    -false && echo 'false succeeded'
    +sub hello {
    +    print "Hello from package foo\n";
    +}
     
    -# Bypass function with builtin/command
    -builtin true # returns 0
    -command true # returns 0
    +package bar;
    +
    +sub hello {
    +    print "Hello from package bar\n";
    +}
     

    -To disable a builtin entirely: enable -n true (re-enable with enable true).
    -
    -Further reading:
    -
    -Force true to return false
    -
    -

    Restricted Bash


    -
    -bash -r (or rbash) starts a restricted shell that limits potentially dangerous actions, for example:
    +But now it is also possible to do this:

    -
      -
    • Changing directories (cd).
    • -
    • Modifying PATH, SHELL, BASH_ENV, or ENV.
    • -
    • Redirecting output.
    • -
    • Running commands with / in the name.
    • -
    • Using exec.
    • -

    -It’s a coarse sandbox for highly constrained shells; read man bash (RESTRICTED SHELL) for details and caveats.
    + +
    package foo {
    +    sub hello {
    +        print "Hello from package foo\n";
    +    }
    +}
    +
    +package bar {
    +    sub hello {
    +        print "Hello from package bar\n";
    +    }
    +}
    +

    -Example session:
    +

    Postfix dereferencing keeps data structures tidy


    +
    +

    Clear dereferencing


    +
    +The script handles nested hashes and arrays. Postfix dereferencing ($hash->%*, $array->@*) keeps that readable.
    +
    +E.g. instead of having to write:

    -
    rbash -c 'cd /'            # cd: restricted
    -rbash -c 'PATH=/tmp'       # PATH: restricted
    -rbash -c 'echo hi > out'   # redirection: restricted
    -rbash -c '/bin/echo hi'    # commands with /: restricted
    -rbash -c 'exec ls'         # exec: restricted
    +
    for my $elem (@{$array_ref}) {
    +    print "$elem\n";
    +}
     

    -

    Useless use of cat (and when it’s ok)


    -
    -Avoid the extra process if a command already reads files or STDIN:
    +one can now do:

    -
    # Prefer
    -grep -i foo file
    -<file grep -i foo        # or feed via redirection
    -
    -# Over
    -cat file | grep -i foo
    +
    for my $elem ($array_ref->@*) {
    +    print "$elem\n";
    +}
     

    -But for interactive composition, or when you truly need to concatenate multiple sources into a single stream, cat is fine, as you may think, "First I need the content, then I do X." Changing the "useless use of cat" in retrospect is really a waste of time for one-time interactive use:
    +You see that this feature becomes increasingly useful with nested data structures, e.g. to print all keys of the nested hash:

    -
    cat file1 file2 | grep -i foo
    +
    print for keys $hash->{stats}->%*;
     

    -From notes: “Good for interactivity; Useless use of cat” — use judgment.
    +Loops over like $stats->{page_ips}->{urls}->%* or $merge{$key}->{$_}->%* show which level of the structure is in play. The merger in Foostats updates host and URL statistics without building temporary arrays, and the reporter code mirrors the layout of the final tables. Before postfix dereferencing, the same code relied on braces within braces and was harder to read.

    -

    Atomic locking with mkdir


    +

    say is the default voice now



    -Portable advisory locks can be emulated with mkdir because it’s atomic:
    +say became the default once the script switched to use v5.38;. It adds a newline to every message printed, comparable to Ruby's puts, making log messages like "Processing $path" or "Writing report to $report_path" cleaner:

    -
    lockdir=/tmp/myjob.lock
    -if mkdir "$lockdir" 2>/dev/null; then
    -    trap 'rmdir "$lockdir"' EXIT INT TERM
    -    # critical section
    -    do_work
    -else
    -    echo "Another instance is running" >&2
    -    exit 1
    -fi
    +
    use v5.38;
    +
    +print "Hello, world!\n";    # old way
    +say "Hello, world!";        # new way
     

    -This works well on Linux. Remove the lock in trap so crashes don’t leave stale locks.
    +

    Lexical subs promote local reasoning



    -

    Smarter globs and faster find-exec


    +Lexical subroutines keep helpers close to the code that needs them. In Foostats::Logreader::parse_web_logs, functions such as my sub parse_date and my sub open_file live only inside that scope.
    +
    +This is an example of a lexical sub named trim, which is only visible within the outer sub named process_lines:

    -
      -
    • Enable extended globs when useful: shopt -s extglob; then patterns like !(tmp|cache) work.
    • -
    • Use -exec ... {} + to batch many paths in fewer process invocations:
    • -

    -
    find . -name '*.log' -exec gzip -9 {} +
    +
    use v5.38;
    +
    +sub process_lines (@lines) {
    +    my sub trim ($str) {
    +        $str =~ s/^\s+|\s+$//gr;
    +    }
    +    return [ map { trim($_) } @lines ];
    +}
    +
    +my @raw = ("  foo  ", " bar", "baz ");
    +my $cleaned = process_lines(@raw);
    +say for @$cleaned; # prints "foo", "bar", "baz"
     

    -Example for extglob (exclude two dirs from listing):
    +

    Reference aliasing makes intent explicit


    +
    +Reference aliasing can be enabled with use feature qw(refaliasing) and helps communicate intent more clearly (if you remember the Perl syntax, of course—otherwise, it can look rather cryptic). The filter starts with \my $uri_path = \$event->{uri_path} so any later modification touches the original event. This is an example with ref aliasing in action:

    -
    shopt -s extglob
    -ls -d -- !(.git|node_modules) 2>/dev/null
    +
    use feature qw(refaliasing);
    +
    +my $hash = { foo => 42 };
    +\my $foo = \$hash->{foo};
    +
    +$foo = 99;
    +print $hash->{foo}; # prints 99
     

    -E-Mail your comments to paul@nospam.buetow.org :-)
    -
    -Other related posts are:
    -
    -2025-09-14 Bash Golf Part 4 (You are currently reading this)
    -2023-12-10 Bash Golf Part 3
    -2022-01-01 Bash Golf Part 2
    -2021-11-29 Bash Golf Part 1
    -2021-06-05 Gemtexter - One Bash script to rule it all
    -2021-05-16 Personal Bash coding style guide
    -
    -Back to the main site
    -
    -
    -
    - - Random Weird Things - Part Ⅲ - - gemini://foo.zone/gemfeed/2025-08-15-random-weird-things-iii.gmi - 2025-08-14T23:21:32+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Every so often, I come across random, weird, and unexpected things on the internet. It would be neat to share them here from time to time. This is the third run. - -
    -

    Random Weird Things - Part Ⅲ


    +The aggregator in Foostats aliases $self->{stats}{$date_key} before updating counters, so the structure remains intact. Combined with subroutine signatures, this makes it obvious when a piece of data is shared instead of copied, preventing silent bugs. This enables having shorter names for long nested data structures.

    -Published at 2025-08-14T23:21:32+03:00
    +

    Persistent state without globals



    -Every so often, I come across random, weird, and unexpected things on the internet. It would be neat to share them here from time to time. This is the third run.
    +A Perl state variable is declared with state $var and retains its value between calls to the enclosing subroutine. Foostats uses that for rate limiting and de-duplicated logging.

    -2024-07-05 Random Weird Things - Part Ⅰ
    -2025-02-08 Random Weird Things - Part Ⅱ
    -2025-08-15 Random Weird Things - Part Ⅲ (You are currently reading this)
    +This is a small example demonstrating the use of a state variable in Perl:

    -
    - /\_/\        /\_/\        /\_/\
    -( o.o ) WHOA!( o.o ) WHOA!( o.o )
    - > ^ <        > ^ <        > ^ <
    - /   \  MEOW! /   \  MOEEW!/   \
    -/_____\      /_____\      /_____\
    +
    +
    sub counter {
    +    state $count = 0;
    +    $count++;
    +    return $count;
    +}
    +
    +say counter(); # 1
    +say counter(); # 2
    +say counter(); # 3
     

    -

    Table of Contents


    -
    -
    -

    21. Doom in TypeScript’s type system


    -
    -Yes, really. Someone has implemented Doom to run within the TypeScript type system—compile-time madness, but fun to watch.
    -
    -Doom in the TS type system
    -
    -TypeScript’s type checker is surprisingly expressive: conditional types, recursion, and template literal types let you encode nontrivial logic that “executes” during compilation. The demo exploits this to build a tiny ray-caster that renders as compiler errors or types. It’s wildly impractical, but a great reminder that enough expressiveness plus recursion tends to drift toward Turing completeness.
    -
    -

    Run it in a PDF


    -
    -

    22. Doom inside a PDF


    -
    -Running Doom embedded in a PDF file. No separate binary—just a cursed document.
    -
    -doompdf
    -
    -This relies on features like PDF JavaScript and interactive objects, which some viewers still support. Expect mixed results: many modern readers sandbox or disable scripting by default for security. If you try it, use a compatible desktop viewer and be prepared for portability quirks.
    -
    -

    23. Linux inside a PDF


    -
    -Boot a tiny Linux inside a PDF. This rabbit hole goes deep.
    -
    -linuxpdf
    -
    -Like the Doom-in-PDF trick, this leans on the PDF runtime to host unconventional logic and rendering. It’s more of an art piece than a daily driver, but it shows how “document” formats can accidentally become platforms. The security posture of PDF viewers varies significantly, so expect inconsistent behaviour across different apps.
    -
    -

    24. SQLite loves Tcl


    -
    -SQLite was initially designed as a Tcl extension and still relies heavily on Tcl today: the amalgamated C source is generated by mksqlite3c.tcl, tests are written in Tcl, and even the documentation is built with it.
    +Hash and array state variables have been supported since state arrived in Perl 5.10. Scalar state variables were already supported previously.

    -Tcl 2017 paper
    +

    Rate limiting state



    -The famous single-file sqlite3.c is not hand-edited—developers maintain sources, plus build scripts that knit everything together deterministically. Their Tcl-centric tooling provides them with reproducible builds and a very opinionated workflow. It’s a great counterexample to the idea that “serious” projects must standardise on the most popular build stacks.
    +In Foostats, state variables store run-specific state without using package globals. state %blocked remembers IP hashes that already triggered the odd-request filter, and state $last_time and state %count track how many requests an IP makes in the exact second.

    -

    25. Fossil, “e”, and a Tcl/Tk chat


    +

    De-duplicated logging



    -The SQLite folks use a custom Tcl/Tk editor called “e”, a homegrown VCS (Fossil), and even a Tcl/Tk chat room for development—peak bespoke tooling.
    +state %dedup keeps the log output of the suspicious calls to one warning per URI. Early versions utilized global hashes for the same tasks, producing inconsistent results during tests. Switching to state removed those edge cases.

    -More details in the paper
    +

    Subroutine signatures



    -Fossil bundles source control, tickets, wiki, and a web UI into a single portable binary—no external services required. The “e” editor and chat complete a tight, integrated loop tailored to their team’s needs and constraints. It’s delightfully “boring tech” that has produced one of the most reliable databases on earth.
    +Perl now supports subroutine signatures like other modern languages do. Foostats uses them everywhere. Examples:

    -

    26. Kubernetes from an Excel spreadsheet


    + +
    # Old way
    +sub greet_old { my $name = shift; print "Hello, $name!\n" }
    +
    +# Another old way
    +sub greet_old2 ($) { my $name = shift; print "Hello, $name!\n" }
    +
    +# New way
    +sub greet ($name) { say "Hello, $name!"; }
    +
    +greet("Alice"); # prints "Hello, Alice!"
    +

    -Drive kubectl from an .xlsx file because clusters belong in spreadsheets, apparently.
    +In Foostats, constructors declare sub new ($class, $odds_file, $log_path), anonymous callbacks expose sub ($event), and helper subs list the values they expect, e.g.:

    -xlskubectl
    + +
    my $anon = sub ($name) {
    +    say "Hello, $name!";
    +};
    +
    +$anon->("World"); # prints "Hello, World!"
    +

    -Resources are rows; columns map to fields; the tool renders YAML and applies it for you. It’s oddly ergonomic for demos, audits, or letting non‑YAML‑native teammates propose changes. Obviously, be careful—permissions and review gates still matter even if your “IDE” is Excel.
    +

    Defined-or assignment for defaults without boilerplate



    -

    27. SRE means “Sorry…”


    +The operator //= keeps configuration and counters simple. Environment variables may be missing when CRON runs the script, so //=, combined with signatures, sets defaults without warnings. Example use of that operator:

    -An industry joke (or truth?) that SRE (short for Site Reliability Engineer) stands for “Sorry…”.
    + +
    my $foo;
    +$foo //= 42;
    +say $foo; # prints 42
    +
    +$foo //= 99;
    +say $foo; # still prints 42, because $foo was already defined
    +

    -Anecdotes are a good reminder that failure is inevitable and empathy is essential. The best takeaways are about clear communication, graceful degradation, and blameless postmortems. Laughing helps, but guardrails and good on‑call hygiene help even more.
    +

    Cleanup with defer



    -

    28. Touch Grass, the app


    +Even though not used in Foostats, this feature (similar to Go's defer) is neat to have in Perl now.

    -When screens consume too much, this site/app nudges you to go outside.
    +The defer block (use feature 'defer") schedules a piece of code to run when the current scope exits, regardless of how it exits (e.g. normal return, exception). This is perfect for ensuring resources, such as file handles, are closed.

    -Touch grass
    + +
    use feature qw(defer);
    +
    +sub parse_log_file ($path) {
    +    open my $fh, '<', $path or die "Cannot open $path: $!";
    +    defer { close $fh };
    +
    +    while (my $line = <$fh>) {
    +        # ... parsing logic that might throw an exception ...
    +    }
    +    # $fh is automatically closed here
    +}
    +

    -It’s simple and playful—sometimes that’s the nudge you need to break doomscroll loops. Treat it like a micro‑ritual: set a reminder, step outside, reset. Your eyes (and nervous system) will thank you.
    +This pattern replaces manual close calls in every exit path of the subroutine and is more robust than relying solely on object destructors.

    -

    29. Blogging with the C preprocessor


    +

    Builtins and booleans



    -Use the C preprocessor to assemble a blog. It shouldn’t work this well—and yet.
    +The script also utilizes other modern additions that often go unnoticed. use builtin qw(true false); combined with experimental::builtin provides more real boolean values.

    -Macroblog with cpp
    +

    Conclusion



    -Posts are stitched together with #includes and macros, giving you DRY content blocks and repeatable builds. It’s hacky, fast, and delightfully text‑only—perfect for people who think makefiles are a UI. Would I recommend it for everyone? No. Is it charming and effective? Absolutely.
    +I want to code more in Perl again. The newer features make it a joy to write small scripts like Foostats. If you haven't looked at Perl in a while, give it another try! The main thing which holds me back from writing more Perl is the lack of good tooling. For example, there is no proper LSP and tree sitter support available, which would work as good as the ones available for Go and Ruby.

    -

    30. Accidentally Turing-complete


    +A reader pointed out that there's now a third-party Perl Tree-sitter implementation one could use:

    -A delightful catalogue of systems that unintentionally become Turing-complete.
    +https://github.com/tree-sitter-perl/tree-sitter-perl

    -Accidentally Turing-complete
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Give a system conditionals, state, and unbounded composition, and it often crosses the threshold into general computation—whether that was the goal or not. The list includes items such as CSS, regular expression dialects, and even card games. It’s a fun lens for understanding why “just a configuration language” can get complicated fast.
    +Other related posts are:

    -I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)
    +2025-11-02 Perl New Features and Foostats (You are currently reading this)
    +2023-05-01 Unveiling guprecords.raku: Global Uptime Records with Raku
    +2022-05-27 Perl is still a great choice
    +2011-05-07 Perl Daemon (Service Framework)
    +2008-06-26 Perl Poetry

    Back to the main site
    - Local LLM for Coding with Ollama on macOS - - gemini://foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.gmi - 2025-08-04T16:43:39+03:00 + Key Takeaways from The Well-Grounded Rubyist + + gemini://foo.zone/gemfeed/2025-10-11-key-takeaways-from-the-well-grounded-rubyist.gmi + 2025-10-11T15:25:14+03:00 Paul Buetow aka snonux paul@dev.buetow.org - With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama. + Some time ago, I wrote about my journey into Ruby and how 'The Well-Grounded Rubyist' helped me to get a better understanding of the language. I took a lot of notes while reading the book, and I think it's time to share some of them. This is not a comprehensive review, but rather a collection of interesting tidbits and concepts that stuck with me.
    -

    Local LLM for Coding with Ollama on macOS


    +

    Key Takeaways from The Well-Grounded Rubyist



    -Published at 2025-08-04T16:43:39+03:00
    +Published at 2025-10-11T15:25:14+03:00

    -
    -      [::]
    -     _|  |_
    -   /  o  o  \                       |
    -  |    ∆    |  <-- Ollama          / \
    -  |  \___/  |                     /   \
    -   \_______/             LLM --> / 30B \
    -    |     |                     / Qwen3 \
    -   /|     |\                   /  Coder  \
    -  /_|     |_\_________________/ quantised \
    -
    +Some time ago, I wrote about my journey into Ruby and how "The Well-Grounded Rubyist" helped me to get a better understanding of the language. I took a lot of notes while reading the book, and I think it's time to share some of them. This is not a comprehensive review, but rather a collection of interesting tidbits and concepts that stuck with me.

    Table of Contents




    -With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama.
    +My first post about the book.

    -Ollama is a powerful tool that brings local AI capabilities directly to your local hardware. By running AI models locally, you can enjoy the benefits of intelligent assistance without relying on cloud services. This document outlines my initial setup and experiences with Ollama, with a focus on coding tasks and agentic coding.
    +

    -https://ollama.com/
    +

    The Object Model



    -

    Why Local LLMs?


    +One of the most fascinating aspects of Ruby is its object model. The book does a great job of explaining the details.

    -Using local AI models through Ollama offers several advantages:
    +

    Everything is an object (almost)



    -
      -
    • Data Privacy: Keep your code and data completely private by processing everything locally.
    • -
    • Cost-Effective: Reduce reliance on expensive cloud API calls.
    • -
    • Reliability: Works seamlessly even with spotty internet or offline.
    • -
    • Speed: Avoid network latency and enjoy instant responses while coding. Although I mostly found Ollama slower than commercial LLM providers. However, that may change with the evolution of models and hardware.
    • -

    -

    Hardware Considerations


    +In Ruby, most things are objects. This includes numbers, strings, and even classes themselves. This has some interesting consequences. For example, you can't use i++ like in C or Java. Integers are immutable objects. 1 is always the same object. 1 + 1 returns a new object, 2.

    -Running large language models locally is currently limited by consumer hardware capabilities:
    +

    The self keyword



    -
      -
    • GPU Memory: Most consumer-grade GPUs (even in 2025) top out at 16–24GB of VRAM, making it challenging to run larger models like the 30B (30 billion) parameter LLMs (they go up to the 100 billion and more).
    • -
    • RAM Constraints: On my MacBook Pro with M3 CPU and 36GB RAM, I chose a 14B model (qwen2.5-coder:14b-instruct) as it represents a practical balance between capability and resource requirements.
    • -

    -For reference, here are some key points about running large LLMs locally:
    +There is always a current object, self. If you call a method without an explicit receiver, it's called on self. For example, puts "hello" is actually self.puts "hello".

    -
      -
    • Models larger than 30B: I don't even think about running them locally. One (e.g. from Qwen, Deepseek or Kimi K2) with several hundred billion parameters could match the "performance" of commercial LLMs (Claude Sonnet 4, etc). Still, for personal use, the hardware demands are just too high (or temporarily "rent" it via the public cloud?).
    • -
    • 30B models: Require at least 48GB of GPU VRAM for full inference without quantisation. Currently only feasible on high-end professional GPUs (or an Apple-silicone Mac with enough unified RAM).
    • -
    • 14B models: Can run with 16-24GB GPU memory (VRAM), suitable for consumer-grade hardware (or use a quantised larger model)
    • -
    • 7B-13B models: Best fit for mainstream consumer hardware, requiring minimal VRAM and running smoothly on mid-range GPUs, but with limited capabilities compared to larger models and more hallucinations.
    • -

    -The model I'll be mainly using in this blog post (qwen2.5-coder:14b-instruct) is particularly interesting as:
    + +
    # At the top level, self is the main object
    +p self
    +# => main
    +p self.class
    +# => Object
    +
    +def foo
    +  # Inside a method, self is the object that received the call
    +  p self
    +end
    +
    +foo
    +# => main
    +

    -
      -
    • instruct: Indicates this is the instruction-tuned variant, optimised for diverse tasks including coding
    • -
    • coder: Tells me that this model was trained on a mix of code and text data, making it especially effective for programming assistance
    • -

    -https://ollama.com/library/qwen2.5-coder
    -https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct
    +This code demonstrates how self changes depending on the context. At the top level, it's main, an instance of Object. When foo is called without a receiver, it's called on main.

    -For general thinking tasks, I found deepseek-r1:14b to be useful (in the future, I also want to try other qwen models here). For instance, I utilised deepseek-r1:14b to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within a minute or so.
    +

    Singleton Methods


    +
    +You can add methods to individual objects. These are called singleton methods.
    +
    + +
    obj = "a string"
    +
    +def obj.shout
    +  self.upcase + "!"
    +end
    +
    +p obj.shout
    +# => "A STRING!"
    +
    +obj2 = "another string"
    +# obj2.shout would raise a NoMethodError
    +
    +
    +Here, the shout method is only available on the obj object. This is a powerful feature for adding behavior to specific instances.
    +
    +

    Classes are Objects


    +
    +Classes themselves are objects, instances of the Class class. This means you can create classes dynamically.
    +
    + +
    MyClass = Class.new do
    +  def say_hello
    +    puts "Hello from a dynamically created class!"
    +  end
    +end
    +
    +instance = MyClass.new
    +instance.say_hello
    +# => Hello from a dynamically created class!
    +
    +
    +This shows how to create a new class and assign it to a constant. This is what happens behind the scenes when you use the class keyword.
    +
    +

    Control Flow and Methods


    +
    +The book clarified many things about how methods and control flow work in Ruby.
    +
    +

    case and the === operator


    +
    +The case statement is more powerful than I thought. It uses the === (threequals or case equality) operator for comparison, not ==. Different classes can implement === in their own way.
    +
    + +
    # For ranges, it checks for inclusion
    +p (1..5) === 3 # => true
    +
    +# For classes, it checks if the object is an instance of the class
    +p String === "hello" # => true
    +
    +# For regexes, it checks for a match
    +p /llo/ === "hello" # => true
    +
    +def check(value)
    +  case value
    +  when String
    +    "It's a string"
    +  when (1..10)
    +    "It's a number between 1 and 10"
    +  else
    +    "Something else"
    +  end
    +end
    +
    +p check(5) # => "It's a number between 1 and 10"
    +
    +
    +

    Blocks and yield


    +
    +Blocks are a cornerstone of Ruby. You can pass them to methods to customize their behavior. The yield keyword is used to call the block.
    +
    + +
    def my_iterator
    +  puts "Entering the method"
    +  yield
    +  puts "Back in the method"
    +  yield
    +end
    +
    +my_iterator { puts "Inside the block" }
    +# Entering the method
    +# Inside the block
    +# Back in the method
    +# Inside the block
    +
    +
    +This simple iterator shows how yield transfers control to the block. You can also pass arguments to yield and get a return value from the block.
    +
    + +
    def with_return
    +  result = yield(5)
    +  puts "The block returned #{result}"
    +end
    +
    +with_return { |n| n * 2 }
    +# => The block returned 10
    +

    -https://ollama.com/library/deepseek-r1:14b
    -https://huggingface.co/deepseek-ai/DeepSeek-R1
    +This demonstrates passing an argument to the block and using its return value.

    -A quantised (as mentioned above) LLM which has been converted from high-precision connection (typically 16- or 32-bit floating point) representations to lower-precision formats, such as 8-bit integers. This reduces the overall memory footprint of the model, making it significantly smaller and enabling it to run more efficiently on hardware with limited resources or to allow higher throughput on GPUs and CPUs. The benefits of quantisation include reduced storage and faster inference times due to simpler computations and better memory bandwidth utilisation. However, quantisation can introduce a drop in model accuracy because the lower numerical precision means the model cannot represent parameter values as precisely. In some cases, it may lead to instability or unexpected outputs in specific tasks or edge cases.
    +

    Fun with Data Types



    -

    Basic Setup and Manual Code Prompting


    +Ruby's core data types are full of nice little features.

    -

    Installing Ollama and a Model


    +

    Symbols



    -To install Ollama, performed these steps (this assumes that you have already installed Homebrew on your macOS system):
    +Symbols are like immutable strings. They are great for keys in hashes because they are unique and memory-efficient.

    -
    brew install ollama
    -rehash
    -ollama serve
    +
    # Two strings with the same content are different objects
    +p "foo".object_id
    +p "foo".object_id
    +
    +# Two symbols with the same content are the same object
    +p :foo.object_id
    +p :foo.object_id
    +
    +# Modern hash syntax uses symbols as keys
    +my_hash = { name: "Paul", language: "Ruby" }
    +p my_hash[:name] # => "Paul"
     

    -Which started up the Ollama server with something like this (the screenshots shows already some requests made):
    +This code highlights the difference between strings and symbols and shows the convenient hash syntax.

    -Ollama serving
    +

    Arrays and Hashes



    -And then, in a new terminal, I pulled the model with:
    +Arrays and hashes have a rich API. The %w and %i shortcuts for creating arrays of strings and symbols are very handy.

    -
    ollama pull qwen2.5-coder:14b-instruct
    +
    # Array of strings
    +p %w[one two three]
    +# => ["one", "two", "three"]
    +
    +# Array of symbols
    +p %i[one two three]
    +# => [:one, :two, :three]
     

    -Now, I was ready to go! It wasn't so difficult. Now, let's see how I used this model for coding tasks.
    -
    -

    Example Usage


    -
    -I run the following command to get a Go function for calculating Fibonacci numbers:
    +A quick way to create arrays. You can also retrieve multiple values at once.

    -
    time echo "Write a function in golang to print out the Nth fibonacci number, \
    -  only the function without the boilerplate" | ollama run qwen2.5-coder:14b-instruct
    -
    -Output:
    -
    -func fibonacci(n int) int {
    -    if n <= 1 {
    -        return n
    -    }
    -    a, b := 0, 1
    -    for i := 2; i <= n; i++ {
    -        a, b = b, a+b
    -    }
    -    return b
    -}
    -
    -Execution Metrics:
    +
    arr = [10, 20, 30, 40, 50]
    +p arr.values_at(0, 2, 4)
    +# => [10, 30, 50]
     
    -Executed in    4.90 secs      fish           external
    -   usr time   15.54 millis    0.31 millis   15.24 millis
    -   sys time   19.68 millis    1.02 millis   18.66 millis
    +hash = { a: 1, b: 2, c: 3 }
    +p hash.values_at(:a, :c)
    +# => [1, 3]
     

    -Note, after having written this blog post, I tried the same with the newer model qwen3-coder:30b-a3b-q4_K_M (which "just" came out, and it's a quantised 30B model), and it was much faster:
    +The values_at method is a concise way to get multiple elements.

    -
    -Executed in    1.83 secs      fish           external
    -   usr time   17.82 millis    4.40 millis   13.42 millis
    -   sys time   17.07 millis    1.57 millis   15.50 millis
    -
    +

    Final Thoughts



    -https://ollama.com/library/qwen3-coder:30b-a3b-q4_K_M
    +These are just a few of the many things I learned from "The Well-Grounded Rubyist". The book gave me a much deeper appreciation for the language and its design. If you are a Ruby programmer, I highly recommend it. Meanwhile, I also read the book "Programming Ruby 3.3", just I didn't have time to process my notes there yet.

    -

    Agentic Coding with Aider


    +E-Mail your comments to paul@nospam.buetow.org :-)

    -

    Installation


    +Other Ruby-related posts:

    -Aider is a tool that enables agentic coding by leveraging AI models (also local ones, as in our case). While setting up OpenAI Codex and OpenCode with Ollama proved challenging (those tools either didn't know how to work with the "tools" (the capability to execute external commands or to edit files for example) or didn't connect at all to Ollama for some reason), Aider worked smoothly.
    +2026-03-02 RCM: The Ruby Configuration Management DSL
    +2025-10-11 Key Takeaways from The Well-Grounded Rubyist (You are currently reading this)
    +2021-07-04 The Well-Grounded Rubyist

    -To get started, the only thing I had to do was to install it via Homebrew, initialise a Git repository, and then start Aider with the Ollama model ollama_chat/qwen2.5-coder:14b-instruct:
    +Back to the main site
    +
    +
    +
    + + f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments + + gemini://foo.zone/gemfeed/2025-10-02-f3s-kubernetes-with-freebsd-part-7.gmi + 2025-10-02T11:27:19+03:00, last updated Tue 30 Dec 10:11:58 EET 2025 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the seventh blog post about the f3s series for my self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines. + +
    +

    f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments


    +
    +Published at 2025-10-02T11:27:19+03:00, last updated Tue 30 Dec 10:11:58 EET 2025
    +
    +This is the seventh blog post about the f3s series for my self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I use on FreeBSD-based physical machines.
    +
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +
    +f3s logo
    +
    +

    Table of Contents


    +
    +
    +

    Introduction


    +
    +In this blog post, I am finally going to install k3s (the Kubernetes distribution I use) to the whole setup and deploy the first workloads (helm charts, and a private registry) to it.
    +
    +https://k3s.io
    +
    +

    Important Note: GitOps Migration


    +
    +**Note:** After publishing this blog post, the f3s cluster was migrated from imperative Helm deployments to declarative GitOps using ArgoCD. The Kubernetes manifests and Helm charts in the repository have been reorganized for ArgoCD-based continuous deployment.
    +
    +**To view the exact manifests and charts as they existed when this blog post was written** (before the ArgoCD migration), check out the pre-ArgoCD revision:

    -
    brew install aider
    -mkdir -p ~/git/aitest && cd ~/git/aitest && git init
    -aider --model ollama_chat/qwen2.5-coder:14b-instruct
    +
    $ git clone https://codeberg.org/snonux/conf.git
    +$ cd conf
    +$ git checkout 15a86f3  # Last commit before ArgoCD migration
    +$ cd f3s/
     

    -https://aider.chat
    -https://opencode.ai
    -https://github.com/openai/codex
    +**Current master branch** contains the ArgoCD-managed versions with:
    +
      +
    • Application manifests organized under argocd-apps/{monitoring,services,infra,test}/
    • +
    • Additional resources under */manifests/ directories (e.g., prometheus/manifests/)
    • +
    • Justfiles updated to trigger ArgoCD syncs instead of direct Helm commands
    • +

    +The deployment concepts and architecture remain the same—only the deployment method changed from imperative (helm install/upgrade) to declarative (GitOps with ArgoCD).

    -

    Agentic coding prompt


    +

    Updating



    -This is the prompt I gave:
    +Before proceeding, I bring all systems involved up-to-date. On all three Rocky Linux 9 boxes r0, r1, and r2:

    -
    -Create a Go project with these files:
    -
    -* `cmd/aitest/main.go`: CLI entry point
    -* `internal/version.go`: Version information (0.0.0), should be printed when the
    -   program was started with `-version` flag
    -* `internal/count.go`: File counting functionality, the program should print out
    -   the number of files in a given subdirectory (the directory is provided as a
    -   command line flag with `-dir`), if none flag is given, no counting should be
    -   done
    -* `README.md`: Installation and usage instructions
    +
    +
    dnf update -y
    +reboot
     

    -It then generated something, but did not work out of the box, as it had some issues with the imports and package names. So I had to do some follow-up prompts to fix those issues with something like this:
    +On the FreeBSD hosts, I upgraded from FreeBSD 14.2 to 14.3-RELEASE, running this on all three hosts f0, f1 and f2:

    -
    -* Update import paths to match module name, github.com/yourname/aitest should be
    -  aitest in main.go
    -* The package names of internal/count.go and internal/version.go should be
    -  internal, and not count and version.
    +
    +
    paul@f0:~ % doas freebsd-update fetch
    +paul@f0:~ % doas freebsd-update install
    +paul@f0:~ % doas reboot
    +.
    +.
    +.
    +paul@f0:~ % doas freebsd-update -r 14.3-RELEASE upgrade
    +paul@f0:~ % doas freebsd-update install
    +paul@f0:~ % doas freebsd-update install
    +paul@f0:~ % doas reboot
    +.
    +.
    +.
    +paul@f0:~ % doas freebsd-update install
    +paul@f0:~ % doas pkg update
    +paul@f0:~ % doas pkg upgrade
    +paul@f0:~ % doas reboot
    +.
    +.
    +.
    +paul@f0:~ % uname -a
    +FreeBSD f0.lan.buetow.org 14.3-RELEASE FreeBSD 14.3-RELEASE
    +        releng/14.3-n271432-8c9ce319fef7 GENERIC amd64
     

    -Aider fixing the packages
    +

    Installing k3s



    -

    Compilation & Execution


    +

    Generating K3S_TOKEN and starting the first k3s node



    -Once done so, the project was ready and I could compile and run it:
    +I generated the k3s token on my Fedora laptop with pwgen -n 32 and selected one of the results. Then, on all three r hosts, I ran the following (replace SECRET_TOKEN with the actual secret):

    -
    go build cmd/aitest/main.go
    -./main -v
    -0.0.0
    -./main -dir .
    -Number of files in directory .: 4
    +
    [root@r0 ~]# echo -n SECRET_TOKEN > ~/.k3s_token
     

    -

    The code


    +The following steps are also documented on the k3s website:

    -The code it generated was simple, but functional. The ./cmd/aitest/main.go file:
    +https://docs.k3s.io/datastore/ha-embedded
    +
    +To bootstrap k3s on the first node, I ran this on r0:

    -
    package main
    -
    -import (
    -	"flag"
    -	"fmt"
    -	"os"
    -
    -	"aitest/internal"
    -)
    -
    -func main() {
    -	var versionFlag bool
    -	flag.BoolVar(&versionFlag, "v", false, "print version")
    -	dir := flag.String("dir", "", "directory to count files in")
    -	flag.Parse()
    -
    -	if versionFlag {
    -		fmt.Println(internal.GetVersion())
    -		return
    -	}
    -
    -	if *dir != "" {
    -		fileCount, err := internal.CountFiles(*dir)
    -		if err != nil {
    -			fmt.Fprintf(os.Stderr, "Error counting files: %v\n", err)
    -			os.Exit(1)
    -		}
    -		fmt.Printf("Number of files in directory %s: %d\n", *dir, fileCount)
    -	} else {
    -		fmt.Println("No directory specified. No count given.")
    -	}
    -}
    +
    [root@r0 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    +        sh -s - server --cluster-init \
    +        --node-ip=192.168.2.120 \
    +        --advertise-address=192.168.2.120 \
    +        --tls-san=r0.wg0.wan.buetow.org
    +[INFO]  Finding release for channel stable
    +[INFO]  Using v1.32.6+k3s1 as release
    +.
    +.
    +.
    +[INFO]  systemd: Starting k3s
     

    -The ./internal/version.go file:
    +Note: The --node-ip and --advertise-address flags are important to ensure that the embedded etcd cluster communicates over the WireGuard interface (192.168.2.x) rather than the LAN interface (192.168.1.x). This ensures that all control plane traffic is encrypted via WireGuard.
    +
    +

    Adding the remaining nodes to the cluster


    +
    +Then I ran on the other two nodes r1 and r2:

    -
    package internal
    +
    [root@r1 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    +        sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
    +        --node-ip=192.168.2.121 \
    +        --advertise-address=192.168.2.121 \
    +        --tls-san=r1.wg0.wan.buetow.org
     
    -var Version = "0.0.0"
    +[root@r2 ~]# curl -sfL https://get.k3s.io | K3S_TOKEN=$(cat ~/.k3s_token) \
    +        sh -s - server --server https://r0.wg0.wan.buetow.org:6443 \
    +        --node-ip=192.168.2.122 \
    +        --advertise-address=192.168.2.122 \
    +        --tls-san=r2.wg0.wan.buetow.org
    +.
    +.
    +.
     
    -func GetVersion() string {
    -	return Version
    -}
     

    -The ./internal/count.go file:
    +Once done, I had a three-node Kubernetes cluster control plane:

    -
    package internal
    -
    -import (
    -	"os"
    -)
    -
    -func CountFiles(dir string) (int, error) {
    -	files, err := os.ReadDir(dir)
    -	if err != nil {
    -		return 0, err
    -	}
    -
    -	count := 0
    -	for _, file := range files {
    -		if !file.IsDir() {
    -			count++
    -		}
    -	}
    +
    [root@r0 ~]# kubectl get nodes
    +NAME                STATUS   ROLES                       AGE     VERSION
    +r0.lan.buetow.org   Ready    control-plane,etcd,master   4m44s   v1.32.6+k3s1
    +r1.lan.buetow.org   Ready    control-plane,etcd,master   3m13s   v1.32.6+k3s1
    +r2.lan.buetow.org   Ready    control-plane,etcd,master   30s     v1.32.6+k3s1
     
    -	return count, nil
    -}
    +[root@r0 ~]# kubectl get pods --all-namespaces
    +NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
    +kube-system   coredns-5688667fd4-fs2jj                  1/1     Running     0          5m27s
    +kube-system   helm-install-traefik-crd-f9hgd            0/1     Completed   0          5m27s
    +kube-system   helm-install-traefik-zqqqk                0/1     Completed   2          5m27s
    +kube-system   local-path-provisioner-774c6665dc-jqlnc   1/1     Running     0          5m27s
    +kube-system   metrics-server-6f4c6675d5-5xpmp           1/1     Running     0          5m27s
    +kube-system   svclb-traefik-411cec5b-cdp2l              2/2     Running     0          78s
    +kube-system   svclb-traefik-411cec5b-f625r              2/2     Running     0          4m58s
    +kube-system   svclb-traefik-411cec5b-twrd7              2/2     Running     0          4m2s
    +kube-system   traefik-c98fdf6fb-lt6fx                   1/1     Running     0          4m58s
     

    -The code is quite straightforward, especially for generating boilerplate code this will be useful for many use cases!
    -
    -

    In-Editor Code Completion


    -
    -To leverage Ollama for real-time code completion in my editor, I have integrated it with Helix, my preferred text editor. Helix supports the LSP (Language Server Protocol), which enables advanced code completion features. The lsp-ai is an LSP server that can interface with Ollama models for code completion tasks.
    +In order to connect with kubectl from my Fedora laptop, I had to copy /etc/rancher/k3s/k3s.yaml from r0 to ~/.kube/config and then replace the value of the server field with r0.lan.buetow.org. kubectl can now manage the cluster. Note that this step has to be repeated when I want to connect to another node of the cluster (e.g. when r0 is down).

    -https://helix-editor.com
    -https://github.com/SilasMarvin/lsp-ai
    +

    Test deployments



    -

    Installation of lsp-ai


    +

    Test deployment to Kubernetes



    -I installed lsp-ai via Rust's Cargo package manager. (If you don't have Rust installed, you can install it via Homebrew as well.):
    +Let's create a test namespace:

    -
    cargo install lsp-ai
    -
    -
    -

    Helix Configuration


    -
    -I edited ~/.config/helix/languages.toml to include:
    -
    -
    -[[language]]
    -name = "go"
    -auto-format= true
    -diagnostic-severity = "hint"
    -formatter = { command = "goimports" }
    -language-servers = [ "gopls", "golangci-lint-lsp", "lsp-ai", "gpt" ]
    +
    > ~ kubectl create namespace test
    +namespace/test created
    +
    +> ~ kubectl get namespaces
    +NAME              STATUS   AGE
    +default           Active   6h11m
    +kube-node-lease   Active   6h11m
    +kube-public       Active   6h11m
    +kube-system       Active   6h11m
    +test              Active   5s
    +
    +> ~ kubectl config set-context --current --namespace=test
    +Context "default" modified.
     

    -Note that there is also a gpt language server configured, which is for GitHub Copilot, but it is out of scope of this blog post. Let's also configure lsp-ai settings in the same file:
    +And let's also create an Apache test pod:

    -
    -[language-server.lsp-ai]
    -command = "lsp-ai"
    -
    -[language-server.lsp-ai.config.memory]
    -file_store = { }
    -
    -[language-server.lsp-ai.config.models.model1]
    -type = "ollama"
    -model =  "qwen2.5-coder"
    -
    -[language-server.lsp-ai.config.models.model2]
    -type = "ollama"
    -model = "mistral-nemo:latest"
    -
    -[language-server.lsp-ai.config.models.model3]
    -type = "ollama"
    -model = "deepseek-r1:14b"
    -
    -[language-server.lsp-ai.config.completion]
    -model = "model1"
    -
    -[language-server.lsp-ai.config.completion.parameters]
    -max_tokens = 64
    -max_context = 8096
    -
    -## Configure the messages per your needs
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "system"
    -content = "Instructions:\n- You are an AI programming assistant.\n- Given a
    -piece of code with the cursor location marked by \"<CURSOR>\", replace
    -\"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n
    -- Describe your plan for what to build in pseudocode, written out in great
    -detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your
    -completion fits within the language context of the provided code snippet (e.g.,
    -Go, Ruby, Bash, Java, Puppet DSL).\n\nRules:\n- Only respond with code or
    -comments.\n- Only replace \"<CURSOR>\"; do not include any previously written
    -code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within
    -a comment, complete the comment meaningfully.\n- Handle ambiguous cases by
    -providing the most contextually appropriate completion.\n- Be consistent with
    -your responses."
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "func greet(name) {\n    print(f\"Hello, {<CURSOR>}\")\n}"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "assistant"
    -content = "name"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "func sum(a, b) {\n    return a + <CURSOR>\n}"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "assistant"
    -content = "b"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "func multiply(a, b int ) int {\n    a * <CURSOR>\n}"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "assistant"
    -content = "b"
    -
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "// <CURSOR>\nfunc add(a, b) {\n    return a + b\n}"
    +
    +
    > ~ cat <<END > apache-deployment.yaml
    +# Apache HTTP Server Deployment
    +apiVersion: apps/v1
    +kind: Deployment
    +metadata:
    +  name: apache-deployment
    +spec:
    +  replicas: 1
    +  selector:
    +    matchLabels:
    +      app: apache
    +  template:
    +    metadata:
    +      labels:
    +        app: apache
    +    spec:
    +      containers:
    +      - name: apache
    +        image: httpd:latest
    +        ports:
    +        # Container port where Apache listens
    +        - containerPort: 80
    +END
     
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "assistant"
    -content = "Adds two numbers"
    +> ~ kubectl apply -f apache-deployment.yaml
    +deployment.apps/apache-deployment created
     
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "// This function checks if a number is even\n<CURSOR>"
    +> ~ kubectl get all
    +NAME                                     READY   STATUS    RESTARTS   AGE
    +pod/apache-deployment-5fd955856f-4pjmf   1/1     Running   0          7s
     
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "assistant"
    -content = "func is_even(n) {\n    return n % 2 == 0\n}"
    +NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
    +deployment.apps/apache-deployment   1/1     1            1           7s
     
    -[[language-server.lsp-ai.config.completion.parameters.messages]]
    -role = "user"
    -content = "{CODE}"
    +NAME                                           DESIRED   CURRENT   READY   AGE
    +replicaset.apps/apache-deployment-5fd955856f   1         1         1       7s
     

    -As you can see, I have also added other models, such as Mistral Nemo and DeepSeek R1, so that I can switch between them in Helix. Other than that, the completion parameters are interesting. They define how the LLM should interact with the text in the text editor based on the given examples.
    -
    -If you want to see more lsp-ai configuration examples, they are some for Vim and Helix in the lsp-ai git repository!
    -
    -

    Code completion in action


    -
    -The screenshot shows how Ollama's qwen2.5-coder model provides code completion suggestions within the Helix editor. LSP auto-completion is triggered by leaving the cursor at position <CURSOR> for a short period in the code snippet, and Ollama responds with relevant completions based on the context.
    -
    -Completing the fib-function
    -
    -In the LSP auto-completion, the one prefixed with ai - was generated by qwen2.5-coder, the other ones are from other LSP servers (GitHub Copilot, Go linter, Go language server, etc.).
    -
    -I found GitHub Copilot to be still faster than qwen2.5-coder:14b, but the local LLM one is actually workable for me already. And, as mentioned earlier, things will likely improve in the future regarding local LLMs. So I am excited about the future of local LLMs and coding tools like Ollama and Helix.
    -
    -After trying qwen3-coder:30b-a3b-q4_K_M (following the publication of this blog post), I found it to be significantly faster and more capable than the previous model, making it a promising option for local coding tasks. Honestly, even my current local setup already handles routine coding stuff pretty well—better than I expected.
    -
    -

    Conclusion


    -
    -Will there ever be a time we can run larger models (60B, 100B, ...and larger) on consumer hardware, or even on our phones? We are not quite there yet, but I am optimistic that we will see improvements in the next few years. As hardware capabilities improve and/or become cheaper, and more efficient models are developed (or new techniques will be invented to make language models more effective), the landscape of local AI coding assistants will continue to evolve.
    -
    -For now, even the models listed in this blog post are very promising already, and they run on consumer-grade hardware (at least in the realm of the initial tests I've performed... the ones in this blog post are overly simplistic, though! But they were good for getting started with Ollama and initial demonstration)! I will continue experimenting with Ollama and other local LLMs to see how they can enhance my coding experience. I may cancel my Copilot subscription, which I currently use only for in-editor auto-completion, at some point.
    -
    -However, truth be told, I don't think the setup described in this blog post currently matches the performance of commercial models like Claude Code (Sonnet 4, Opus 4), Gemini 2.5 Pro, the OpenAI models and others. Maybe we could get close if we had the high-end hardware needed to run the largest Qwen Coder model available. But, as mentioned already, that is out of reach for occasional coders like me. Furthermore, I want to continue coding manually to some degree, as otherwise I will start to forget how to write for-loops, which would be awkward... However, do we always need the best model when AI can help generate boilerplate or repetitive tasks even with smaller models?
    -
    -E-Mail your comments to paul@nospam.buetow.org :-)
    -
    -Other related posts are:
    -
    -2025-08-05 Local LLM for Coding with Ollama on macOS (You are currently reading this)
    -2025-06-22 Task Samurai: An agentic coding learning experiment
    -
    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 6: Storage - - gemini://foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.gmi - 2025-07-13T16:44:29+03:00, last updated Wed 19 Mar 2026 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the sixth blog post about the f3s series for self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 6: Storage


    -
    -Published at 2025-07-13T16:44:29+03:00, last updated Wed 19 Mar 2026
    -
    -This is the sixth blog post about the f3s series for self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.
    -
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +Let's also create a service:

    -f3s logo
    + +
    > ~ cat <<END > apache-service.yaml
    +apiVersion: v1
    +kind: Service
    +metadata:
    +  labels:
    +    app: apache
    +  name: apache-service
    +spec:
    +  ports:
    +    - name: web
    +      port: 80
    +      protocol: TCP
    +      # Expose port 80 on the service
    +      targetPort: 80
    +  selector:
    +  # Link this service to pods with the label app=apache
    +    app: apache
    +END
    +
    +> ~ kubectl apply -f apache-service.yaml
    +service/apache-service created
    +
    +> ~ kubectl get service
    +NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
    +apache-service   ClusterIP   10.43.249.165   <none>        80/TCP    4s
    +

    -

    Table of Contents


    +Now let's create an ingress:

    -
    -

    Introduction


    +Note: I've modified the hosts listed in this example after I published this blog post to ensure that there aren't any bots scraping it.

    -In the previous posts, we set up a WireGuard mesh network. In the future, we will also set up a Kubernetes cluster. Kubernetes workloads often require persistent storage for databases, configuration files, and application data. Local storage on each node has significant limitations:
    + +
    > ~ cat <<END > apache-ingress.yaml
    +
    +apiVersion: networking.k8s.io/v1
    +kind: Ingress
    +metadata:
    +  name: apache-ingress
    +  namespace: test
    +  annotations:
    +    spec.ingressClassName: traefik
    +    traefik.ingress.kubernetes.io/router.entrypoints: web
    +spec:
    +  rules:
    +    - host: f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +    - host: standby.f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +    - host: www.f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +END
    +
    +> ~ kubectl apply -f apache-ingress.yaml
    +ingress.networking.k8s.io/apache-ingress created
    +
    +> ~ kubectl describe ingress
    +Name:             apache-ingress
    +Labels:           <none>
    +Namespace:        test
    +Address:          192.168.2.120,192.168.2.121,192.168.2.122
    +Ingress Class:    traefik
    +Default backend:  <default>
    +Rules:
    +  Host                    Path  Backends
    +  ----                    ----  --------
    +  f3s.foo.zone
    +                          /   apache-service:80 (10.42.1.11:80)
    +  standby.f3s.foo.zone
    +                          /   apache-service:80 (10.42.1.11:80)
    +  www.f3s.foo.zone
    +                          /   apache-service:80 (10.42.1.11:80)
    +Annotations:              spec.ingressClassName: traefik
    +                          traefik.ingress.kubernetes.io/router.entrypoints: web
    +Events:                   <none>
    +

    -
      -
    • No data sharing: Pods (once we run Kubernetes) on different nodes can't access the same data
    • -
    • Pod mobility: If a pod moves to another node, it loses access to its data
    • -
    • No redundancy: Hardware failure means data loss
    • -

    -This post implements a robust storage solution using:
    +Notes:

      -
    • CARP: For high availability with automatic IP failover
    • -
    • NFS over stunnel: For secure, encrypted network storage
    • -
    • ZFS: For data integrity, encryption, and efficient snapshots
    • -
    • zrepl: For continuous ZFS replication between nodes
    • +
    • In the ingress, I use plain HTTP (web) for the Traefik rule, as all the "production" traffic will be routed through a WireGuard tunnel anyway, as I will show later.

    -The result is a highly available, encrypted storage system that survives node failures while providing shared storage to all Kubernetes pods.
    -
    -Other than what was mentioned in the first post of this blog series, we aren't using HAST, but zrepl for data replication. Read more about it later in this blog post.
    -
    -

    Additional storage capacity


    -
    -We add 1 TB of additional storage to each of the nodes (f0, f1, f2) in the form of an SSD drive. The Beelink mini PCs have enough space in the chassis for the extra space.
    +So I tested the Apache web server through the ingress rule:

    -
    + +
    > ~ curl -H "Host: www.f3s.foo.zone" http://r0.lan.buetow.org:80
    +<html><body><h1>It works!</h1></body></html>
    +

    -Upgrading the storage was as easy as unscrewing, plugging the drive in, and then screwing it back together again. The procedure was uneventful! We're using two different SSD models (Samsung 870 EVO and Crucial BX500) to avoid simultaneous failures from the same manufacturing batch.
    +

    Test deployment with persistent volume claim



    -We then create the zdata ZFS pool on all three nodes:
    +Next, I modified the Apache example to serve the htdocs directory from the NFS share I created in the previous blog post. I used the following manifests. Most of them are the same as before, except for the persistent volume claim and the volume mount in the Apache deployment.

    -
    paul@f0:~ % doas zpool create -m /data zdata /dev/ada1
    -paul@f0:~ % zpool list
    -NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    -zdata   928G  12.1M   928G        -         -     0%     0%  1.00x    ONLINE  -
    -zroot   472G  29.0G   443G        -         -     0%     6%  1.00x    ONLINE  -
    +
    > ~ cat <<END > apache-deployment.yaml
    +# Apache HTTP Server Deployment
    +apiVersion: apps/v1
    +kind: Deployment
    +metadata:
    +  name: apache-deployment
    +  namespace: test
    +spec:
    +  replicas: 2
    +  selector:
    +    matchLabels:
    +      app: apache
    +  template:
    +    metadata:
    +      labels:
    +        app: apache
    +    spec:
    +      containers:
    +      - name: apache
    +        image: httpd:latest
    +        ports:
    +        # Container port where Apache listens
    +        - containerPort: 80
    +        readinessProbe:
    +          httpGet:
    +            path: /
    +            port: 80
    +          initialDelaySeconds: 5
    +          periodSeconds: 10
    +        livenessProbe:
    +          httpGet:
    +            path: /
    +            port: 80
    +          initialDelaySeconds: 15
    +          periodSeconds: 10
    +        volumeMounts:
    +        - name: apache-htdocs
    +          mountPath: /usr/local/apache2/htdocs/
    +      volumes:
    +      - name: apache-htdocs
    +        persistentVolumeClaim:
    +          claimName: example-apache-pvc
    +END
     
    -paul@f0:/ % doas camcontrol devlist
    -<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    -<Samsung SSD 870 EVO 1TB SVT03B6Q>  at scbus1 target 0 lun 0 (pass1,ada1)
    -paul@f0:/ %
    +> ~ cat <<END > apache-ingress.yaml
    +apiVersion: networking.k8s.io/v1
    +kind: Ingress
    +metadata:
    +  name: apache-ingress
    +  namespace: test
    +  annotations:
    +    spec.ingressClassName: traefik
    +    traefik.ingress.kubernetes.io/router.entrypoints: web
    +spec:
    +  rules:
    +    - host: f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +    - host: standby.f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +    - host: www.f3s.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: apache-service
    +                port:
    +                  number: 80
    +END
    +
    +> ~ cat <<END > apache-persistent-volume.yaml
    +apiVersion: v1
    +kind: PersistentVolume
    +metadata:
    +  name: example-apache-pv
    +spec:
    +  capacity:
    +    storage: 1Gi
    +  volumeMode: Filesystem
    +  accessModes:
    +    - ReadWriteOnce
    +  persistentVolumeReclaimPolicy: Retain
    +  hostPath:
    +    path: /data/nfs/k3svolumes/example-apache-volume-claim
    +    type: Directory
    +---
    +apiVersion: v1
    +kind: PersistentVolumeClaim
    +metadata:
    +  name: example-apache-pvc
    +  namespace: test
    +spec:
    +  storageClassName: ""
    +  accessModes:
    +    - ReadWriteOnce
    +  resources:
    +    requests:
    +      storage: 1Gi
    +END
    +
    +> ~ cat <<END > apache-service.yaml
    +apiVersion: v1
    +kind: Service
    +metadata:
    +  labels:
    +    app: apache
    +  name: apache-service
    +  namespace: test
    +spec:
    +  ports:
    +    - name: web
    +      port: 80
    +      protocol: TCP
    +      # Expose port 80 on the service
    +      targetPort: 80
    +  selector:
    +  # Link this service to pods with the label app=apache
    +    app: apache
    +END
     

    -To verify that we have a different SSD on the second node (the third node has the same drive as the first):
    +I applied the manifests:

    -
    paul@f1:/ % doas camcontrol devlist
    -<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    -<CT1000BX500SSD1 M6CR072>          at scbus1 target 0 lun 0 (pass1,ada1)
    +
    > ~ kubectl apply -f apache-persistent-volume.yaml
    +> ~ kubectl apply -f apache-service.yaml
    +> ~ kubectl apply -f apache-deployment.yaml
    +> ~ kubectl apply -f apache-ingress.yaml
     

    -

    ZFS encryption keys


    -
    -ZFS native encryption requires encryption keys to unlock datasets. We need a secure method to store these keys that balances security with operational needs:
    -
    -
      -
    • Security: Keys must not be stored on the same disks they encrypt
    • -
    • Availability: Keys must be available at boot for automatic mounting
    • -
    • Portability: Keys should be easily moved between systems for recovery
    • -

    -Using USB flash drives as hardware key storage provides a convenient and elegant solution. The encrypted data is unreadable without physical access to the USB key, protecting against disk theft or improper disposal. In production environments, you may use enterprise key management systems; however, for a home lab, USB keys offer good security with minimal complexity.
    -
    -

    UFS on USB keys


    +Looking at the deployment, I could see it failed because the directory didn't exist yet on the NFS share (note that I also increased the replica count to 2 so if one node goes down there's already a replica running on another node for faster failover):

    -We'll format the USB drives with UFS (Unix File System) rather than ZFS for simplicity. There is no need to use ZFS.
    + +
    > ~ kubectl get pods
    +NAME                                 READY   STATUS              RESTARTS   AGE
    +apache-deployment-5b96bd6b6b-fv2jx   0/1     ContainerCreating   0          9m15s
    +apache-deployment-5b96bd6b6b-ax2ji   0/1     ContainerCreating   0          9m15s
    +
    +> ~ kubectl describe pod apache-deployment-5b96bd6b6b-fv2jx | tail -n 5
    +Events:
    +  Type     Reason       Age                   From               Message
    +  ----     ------       ----                  ----               -------
    +  Normal   Scheduled    9m34s                 default-scheduler  Successfully
    +    assigned test/apache-deployment-5b96bd6b6b-fv2jx to r2.lan.buetow.org
    +  Warning  FailedMount  80s (x12 over 9m34s)  kubelet            MountVolume.SetUp
    +    failed for volume "example-apache-pv" : hostPath type check failed:
    +    /data/nfs/k3svolumes/example-apache is not a directory
    +

    -Let's see the USB keys:
    +That's intentional—I needed to create the directory on the NFS share first, so I did that (e.g. on r0):

    -USB keys
    + +
    [root@r0 ~]# mkdir /data/nfs/k3svolumes/example-apache-volume-claim/
    +
    +[root@r0 ~]# cat <<END > /data/nfs/k3svolumes/example-apache-volume-claim/index.html
    +<!DOCTYPE html>
    +<html>
    +<head>
    +  <title>Hello, it works</title>
    +</head>
    +<body>
    +  <h1>Hello, it works!</h1>
    +  <p>This site is served via a PVC!</p>
    +</body>
    +</html>
    +END
    +

    -To verify that the USB key (flash disk) is there:
    +The index.html file gives us some actual content to serve. After deleting the pod, it recreates itself and the volume mounts correctly:

    -
    -paul@f0:/ % doas camcontrol devlist
    -<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    -<Samsung SSD 870 EVO 1TB SVT03B6Q>  at scbus1 target 0 lun 0 (pass1,ada1)
    -<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    -paul@f0:/ %
    +
    +
    > ~ kubectl delete pod apache-deployment-5b96bd6b6b-fv2jx
    +
    +> ~ curl -H "Host: www.f3s.foo.zone" http://r0.lan.buetow.org:80
    +<!DOCTYPE html>
    +<html>
    +<head>
    +  <title>Hello, it works</title>
    +</head>
    +<body>
    +  <h1>Hello, it works!</h1>
    +  <p>This site is served via a PVC!</p>
    +</body>
    +</html>
     

    -Let's create the UFS file system and mount it (done on all three nodes f0, f1 and f2):
    +

    Scaling Traefik for faster failover


    +
    +Traefik (used for ingress on k3s) ships with a single replica by default, but for faster failover I bumped it to two replicas so each worker node runs one pod. That way, if a node disappears, the service stays up while Kubernetes schedules a replacement. Here's the command I used:

    -
    paul@f0:/ % doas newfs /dev/da0
    -/dev/da0: 15000.0MB (30720000 sectors) block size 32768, fragment size 4096
    -        using 24 cylinder groups of 625.22MB, 20007 blks, 80128 inodes.
    -        with soft updates
    -super-block backups (for fsck_ffs -b #) at:
    - 192, 1280640, 2561088, 3841536, 5121984, 6402432, 7682880, 8963328, 10243776,
    -11524224, 12804672, 14085120, 15365568, 16646016, 17926464, 19206912,k 20487360,
    -...
    -
    -paul@f0:/ % echo '/dev/da0 /keys ufs rw 0 2' | doas tee -a /etc/fstab
    -/dev/da0 /keys ufs rw 0 2
    -paul@f0:/ % doas mkdir /keys
    -paul@f0:/ % doas mount /keys
    -paul@f0:/ % df | grep keys
    -/dev/da0             14877596       8  13687384     0%    /keys
    +
    > ~ kubectl -n kube-system scale deployment traefik --replicas=2
     

    -USB keys stuck in
    +And the result:

    -

    Generating encryption keys


    + +
    > ~ kubectl -n kube-system get pods -l app.kubernetes.io/name=traefik
    +kube-system   traefik-c98fdf6fb-97kqk   1/1   Running   19 (53d ago)   64d
    +kube-system   traefik-c98fdf6fb-9npg2   1/1   Running   11 (53d ago)   61d
    +

    -The following keys will later be used to encrypt the ZFS file systems. They will be stored on all three nodes, serving as a backup in case one of the keys is lost or corrupted. When we later replicate encrypted ZFS volumes from one node to another, the keys must also be available on the destination node.
    +

    Make it accessible from the public internet



    -
    -paul@f0:/keys % doas openssl rand -out /keys/f0.lan.buetow.org:bhyve.key 32
    -paul@f0:/keys % doas openssl rand -out /keys/f1.lan.buetow.org:bhyve.key 32
    -paul@f0:/keys % doas openssl rand -out /keys/f2.lan.buetow.org:bhyve.key 32
    -paul@f0:/keys % doas openssl rand -out /keys/f0.lan.buetow.org:zdata.key 32
    -paul@f0:/keys % doas openssl rand -out /keys/f1.lan.buetow.org:zdata.key 32
    -paul@f0:/keys % doas openssl rand -out /keys/f2.lan.buetow.org:zdata.key 32
    -paul@f0:/keys % doas chown root *
    -paul@f0:/keys % doas chmod 400 *
    -
    -paul@f0:/keys % ls -l
    -total 20
    -*r--------  1 root wheel 32 May 25 13:07 f0.lan.buetow.org:bhyve.key
    -*r--------  1 root wheel 32 May 25 13:07 f1.lan.buetow.org:bhyve.key
    -*r--------  1 root wheel 32 May 25 13:07 f2.lan.buetow.org:bhyve.key
    -*r--------  1 root wheel 32 May 25 13:07 f0.lan.buetow.org:zdata.key
    -*r--------  1 root wheel 32 May 25 13:07 f1.lan.buetow.org:zdata.key
    -*r--------  1 root wheel 32 May 25 13:07 f2.lan.buetow.org:zdata.key
    -
    +Next, I made this accessible through the public internet via the www.f3s.foo.zone hosts. As a reminder from part 1 of this series, I reviewed the section titled "OpenBSD/relayd to the rescue for external connectivity":

    -After creation, these are copied to the other two nodes, f1 and f2, into the /keys partition (I won't provide the commands here; create a tarball, copy it over, and extract it on the destination nodes).
    +f3s: Kubernetes with FreeBSD - Part 1: Setting the stage

    -

    Configuring zdata ZFS pool encryption


    +All apps should be reachable through the internet (e.g., from my phone or computer when travelling). For external connectivity and TLS management, I've got two OpenBSD VMs (one hosted by OpenBSD Amsterdam and another hosted by Hetzner) handling public-facing services like DNS, relaying traffic, and automating Let's Encrypt certificates.

    -Let's encrypt our zdata ZFS pool. We are not encrypting the whole pool, but everything within the zdata/enc data set:
    +All of this (every Linux VM to every OpenBSD box) will be connected via WireGuard tunnels, keeping everything private and secure. There will be 6 WireGuard tunnels (3 k3s nodes times two OpenBSD VMs).
    +
    +So, when I want to access a service running in k3s, I will hit an external DNS endpoint (with the authoritative DNS servers being the OpenBSD boxes). The DNS will resolve to the master OpenBSD VM (see my KISS highly-available with OpenBSD blog post), and from there, the relayd process (with a Let's Encrypt certificate—see my Let's Encrypt with OpenBSD and Rex blog post) will accept the TCP connection and forward it through the WireGuard tunnel to a reachable node port of one of the k3s nodes, thus serving the traffic.

    -
    paul@f0:/keys % doas zfs create -o encryption=on -o keyformat=raw -o \
    -  keylocation=file:///keys/`hostname`:zdata.key zdata/enc
    -paul@f0:/ % zfs list | grep zdata
    -zdata                                          836K   899G    96K  /data
    -zdata/enc                                      200K   899G   200K  /data/enc
    +
    > ~ curl https://f3s.foo.zone
    +<html><body><h1>It works!</h1></body></html>
     
    -paul@f0:/keys % zfs get all zdata/enc | grep -E -i '(encryption|key)'
    -zdata/enc  encryption            aes-256-gcm                               -
    -zdata/enc  keylocation           file:///keys/f0.lan.buetow.org:zdata.key  local
    -zdata/enc  keyformat             raw                                       -
    -zdata/enc  encryptionroot        zdata/enc                                 -
    -zdata/enc  keystatus             available                                 -
    +> ~ curl https://www.f3s.foo.zone
    +<html><body><h1>It works!</h1></body></html>
    +
    +> ~ curl https://standby.f3s.foo.zone
    +<html><body><h1>It works!</h1></body></html>
     

    -All future data sets within zdata/enc will inherit the same encryption key.
    +This is how it works in relayd.conf on OpenBSD:

    -

    Migrating Bhyve VMs to an encrypted bhyve ZFS volume


    +

    OpenBSD relayd configuration



    -We set up Bhyve VMs in a previous blog post. Their ZFS data sets rely on zroot, which is the default ZFS pool on the internal 512GB NVME drive. They aren't encrypted yet, so we encrypt the VM data sets as well now. To do so, we first shut down the VMs on all three nodes:
    +The OpenBSD edge relays keep the Kubernetes-facing addresses for the f3s ingress endpoints in a shared backend table so TLS traffic for every f3s hostname lands on the same pool of k3s nodes (pointing to the WireGuard IP addresses of those nodes - remember, they are running locally in my LAN, wheras the OpenBSD edge relays operate in the public internet):

    - -
    paul@f0:/keys % doas vm stop rocky
    -Sending ACPI shutdown to rocky
    +
    +table <f3s> {
    +  192.168.2.120
    +  192.168.2.121
    +  192.168.2.122
    +}
    +
    +
    +Inside the http protocol "https" block each public hostname gets its Let's Encrypt certificate. The protocol configures TLS keypairs for all f3s services and other public endpoints. For f3s hosts specifically, there are no explicit forward to rules in the protocol—they use the relay-level failover mechanism described later. Non-f3s hosts get explicit localhost routing to prevent them from trying the f3s backends:
    +
    +
    +http protocol "https" {
    +    # TLS certificates for all f3s services
    +    tls keypair f3s.foo.zone
    +    tls keypair www.f3s.foo.zone
    +    tls keypair standby.f3s.foo.zone
    +    tls keypair anki.f3s.foo.zone
    +    tls keypair www.anki.f3s.foo.zone
    +    tls keypair standby.anki.f3s.foo.zone
    +    tls keypair bag.f3s.foo.zone
    +    tls keypair www.bag.f3s.foo.zone
    +    tls keypair standby.bag.f3s.foo.zone
    +    tls keypair flux.f3s.foo.zone
    +    tls keypair www.flux.f3s.foo.zone
    +    tls keypair standby.flux.f3s.foo.zone
    +    tls keypair audiobookshelf.f3s.foo.zone
    +    tls keypair www.audiobookshelf.f3s.foo.zone
    +    tls keypair standby.audiobookshelf.f3s.foo.zone
    +    tls keypair gpodder.f3s.foo.zone
    +    tls keypair www.gpodder.f3s.foo.zone
    +    tls keypair standby.gpodder.f3s.foo.zone
    +    tls keypair radicale.f3s.foo.zone
    +    tls keypair www.radicale.f3s.foo.zone
    +    tls keypair standby.radicale.f3s.foo.zone
    +    tls keypair vault.f3s.foo.zone
    +    tls keypair www.vault.f3s.foo.zone
    +    tls keypair standby.vault.f3s.foo.zone
    +    tls keypair syncthing.f3s.foo.zone
    +    tls keypair www.syncthing.f3s.foo.zone
    +    tls keypair standby.syncthing.f3s.foo.zone
    +    tls keypair uprecords.f3s.foo.zone
    +    tls keypair www.uprecords.f3s.foo.zone
    +    tls keypair standby.uprecords.f3s.foo.zone
     
    -paul@f0:/keys % doas vm list
    -NAME     DATASTORE  LOADER     CPU  MEMORY  VNC  AUTO     STATE
    -rocky    default    uefi       4    14G     -    Yes [1]  Stopped
    +    # Explicitly route non-f3s hosts to localhost
    +    match request header "Host" value "foo.zone" forward to <localhost>
    +    match request header "Host" value "www.foo.zone" forward to <localhost>
    +    match request header "Host" value "dtail.dev" forward to <localhost>
    +    # ... other non-f3s hosts ...
    +
    +    # NOTE: f3s hosts have NO match rules here!
    +    # They use relay-level failover (f3s -> localhost backup)
    +    # See the relay configuration below for automatic failover details
    +}
     

    -After this, we rename the unencrypted data set to _old, create a new encrypted data set, and also snapshot it as @hamburger.
    -
    - -
    paul@f0:/keys % doas zfs rename zroot/bhyve zroot/bhyve_old
    -paul@f0:/keys % doas zfs set mountpoint=/mnt zroot/bhyve_old
    -paul@f0:/keys % doas zfs snapshot zroot/bhyve_old/rocky@hamburger
    +Both IPv4 and IPv6 listeners reuse the same protocol definition, making the relay transparent for dual-stack clients while still health checking every k3s backend before forwarding traffic over WireGuard:
    +
    +
    +relay "https4" {
    +    listen on 46.23.94.99 port 443 tls
    +    protocol "https"
    +    # Primary: f3s cluster (with health checks) - Falls back to localhost when all hosts down
    +    forward to <f3s> port 80 check tcp
    +    forward to <localhost> port 8080
    +}
     
    -paul@f0:/keys % doas zfs create -o encryption=on -o keyformat=raw -o \
    -  keylocation=file:///keys/`hostname`:bhyve.key zroot/bhyve
    -paul@f0:/keys % doas zfs set mountpoint=/zroot/bhyve zroot/bhyve
    -paul@f0:/keys % doas zfs set mountpoint=/zroot/bhyve/rocky zroot/bhyve/rocky
    +relay "https6" {
    +    listen on 2a03:6000:6f67:624::99 port 443 tls
    +    protocol "https"
    +    # Primary: f3s cluster (with health checks) - Falls back to localhost when all hosts down
    +    forward to <f3s> port 80 check tcp
    +    forward to <localhost> port 8080
    +}
     

    -Once done, we import the snapshot into the encrypted dataset and also copy some other metadata files from vm-bhyve back over.
    +In practice, that means relayd terminates TLS with the correct certificate, keeps the three WireGuard-connected backends in rotation, and ships each request to whichever bhyve VM answers first.
    +
    +

    Automatic failover when f3s cluster is down


    +
    +Update: This section was added at Tue 30 Dec 10:11:44 EET 2025
    +
    +One important aspect of this setup is graceful degradation: when all three f3s nodes are unreachable (e.g., during maintenance or a power outage in my LAN), users should see a friendly status page instead of an error message.
    +
    +OpenBSD's relayd supports automatic failover through its health check mechanism. According to the relayd.conf manual:
    +
    +This directive can be specified multiple times - subsequent entries will be used as the backup table if all hosts in the previous table are down.
    +
    +The key is the order of forward to statements in the relay configuration. By placing the f3s table first with check tcp health checks, followed by localhost as a backup, relayd automatically routes traffic based on backend availability:
    +
    +When f3s cluster is UP:
    +
    +
      +
    • Health checks on port 80 succeed for f3s nodes
    • +
    • All f3s traffic routes to the Kubernetes cluster
    • +
    • Localhost backup remains idle
    • +

    +When f3s cluster is DOWN:

    +
      +
    • All health checks fail (nodes unreachable)
    • +
    • The <f3s> table becomes unavailable
    • +
    • Traffic automatically falls back to <localhost> on port 8080
    • +
    • OpenBSD's httpd serves a static fallback page
    • +

    -paul@f0:/keys % doas zfs send zroot/bhyve_old/rocky@hamburger | \
    -  doas zfs recv zroot/bhyve/rocky
    -paul@f0:/keys % doas cp -Rp /mnt/.config /zroot/bhyve/
    -paul@f0:/keys % doas cp -Rp /mnt/.img /zroot/bhyve/
    -paul@f0:/keys % doas cp -Rp /mnt/.templates /zroot/bhyve/
    -paul@f0:/keys % doas cp -Rp /mnt/.iso /zroot/bhyve/
    +# NEW configuration - supports automatic failover
    +http protocol "https" {
    +    # Explicitly route non-f3s hosts to localhost
    +    match request header "Host" value "foo.zone" forward to <localhost>
    +    match request header "Host" value "dtail.dev" forward to <localhost>
    +    # ... other non-f3s hosts ...
    +
    +    # f3s hosts have NO protocol rules - they use relay-level failover
    +    # (no match rules for f3s.foo.zone, anki.f3s.foo.zone, etc.)
    +}
    +
    +relay "https4" {
    +    # f3s FIRST (with health checks), localhost as BACKUP
    +    forward to <f3s> port 80 check tcp
    +    forward to <localhost> port 8080
    +}
     

    -We also have to make encrypted ZFS data sets mount automatically on boot:
    +This way, f3s traffic uses the relay's default behavior: try the first table, fall back to the second when health checks fail.

    - -
    paul@f0:/keys % doas sysrc zfskeys_enable=YES
    -zfskeys_enable:  -> YES
    -paul@f0:/keys % doas vm init
    -paul@f0:/keys % doas reboot
    -.
    -.
    -.
    -paul@f0:~ % doas vm list
    -paul@f0:~ % doas vm list
    -NAME     DATASTORE  LOADER     CPU  MEMORY  VNC           AUTO     STATE
    -rocky    default    uefi       4    14G     0.0.0.0:5900  Yes [1]  Running (2265)
    +

    OpenBSD httpd fallback configuration


    +
    +The localhost httpd service on port 8080 serves the fallback content from /var/www/htdocs/f3s_fallback/. This directory contains a simple HTML page explaining the situation.
    +
    +The key configuration detail is using request rewrite to ensure the fallback page is served for ALL paths, not just the root. Without this, accessing paths like /login?redirect=/files/ would return 404 instead of the fallback page:
    +
    +
    +# OpenBSD httpd.conf
    +# Fallback for f3s hosts - serve fallback page for ALL paths
    +server "f3s.foo.zone" {
    +  listen on * port 8080
    +  log style forwarded
    +  location * {
    +    # Rewrite all requests to /index.html to show fallback page regardless of path
    +    request rewrite "/index.html"
    +    root "/htdocs/f3s_fallback"
    +  }
    +}
    +
    +server "anki.f3s.foo.zone" {
    +  listen on * port 8080
    +  log style forwarded
    +  location * {
    +    request rewrite "/index.html"
    +    root "/htdocs/f3s_fallback"
    +  }
    +}
    +
    +# ... similar blocks for all f3s hostnames ...
     

    -As you can see, the VM is running. This means the encrypted zroot/bhyve was mounted successfully after the reboot! Now we can destroy the old, unencrypted, and now unused bhyve dataset:
    +The request rewrite "/index.html" directive ensures that whether someone accesses /, /login, /api/status, or any other path, they all receive the same fallback page. This prevents confusing 404 errors when users have bookmarked specific URLs or follow deep links while the cluster is down.

    - -
    paul@f0:~ % doas zfs destroy -R zroot/bhyve_old
    -
    +The fallback page itself is straightforward:

    -To verify once again that zroot/bhyve and zroot/bhyve/rocky are now both encrypted, we run:
    -
    -
    paul@f0:~ % zfs get all zroot/bhyve | grep -E '(encryption|key)'
    -zroot/bhyve  encryption            aes-256-gcm                               -
    -zroot/bhyve  keylocation           file:///keys/f0.lan.buetow.org:bhyve.key  local
    -zroot/bhyve  keyformat             raw                                       -
    -zroot/bhyve  encryptionroot        zroot/bhyve                               -
    -zroot/bhyve  keystatus             available                                 -
    -
    -paul@f0:~ % zfs get all zroot/bhyve/rocky | grep -E '(encryption|key)'
    -zroot/bhyve/rocky  encryption            aes-256-gcm            -
    -zroot/bhyve/rocky  keylocation           none                   default
    -zroot/bhyve/rocky  keyformat             raw                    -
    -zroot/bhyve/rocky  encryptionroot        zroot/bhyve            -
    -zroot/bhyve/rocky  keystatus             available              -
    +
    <!DOCTYPE html>
    +<html>
    +<head>
    +    <title>Server turned off</title>
    +    <style>
    +        body {
    +            font-family: sans-serif;
    +            text-align: center;
    +            padding-top: 50px;
    +        }
    +        .container {
    +            max-width: 600px;
    +            margin: 0 auto;
    +        }
    +    </style>
    +</head>
    +<body>
    +    <div class="container">
    +        <h1>Server turned off</h1>
    +        <p>The servers are all currently turned off.</p>
    +        <p>Please try again later.</p>
    +        <p>Or email <a href="mailto:paul@nospam.buetow.org">paul@nospam.buetow.org</a>
    +           - so I can turn them back on for you!</p>
    +    </div>
    +</body>
    +</html>
     

    -

    ZFS Replication with zrepl


    -
    -Data replication is the cornerstone of high availability. While CARP handles IP failover (see later in this post), we need continuous data replication to ensure the backup server has current data when it becomes active. Without replication, failover would result in data loss or require shared storage (like iSCSI), which introduces a single point of failure.
    -
    -

    Understanding Replication Requirements


    -
    -Our storage system has different replication needs:
    +This approach provides several benefits:

      -
    • NFS data (/data/nfs/k3svolumes): Soon, it will contain active Kubernetes persistent volumes. Needs frequent replication (every minute) to minimise data loss during failover.
    • -
    • VM data (/zroot/bhyve/freebsd): Contains VM images that change less frequently. Can tolerate longer replication intervals (every 10 minutes).
    • +
    • Automatic detection: Health checks run continuously; no manual intervention needed
    • +
    • Instant fallback: When all f3s nodes go down, the next request automatically routes to localhost
    • +
    • Transparent recovery: When f3s comes back online, health checks pass and traffic resumes automatically
    • +
    • User experience: Visitors see a helpful message instead of connection errors
    • +
    • No DNS changes: The same hostnames work whether f3s is up or down

    -The 1-minute replication window is perfectly acceptable for my personal use cases. This isn't a high-frequency trading system or a real-time database—it's storage for personal projects, development work, and home lab experiments. Losing at most 1 minute of work in a disaster scenario is a reasonable trade-off for the reliability and simplicity of snapshot-based replication. Additionally, in the case of a "1 minute of data loss," I would likely still have the data available on the client side.
    +This fallback mechanism has proven invaluable during maintenance windows and unexpected outages, ensuring that users always get a response even when the home lab is offline.

    -Why use zrepl instead of HAST? While HAST (Highly Available Storage) is FreeBSD's native solution for high-availability storage and supports synchronous replication—thus eliminating the mentioned 1-minute window—I've chosen zrepl for several important reasons:
    +

    Exposing services via LAN ingress



    -
      -
    • HAST can cause ZFS corruption: HAST operates at the block level and doesn't understand ZFS's transactional semantics. During failover, in-flight transactions can lead to corrupted zpools. I've experienced this firsthand (I am confident I have configured something wrong) - the automatic failover would trigger while ZFS was still writing, resulting in an unmountable pool.
    • -
    • ZFS-aware replication: zrepl understands ZFS datasets and snapshots. It replicates at the dataset level, ensuring each snapshot is a consistent point-in-time copy. This is fundamentally safer than block-level replication.
    • -
    • Snapshot history: With zrepl, you get multiple recovery points (every minute for NFS data in our setup). If corruption occurs, you can roll back to any previous snapshot. HAST only gives you the current state.
    • -
    • Easier recovery: When something goes wrong with zrepl, you still have intact snapshots on both sides. With HAST, a corrupted primary often means a corrupted secondary as well.
    • -

    -FreeBSD HAST
    +In addition to external access through the OpenBSD relays, services can also be exposed on the local network using LAN-specific ingresses. This is useful for accessing services from within the home network without going through the internet, reducing latency and providing an alternative path if the external relays are unavailable.

    -

    Installing zrepl


    +The LAN ingress architecture leverages the existing FreeBSD CARP (Common Address Redundancy Protocol) failover infrastructure that's already in place for NFS-over-TLS (see Part 5). Instead of deploying MetalLB or another LoadBalancer implementation, we reuse the CARP virtual IP (192.168.1.138) by adding HTTP/HTTPS forwarding alongside the existing stunnel service on port 2323.

    -First, install zrepl on both hosts involved (we will replicate data from f0 to f1):
    +

    Architecture overview



    - -
    paul@f0:~ % doas pkg install -y zrepl
    -
    +The LAN access path differs from external access:

    -Then, we verify the pools and datasets on both hosts:
    +**External access (*.f3s.foo.zone):**
    +
    +Internet → OpenBSD relayd (TLS termination, Let's Encrypt)
    +        → WireGuard tunnel
    +        → k3s Traefik :80 (HTTP)
    +        → Service
    +

    - -
    # On f0
    -paul@f0:~ % doas zpool list
    -NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    -zdata   928G  1.03M   928G        -         -     0%     0%  1.00x    ONLINE  -
    -zroot   472G  26.7G   445G        -         -     0%     5%  1.00x    ONLINE  -
    -
    -paul@f0:~ % doas zfs list -r zdata/enc
    -NAME        USED  AVAIL  REFER  MOUNTPOINT
    -zdata/enc   200K   899G   200K  /data/enc
    -
    -# On f1
    -paul@f1:~ % doas zpool list
    -NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    -zdata   928G   956K   928G        -         -     0%     0%  1.00x    ONLINE  -
    -zroot   472G  11.7G   460G        -         -     0%     2%  1.00x    ONLINE  -
    -
    -paul@f1:~ % doas zfs list -r zdata/enc
    -NAME        USED  AVAIL  REFER  MOUNTPOINT
    -zdata/enc   200K   899G   200K  /data/enc
    +**LAN access (*.f3s.lan.foo.zone):**
    +
    +LAN → FreeBSD CARP VIP (192.168.1.138)
    +    → FreeBSD relayd (TCP forwarding)
    +    → k3s Traefik :443 (TLS termination, cert-manager)
    +    → Service
     

    -Since we have a WireGuard tunnel between f0 and f1, we'll use TCP transport over the secure tunnel instead of SSH. First, check the WireGuard IP addresses:
    +The key architectural decisions:

    - -
    # Check WireGuard interface IPs
    -paul@f0:~ % ifconfig wg0 | grep inet
    -	inet 192.168.2.130 netmask 0xffffff00
    -
    -paul@f1:~ % ifconfig wg0 | grep inet
    -	inet 192.168.2.131 netmask 0xffffff00
    -
    +
      +
    • FreeBSD relayd performs pure TCP forwarding (Layer 4) for ports 80 and 443, not TLS termination
    • +
    • Traefik inside k3s handles TLS offloading using certificates from cert-manager
    • +
    • Self-signed CA for LAN domains (no external dependencies)
    • +
    • CARP provides automatic failover between f0 and f1
    • +
    • No code changes to applications—just add a LAN ingress resource
    • +

    +

    Installing cert-manager



    -Let's create a dedicated dataset for NFS data that will be replicated:
    +First, install cert-manager to handle certificate lifecycle management for LAN services. The installation is automated with a Justfile:
    +
    +codeberg.org/snonux/conf/f3s/cert-manager

    -
    # Create the nfsdata dataset that will hold all data exposed via NFS
    -paul@f0:~ % doas zfs create zdata/enc/nfsdata
    +
    $ cd conf/f3s/cert-manager
    +$ just install
    +kubectl apply -f cert-manager.yaml
    +# ... cert-manager CRDs and resources created ...
    +kubectl apply -f self-signed-issuer.yaml
    +clusterissuer.cert-manager.io/selfsigned-issuer created
    +clusterissuer.cert-manager.io/selfsigned-ca-issuer created
    +kubectl apply -f ca-certificate.yaml
    +certificate.cert-manager.io/selfsigned-ca created
    +kubectl apply -f wildcard-certificate.yaml
    +certificate.cert-manager.io/f3s-lan-wildcard created
     

    -Afterwards, we create the zrepl configuration on f0:
    +This creates:
    +
    +
      +
    • A self-signed ClusterIssuer
    • +
    • A CA certificate (f3s-lan-ca) valid for 10 years
    • +
    • A CA-signed ClusterIssuer
    • +
    • A wildcard certificate (*.f3s.lan.foo.zone) valid for 90 days with automatic renewal
    • +

    +Verify the certificates:

    -
    paul@f0:~ % doas tee /usr/local/etc/zrepl/zrepl.yml <<'EOF'
    -global:
    -  logging:
    -    - type: stdout
    -      level: info
    -      format: human
    -
    -jobs:
    -  - name: f0_to_f1_nfsdata
    -    type: push
    -    connect:
    -      type: tcp
    -      address: "192.168.2.131:8888"
    -    filesystems:
    -      "zdata/enc/nfsdata": true
    -    send:
    -      encrypted: true
    -    snapshotting:
    -      type: periodic
    -      prefix: zrepl_
    -      interval: 1m
    -    pruning:
    -      keep_sender:
    -        - type: last_n
    -          count: 10
    -        - type: grid
    -          grid: 4x7d | 6x30d
    -          regex: "^zrepl_.*"
    -      keep_receiver:
    -        - type: last_n
    -          count: 10
    -        - type: grid
    -          grid: 4x7d | 6x30d
    -          regex: "^zrepl_.*"
    -
    -  - name: f0_to_f1_freebsd
    -    type: push
    -    connect:
    -      type: tcp
    -      address: "192.168.2.131:8888"
    -    filesystems:
    -      "zroot/bhyve/freebsd": true
    -    send:
    -      encrypted: true
    -    snapshotting:
    -      type: periodic
    -      prefix: zrepl_
    -      interval: 10m
    -    pruning:
    -      keep_sender:
    -        - type: last_n
    -          count: 10
    -        - type: grid
    -          grid: 4x7d
    -          regex: "^zrepl_.*"
    -      keep_receiver:
    -        - type: last_n
    -          count: 10
    -        - type: grid
    -          grid: 4x7d
    -          regex: "^zrepl_.*"
    -EOF
    +http://www.gnu.org/software/src-highlite -->
    +
    $ kubectl get certificate -n cert-manager
    +NAME               READY   SECRET                 AGE
    +f3s-lan-wildcard   True    f3s-lan-tls            5m
    +selfsigned-ca      True    selfsigned-ca-secret   5m
     

    - We're using two separate replication jobs with different intervals:
    -
    -
      -
    • f0_to_f1_nfsdata: Replicates NFS data every minute for faster failover recovery
    • -
    • f0_to_f1_freebsd: Replicates FreeBSD VM every ten minutes (less critical)
    • -

    -The FreeBSD VM is only used for development purposes, so it doesn't require as frequent replication as the NFS data. It's off-topic to this blog series, but it showcases how zrepl's flexibility in handling different datasets with varying replication needs.
    +The wildcard certificate (f3s-lan-tls) needs to be copied to any namespace that uses it:

    -Furthermore:
    + +
    $ kubectl get secret f3s-lan-tls -n cert-manager -o yaml | \
    +    sed 's/namespace: cert-manager/namespace: services/' | \
    +    kubectl apply -f -
    +

    -
      -
    • We're specifically replicating zdata/enc/nfsdata instead of the entire zdata/enc dataset. This dedicated dataset will contain all the data we later want to expose via NFS, keeping a clear separation between replicated NFS data and other local encrypted data.
    • -
    • We use send: encrypted: true to keep the replication stream encrypted. While WireGuard already encrypts in transit, this provides additional protection. For reduced CPU overhead, you could set encrypted: false since the tunnel is secure.
    • -

    -

    Configuring zrepl on f1 (sink)


    +

    Configuring FreeBSD relayd for LAN access



    -On f1 (the sink, meaning it's the node receiving the replication data), we configure zrepl to receive the data as follows:
    +On both FreeBSD hosts (f0, f1), install and configure relayd for TCP forwarding:

    -
    # First, create a dedicated sink dataset
    -paul@f1:~ % doas zfs create zdata/sink
    +
    paul@f0:~ % doas pkg install -y relayd
    +
    +
    +Create /usr/local/etc/relayd.conf:
    +
    +
    +# k3s nodes backend table
    +table <k3s_nodes> { 192.168.1.120 192.168.1.121 192.168.1.122 }
     
    -paul@f1:~ % doas tee /usr/local/etc/zrepl/zrepl.yml <<'EOF'
    -global:
    -  logging:
    -    - type: stdout
    -      level: info
    -      format: human
    +# TCP forwarding to Traefik (no TLS termination)
    +relay "lan_http" {
    +    listen on 192.168.1.138 port 80
    +    forward to <k3s_nodes> port 80 check tcp
    +}
     
    -jobs:
    -  - name: sink
    -    type: sink
    -    serve:
    -      type: tcp
    -      listen: "192.168.2.131:8888"
    -      clients:
    -        "192.168.2.130": "f0"
    -    recv:
    -      placeholder:
    -        encryption: inherit
    -    root_fs: "zdata/sink"
    -EOF
    +relay "lan_https" {
    +    listen on 192.168.1.138 port 443
    +    forward to <k3s_nodes> port 443 check tcp
    +}
     

    -

    Enabling and starting zrepl services


    +Note: The IP addresses 192.168.1.120-122 are the LAN IPs of the k3s nodes (r0, r1, r2), not their WireGuard IPs. FreeBSD relayd requires PF (Packet Filter) to be enabled. Create a minimal /etc/pf.conf:

    -We then enable and start zrepl on both hosts via:
    +
    +# Basic PF rules for relayd
    +set skip on lo0
    +pass in quick
    +pass out quick
    +
    +
    +Enable PF and relayd:

    -
    # On f0
    -paul@f0:~ % doas sysrc zrepl_enable=YES
    -zrepl_enable:  -> YES
    -paul@f0:~ % doas service `zrepl` start
    -Starting zrepl.
    -
    -# On f1
    -paul@f1:~ % doas sysrc zrepl_enable=YES
    -zrepl_enable:  -> YES
    -paul@f1:~ % doas service `zrepl` start
    -Starting zrepl.
    +
    paul@f0:~ % doas sysrc pf_enable=YES pflog_enable=YES relayd_enable=YES
    +paul@f0:~ % doas service pf start
    +paul@f0:~ % doas service pflog start
    +paul@f0:~ % doas service relayd start
     

    -To check the replication status, we run:
    +Verify relayd is listening on the CARP VIP:

    -
    # On f0, check `zrepl` status (use raw mode for non-tty)
    -paul@f0:~ % doas pkg install jq
    -paul@f0:~ % doas zrepl status --mode raw | grep -A2 "Replication" | jq .
    -"Replication":{"StartAt":"2025-07-01T22:31:48.712143123+03:00"...
    -
    -# Check if services are running
    -paul@f0:~ % doas service zrepl status
    -zrepl is running as pid 2649.
    -
    -paul@f1:~ % doas service zrepl status
    -zrepl is running as pid 2574.
    -
    -# Check for `zrepl` snapshots on source
    -paul@f0:~ % doas zfs list -t snapshot -r zdata/enc | grep zrepl
    -zdata/enc@zrepl_20250701_193148_000    0B      -   176K  -
    -
    -# On f1, verify the replicated datasets  
    -paul@f1:~ % doas zfs list -r zdata | grep f0
    -zdata/f0             576K   899G   200K  none
    -zdata/f0/zdata       376K   899G   200K  none
    -zdata/f0/zdata/enc   176K   899G   176K  none
    -
    -# Check replicated snapshots on f1
    -paul@f1:~ % doas zfs list -t snapshot -r zdata | grep zrepl
    -zdata/f0/zdata/enc@zrepl_20250701_193148_000     0B      -   176K  -
    -zdata/f0/zdata/enc@zrepl_20250701_194148_000     0B      -   176K  -
    -.
    -.
    -.
    +
    paul@f0:~ % doas sockstat -4 -l | grep 192.168.1.138
    +_relayd  relayd   2903  11  tcp4   192.168.1.138:80      *:*
    +_relayd  relayd   2903  12  tcp4   192.168.1.138:443     *:*
     

    -

    Monitoring replication


    +Repeat the same configuration on f1. Both hosts will run relayd listening on the CARP VIP, but only the CARP MASTER will respond to traffic. When failover occurs, the new MASTER takes over seamlessly.

    -You can monitor the replication progress with:
    +

    Adding LAN ingress to services


    +
    +To expose a service on the LAN, add a second Ingress resource to its Helm chart. Here's an example:
    +
    +
    +---
    +# LAN Ingress for f3s.lan.foo.zone
    +apiVersion: networking.k8s.io/v1
    +kind: Ingress
    +metadata:
    +  name: ingress-lan
    +  namespace: services
    +  annotations:
    +    spec.ingressClassName: traefik
    +    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
    +spec:
    +  tls:
    +    - hosts:
    +        - f3s.lan.foo.zone
    +      secretName: f3s-lan-tls
    +  rules:
    +    - host: f3s.lan.foo.zone
    +      http:
    +        paths:
    +          - path: /
    +            pathType: Prefix
    +            backend:
    +              service:
    +                name: service
    +                port:
    +                  number: 4533
    +
    +
    +Key points:
    +
    +
      +
    • Use web,websecure entrypoints (both HTTP and HTTPS)
    • +
    • Reference the f3s-lan-tls secret in the tls section
    • +
    • Use .f3s.lan.foo.zone subdomain pattern
    • +
    • Same backend service as the external ingress
    • +

    +Apply the ingress and test:

    -
    paul@f0:~ % doas zrepl status
    +
    $ kubectl apply -f ingress-lan.yaml
    +ingress.networking.k8s.io/ingress-lan created
    +
    +$ curl -k https://f3s.lan.foo.zone
    +HTTP/2 302 
    +location: /app/
     

    -zrepl status
    +

    Client-side DNS and CA setup



    -With this setup, both zdata/enc/nfsdata and zroot/bhyve/freebsd on f0 will be automatically replicated to f1 every 1 minute (or 10 minutes in the case of the FreeBSD VM), with encrypted snapshots preserved on both sides. The pruning policy ensures that we keep the last 10 snapshots while managing disk space efficiently.
    +To access LAN services, clients need DNS entries and must trust the self-signed CA.

    -The replicated data appears on f1 under zdata/sink/ with the source host and dataset hierarchy preserved:
    +Add DNS entries to /etc/hosts on your laptop:

    -
      -
    • zdata/enc/nfsdatazdata/sink/f0/zdata/enc/nfsdata
    • -
    • zroot/bhyve/freebsdzdata/sink/f0/zroot/bhyve/freebsd
    • -

    -This is by design - zrepl preserves the complete path from the source to ensure there are no conflicts when replicating from multiple sources.
    + +
    $ sudo tee -a /etc/hosts << 'EOF'
    +# f3s LAN services
    +192.168.1.138  f3s.lan.foo.zone
    +EOF
    +

    -

    Verifying replication after reboot


    +The CARP VIP 192.168.1.138 provides high availability—traffic automatically fails over to the backup host if the master goes down.

    -The zrepl service is configured to start automatically at boot. After rebooting both hosts:
    +Export the self-signed CA certificate:

    -
    paul@f0:~ % uptime
    -11:17PM  up 1 min, 0 users, load averages: 0.16, 0.06, 0.02
    -
    -paul@f0:~ % doas service `zrepl` status
    -zrepl is running as pid 2366.
    -
    -paul@f1:~ % doas service `zrepl` status
    -zrepl is running as pid 2309.
    -
    -# Check that new snapshots are being created and replicated
    -paul@f0:~ % doas zfs list -t snapshot | grep `zrepl` | tail -2
    -zdata/enc/nfsdata@zrepl_20250701_202530_000                0B      -   200K  -
    -zroot/bhyve/freebsd@zrepl_20250701_202530_000               0B      -  2.97G  -
    -.
    -.
    -.
    -
    -paul@f1:~ % doas zfs list -t snapshot -r zdata/sink | grep 202530
    -zdata/sink/f0/zdata/enc/nfsdata@zrepl_20250701_202530_000      0B      -   176K  -
    -zdata/sink/f0/zroot/bhyve/freebsd@zrepl_20250701_202530_000     0B      -  2.97G  -
    -.
    -.
    -.
    +
    $ kubectl get secret selfsigned-ca-secret -n cert-manager -o jsonpath='{.data.ca\.crt}' | \
    +    base64 -d > f3s-lan-ca.crt
    +
    +
    +Install the CA certificate on Linux (Fedora/Rocky):
    +
    + +
    $ sudo cp f3s-lan-ca.crt /etc/pki/ca-trust/source/anchors/
    +$ sudo update-ca-trust
     

    -The timestamps confirm that replication resumed automatically after the reboot, ensuring continuous data protection. We can also write a test file to the NFS data directory on f0 and verify whether it appears on f1 after a minute.
    +After trusting the CA, browsers will accept the LAN certificates without warnings.
    +
    +

    Scaling to other services


    +
    +The same pattern can be applied to any service. To add LAN access:
    +
    +1. Copy the f3s-lan-tls secret to the service's namespace (if not already there)
    +2. Add a LAN Ingress resource using the pattern above
    +3. Configure DNS: 192.168.1.138 service.f3s.lan.foo.zone
    +
    +No changes needed to:
    +
    +
      +
    • relayd configuration (forwards all traffic)
    • +
    • cert-manager (wildcard cert covers all *.f3s.lan.foo.zone)
    • +
    • CARP configuration (VIP shared by all services)
    • +

    +

    TLS offloaders summary


    +
    +The f3s infrastructure now has three distinct TLS offloaders:

    -

    Understanding Failover Limitations and Design Decisions


    +
      +
    • **OpenBSD relayd**: External internet traffic (*.f3s.foo.zone) using Let's Encrypt
    • +
    • **Traefik (k3s)**: LAN HTTPS traffic (*.f3s.lan.foo.zone) using cert-manager
    • +
    • **stunnel**: NFS-over-TLS (port 2323) using custom PKI
    • +

    +Each serves a different purpose with appropriate certificate management for its use case.

    -Our system intentionally fails over to a read-only copy of the replica in the event of the primary's failure. This is due to the nature of zrepl, which only replicates data in one direction. If we mount the data set on the sink node in read-write mode, it would cause the ZFS dataset to diverge from the original, and the replication would break. It can still be mounted read-write on the sink node in case of a genuine issue on the primary node, but that step is left intentionally manual. Therefore, we don't need to fix the replication later on manually.
    +

    Deploying the private Docker image registry



    -So in summary:
    +As not all Docker images I want to deploy are available on public Docker registries and as I also build some of them by myself, there is the need of a private registry.

    -
      -
    • Split-brain prevention: Automatic failover to a read-write copy can cause both nodes to become active simultaneously if network communication fails. This leads to data divergence that's extremely difficult to resolve.
    • -
    • False positive protection: Temporary network issues or high load can trigger unwanted failovers. Manual intervention ensures that failovers occur only when truly necessary.
    • -
    • Data integrity over availability: For storage systems, data consistency is paramount. A few minutes of downtime is preferable to data corruption in this specific use case.
    • -
    • Simplified recovery: With manual failover, you always know which dataset is authoritative, making recovery more straightforward.
    • -

    -

    Mounting the NFS datasets


    +All manifests for the f3s stack live in my configuration repository:

    -To make the NFS data accessible on both nodes, we need to mount it. On f0, this is straightforward:
    +codeberg.org/snonux/conf/f3s
    +
    +Within that repo, the f3s/registry/ directory contains the Helm chart, a Justfile, and a detailed README. Here's the condensed walkthrough I used to roll out the registry with Helm.
    +
    +

    Prepare the NFS-backed storage


    +
    +Create the directory that will hold the registry blobs on the NFS share (I ran this on r0, but any node that exports /data/nfs/k3svolumes works):

    -
    # On f0 - set mountpoint for the primary nfsdata
    -paul@f0:~ % doas zfs set mountpoint=/data/nfs zdata/enc/nfsdata
    -paul@f0:~ % doas mkdir -p /data/nfs
    -
    -# Verify it's mounted
    -paul@f0:~ % df -h /data/nfs
    -Filesystem           Size    Used   Avail Capacity  Mounted on
    -zdata/enc/nfsdata    899G    204K    899G     0%    /data/nfs
    +
    [root@r0 ~]# mkdir -p /data/nfs/k3svolumes/registry
     

    -On f1, we need to handle the encryption key and mount the standby copy:
    +

    Install (or upgrade) the chart


    +
    +Clone the repo (or pull the latest changes) on a workstation that has helm configured for the cluster, then deploy the chart. The Justfile wraps the commands, but the raw Helm invocation looks like this:

    -
    # On f1 - first check encryption status
    -paul@f1:~ % doas zfs get keystatus zdata/sink/f0/zdata/enc/nfsdata
    -NAME                             PROPERTY   VALUE        SOURCE
    -zdata/sink/f0/zdata/enc/nfsdata  keystatus  unavailable  -
    -
    -# Load the encryption key (using f0's key stored on the USB)
    -paul@f1:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
    -    zdata/sink/f0/zdata/enc/nfsdata
    -
    -# Set mountpoint and mount (same path as f0 for easier failover)
    -paul@f1:~ % doas mkdir -p /data/nfs
    -paul@f1:~ % doas zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
    -paul@f1:~ % doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
    -
    -# Make it read-only to prevent accidental writes that would break replication
    -paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    -
    -# Verify
    -paul@f1:~ % df -h /data/nfs
    -Filesystem                         Size    Used   Avail Capacity  Mounted on
    -zdata/sink/f0/zdata/enc/nfsdata    896G    204K    896G     0%    /data/nfs
    +
    $ git clone https://codeberg.org/snonux/conf/f3s.git
    +$ cd conf/f3s/examples/conf/f3s/registry
    +$ helm upgrade --install registry ./helm-chart --namespace infra --create-namespace
     

    -Note: The dataset is mounted at the same path (/data/nfs) on both hosts to simplify failover procedures. The dataset on f1 is set to readonly=on to prevent accidental modifications, which, as mentioned earlier, would break replication. If we did, replication from f0 to f1 would fail like this:
    -
    -cannot receive incremental stream: destination zdata/sink/f0/zdata/enc/nfsdata has been modified since most recent snapshot
    -
    -To fix a broken replication after accidental writes, we can do:
    +Helm creates the infra namespace if it does not exist, provisions a PersistentVolume/PersistentVolumeClaim pair that points at /data/nfs/k3svolumes/registry, and spins up a single registry pod exposed via the docker-registry-service NodePort (30001). Verify everything is up before continuing:

    -
    # Option 1: Rollback to the last common snapshot (loses local changes)
    -paul@f1:~ % doas zfs rollback zdata/sink/f0/zdata/enc/nfsdata@zrepl_20250701_204054_000
    +
    $ kubectl get pods --namespace infra
    +NAME                               READY   STATUS    RESTARTS      AGE
    +docker-registry-6bc9bb46bb-6grkr   1/1     Running   6 (53d ago)   54d
     
    -# Option 2: Make it read-only to prevent accidents again
    -paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    +$ kubectl get svc docker-registry-service -n infra
    +NAME                      TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
    +docker-registry-service   NodePort   10.43.141.56   <none>        5000:30001/TCP   54d
     

    -And replication should work again!
    +

    Allow nodes and workstations to trust the registry



    -

    Troubleshooting: Files not appearing in replication


    +The registry listens on plain HTTP, so both Docker daemons on workstations and the k3s nodes need to treat it as an insecure registry. That's fine for my personal needs, as:

    -If you write files to /data/nfs/ on f0 but they don't appear on f1, check if the dataset is mounted on f0?
    +
      +
    • I don't store any secrets in the images
    • +
    • I access the registry this way only via my LAN
    • +
    • I may will change it later on...
    • +

    +On my Fedora workstation where I build images:

    -
    paul@f0:~ % doas zfs list -o name,mountpoint,mounted | grep nfsdata
    -zdata/enc/nfsdata                             /data/nfs             yes
    +
    $ cat <<"EOF" | sudo tee /etc/docker/daemon.json >/dev/null
    +{
    +  "insecure-registries": [
    +    "r0.lan.buetow.org:30001",
    +    "r1.lan.buetow.org:30001",
    +    "r2.lan.buetow.org:30001"
    +  ]
    +}
    +EOF
    +$ sudo systemctl restart docker
     

    -If it shows no, the dataset isn't mounted! This means files are being written to the root filesystem, not ZFS. Next, we should check whether the encryption key is loaded:
    +On each k3s node, make registry.lan.buetow.org resolve locally and point k3s at the NodePort:

    -
    paul@f0:~ % doas zfs get keystatus zdata/enc/nfsdata
    -NAME               PROPERTY   VALUE        SOURCE
    -zdata/enc/nfsdata  keystatus  available    -
    -# If "unavailable", load the key:
    -paul@f0:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    -paul@f0:~ % doas zfs mount zdata/enc/nfsdata
    +
    $ for node in r0 r1 r2; do
    +>   ssh root@$node "echo '127.0.0.1 registry.lan.buetow.org' >> /etc/hosts"
    +> done
    +
    +$ for node in r0 r1 r2; do
    +> ssh root@$node "cat <<'EOF' > /etc/rancher/k3s/registries.yaml
    +mirrors:
    +  "registry.lan.buetow.org:30001":
    +    endpoint:
    +      - "http://localhost:30001"
    +EOF
    +systemctl restart k3s"
    +> done
     

    -You can also verify that files are in the snapshot (not just the directory):
    +Thanks to the relayd configuration earlier in the post, the external hostnames (f3s.foo.zone, etc.) can already reach NodePort 30001, so publishing the registry later to the outside world is just a matter of wiring the DNS the same way as the ingress hosts. But by default, that's not enabled for now due to security reasons.
    +
    +

    Pushing and pulling images


    +
    +Tag any locally built image with one of the node IPs on port 30001, then push it. I usually target whichever node is closest to me, but any of the three will do:

    -
    paul@f0:~ % ls -la /data/nfs/.zfs/snapshot/zrepl_*/
    +
    $ docker tag my-app:latest r0.lan.buetow.org:30001/my-app:latest
    +$ docker push r0.lan.buetow.org:30001/my-app:latest
     

    -This issue commonly occurs after a reboot if the encryption keys aren't configured to load automatically.
    +Inside the cluster (or from other nodes), reference the image via the service name that Helm created:

    -

    Configuring automatic key loading on boot


    +
    +image: docker-registry-service:5000/my-app:latest
    +

    -To ensure all additional encrypted datasets are mounted automatically after reboot as well, we do:
    +You can test the pull path straight away:

    -
    # On f0 - configure all encrypted datasets
    -paul@f0:~ % doas sysrc zfskeys_enable=YES
    -zfskeys_enable: YES -> YES
    -paul@f0:~ % doas sysrc zfskeys_datasets="zdata/enc zdata/enc/nfsdata zroot/bhyve"
    -zfskeys_datasets:  -> zdata/enc zdata/enc/nfsdata zroot/bhyve
    -
    -# Set correct key locations for all datasets
    -paul@f0:~ % doas zfs set \
    -  keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    -
    -# On f1 - include the replicated dataset
    -paul@f1:~ % doas sysrc zfskeys_enable=YES
    -zfskeys_enable: YES -> YES
    -paul@f1:~ % doas sysrc \
    -  zfskeys_datasets="zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata"
    -zfskeys_datasets:  -> zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata
    -
    -# Set key location for replicated dataset
    -paul@f1:~ % doas zfs set \
    -  keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
    +
    $ kubectl run registry-test \
    +>   --image=docker-registry-service:5000/my-app:latest \
    +>   --restart=Never -n test --command -- sleep 300
     

    -Important notes:
    +If the pod pulls successfully, the private registry is ready for use by the rest of the workloads. Note, that the commands above actually don't work, they are only for illustration purpose mentioned here.

    -
      -
    • Each encryption root needs its own key load entry
    • -
    • The replicated dataset on f1 uses the same encryption key as the source on f0
    • -
    • Always verify datasets are mounted after reboot with zfs list -o name,mounted
    • -
    • Critical: Always ensure the replicated dataset on f1 remains read-only with doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    • -

    -

    Troubleshooting: zrepl Replication Not Working


    +

    Example: Anki Sync Server from the private registry



    -If zrepl replication is not working, here's a systematic approach to diagnose and fix common issues:
    +One of the first workloads I migrated onto the k3s cluster after standing up the registry was my Anki sync server. The configuration repo ships everything in examples/conf/f3s/anki-sync-server/: a Docker build context plus a Helm chart that references the freshly built image.

    -

    Check if zrepl Services are Running


    +

    Build and push the image



    -First, verify that zrepl is running on both nodes:
    +The Dockerfile lives under docker-image/ and takes the Anki release to compile as an ANKI_VERSION build argument. The accompanying Justfile wraps the steps, but the raw commands look like this:

    -
    # Check service status on both f0 and f1
    -paul@f0:~ % doas service zrepl status
    -paul@f1:~ % doas service zrepl status
    -
    -# If not running, start the service
    -paul@f0:~ % doas service zrepl start
    -paul@f1:~ % doas service zrepl start
    +
    $ cd conf/f3s/examples/conf/f3s/anki-sync-server/docker-image
    +$ docker build -t anki-sync-server:25.07.5b --build-arg ANKI_VERSION=25.07.5 .
    +$ docker tag anki-sync-server:25.07.5b \
    +    r0.lan.buetow.org:30001/anki-sync-server:25.07.5b
    +$ docker push r0.lan.buetow.org:30001/anki-sync-server:25.07.5b
     

    -

    Check zrepl Status for Errors


    +Because every k3s node treats registry.lan.buetow.org:30001 as an insecure mirror (see above), the push succeeds regardless of which node answers. If you prefer the shortcut, just f3s in that directory performs the same build/tag/push sequence.

    -Use the status command to see detailed error information:
    +

    Create the Anki secret and storage on the cluster


    +
    +The Helm chart expects the services namespace, a pre-created NFS directory, and a Kubernetes secret that holds the credentials the upstream container understands:

    -
    # Check detailed status (use --mode raw for non-tty environments)
    -paul@f0:~ % doas zrepl status --mode raw
    -
    -# Look for error messages in the replication section
    -# Common errors include "no common snapshot" or connection failures
    +
    $ ssh root@r0 "mkdir -p /data/nfs/k3svolumes/anki-sync-server/anki_data"
    +$ kubectl create namespace services
    +$ kubectl create secret generic anki-sync-server-secret \
    +    --from-literal=SYNC_USER1='paul:SECRETPASSWORD' \
    +    -n services
     

    -

    Fixing "No Common Snapshot" Errors


    -
    -This is the most common replication issue, typically occurring when:
    +If the services namespace already exists, you can skip that line or let Kubernetes tell you the namespace is unchanged.

    -
      -
    • The receiver has existing snapshots that don't match the sender
    • -
    • Different snapshot naming schemes are in use
    • -
    • The receiver dataset was created independently
    • -

    -**Error message example:**
    -
    -no common snapshot or suitable bookmark between sender and receiver
    -
    +

    Deploy the chart



    -**Solution: Clean up conflicting snapshots on receiver**
    +With the prerequisites in place, install (or upgrade) the chart. It pins the container image to the tag we just pushed and mounts the NFS export via a PersistentVolume/PersistentVolumeClaim pair:

    -
    # First, identify the destination dataset on f1
    -paul@f1:~ % doas zfs list | grep sink
    -
    -# Check existing snapshots on the problematic dataset
    -paul@f1:~ % doas zfs list -t snapshot | grep nfsdata
    -
    -# If you see snapshots with different naming (e.g., @daily-*, @weekly-*)
    -# these conflict with zrepl's @zrepl_* snapshots
    -
    -# Destroy the entire destination dataset to allow clean replication
    -paul@f1:~ % doas zfs destroy -r zdata/sink/f0/zdata/enc/nfsdata
    -
    -# For VM replication, do the same for the freebsd dataset
    -paul@f1:~ % doas zfs destroy -r zdata/sink/f0/zroot/bhyve/freebsd
    -
    -# Wake up zrepl to start fresh replication
    -paul@f0:~ % doas zrepl signal wakeup f0_to_f1_nfsdata
    -paul@f0:~ % doas zrepl signal wakeup f0_to_f1_freebsd
    -
    -# Check replication status
    -paul@f0:~ % doas zrepl status --mode raw
    +
    $ cd ../helm-chart
    +$ helm upgrade --install anki-sync-server . -n services
    +
    +
    +Helm provisions everything referenced in the templates:
    +
    +
    +containers:
    +- name: anki-sync-server  image: registry.lan.buetow.org:30001/anki-sync-server:25.07.5b
    +  volumeMounts:
    +  - name: anki-data
    +    mountPath: /anki_data
     

    -**Verification that replication is working:**
    +Once the release comes up, verify that the pod pulled the freshly pushed image and that the ingress we configured earlier resolves through relayd just like the Apache example.

    -
    # Look for "stepping" state and active zfs send processes
    -paul@f0:~ % doas zrepl status --mode raw | grep -A5 "State.*stepping"
    -
    -# Check for active ZFS commands
    -paul@f0:~ % doas zrepl status --mode raw | grep -A10 "ZFSCmds.*Active"
    -
    -# Monitor progress - bytes replicated should be increasing
    -paul@f0:~ % doas zrepl status --mode raw | grep BytesReplicated
    +
    $ kubectl get pods -n services
    +$ kubectl get ingress anki-sync-server-ingress -n services
    +$ curl https://anki.f3s.foo.zone/health
     

    -

    Network Connectivity Issues


    +All of this runs solely on first-party images that now live in the private registry, proving the full flow from local bild to WireGuard-exposed service.

    -If replication fails to connect:
    +

    NFSv4 UID mapping for Postgres-backed (and other) apps


    +
    +NFSv4 only sees numeric user and group IDs, so the postgres account created inside the container must exist with the same UID/GID on the Kubernetes worker and on the FreeBSD NFS servers. Otherwise the pod starts with UID 999, the export sees it as an unknown anonymous user, and Postgres fails to initialise its data directory.
    +
    +To verify things line up end-to-end I run id in the container and on the hosts:

    -
    # Test connectivity between nodes
    -paul@f0:~ % nc -zv 192.168.2.131 8888
    +
    > ~ kubectl exec -n services deploy/miniflux-postgres -- id postgres
    +uid=999(postgres) gid=999(postgres) groups=999(postgres)
     
    -# Check if zrepl is listening on f1
    -paul@f1:~ % doas netstat -an | grep 8888
    +[root@r0 ~]# id postgres
    +uid=999(postgres) gid=999(postgres) groups=999(postgres)
     
    -# Verify WireGuard tunnel is working
    -paul@f0:~ % ping 192.168.2.131
    +paul@f0:~ % doas id postgres
    +uid=999(postgres) gid=99(postgres) groups=999(postgres)
     

    -

    Encryption Key Issues


    -
    -If encrypted replication fails:
    +The Rocky Linux workers get their matching user with plain useradd/groupadd (repeat on r0, r1, and r2):

    -
    # Verify encryption keys are available on both nodes
    -paul@f0:~ % doas zfs get keystatus zdata/enc/nfsdata
    -paul@f1:~ % doas zfs get keystatus zdata/sink/f0/zdata/enc/nfsdata
    -
    -# Load keys if unavailable
    -paul@f1:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
    -    zdata/sink/f0/zdata/enc/nfsdata
    +
    [root@r0 ~]# groupadd --gid 999 postgres
    +[root@r0 ~]# useradd --uid 999 --gid 999 \
    +                --home-dir /var/lib/pgsql \
    +                --shell /sbin/nologin postgres
     

    -

    Monitoring Ongoing Replication


    -
    -After fixing issues, monitor replication health:
    +FreeBSD uses pw, so on each NFS server (f0, f1, f2) I created the same account and disabled shell access:

    -
    # Monitor replication progress (run repeatedly to check status)
    -paul@f0:~ % doas zrepl status --mode raw | grep -A10 BytesReplicated
    -
    -# Or install watch from ports and use it
    -paul@f0:~ % doas pkg install watch
    -paul@f0:~ % watch -n 5 'doas zrepl status --mode raw | grep -A10 BytesReplicated'
    -
    -# Check for new snapshots being created
    -paul@f0:~ % doas zfs list -t snapshot | grep zrepl | tail -5
    -
    -# Verify snapshots appear on receiver
    -paul@f1:~ % doas zfs list -t snapshot -r zdata/sink | grep zrepl | tail -5
    +
    paul@f0:~ % doas pw groupadd postgres -g 999
    +paul@f0:~ % doas pw useradd postgres -u 999 -g postgres \
    +                -d /var/db/postgres -s /usr/sbin/nologin
     

    -This troubleshooting process resolves the most common zrepl issues and ensures continuous data replication between your storage nodes.
    -
    -

    CARP (Common Address Redundancy Protocol)


    -
    -High availability is crucial for storage systems. If the storage server goes down, all NFS clients (which will also be Kubernetes pods later on in this series) lose access to their persistent data. CARP provides a solution by creating a virtual IP address that automatically migrates to a different server during failures. This means that clients point to that VIP for NFS mounts and are always contacting the current primary node.
    -
    -

    How CARP Works


    -
    -In our case, CARP allows two hosts (f0 and f1) to share a virtual IP address (VIP). The hosts communicate using multicast to elect a MASTER, while the other remain as BACKUP. When the MASTER fails, the BACKUP automatically promotes itself, and the VIP is reassigned to the new MASTER. This happens within seconds.
    -
    -Key benefits for our storage system:
    -
    -
      -
    • Automatic failover: No manual intervention is required for basic failures, although there are a few limitations. The backup will have read-only access to the available data by default, as we have already learned.
    • -
    • Transparent to clients: Pods continue using the same IP address
    • -
    • Works with stunnel: Behind the VIP, there will be a stunnel process running, which ensures encrypted connections follow the active server.
    • -

    -FreeBSD CARP
    -Stunnel
    -
    -

    Configuring CARP


    +Once the UID/GID exist everywhere, the Miniflux chart in examples/conf/f3s/miniflux deploys cleanly. The chart provisions both the application and its bundled Postgres database, mounts the exported directory, and builds the DSN at runtime. The important bits live in helm-chart/templates/persistent-volumes.yaml and deployment.yaml:

    -First, we add the CARP configuration to /etc/rc.conf on both f0 and f1:
    +
    +# Persistent volume lives on the NFS export
    +hostPath:
    +  path: /data/nfs/k3svolumes/miniflux/data
    +  type: Directory
    +...
    +containers:
    +- name: miniflux-postgres
    +  image: postgres:17
    +  volumeMounts:
    +  - name: miniflux-postgres-data
    +    mountPath: /var/lib/postgresql/data
    +

    -Update: Sun 4 Jan 00:17:00 EET 2026 - Added advskew 100 to f1 so f0 always wins CARP elections when it comes back online after a reboot.
    +Follow the README beside the chart to create the secrets and the target directory:

    -
    # On f0 - The virtual IP 192.168.1.138 will float between f0 and f1
    -ifconfig_re0_alias0="inet vhid 1 pass testpass alias 192.168.1.138/32"
    -
    -# On f1 - Higher advskew means lower priority, so f0 wins elections
    -ifconfig_re0_alias0="inet vhid 1 advskew 100 pass testpass alias 192.168.1.138/32"
    +
    $ cd examples/conf/f3s/miniflux/helm-chart
    +$ mkdir -p /data/nfs/k3svolumes/miniflux/data
    +$ kubectl create secret generic miniflux-db-password \
    +    --from-literal=fluxdb_password='YOUR_PASSWORD' -n services
    +$ kubectl create secret generic miniflux-admin-password \
    +    --from-literal=admin_password='YOUR_ADMIN_PASSWORD' -n services
    +$ helm upgrade --install miniflux . -n services --create-namespace
     

    -Whereas:
    -
    -
      -
    • vhid 1: Virtual Host ID - must match on all CARP members
    • -
    • advskew: Advertisement skew - higher value means lower priority (f1 uses 100, f0 uses default 0)
    • -
    • pass testpass: Password for CARP authentication (if you follow this, use a different password!)
    • -
    • alias 192.168.1.138/32: The virtual IP address with a /32 netmask
    • -

    -Next, update /etc/hosts on all nodes (f0, f1, f2, r0, r1, r2) to resolve the VIP hostname:
    +And to verify it's all up:

    -192.168.2.138 f3s-storage-ha f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
    -fd42:beef:cafe:2::138 f3s-storage-ha f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
    +$ kubectl get all --namespace=services | grep mini
    +pod/miniflux-postgres-556444cb8d-xvv2p   1/1     Running   0             54d
    +pod/miniflux-server-85d7c64664-stmt9     1/1     Running   0             54d
    +service/miniflux                   ClusterIP   10.43.47.80     <none>        8080/TCP             54d
    +service/miniflux-postgres          ClusterIP   10.43.139.50    <none>        5432/TCP             54d
    +deployment.apps/miniflux-postgres   1/1     1            1           54d
    +deployment.apps/miniflux-server     1/1     1            1           54d
    +replicaset.apps/miniflux-postgres-556444cb8d   1         1         1       54d
    +replicaset.apps/miniflux-server-85d7c64664     1         1         1       54d
     

    -This allows clients to connect to f3s-storage-ha regardless of which physical server is currently the MASTER.
    +Or from the repository root I simply run:

    -

    CARP State Change Notifications


    +

    Helm charts currently in service



    -To correctly manage services during failover, we need to detect CARP state changes. FreeBSD's devd system can notify us when CARP transitions between MASTER and BACKUP states.
    +These are the charts that already live under examples/conf/f3s and run on the cluster today (and I'll keep adding more as new services graduate into production):

    -Add this to /etc/devd.conf on both f0 and f1:
    +
      +
    • anki-sync-server — custom-built image served from the private registry, stores decks on /data/nfs/k3svolumes/anki-sync-server/anki_data, and authenticates through the anki-sync-server-secret.
    • +
    • koreade-sync-server — Sync server for KOReader.
    • +
    • audiobookshelf — media streaming stack with three hostPath mounts (config, audiobooks, podcasts) so the library survives node rebuilds.
    • +
    • example-apache — minimal HTTP service I use for smoke-testing ingress and relayd rules.
    • +
    • example-apache-volume-claim — Apache plus PVC variant that exercises NFS-backed storage for walkthroughs like the one earlier in this post.
    • +
    • miniflux — the Postgres-backed feed reader described above, wired for NFSv4 UID mapping and per-release secrets.
    • +
    • opodsync — podsync deployment with its data directory under /data/nfs/k3svolumes/opodsync/data.
    • +
    • radicale — CalDAV/CardDAV (and gpodder) backend with separate collections and auth volumes.
    • +
    • registry — the plain-HTTP Docker registry exposed on NodePort 30001 and mirrored internally as registry.lan.buetow.org:30001.
    • +
    • syncthing — two-volume setup for config and shared data, fronted by the syncthing.f3s.foo.zone ingress.
    • +
    • wallabag — read-it-later service with persistent data and images directories on the NFS export.
    • +

    +I hope you enjoyed this walkthrough. Read the next post of this series:

    - -
    paul@f0:~ % cat <<END | doas tee -a /etc/devd.conf
    -notify 0 {
    -        match "system"          "CARP";
    -        match "subsystem"       "[0-9]+@[0-9a-z.]+";
    -        match "type"            "(MASTER|BACKUP)";
    -        action "/usr/local/bin/carpcontrol.sh $subsystem $type";
    -};
    -END
    -
    -paul@f0:~ % doas service devd restart
    -
    +f3s: Kubernetes with FreeBSD - Part 8: Observability

    -Next, we create the CARP control script that will restart stunnel when the CARP state changes:
    +Other *BSD-related posts:

    -Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed $1 to $2 because devd passes $subsystem $type, so the state is in the second argument.
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments (You are currently reading this)
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD

    - -
    paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
    -#!/bin/sh
    -# CARP state change control script
    -
    -case "$2" in
    -    MASTER)
    -        logger "CARP state changed to MASTER, starting services"
    -        ;;
    -    BACKUP)
    -        logger "CARP state changed to BACKUP, stopping services"
    -        ;;
    -    *)
    -        logger "CARP state changed to $2 (unhandled)"
    -        ;;
    -esac
    -EOF
    -
    -paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
    -
    -# Copy the same script to f1
    -paul@f0:~ % scp /usr/local/bin/carpcontrol.sh f1:/tmp/
    -paul@f1:~ % doas mv /tmp/carpcontrol.sh /usr/local/bin/
    -paul@f1:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
    -
    +E-Mail your comments to paul@nospam.buetow.org

    -Note that carpcontrol.sh doesn't do anything useful yet. We will provide more details (including starting and stopping services upon failover) later in this blog post.
    +Back to the main site
    +
    +
    +
    + + Bash Golf Part 4 + + gemini://foo.zone/gemfeed/2025-09-14-bash-golf-part-4.gmi + 2025-09-13T12:04:03+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the fourth blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time. + +
    +

    Bash Golf Part 4



    -To enable CARP in /boot/loader.conf, run:
    +Published at 2025-09-13T12:04:03+03:00

    - -
    paul@f0:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
    -carp_load="YES"
    -paul@f1:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf  
    -carp_load="YES"
    +This is the fourth blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time. 
    +
    +2021-11-29 Bash Golf Part 1
    +2022-01-01 Bash Golf Part 2
    +2023-12-10 Bash Golf Part 3
    +2025-09-14 Bash Golf Part 4 (You are currently reading this)
    +
    +
    +    '\       '\        '\        '\                   .  .        |>18>>
    +      \        \         \         \              .         ' .   |
    +     O>>      O>>       O>>       O>>         .                 'o |
    +      \       .\. ..    .\. ..    .\. ..   .                      |
    +      /\    .  /\     .  /\     .  /\    . .                      |
    +     / /   .  / /  .'.  / /  .'.  / /  .'    .                    |
    +jgs^^^^^^^`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    +                        Art by Joan Stark, mod. by Paul Buetow
     

    -Then reboot both hosts or run doas kldload carp to load the module immediately.
    +

    Table of Contents



    -

    NFS Server Configuration


    +
    +

    Split pipelines with tee + process substitution



    -With ZFS replication in place, we can now set up NFS servers on both f0 and f1 to export the replicated data. Since native NFS over TLS (RFC 9289) has compatibility issues between Linux and FreeBSD (not digging into the details here, but I couldn't get it to work), we'll use stunnel to provide encryption.
    +Sometimes you want to fan out one stream to multiple consumers and still continue the original pipeline. tee plus process substitution does exactly that:

    -

    Setting up NFS on f0 (Primary)


    +
    +somecommand \
    +    | tee >(command1) >(command2) \
    +    | command3
    +

    -First, enable the NFS services in rc.conf:
    +All of command1, command2, and command3 see the output of somecommand. Example:

    -
    paul@f0:~ % doas sysrc nfs_server_enable=YES
    -nfs_server_enable: YES -> YES
    -paul@f0:~ % doas sysrc nfsv4_server_enable=YES
    -nfsv4_server_enable: YES -> YES
    -paul@f0:~ % doas sysrc nfsuserd_enable=YES
    -nfsuserd_enable: YES -> YES
    -paul@f0:~ % doas sysrc nfsuserd_flags="-domain lan.buetow.org"
    -nfsuserd_flags: "" -> "-domain lan.buetow.org"
    -paul@f0:~ % doas sysrc mountd_enable=YES
    -mountd_enable: NO -> YES
    -paul@f0:~ % doas sysrc rpcbind_enable=YES
    -rpcbind_enable: NO -> YES
    +
    printf 'a\nb\n' \
    +    | tee >(sed 's/.*/X:&/; s/$/ :c1/') >(tr a-z A-Z | sed 's/$/ :c2/') \
    +    | sed 's/$/ :c3/'
     

    -Update: 08.08.2025: I've added the domain to nfsuserd_flags
    -
    -And we also create a dedicated directory for Kubernetes volumes:
    +Output:

    - -
    # First, ensure the dataset is mounted
    -paul@f0:~ % doas zfs get mounted zdata/enc/nfsdata
    -NAME               PROPERTY  VALUE    SOURCE
    -zdata/enc/nfsdata  mounted   yes      -
    -
    -# Create the k3svolumes directory
    -paul@f0:~ % doas mkdir -p /data/nfs/k3svolumes
    -paul@f0:~ % doas chmod 755 /data/nfs/k3svolumes
    +
    +a :c3
    +b :c3
    +A :c2 :c3
    +B :c2 :c3
    +X:a :c1 :c3
    +X:b :c1 :c3
     

    -We also create the /etc/exports file. Since we're using stunnel for encryption, ALL clients must connect through stunnel, which appears as localhost (127.0.0.1) to the NFS server:
    +This relies on Bash process substitution (>(...)). Make sure your shell is Bash and not a POSIX /bin/sh.
    +
    +Example (fails under dash/POSIX sh):

    -
    paul@f0:~ % doas tee /etc/exports <<'EOF'
    -V4: /data/nfs -sec=sys
    -/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
    -EOF
    +
    /bin/sh -c 'echo hi | tee >(cat)'
    +# /bin/sh: 1: Syntax error: "(" unexpected
     

    -The exports configuration:
    +Combine with set -o pipefail if failures in side branches should fail the whole pipeline.

    -
      -
    • V4: /data/nfs -sec=sys: Sets the NFSv4 root directory to /data/nfs
    • -
    • -maproot=root: Maps root user from client to root on server
    • -
    • -network 127.0.0.1: Only accepts connections from localhost (stunnel)
    • -

    -To start the NFS services, we run:
    +Example:

    -
    paul@f0:~ % doas service rpcbind start
    -Starting rpcbind.
    -paul@f0:~ % doas service mountd start
    -Starting mountd.
    -paul@f0:~ % doas service nfsd start
    -Starting nfsd.
    -paul@f0:~ % doas service nfsuserd start
    -Starting nfsuserd.
    -
    -
    -

    Configuring Stunnel for NFS Encryption with CARP Failover


    -
    -Using stunnel with client certificate authentication for NFS encryption provides several advantages:
    -
    -
      -
    • Compatibility: Works with any NFS version and between different operating systems
    • -
    • Strong encryption: Uses TLS/SSL with configurable cipher suites
    • -
    • Transparent: Applications don't need modification, encryption happens at the transport layer
    • -
    • Performance: Minimal overhead (~2% in benchmarks)
    • -
    • Flexibility: Can encrypt any TCP-based protocol, not just NFS
    • -
    • Strong Authentication: Client certificates provide cryptographic proof of identity
    • -
    • Access Control: Only clients with valid certificates signed by your CA can connect
    • -
    • Certificate Revocation: You can revoke access by removing certificates from the CA
    • -

    -Stunnel integrates seamlessly with our CARP setup:
    -
    -
    -                    CARP VIP (192.168.1.138)
    -                           |
    -    f0 (MASTER) ←---------→|←---------→ f1 (BACKUP)
    -    stunnel:2323           |           stunnel:stopped
    -    nfsd:2049              |           nfsd:stopped
    -                           |
    -                    Clients connect here
    +
    set -o pipefail
    +printf 'ok\n' | tee >(false) | cat >/dev/null
    +echo $?   # 1 because a side branch failed
     

    -The key insight is that stunnel binds to the CARP VIP. When CARP fails over, the VIP is moved to the new master, and stunnel starts there automatically. Clients maintain their connection to the same IP throughout.
    -
    -

    Creating a Certificate Authority for Client Authentication


    +Further reading:

    -First, create a CA to sign both server and client certificates:
    +Splitting pipelines with tee

    - -
    # On f0 - Create CA
    -paul@f0:~ % doas mkdir -p /usr/local/etc/stunnel/ca
    -paul@f0:~ % cd /usr/local/etc/stunnel/ca
    -paul@f0:~ % doas openssl genrsa -out ca-key.pem 4096
    -paul@f0:~ % doas openssl req -new -x509 -days 3650 -key ca-key.pem -out ca-cert.pem \
    -  -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=F3S Stunnel CA'
    -
    -# Create server certificate
    -paul@f0:~ % cd /usr/local/etc/stunnel
    -paul@f0:~ % doas openssl genrsa -out server-key.pem 4096
    -paul@f0:~ % doas openssl req -new -key server-key.pem -out server.csr \
    -  -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=f3s-storage-ha.lan'
    -paul@f0:~ % doas openssl x509 -req -days 3650 -in server.csr -CA ca/ca-cert.pem \
    -  -CAkey ca/ca-key.pem -CAcreateserial -out server-cert.pem
    -
    -# Create client certificates for authorised clients
    -paul@f0:~ % cd /usr/local/etc/stunnel/ca
    -paul@f0:~ % doas sh -c 'for client in r0 r1 r2 earth; do 
    -  openssl genrsa -out ${client}-key.pem 4096
    -  openssl req -new -key ${client}-key.pem -out ${client}.csr \
    -    -subj "/C=US/ST=State/L=City/O=F3S Storage/CN=${client}.lan.buetow.org"
    -  openssl x509 -req -days 3650 -in ${client}.csr -CA ca-cert.pem \
    -    -CAkey ca-key.pem -CAcreateserial -out ${client}-cert.pem
    -  # Combine cert and key into a single file for stunnel client
    -  cat ${client}-cert.pem ${client}-key.pem > ${client}-stunnel.pem
    -done'
    -
    +

    Heredocs for remote sessions (and their gotchas)



    -

    Install and Configure Stunnel on f0


    +Heredocs are great to send multiple commands over SSH in a readable way:

    -
    # Install stunnel
    -paul@f0:~ % doas pkg install -y stunnel
    -
    -# Configure stunnel server with client certificate authentication
    -paul@f0:~ % doas tee /usr/local/etc/stunnel/stunnel.conf <<'EOF'
    -cert = /usr/local/etc/stunnel/server-cert.pem
    -key = /usr/local/etc/stunnel/server-key.pem
    -
    -setuid = stunnel
    -setgid = stunnel
    -
    -[nfs-tls]
    -accept = 192.168.1.138:2323
    -connect = 127.0.0.1:2049
    -CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
    -verify = 2
    -requireCert = yes
    +
    ssh "$SSH_USER@$SSH_HOST" <<EOF
    +    # Go to the work directory
    +    cd "$WORK_DIR"
    +  
    +    # Make a git pull
    +    git pull
    +  
    +    # Export environment variables required for the service to run
    +    export AUTH_TOKEN="$APP_AUTH_TOKEN"
    +  
    +    # Start the service
    +    docker compose up -d --build
     EOF
    -
    -# Enable and start stunnel
    -paul@f0:~ % doas sysrc stunnel_enable=YES
    -stunnel_enable:  -> YES
    -paul@f0:~ % doas service stunnel start
    -Starting stunnel.
    -
    -# Restart stunnel to apply the CARP VIP binding
    -paul@f0:~ % doas service stunnel restart
    -Stopping stunnel.
    -Starting stunnel.
     

    -The configuration includes:
    +Tips:

    -
      -
    • verify = 2: Verify client certificate and fail if not provided
    • -
    • requireCert = yes: Client must present a valid certificate
    • -
    • CAfile: Path to the CA certificate that signed the client certificates
    • -

    -

    Setting up NFS on f1 (Standby)


    +Quoting the delimiter changes interpolation. Use <<'EOF' to avoid local expansion and send the content literally.

    -Repeat the same configuration on f1:
    +Example:

    -
    paul@f1:~ % doas sysrc nfs_server_enable=YES
    -nfs_server_enable: NO -> YES
    -paul@f1:~ % doas sysrc nfsv4_server_enable=YES
    -nfsv4_server_enable: NO -> YES
    -paul@f1:~ % doas sysrc nfsuserd_enable=YES
    -nfsuserd_enable: NO -> YES
    -paul@f1:~ % doas sysrc mountd_enable=YES
    -mountd_enable: NO -> YES
    -paul@f1:~ % doas sysrc rpcbind_enable=YES
    -rpcbind_enable: NO -> YES
    -
    -paul@f1:~ % doas tee /etc/exports <<'EOF'
    -V4: /data/nfs -sec=sys
    -/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
    -EOF
    -
    -paul@f1:~ % doas service rpcbind start
    -Starting rpcbind.
    -paul@f1:~ % doas service mountd start
    -Starting mountd.
    -paul@f1:~ % doas service nfsd start
    -Starting nfsd.
    -paul@f1:~ % doas service nfsuserd start
    -Starting nfsuserd.
    +http://www.gnu.org/software/src-highlite -->
    +
    FOO=bar
    +cat <<'EOF'
    +$FOO is not expanded here
    +EOF
     

    -And to configure stunnel on f1, we run:
    +Prefer explicit quoting for variables (as above) to avoid surprises. Example (spaces preserved only when quoted):

    -
    # Install stunnel
    -paul@f1:~ % doas pkg install -y stunnel
    -
    -# Copy certificates from f0
    -paul@f0:~ % doas tar -cf /tmp/stunnel-certs.tar \
    -  -C /usr/local/etc/stunnel server-cert.pem server-key.pem ca
    -paul@f0:~ % scp /tmp/stunnel-certs.tar f1:/tmp/
    -
    -paul@f1:~ % cd /usr/local/etc/stunnel && doas tar -xf /tmp/stunnel-certs.tar
    -
    -# Configure stunnel server on f1 with client certificate authentication
    -paul@f1:~ % doas tee /usr/local/etc/stunnel/stunnel.conf <<'EOF'
    -cert = /usr/local/etc/stunnel/server-cert.pem
    -key = /usr/local/etc/stunnel/server-key.pem
    -
    -setuid = stunnel
    -setgid = stunnel
    -
    -[nfs-tls]
    -accept = 192.168.1.138:2323
    -connect = 127.0.0.1:2049
    -CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
    -verify = 2
    -requireCert = yes
    +
    WORK_DIR="/tmp/my work"
    +ssh host <<EOF
    +    cd $WORK_DIR      # may break if unquoted
    +    cd "$WORK_DIR"   # safe
     EOF
    -
    -# Enable and start stunnel
    -paul@f1:~ % doas sysrc stunnel_enable=YES
    -stunnel_enable:  -> YES
    -paul@f1:~ % doas service stunnel start
    -Starting stunnel.
    -
    -# Restart stunnel to apply the CARP VIP binding
    -paul@f1:~ % doas service stunnel restart
    -Stopping stunnel.
    -Starting stunnel.
     

    -

    CARP Control Script for Clean Failover


    -
    -With stunnel configured to bind to the CARP VIP (192.168.1.138), only the server that is currently the CARP MASTER will accept stunnel connections. This provides automatic failover for encrypted NFS:
    -
    -
      -
    • When f0 is CARP MASTER: stunnel on f0 accepts connections on 192.168.1.138:2323
    • -
    • When f1 becomes CARP MASTER: stunnel on f1 starts accepting connections on 192.168.1.138:2323
    • -
    • The backup server's stunnel process will fail to bind to the VIP and won't accept connections
    • -

    -This ensures that clients always connect to the active NFS server through the CARP VIP. To ensure clean failover behaviour and prevent stale file handles, we'll update our carpcontrol.sh script so that:
    +Consider set -euo pipefail at the top of the remote block for stricter error handling. Example:

    -
      -
    • Stops NFS services on BACKUP nodes (preventing split-brain scenarios)
    • -
    • Starts NFS services only on the MASTER node
    • -
    • Manages stunnel binding to the CARP VIP
    • -

    -This approach ensures clients can only connect to the active server, eliminating stale handles from the inactive server:
    + +
    ssh host <<'EOF'
    +    set -euo pipefail
    +    false   # causes immediate failure
    +    echo never
    +EOF
    +

    -Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed $1 to $2 because devd passes $subsystem $type, so the state is in the second argument.
    +Indent-friendly variant: use a dash to strip leading tabs in the body:

    -
    # Create CARP control script on both f0 and f1
    -paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
    -#!/bin/sh
    -# CARP state change control script
    -
    -HOSTNAME=`hostname`
    -
    -if [ ! -f /data/nfs/nfs.DO_NOT_REMOVE ]; then
    -    logger '/data/nfs not mounted, mounting it now!'
    -    if [ "$HOSTNAME" = 'f0.lan.buetow.org' ]; then
    -        zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    -        zfs set mountpoint=/data/nfs zdata/enc/nfsdata
    -    else
    -        zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
    -        zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
    -        zfs mount zdata/sink/f0/zdata/enc/nfsdata
    -        zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    -    fi
    -    service nfsd stop 2>&1
    -    service mountd stop 2>&1
    -fi
    -
    -
    -case "$2" in
    -    MASTER)
    -        logger "CARP state changed to MASTER, starting services"
    -        service rpcbind start >/dev/null 2>&1
    -        service mountd start >/dev/null 2>&1
    -        service nfsd start >/dev/null 2>&1
    -        service nfsuserd start >/dev/null 2>&1
    -        service stunnel restart >/dev/null 2>&1
    -        logger "CARP MASTER: NFS and stunnel services started"
    -        ;;
    -    BACKUP)
    -        logger "CARP state changed to BACKUP, stopping services"
    -        service stunnel stop >/dev/null 2>&1
    -        service nfsd stop >/dev/null 2>&1
    -        service mountd stop >/dev/null 2>&1
    -        service nfsuserd stop >/dev/null 2>&1
    -        logger "CARP BACKUP: NFS and stunnel services stopped"
    -        ;;
    -    *)
    -        logger "CARP state changed to $2 (unhandled)"
    -        ;;
    -esac
    +
    cat <<-EOF > script.sh
    +	#!/usr/bin/env bash
    +	echo "tab-indented content is dedented"
     EOF
    -
    -paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
     

    -

    CARP Management Script


    +Further reading:

    -To simplify CARP state management and failover testing, create this helper script on both f0 and f1:
    +Heredoc headaches and fixes
    +
    +

    Namespacing and dynamic dispatch with ::


    +
    +You can emulate simple namespacing by encoding hierarchy in function names. One neat pattern is pseudo-inheritance via a tiny super helper that maps pkg::lang::action to a pkg::base::action default.

    -
    # Create the CARP management script
    -paul@f0:~ % doas tee /usr/local/bin/carp <<'EOF'
    -#!/bin/sh
    -# CARP state management script
    -# Usage: carp [master|backup|auto-failback enable|auto-failback disable]
    -# Without arguments: shows current state
    -
    -# Find the interface with CARP configured
    -CARP_IF=$(ifconfig -l | xargs -n1 | while read if; do
    -    ifconfig "$if" 2>/dev/null | grep -q "carp:" && echo "$if" && break
    -done)
    -
    -if [ -z "$CARP_IF" ]; then
    -    echo "Error: No CARP interface found"
    -    exit 1
    -fi
    -
    -# Get CARP VHID
    -VHID=$(ifconfig "$CARP_IF" | grep "carp:" | sed -n 's/.*vhid \([0-9]*\).*/\1/p')
    -
    -if [ -z "$VHID" ]; then
    -    echo "Error: Could not determine CARP VHID"
    -    exit 1
    -fi
    -
    -# Function to get the current state
    -get_state() {
    -    ifconfig "$CARP_IF" | grep "carp:" | awk '{print $2}'
    -}
    +
    #!/usr/bin/env bash
    +set -euo pipefail
     
    -# Check for auto-failback block file
    -BLOCK_FILE="/data/nfs/nfs.NO_AUTO_FAILBACK"
    -check_auto_failback() {
    -    if [ -f "$BLOCK_FILE" ]; then
    -        echo "WARNING: Auto-failback is DISABLED (file exists: $BLOCK_FILE)"
    -    fi
    +super() {
    +    local -r fn=${FUNCNAME[1]}
    +    # Split name on :: and dispatch to base implementation
    +    local -a parts=( ${fn//::/ } )
    +    "${parts[0]}::base::${parts[2]}" "$@"
     }
     
    -# Main logic
    -case "$1" in
    -    "")
    -        # No argument - show current state
    -        STATE=$(get_state)
    -        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    -        check_auto_failback
    -        ;;
    -    master)
    -        # Force to MASTER state
    -        echo "Setting CARP to MASTER state..."
    -        ifconfig "$CARP_IF" vhid "$VHID" state master
    -        sleep 1
    -        STATE=$(get_state)
    -        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    -        check_auto_failback
    -        ;;
    -    backup)
    -        # Force to BACKUP state
    -        echo "Setting CARP to BACKUP state..."
    -        ifconfig "$CARP_IF" vhid "$VHID" state backup
    -        sleep 1
    -        STATE=$(get_state)
    -        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    -        check_auto_failback
    -        ;;
    -    auto-failback)
    -        case "$2" in
    -            enable)
    -                if [ -f "$BLOCK_FILE" ]; then
    -                    rm "$BLOCK_FILE"
    -                    echo "Auto-failback ENABLED (removed $BLOCK_FILE)"
    -                else
    -                    echo "Auto-failback was already enabled"
    -                fi
    -                ;;
    -            disable)
    -                if [ ! -f "$BLOCK_FILE" ]; then
    -                    touch "$BLOCK_FILE"
    -                    echo "Auto-failback DISABLED (created $BLOCK_FILE)"
    -                else
    -                    echo "Auto-failback was already disabled"
    -                fi
    -                ;;
    -            *)
    -                echo "Usage: $0 auto-failback [enable|disable]"
    -                echo "  enable:  Remove block file to allow automatic failback"
    -                echo "  disable: Create block file to prevent automatic failback"
    -                exit 1
    -                ;;
    -        esac
    -        ;;
    -    *)
    -        echo "Usage: $0 [master|backup|auto-failback enable|auto-failback disable]"
    -        echo "  Without arguments: show current CARP state"
    -        echo "  master: force this node to become CARP MASTER"
    -        echo "  backup: force this node to become CARP BACKUP"
    -        echo "  auto-failback enable:  allow automatic failback to f0"
    -        echo "  auto-failback disable: prevent automatic failback to f0"
    -        exit 1
    -        ;;
    -esac
    -EOF
    -
    -paul@f0:~ % doas chmod +x /usr/local/bin/carp
    +foo::base::greet() { echo "base: $@"; }
    +foo::german::greet()  { super "Guten Tag, $@!"; }
    +foo::english::greet() { super "Good day,  $@!"; }
     
    -# Copy to f1 as well
    -paul@f0:~ % scp /usr/local/bin/carp f1:/tmp/
    -paul@f1:~ % doas cp /tmp/carp /usr/local/bin/carp && doas chmod +x /usr/local/bin/carp
    +for lang in german english; do
    +    foo::$lang::greet Paul
    +done
    +
    +
    +Output:
    +
    +
    +base: Guten Tag, Paul!
    +base: Good day,  Paul!
     

    -Now you can easily manage CARP states and auto-failback:
    +

    Indirect references with namerefs


    +
    +declare -n creates a name reference — a variable that points to another variable. It’s cleaner than eval for indirection:

    -
    # Check current CARP state
    -paul@f0:~ % doas carp
    -CARP state on re0 (vhid 1): MASTER
    -
    -# If auto-failback is disabled, you'll see a warning
    -paul@f0:~ % doas carp
    -CARP state on re0 (vhid 1): MASTER
    -WARNING: Auto-failback is DISABLED (file exists: /data/nfs/nfs.NO_AUTO_FAILBACK)
    -
    -# Force f0 to become BACKUP (triggers failover to f1)
    -paul@f0:~ % doas carp backup
    -Setting CARP to BACKUP state...
    -CARP state on re0 (vhid 1): BACKUP
    -
    -# Disable auto-failback (useful for maintenance)
    -paul@f0:~ % doas carp auto-failback disable
    -Auto-failback DISABLED (created /data/nfs/nfs.NO_AUTO_FAILBACK)
    -
    -# Enable auto-failback
    -paul@f0:~ % doas carp auto-failback enable
    -Auto-failback ENABLED (removed /data/nfs/nfs.NO_AUTO_FAILBACK)
    +
    user_name=paul
    +declare -n ref=user_name
    +echo "$ref"       # paul
    +ref=julia
    +echo "$user_name" # julia
     

    -

    Automatic Failback After Reboot


    +Output:

    -When f0 reboots (planned or unplanned), f1 takes over as CARP MASTER. To ensure f0 automatically reclaims its primary role once it's fully operational, we'll implement an automatic failback mechanism. With:
    +
    +paul
    +julia
    +

    -Update: Fixed the script at Sun 4 Jan 00:04:28 EET 2026 - removed the NFS service check because when f0 is BACKUP, NFS services are intentionally stopped by carpcontrol.sh, which would prevent auto-failback from ever triggering.
    +Namerefs are local to functions when declared with local -n. Requires Bash ≥4.3.
    +
    +You can also construct the target name dynamically:

    -
    paul@f0:~ % doas tee /usr/local/bin/carp-auto-failback.sh <<'EOF'
    -#!/bin/sh
    -# CARP automatic failback script for f0
    -# Ensures f0 reclaims MASTER role after reboot when storage is ready
    -
    -LOGFILE="/var/log/carp-auto-failback.log"
    -MARKER_FILE="/data/nfs/nfs.DO_NOT_REMOVE"
    -BLOCK_FILE="/data/nfs/nfs.NO_AUTO_FAILBACK"
    -
    -log_message() {
    -    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOGFILE"
    +
    make_var() {
    +    local idx=$1; shift
    +    local name="slot_$idx"
    +    printf -v "$name" '%s' "$*"   # create variable slot_$idx
     }
     
    -# Check if we're already MASTER
    -CURRENT_STATE=$(/usr/local/bin/carp | awk '{print $NF}')
    -if [ "$CURRENT_STATE" = "MASTER" ]; then
    -    exit 0
    -fi
    -
    -# Check if /data/nfs is mounted
    -if ! mount | grep -q "on /data/nfs "; then
    -    log_message "SKIP: /data/nfs not mounted"
    -    exit 0
    -fi
    -
    -# Check if the marker file exists
    -# (identifies that the ZFS data set is properly mounted)
    -if [ ! -f "$MARKER_FILE" ]; then
    -    log_message "SKIP: Marker file $MARKER_FILE not found"
    -    exit 0
    -fi
    -
    -# Check if failback is blocked (for maintenance)
    -if [ -f "$BLOCK_FILE" ]; then
    -    log_message "SKIP: Failback blocked by $BLOCK_FILE"
    -    exit 0
    -fi
    -
    -# All conditions met - promote to MASTER
    -log_message "CONDITIONS MET: Promoting to MASTER (was $CURRENT_STATE)"
    -/usr/local/bin/carp master
    +get_var() {
    +    local idx=$1
    +    local -n ref="slot_$idx"      # bind ref to slot_$idx
    +    printf '%s\n' "$ref"
    +}
     
    -# Log result
    -sleep 2
    -NEW_STATE=$(/usr/local/bin/carp | awk '{print $NF}')
    -log_message "Failback complete: State is now $NEW_STATE"
    +make_var 7 "seven"
    +get_var 7
    +
    +
    +Output:
    +
    +
    +seven
    +
    +
    +

    Function declaration forms


    +
    +All of these work in Bash, but only the first one is POSIX-ish:
    +
    + +
    foo() { echo foo; }
    +function foo { echo foo; }
    +function foo() { echo foo; }
    +
    +
    +Recommendation: prefer name() { ... } for portability and consistency.
    +
    +

    Chaining function calls in conditionals


    +
    +Functions return a status like commands. You can short-circuit them in conditionals:
    +
    + +
    deploy_check() { test -f deploy.yaml; }
    +smoke_test()   { curl -fsS http://localhost/healthz >/dev/null; }
     
    -# If successful, log to the system log too
    -if [ "$NEW_STATE" = "MASTER" ]; then
    -    logger "CARP: f0 automatically reclaimed MASTER role"
    +if deploy_check || smoke_test; then
    +    echo "All good."
    +else
    +    echo "Something failed." >&2
     fi
    -EOF
    -
    -paul@f0:~ % doas chmod +x /usr/local/bin/carp-auto-failback.sh
     

    -The marker file identifies that the ZFS data set is mounted correctly. We create it with:
    +You can also compress it golf-style:

    -
    paul@f0:~ % doas touch /data/nfs/nfs.DO_NOT_REMOVE
    +
    deploy_check || smoke_test && echo ok || echo fail >&2
     

    -We add a cron job to check every minute:
    +

    Grep, sed, awk quickies


    +
    +Word match and context: grep -w word file; with context: grep -C3 foo file (same as -A3 -B3). Example:

    -
    paul@f0:~ % echo "* * * * * /usr/local/bin/carp-auto-failback.sh" | doas crontab -
    +
    cat > /tmp/ctx.txt <<EOF
    +one
    +foo
    +two
    +three
    +bar
    +EOF
    +grep -C1 foo /tmp/ctx.txt
     

    -The enhanced CARP script provides integrated control over auto-failback. To temporarily turn off automatic failback (e.g., for f0 maintenance), we run:
    +Output:
    +
    +
    +one
    +foo
    +two
    +
    +
    +Skip a directory while recursing: grep -R --exclude-dir=foo 'bar' /path. Example:

    -
    paul@f0:~ % doas carp auto-failback disable
    -Auto-failback DISABLED (created /data/nfs/nfs.NO_AUTO_FAILBACK)
    +
    mkdir -p /tmp/golf/foo /tmp/golf/src
    +printf 'bar\n' > /tmp/golf/src/a.txt
    +printf 'bar\n' > /tmp/golf/foo/skip.txt
    +grep -R --exclude-dir=foo 'bar' /tmp/golf
     

    -And to re-enable it:
    +Output:
    +
    +
    +/tmp/golf/src/a.txt:bar
    +
    +
    +Insert lines with sed: sed -e '1isomething' -e '3isomething' file. Example:

    -
    paul@f0:~ % doas carp auto-failback enable
    -Auto-failback ENABLED (removed /data/nfs/nfs.NO_AUTO_FAILBACK)
    +
    printf 'A\nB\nC\n' > /tmp/s.txt
    +sed -e '1iHEAD' -e '3iMID' /tmp/s.txt
     

    -To check whether auto-failback is enabled, we run:
    +Output:
    +
    +
    +HEAD
    +A
    +B
    +MID
    +C
    +
    +
    +Drop last column with awk: awk 'NF{NF-=1};1' file. Example:

    -
    paul@f0:~ % doas carp
    -CARP state on re0 (vhid 1): MASTER
    -# If disabled, you'll see: WARNING: Auto-failback is DISABLED
    +
    printf 'a b c\nx y z\n' > /tmp/t.txt
    +cat /tmp/t.txt
    +echo
    +awk 'NF{NF-=1};1' /tmp/t.txt
     

    -The failback attempts are logged to /var/log/carp-auto-failback.log!
    +Output:

    -So, in summary:
    +
    +a b c
    +x y z
    +
    +a b
    +x y
    +

    -
      -
    • After f0 reboots: f1 is MASTER, f0 boots as BACKUP
    • -
    • Cron runs every minute: Checks if conditions are met (Is f0 currently BACKUP? (don't run if already MASTER)), (Is /data/nfs mounted? (ZFS datasets are ready)), (Does marker file exist? (confirms this is primary storage)), (Is failback blocked? (admin can prevent failback)), (Are NFS services running? (system is fully ready))
    • -
    • Failback occurs: Typically 2-3 minutes after boot completes
    • -
    • Logging: All attempts logged for troubleshooting
    • -

    -This ensures f0 automatically resumes its role as primary storage server after any reboot, while providing administrative control when needed.
    +

    Safe xargs with NULs



    -

    Client Configuration for NFS via Stunnel


    +Avoid breaking on spaces/newlines by pairing find -print0 with xargs -0:

    -To mount NFS shares with stunnel encryption, clients must install and configure stunnel using their client certificates.
    + +
    find . -type f -name '*.log' -print0 | xargs -0 rm -f
    +

    -

    Configuring Rocky Linux Clients (r0, r1, r2)


    +Example with spaces and NULs only:
    +
    + +
    printf 'a\0b c\0' | xargs -0 -I{} printf '<%s>\n' {}
    +
    +
    +Output:
    +
    +
    +<a>
    +<b c>
    +
    +
    +

    Efficient file-to-variable and arrays


    +
    +Read a whole file into a variable without spawning cat:
    +
    + +
    cfg=$(<config.ini)
    +

    -On the Rocky Linux VMs, we run:
    +Read lines into an array safely with mapfile (aka readarray):

    -
    # Install stunnel on client (example for `r0`)
    -[root@r0 ~]# dnf install -y stunnel nfs-utils
    -
    -# Copy client certificate and CA certificate from f0
    -[root@r0 ~]# scp f0:/usr/local/etc/stunnel/ca/r0-stunnel.pem /etc/stunnel/
    -[root@r0 ~]# scp f0:/usr/local/etc/stunnel/ca/ca-cert.pem /etc/stunnel/
    -
    -# Configure stunnel client with certificate authentication
    -[root@r0 ~]# tee /etc/stunnel/stunnel.conf <<'EOF'
    -cert = /etc/stunnel/r0-stunnel.pem
    -CAfile = /etc/stunnel/ca-cert.pem
    -client = yes
    -verify = 2
    -
    -[nfs-ha]
    -accept = 127.0.0.1:2323
    -connect = 192.168.1.138:2323
    -EOF
    -
    -# Enable and start stunnel
    -[root@r0 ~]# systemctl enable --now stunnel
    -
    -# Repeat for r1 and r2 with their respective certificates
    +
    mapfile -t lines < <(grep -v '^#' config.ini)
    +printf '%s\n' "${lines[@]}"
     

    -Note: Each client must use its certificate file (r0-stunnel.pem, r1-stunnel.pem, r2-stunnel.pem, or earth-stunnel.pem - the latter is for my Laptop, which can also mount the NFS shares).
    -
    -

    NFSv4 user mapping config on Rocky


    +Assign formatted strings without a subshell using printf -v:

    -Update: This section was added 08.08.2025!
    + +
    printf -v msg 'Hello %s, id=%04d' "$USER" 42
    +echo "$msg"
    +

    -For this, we need to set the Domain in /etc/idmapd.conf on all 3 Rocky hosts to lan.buetow.org (remember, earlier in this blog post we set the nfsuserd domain on the NFS server side to lan.buetow.org as well!)
    +Output:

    -[General]
    -
    -Domain = lan.buetow.org
    -.
    -.
    -.
    +Hello paul, id=0042
     

    -We also need to increase the inotify limit, otherwise nfs-idmapd may fail to start with "Too many open files":
    +Read NUL-delimited data (pairs well with -print0):

    -
    [root@r0 ~]# echo 'fs.inotify.max_user_instances = 512' > /etc/sysctl.d/99-inotify.conf
    -[root@r0 ~]# sysctl -w fs.inotify.max_user_instances=512
    +
    mapfile -d '' -t files < <(find . -type f -print0)
    +printf '%s\n' "${files[@]}"
     

    -And afterwards, we need to run the following on all 3 Rocky hosts:
    +

    Quick password generator


    +
    +Pure Bash with /dev/urandom:

    -
    [root@r0 ~]# systemctl start nfs-idmapd
    -[root@r0 ~]# systemctl enable --now nfs-client.target
    +
    LC_ALL=C tr -dc 'A-Za-z0-9_' </dev/urandom | head -c 16; echo
     

    -and then, safest, reboot those.
    +Alternative using openssl:

    -

    Testing NFS Mount with Stunnel


    + +
    openssl rand -base64 16 | tr -d '\n' | cut -c1-22
    +

    -To mount NFS through the stunnel encrypted tunnel, we run:
    +

    yes for automation


    +
    +yes streams a string repeatedly; handy for feeding interactive commands or quick load generation:

    -
    # Create a mount point
    -[root@r0 ~]# mkdir -p /data/nfs/k3svolumes
    -
    -# Mount through stunnel (using localhost and NFSv4)
    -[root@r0 ~]# mount -t nfs4 -o port=2323 127.0.0.1:/k3svolumes /data/nfs/k3svolumes
    -
    -# Verify mount
    -[root@r0 ~]# mount | grep k3svolumes
    -127.0.0.1:/k3svolumes on /data/nfs/k3svolumes 
    -  type nfs4 (rw,relatime,vers=4.2,rsize=131072,wsize=131072,
    -  namlen=255,hard,proto=tcp,port=2323,timeo=600,retrans=2,sec=sys,
    -  clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1)
    -
    -# For persistent mount, add to /etc/fstab:
    -127.0.0.1:/k3svolumes /data/nfs/k3svolumes nfs4 port=2323,_netdev,soft,timeo=10,retrans=2,intr 0 0
    +
    yes | rm -r large_directory        # auto-confirm
    +yes n | dangerous-command          # auto-decline
    +yes anything | head -n1            # prints one line: anything
     

    -Note: The mount uses localhost (127.0.0.1) because stunnel is listening locally and forwarding the encrypted traffic to the remote server.
    -
    -

    Testing CARP Failover with mounted clients and stale file handles:


    +

    Forcing true to fail (and vice versa)



    -To test the failover process:
    +You can shadow builtins with functions:

    -
    # On f0 (current MASTER) - trigger failover
    -paul@f0:~ % doas ifconfig re0 vhid 1 state backup
    -
    -# On f1 - verify it becomes MASTER
    -paul@f1:~ % ifconfig re0 | grep carp
    -    inet 192.168.1.138 netmask 0xffffffff broadcast 192.168.1.138 vhid 1
    +
    true()  { return 1; }
    +false() { return 0; }
     
    -# Check stunnel is now listening on f1
    -paul@f1:~ % doas sockstat -l | grep 2323
    -stunnel  stunnel    4567  3  tcp4   192.168.1.138:2323    *:*
    +true  || echo 'true failed'
    +false && echo 'false succeeded'
     
    -# On client - verify NFS mount still works
    -[root@r0 ~]# ls /data/nfs/k3svolumes/
    -[root@r0 ~]# echo "Test after failover" > /data/nfs/k3svolumes/failover-test.txt
    +# Bypass function with builtin/command
    +builtin true # returns 0
    +command true # returns 0
     

    -After a CARP failover, NFS clients may experience "Stale file handle" errors because they cached file handles from the previous server. To resolve this manually, we can run:
    +To disable a builtin entirely: enable -n true (re-enable with enable true).
    +
    +Further reading:
    +
    +Force true to return false
    +
    +

    Restricted Bash


    +
    +bash -r (or rbash) starts a restricted shell that limits potentially dangerous actions, for example:
    +
    +
      +
    • Changing directories (cd).
    • +
    • Modifying PATH, SHELL, BASH_ENV, or ENV.
    • +
    • Redirecting output.
    • +
    • Running commands with / in the name.
    • +
    • Using exec.
    • +

    +It’s a coarse sandbox for highly constrained shells; read man bash (RESTRICTED SHELL) for details and caveats.
    +
    +Example session:

    -
    # Force unmount and remount
    -[root@r0 ~]# umount -f /data/nfs/k3svolumes
    -[root@r0 ~]# mount /data/nfs/k3svolumes
    +
    rbash -c 'cd /'            # cd: restricted
    +rbash -c 'PATH=/tmp'       # PATH: restricted
    +rbash -c 'echo hi > out'   # redirection: restricted
    +rbash -c '/bin/echo hi'    # commands with /: restricted
    +rbash -c 'exec ls'         # exec: restricted
     

    -For the automatic recovery, we create a script:
    +

    Useless use of cat (and when it’s ok)


    +
    +Avoid the extra process if a command already reads files or STDIN:

    -
    [root@r0 ~]# cat > /usr/local/bin/check-nfs-mount.sh << 'EOF'
    -#!/bin/bash
    -# Fast NFS mount health monitor - runs every 10 seconds via systemd timer
    -
    -MOUNT_POINT="/data/nfs/k3svolumes"
    -LOCK_FILE="/var/run/nfs-mount-check.lock"
    -
    -# Use a lock file to prevent concurrent runs
    -if [ -f "$LOCK_FILE" ]; then
    -    exit 0
    -fi
    -touch "$LOCK_FILE"
    -trap "rm -f $LOCK_FILE" EXIT
    -
    -MOUNT_FIXED=0
    -
    -fix_mount () {
    -    echo "Attempting to remount NFS mount $MOUNT_POINT"
    -    if mount -o remount -f "$MOUNT_POINT" 2>/dev/null; then
    -        echo "Remount command issued for $MOUNT_POINT"
    -    else
    -        echo "Failed to remount NFS mount $MOUNT_POINT"
    -    fi
    -
    -    echo "Checking if $MOUNT_POINT is a mountpoint"
    -    if mountpoint "$MOUNT_POINT" >/dev/null 2>&1; then
    -        echo "$MOUNT_POINT is a valid mountpoint"
    -    else
    -        echo "$MOUNT_POINT is not a valid mountpoint, attempting mount"
    -        if mount "$MOUNT_POINT"; then
    -            echo "Successfully mounted $MOUNT_POINT"
    -            MOUNT_FIXED=1
    -            return
    -        else
    -            echo "Failed to mount $MOUNT_POINT"
    -        fi
    -    fi
    -
    -    echo "Attempting to unmount $MOUNT_POINT"
    -    if umount -f "$MOUNT_POINT" 2>/dev/null; then
    -        echo "Successfully unmounted $MOUNT_POINT"
    -    else
    -        echo "Failed to unmount $MOUNT_POINT (it might not be mounted)"
    -    fi
    -
    -    echo "Attempting to mount $MOUNT_POINT"
    -    if mount "$MOUNT_POINT"; then
    -        echo "NFS mount $MOUNT_POINT mounted successfully"
    -        MOUNT_FIXED=1
    -        return
    -    else
    -        echo "Failed to mount NFS mount $MOUNT_POINT"
    -    fi
    -
    -    echo "Failed to fix NFS mount $MOUNT_POINT"
    -    exit 1
    -}
    -
    -if ! mountpoint "$MOUNT_POINT" >/dev/null 2>&1; then
    -    echo "NFS mount $MOUNT_POINT not found"
    -    fix_mount
    -fi
    -
    -if ! timeout 2s stat "$MOUNT_POINT" >/dev/null 2>&1; then
    -    echo "NFS mount $MOUNT_POINT appears to be unresponsive"
    -    fix_mount
    -fi
    -
    -# After a successful remount, delete pods stuck on this node
    -if [ "$MOUNT_FIXED" -eq 1 ]; then
    -    echo "Mount was fixed, checking for stuck pods on this node..."
    -    NODE=$(hostname)
    -    export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
    -    kubectl get pods --all-namespaces \
    -      --field-selector="spec.nodeName=$NODE" \
    -      -o json 2>/dev/null | jq -r '
    -        .items[] |
    -        select(
    -          .status.phase == "Unknown" or
    -          .status.phase == "Pending" or
    -          (.status.conditions // [] |
    -            any(.type == "Ready" and .status == "False")) or
    -          (.status.containerStatuses // [] |
    -            any(.state.waiting.reason == "ContainerCreating"))
    -        ) | "\(.metadata.namespace) \(.metadata.name)"' | \
    -      while read ns pod; do
    -        echo "Deleting stuck pod $ns/$pod"
    -        kubectl delete pod -n "$ns" "$pod" \
    -          --grace-period=0 --force 2>&1
    -      done
    -fi
    -EOF
    +
    # Prefer
    +grep -i foo file
    +<file grep -i foo        # or feed via redirection
     
    -[root@r0 ~]# chmod +x /usr/local/bin/check-nfs-mount.sh
    +# Over
    +cat file | grep -i foo
    +
    +
    +But for interactive composition, or when you truly need to concatenate multiple sources into a single stream, cat is fine, as you may think, "First I need the content, then I do X." Changing the "useless use of cat" in retrospect is really a waste of time for one-time interactive use:
    +
    + +
    cat file1 file2 | grep -i foo
     

    -And we create the systemd service as follows:
    +From notes: “Good for interactivity; Useless use of cat” — use judgment.
    +
    +

    Atomic locking with mkdir


    +
    +Portable advisory locks can be emulated with mkdir because it’s atomic:

    -
    [root@r0 ~]# cat > /etc/systemd/system/nfs-mount-monitor.service << 'EOF'
    -[Unit]
    -Description=NFS Mount Health Monitor
    -After=network-online.target
    -
    -[Service]
    -Type=oneshot
    -ExecStart=/usr/local/bin/check-nfs-mount.sh
    -StandardOutput=journal
    -StandardError=journal
    -EOF
    +
    lockdir=/tmp/myjob.lock
    +if mkdir "$lockdir" 2>/dev/null; then
    +    trap 'rmdir "$lockdir"' EXIT INT TERM
    +    # critical section
    +    do_work
    +else
    +    echo "Another instance is running" >&2
    +    exit 1
    +fi
     

    -And we also create the systemd timer (runs every 10 seconds):
    +This works well on Linux. Remove the lock in trap so crashes don’t leave stale locks.
    +
    +

    Smarter globs and faster find-exec



    +
      +
    • Enable extended globs when useful: shopt -s extglob; then patterns like !(tmp|cache) work.
    • +
    • Use -exec ... {} + to batch many paths in fewer process invocations:
    • +

    -
    [root@r0 ~]# cat > /etc/systemd/system/nfs-mount-monitor.timer << 'EOF'
    -[Unit]
    -Description=Run NFS Mount Health Monitor every 10 seconds
    -Requires=nfs-mount-monitor.service
    -
    -[Timer]
    -OnBootSec=30s
    -OnUnitActiveSec=10s
    -AccuracySec=1s
    -
    -[Install]
    -WantedBy=timers.target
    -EOF
    +
    find . -name '*.log' -exec gzip -9 {} +
     

    -To enable and start the timer, we run:
    +Example for extglob (exclude two dirs from listing):

    -
    [root@r0 ~]# systemctl daemon-reload
    -[root@r0 ~]# systemctl enable nfs-mount-monitor.timer
    -[root@r0 ~]# systemctl start nfs-mount-monitor.timer
    -
    -# Check status
    -[root@r0 ~]# systemctl status nfs-mount-monitor.timer
    -● nfs-mount-monitor.timer - Run NFS Mount Health Monitor every 10 seconds
    -     Loaded: loaded (/etc/systemd/system/nfs-mount-monitor.timer; enabled)
    -     Active: active (waiting) since Sat 2025-07-06 10:00:00 EEST
    -    Trigger: Sat 2025-07-06 10:00:10 EEST; 8s left
    -
    -# Monitor logs
    -[root@r0 ~]# journalctl -u nfs-mount-monitor -f
    +
    shopt -s extglob
    +ls -d -- !(.git|node_modules) 2>/dev/null
     

    -Note: Stale file handles are inherent to NFS failover because file handles are server-specific. The best approach depends on your application's tolerance for brief disruptions. Of course, all the changes made to r0 above must also be applied to r1 and r2.
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Updated Wed 19 Mar 2026: Added automatic pod restart after NFS remount
    +Other related posts are:

    -The script now also tracks whether a mount was fixed via the MOUNT_FIXED variable. After a successful remount, it queries kubectl for pods on the local node that are stuck in Unknown, Pending, or ContainerCreating state and force-deletes them. Kubernetes then automatically reschedules these pods, which will now succeed because the NFS mount is healthy again. Without this, pods that hit a stale mount would remain broken until manually deleted, even after the underlying NFS issue was resolved.
    +2025-09-14 Bash Golf Part 4 (You are currently reading this)
    +2023-12-10 Bash Golf Part 3
    +2022-01-01 Bash Golf Part 2
    +2021-11-29 Bash Golf Part 1
    +2021-06-05 Gemtexter - One Bash script to rule it all
    +2021-05-16 Personal Bash coding style guide

    -

    Complete Failover Test


    +Back to the main site
    +
    +
    +
    + + Random Weird Things - Part Ⅲ + + gemini://foo.zone/gemfeed/2025-08-15-random-weird-things-iii.gmi + 2025-08-14T23:21:32+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Every so often, I come across random, weird, and unexpected things on the internet. It would be neat to share them here from time to time. This is the third run. + +
    +

    Random Weird Things - Part Ⅲ



    -Here's a comprehensive test of the failover behaviour with all optimisations in place:
    +Published at 2025-08-14T23:21:32+03:00

    - -
    # 1. Check the initial state
    -paul@f0:~ % ifconfig re0 | grep carp
    -    carp: MASTER vhid 1 advbase 1 advskew 0
    -paul@f1:~ % ifconfig re0 | grep carp
    -    carp: BACKUP vhid 1 advbase 1 advskew 100
    -
    -# 2. Create a test file from a client
    -[root@r0 ~]# echo "test before failover" > /data/nfs/k3svolumes/test-before.txt
    -
    -# 3. Trigger failover (f0 → f1)
    -paul@f0:~ % doas ifconfig re0 vhid 1 state backup
    -
    -# 4. Monitor client behaviour
    -[root@r0 ~]# ls /data/nfs/k3svolumes/
    -ls: cannot access '/data/nfs/k3svolumes/': Stale file handle
    -
    -# 5. Check automatic recovery (within 10 seconds)
    -[root@r0 ~]# journalctl -u nfs-mount-monitor -f
    -Jul 06 10:15:32 r0 nfs-monitor[1234]: NFS mount unhealthy detected at \
    -  Sun Jul 6 10:15:32 EEST 2025
    -Jul 06 10:15:32 r0 nfs-monitor[1234]: Attempting to fix stale NFS mount at \
    -  Sun Jul 6 10:15:32 EEST 2025
    -Jul 06 10:15:33 r0 nfs-monitor[1234]: NFS mount fixed at \
    -  Sun Jul 6 10:15:33 EEST 2025
    +Every so often, I come across random, weird, and unexpected things on the internet. It would be neat to share them here from time to time. This is the third run.
    +
    +2024-07-05 Random Weird Things - Part Ⅰ
    +2025-02-08 Random Weird Things - Part Ⅱ
    +2025-08-15 Random Weird Things - Part Ⅲ (You are currently reading this)
    +
    +
    + /\_/\        /\_/\        /\_/\
    +( o.o ) WHOA!( o.o ) WHOA!( o.o )
    + > ^ <        > ^ <        > ^ <
    + /   \  MEOW! /   \  MOEEW!/   \
    +/_____\      /_____\      /_____\
     

    -Failover Timeline:
    +

    Table of Contents




    -Benefits of the Optimised Setup:
    +

    21. Doom in TypeScript’s type system



    -
      -
    • No hanging processes - Soft mounts fail quickly
    • -
    • Clean failover - Old server stops serving immediately
    • -
    • Fast automatic recovery - No manual intervention needed
    • -
    • Predictable timing - Recovery within 10 seconds with systemd timer
    • -
    • Better visibility - systemd journal provides detailed logs
    • -

    -Important Considerations:
    +Yes, really. Someone has implemented Doom to run within the TypeScript type system—compile-time madness, but fun to watch.

    -
      -
    • Recent writes (within 1 minute) may not be visible after failover due to replication lag
    • -
    • Applications should handle brief NFS errors gracefully
    • -
    • For zero-downtime requirements, consider synchronous replication or distributed storage (see "Future storage explorations" section later in this blog post)
    • -

    -

    Update: Upgrade to 4TB drives


    +Doom in the TS type system

    -Update: 27.01.2026 I have since replaced the 1TB drives with 4TB drives for more storage capacity. The upgrade procedure was different for each node!
    +TypeScript’s type checker is surprisingly expressive: conditional types, recursion, and template literal types let you encode nontrivial logic that “executes” during compilation. The demo exploits this to build a tiny ray-caster that renders as compiler errors or types. It’s wildly impractical, but a great reminder that enough expressiveness plus recursion tends to drift toward Turing completeness.

    -

    Upgrading f1 (simpler approach)


    +

    Run it in a PDF



    -Since f1 is the replication sink, the upgrade was straightforward:
    +

    22. Doom inside a PDF



    -
      -
    • 1. Physically replaced the 1TB drive with the 4TB drive
    • -
    • 2. Re-setup the drive as described earlier in this blog post
    • -
    • 3. Re-replicated all data from f0 to f1 via zrepl
    • -
    • 4. Reloaded the encryption keys as described in this blog post
    • -
    • 5. Set the mount point again for the encrypted dataset, explicitly as read-only (since f1 is the replication sink)
    • -

    -

    Upgrading f0 (using ZFS resilvering)


    +Running Doom embedded in a PDF file. No separate binary—just a cursed document.

    -For f0, which is the primary storage node, I used ZFS resilvering to avoid data loss:
    +doompdf

    -
      -
    • 1. Plugged the new 4TB drive into an external USB SSD drive reader
    • -
    • 2. Attached the 4TB drive to the zdata pool for resilvering
    • -
    • 3. Once resilvering completed, detached the 1TB drive from the zdata pool
    • -
    • 4. Shutdown f0 and physically replaced the internal drive
    • -
    • 5. Booted with the new drive in place
    • -
    • 6. Expanded the pool to use the full 4TB capacity:
    • -

    - -
    paul@f0:~ % doas zpool online -e /dev/ada1
    -
    +This relies on features like PDF JavaScript and interactive objects, which some viewers still support. Expect mixed results: many modern readers sandbox or disable scripting by default for security. If you try it, use a compatible desktop viewer and be prepared for portability quirks.

    -
      -
    • 7. Reloaded the encryption keys as described in this blog post
    • -
    • 8. Set the mount point again for the encrypted dataset
    • -

    -This was a one-time effort on both nodes - after a reboot, everything was remembered and came up normally. Here are the updated outputs:
    +

    23. Linux inside a PDF



    - -
    paul@f0:~ % doas zpool list
    -NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    -zdata  3.63T   677G  2.97T        -         -     3%    18%  1.00x    ONLINE  -
    -zroot   472G  68.4G   404G        -         -    13%    14%  1.00x    ONLINE  -
    -
    -paul@f0:~ % doas camcontrol devlist
    -<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    -<SD Ultra 3D 4TB 530500WD>         at scbus1 target 0 lun 0 (pass1,ada1)
    -<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    -
    +Boot a tiny Linux inside a PDF. This rabbit hole goes deep.

    -We're still using different SSD models on f1 (WD Blue SA510 4TB) to avoid simultaneous failures:
    +linuxpdf

    - -
    paul@f1:~ % doas camcontrol devlist
    -<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    -<WD Blue SA510 2.5 4TB 530500WD>   at scbus1 target 0 lun 0 (pass1,ada1)
    -<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    -
    +Like the Doom-in-PDF trick, this leans on the PDF runtime to host unconventional logic and rendering. It’s more of an art piece than a daily driver, but it shows how “document” formats can accidentally become platforms. The security posture of PDF viewers varies significantly, so expect inconsistent behaviour across different apps.
    +
    +

    24. SQLite loves Tcl


    +
    +SQLite was initially designed as a Tcl extension and still relies heavily on Tcl today: the amalgamated C source is generated by mksqlite3c.tcl, tests are written in Tcl, and even the documentation is built with it.
    +
    +Tcl 2017 paper
    +
    +The famous single-file sqlite3.c is not hand-edited—developers maintain sources, plus build scripts that knit everything together deterministically. Their Tcl-centric tooling provides them with reproducible builds and a very opinionated workflow. It’s a great counterexample to the idea that “serious” projects must standardise on the most popular build stacks.
    +
    +

    25. Fossil, “e”, and a Tcl/Tk chat


    +
    +The SQLite folks use a custom Tcl/Tk editor called “e”, a homegrown VCS (Fossil), and even a Tcl/Tk chat room for development—peak bespoke tooling.
    +
    +More details in the paper
    +
    +Fossil bundles source control, tickets, wiki, and a web UI into a single portable binary—no external services required. The “e” editor and chat complete a tight, integrated loop tailored to their team’s needs and constraints. It’s delightfully “boring tech” that has produced one of the most reliable databases on earth.
    +
    +

    26. Kubernetes from an Excel spreadsheet



    -

    Conclusion


    +Drive kubectl from an .xlsx file because clusters belong in spreadsheets, apparently.

    -We've built a robust, encrypted storage system for our FreeBSD-based Kubernetes cluster that provides:
    +xlskubectl

    -
      -
    • High Availability: CARP ensures the storage VIP moves automatically during failures
    • -
    • Data Protection: ZFS encryption protects data at rest, stunnel protects data in transit
    • -
    • Continuous Replication: 1-minute RPO for the data, automated via zrepl
    • -
    • Secure Access: Client certificate authentication prevents unauthorised access
    • -

    -Some key lessons learned are:
    +Resources are rows; columns map to fields; the tool renders YAML and applies it for you. It’s oddly ergonomic for demos, audits, or letting non‑YAML‑native teammates propose changes. Obviously, be careful—permissions and review gates still matter even if your “IDE” is Excel.

    -
      -
    • Stunnel vs Native NFS/TLS: While native encryption would be ideal, stunnel provides better cross-platform compatibility
    • -
    • Manual vs Automatic Failover: For storage systems, controlled failover often prevents more problems than it causes
    • -
    • Client Compatibility: Different NFS implementations behave differently - test thoroughly
    • -

    -

    Future Storage Explorations


    +

    27. SRE means “Sorry…”



    -While zrepl provides excellent snapshot-based replication for disaster recovery, there are other storage technologies worth exploring for the f3s project:
    +An industry joke (or truth?) that SRE (short for Site Reliability Engineer) stands for “Sorry…”.

    -

    MinIO for S3-Compatible Object Storage


    +Anecdotes are a good reminder that failure is inevitable and empathy is essential. The best takeaways are about clear communication, graceful degradation, and blameless postmortems. Laughing helps, but guardrails and good on‑call hygiene help even more.

    -MinIO is a high-performance, S3-compatible object storage system that could complement our ZFS-based storage. Some potential use cases:
    +

    28. Touch Grass, the app



    -
      -
    • S3 API compatibility: Many modern applications expect S3-style object storage APIs. MinIO could provide this interface while using our ZFS storage as the backend.
    • -
    • Multi-site replication: MinIO supports active-active replication across multiple sites, which could work well with our f0/f1/f2 node setup.
    • -
    • Kubernetes native: MinIO has excellent Kubernetes integration with operators and CSI drivers, making it ideal for the f3s k3s environment.
    • -

    -

    MooseFS for Distributed High Availability


    +When screens consume too much, this site/app nudges you to go outside.

    -MooseFS is a fault-tolerant, distributed file system that could provide proper high-availability storage:
    +Touch grass

    -
      -
    • True HA: Unlike our current setup, which requires manual failover, MooseFS provides automatic failover with no single point of failure.
    • -
    • POSIX compliance: Applications can use MooseFS like any regular filesystem, no code changes needed.
    • -
    • Flexible redundancy: Configure different replication levels per directory or file, optimising storage efficiency.
    • -
    • FreeBSD support: MooseFS has native FreeBSD support, making it a natural fit for the f3s project.
    • -

    -Both technologies could run on top of our encrypted ZFS volumes, combining ZFS's data integrity and encryption features with distributed storage capabilities. This would be particularly interesting for workloads that need either S3-compatible APIs (MinIO) or transparent distributed POSIX storage (MooseFS). What about Ceph and GlusterFS? Unfortunately, there doesn't seem to be great native FreeBSD support for them. However, other alternatives also appear suitable for my use case.
    +It’s simple and playful—sometimes that’s the nudge you need to break doomscroll loops. Treat it like a micro‑ritual: set a reminder, step outside, reset. Your eyes (and nervous system) will thank you.

    -Read the next post of this series:
    +

    29. Blogging with the C preprocessor



    -f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +Use the C preprocessor to assemble a blog. It shouldn’t work this well—and yet.

    -Other *BSD-related posts:
    +Macroblog with cpp

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +Posts are stitched together with #includes and macros, giving you DRY content blocks and repeatable builds. It’s hacky, fast, and delightfully text‑only—perfect for people who think makefiles are a UI. Would I recommend it for everyone? No. Is it charming and effective? Absolutely.

    -E-Mail your comments to paul@nospam.buetow.org
    +

    30. Accidentally Turing-complete


    +
    +A delightful catalogue of systems that unintentionally become Turing-complete.
    +
    +Accidentally Turing-complete
    +
    +Give a system conditionals, state, and unbounded composition, and it often crosses the threshold into general computation—whether that was the goal or not. The list includes items such as CSS, regular expression dialects, and even card games. It’s a fun lens for understanding why “just a configuration language” can get complicated fast.
    +
    +I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - Posts from January to June 2025 - - gemini://foo.zone/gemfeed/2025-07-01-posts-from-january-to-june-2025.gmi - 2025-07-01T22:39:29+03:00 + Local LLM for Coding with Ollama on macOS + + gemini://foo.zone/gemfeed/2025-08-05-local-coding-llm-with-ollama.gmi + 2025-08-04T16:43:39+03:00 Paul Buetow aka snonux paul@dev.buetow.org - These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay. + With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama.
    -

    Posts from January to June 2025


    +

    Local LLM for Coding with Ollama on macOS



    -Published at 2025-07-01T22:39:29+03:00
    +Published at 2025-08-04T16:43:39+03:00

    -These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.
    +
    +      [::]
    +     _|  |_
    +   /  o  o  \                       |
    +  |    ∆    |  <-- Ollama          / \
    +  |  \___/  |                     /   \
    +   \_______/             LLM --> / 30B \
    +    |     |                     / Qwen3 \
    +   /|     |\                   /  Coder  \
    +  /_|     |_\_________________/ quantised \
    +

    -These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.
    +

    Table of Contents



    -My about page
    -https://codeberg.org/snonux/gos
    +
    +With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I decided to explore the capabilities of local large language models (LLMs) using Ollama.

    -

    Table of Contents


    +Ollama is a powerful tool that brings local AI capabilities directly to your local hardware. By running AI models locally, you can enjoy the benefits of intelligent assistance without relying on cloud services. This document outlines my initial setup and experiences with Ollama, with a focus on coding tasks and agentic coding.
    +
    +https://ollama.com/
    +
    +

    Why Local LLMs?


    +
    +Using local AI models through Ollama offers several advantages:


    -

    January 2025


    +

    Hardware Considerations



    -

    I am currently binge-listening to the Google ...


    +Running large language models locally is currently limited by consumer hardware capabilities:

    -I am currently binge-listening to the Google #SRE ProdCast. It's really great to learn about the stories of individual SREs and their journeys. It is not just about SREs at Google; there are also external guests.
    +
      +
    • GPU Memory: Most consumer-grade GPUs (even in 2025) top out at 16–24GB of VRAM, making it challenging to run larger models like the 30B (30 billion) parameter LLMs (they go up to the 100 billion and more).
    • +
    • RAM Constraints: On my MacBook Pro with M3 CPU and 36GB RAM, I chose a 14B model (qwen2.5-coder:14b-instruct) as it represents a practical balance between capability and resource requirements.
    • +

    +For reference, here are some key points about running large LLMs locally:

    -sre.google/prodcast/
    +
      +
    • Models larger than 30B: I don't even think about running them locally. One (e.g. from Qwen, Deepseek or Kimi K2) with several hundred billion parameters could match the "performance" of commercial LLMs (Claude Sonnet 4, etc). Still, for personal use, the hardware demands are just too high (or temporarily "rent" it via the public cloud?).
    • +
    • 30B models: Require at least 48GB of GPU VRAM for full inference without quantisation. Currently only feasible on high-end professional GPUs (or an Apple-silicone Mac with enough unified RAM).
    • +
    • 14B models: Can run with 16-24GB GPU memory (VRAM), suitable for consumer-grade hardware (or use a quantised larger model)
    • +
    • 7B-13B models: Best fit for mainstream consumer hardware, requiring minimal VRAM and running smoothly on mid-range GPUs, but with limited capabilities compared to larger models and more hallucinations.
    • +

    +The model I'll be mainly using in this blog post (qwen2.5-coder:14b-instruct) is particularly interesting as:

    -

    Recently, there was a >5000 LOC #bash ...


    +
      +
    • instruct: Indicates this is the instruction-tuned variant, optimised for diverse tasks including coding
    • +
    • coder: Tells me that this model was trained on a mix of code and text data, making it especially effective for programming assistance
    • +

    +https://ollama.com/library/qwen2.5-coder
    +https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct

    -Recently, there was a >5000 LOC #bash codebase at work that reported the progress of a migration, nobody understood it and it was wonky (sometimes it would not return the desired results). On top of that, the coding style was very bad as well (I could rant forever here). The engineer who wrote it left the company. I rewrote it in #Perl in about 300 LOC. Colleagues asked why not Python. Perl is the perfect choice here—it's even in its name: Practical Extraction and Report Language!
    +For general thinking tasks, I found deepseek-r1:14b to be useful (in the future, I also want to try other qwen models here). For instance, I utilised deepseek-r1:14b to format this blog post and correct some English errors, demonstrating its effectiveness in natural language processing tasks. Additionally, it has proven invaluable for adding context and enhancing clarity in technical explanations, all while running locally on the MacBook Pro. Admittedly, it was a lot slower than "just using ChatGPT", but still within a minute or so.

    -

    Ghostty is a terminal emulator that was ...


    +https://ollama.com/library/deepseek-r1:14b
    +https://huggingface.co/deepseek-ai/DeepSeek-R1

    -Ghostty is a terminal emulator that was recently released publicly as open-source. I love that it works natively on both Linux and macOS; it looks great (font rendering) and is fast and customizable via a config file (which I manage with a config mng system). Ghostty is a passion project written in Zig, the author loved the community so much while working on it that he donated $300k to the Zig Foundation. #terminal #emulator
    +A quantised (as mentioned above) LLM which has been converted from high-precision connection (typically 16- or 32-bit floating point) representations to lower-precision formats, such as 8-bit integers. This reduces the overall memory footprint of the model, making it significantly smaller and enabling it to run more efficiently on hardware with limited resources or to allow higher throughput on GPUs and CPUs. The benefits of quantisation include reduced storage and faster inference times due to simpler computations and better memory bandwidth utilisation. However, quantisation can introduce a drop in model accuracy because the lower numerical precision means the model cannot represent parameter values as precisely. In some cases, it may lead to instability or unexpected outputs in specific tasks or edge cases.

    -ghostty.org
    +

    Basic Setup and Manual Code Prompting



    -

    Go is not an easy programming language. Don't ...


    +

    Installing Ollama and a Model



    -Go is not an easy programming language. Don't confuse easy with simple syntax. I'd agree to this. With the recent addition of Generics to the language I also feel that even the syntax stops being simple.. Also, simplicity is complex (especially under the hood how the language works - there are many mechanics you need to know if you really want to master the language). #golang
    +To install Ollama, performed these steps (this assumes that you have already installed Homebrew on your macOS system):

    -www.arp242.net/go-easy.html
    + +
    brew install ollama
    +rehash
    +ollama serve
    +

    -

    How will AI change software engineering (or has ...


    +Which started up the Ollama server with something like this (the screenshots shows already some requests made):

    -How will AI change software engineering (or has it already)? The bottom line is that less experienced engineers may have problems (accepting incomplete or incorrect programs, only reaching 70 percent solutions), while experienced engineers can leverage AI to boost their performance as they know how to fix the remaining 30 percent of the generated code. #ai #engineering #software
    +Ollama serving

    -newsletter.pragmaticengineer.com/p/how-ai-will-change-software-engineering
    +And then, in a new terminal, I pulled the model with:

    -

    Eliminating toil - Toil is not always a bad ...


    + +
    ollama pull qwen2.5-coder:14b-instruct
    +

    -Eliminating toil - Toil is not always a bad thing - some even enjoy toil - it is calming in small amounts - but it becomes toxic in large amounts - #SRE
    +Now, I was ready to go! It wasn't so difficult. Now, let's see how I used this model for coding tasks.

    -sre.google/sre-book/eliminating-toil/
    +

    Example Usage



    -

    Fun read. How about using the character ...


    +I run the following command to get a Go function for calculating Fibonacci numbers:

    -Fun read. How about using the character sequence :-) as a statement separator in a programming language?
    + +
    time echo "Write a function in golang to print out the Nth fibonacci number, \
    +  only the function without the boilerplate" | ollama run qwen2.5-coder:14b-instruct
    +
    +Output:
    +
    +func fibonacci(n int) int {
    +    if n <= 1 {
    +        return n
    +    }
    +    a, b := 0, 1
    +    for i := 2; i <= n; i++ {
    +        a, b = b, a+b
    +    }
    +    return b
    +}
    +
    +Execution Metrics:
    +
    +Executed in    4.90 secs      fish           external
    +   usr time   15.54 millis    0.31 millis   15.24 millis
    +   sys time   19.68 millis    1.02 millis   18.66 millis
    +

    -ntietz.com/blog/researching-why-we-use-semicolons-as-statement-terminators/
    +Note, after having written this blog post, I tried the same with the newer model qwen3-coder:30b-a3b-q4_K_M (which "just" came out, and it's a quantised 30B model), and it was much faster:

    -

    Thats unexpected, you cant remove a NaN key ...


    +
    +Executed in    1.83 secs      fish           external
    +   usr time   17.82 millis    4.40 millis   13.42 millis
    +   sys time   17.07 millis    1.57 millis   15.50 millis
    +

    -Thats unexpected, you cant remove a NaN key from a map without clearing it! #golang via @wallabagapp
    +https://ollama.com/library/qwen3-coder:30b-a3b-q4_K_M

    -unexpected-go.com/you-cant-remove-a-nan-key-from-a-map-without-clearing-it.html
    +

    Agentic Coding with Aider



    -

    Nice refresher for #shell #bash #zsh ...


    +

    Installation



    -Nice refresher for #shell #bash #zsh redirection rules
    +Aider is a tool that enables agentic coding by leveraging AI models (also local ones, as in our case). While setting up OpenAI Codex and OpenCode with Ollama proved challenging (those tools either didn't know how to work with the "tools" (the capability to execute external commands or to edit files for example) or didn't connect at all to Ollama for some reason), Aider worked smoothly.

    -rednafi.com/misc/shell_redirection/
    +To get started, the only thing I had to do was to install it via Homebrew, initialise a Git repository, and then start Aider with the Ollama model ollama_chat/qwen2.5-coder:14b-instruct:

    -

    I think discussing action items in incident ...


    + +
    brew install aider
    +mkdir -p ~/git/aitest && cd ~/git/aitest && git init
    +aider --model ollama_chat/qwen2.5-coder:14b-instruct
    +

    -I think discussing action items in incident reviews is important. At least the obvious should be captured and noted down. It does not mean that the action items need to be fully refined in the review meeting; that would be out of scope, in my opinion.
    +https://aider.chat
    +https://opencode.ai
    +https://github.com/openai/codex

    -surfingcomplexity.blog/2024/09/28/why-..-..-action-items-during-incident-reviews/
    +

    Agentic coding prompt



    -

    At first, functional options add a bit of ...


    +This is the prompt I gave:

    -At first, functional options add a bit of boilerplate, but they turn out to be quite neat, especially when you have very long parameter lists that need to be made neat and tidy. #golang
    +
    +Create a Go project with these files:
    +
    +* `cmd/aitest/main.go`: CLI entry point
    +* `internal/version.go`: Version information (0.0.0), should be printed when the
    +   program was started with `-version` flag
    +* `internal/count.go`: File counting functionality, the program should print out
    +   the number of files in a given subdirectory (the directory is provided as a
    +   command line flag with `-dir`), if none flag is given, no counting should be
    +   done
    +* `README.md`: Installation and usage instructions
    +

    -www.calhoun.io/using-functional-options-instead-of-method-chaining-in-go/
    +It then generated something, but did not work out of the box, as it had some issues with the imports and package names. So I had to do some follow-up prompts to fix those issues with something like this:

    -

    In the "Working with an SRE Interview" I have ...


    +
    +* Update import paths to match module name, github.com/yourname/aitest should be
    +  aitest in main.go
    +* The package names of internal/count.go and internal/version.go should be
    +  internal, and not count and version.
    +

    -In the "Working with an SRE Interview" I have been askd about what it's like working with an SRE! We'd covered much more in depth, but we decided not to make it too long in the final version! #sre #interview
    +Aider fixing the packages

    -foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.gmi (Gemini)
    -foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.html
    +

    Compilation & Execution



    -

    Small introduction to the #Android ...


    +Once done so, the project was ready and I could compile and run it:

    -Small introduction to the #Android distribution called #GrapheneOS For myself, I am using a Pixel 7 Pro, which comes with "only" 5 years of support (not yet 7 years like the Pixel 8 and 9 series). I also wrote about GrapheneOS here once:
    + +
    go build cmd/aitest/main.go
    +./main -v
    +0.0.0
    +./main -dir .
    +Number of files in directory .: 4
    +

    -dataswamp.org/~solene/2025-01-12-intro-to-grapheneos.html
    -foo.zone/gemfeed/2023-01-23-why-grapheneos-rox.gmi (Gemini)
    -foo.zone/gemfeed/2023-01-23-why-grapheneos-rox.html
    +

    The code



    -

    Helix 2025.01 has been released. The completion ...


    +The code it generated was simple, but functional. The ./cmd/aitest/main.go file:

    -Helix 2025.01 has been released. The completion of path names and the snippet functionality will be particularly useful for me. Overall, it's a great release. The release notes cover only some highlights, but there are many more changes in this version so also have a look at the Changelog! #HelixEditor
    + +
    package main
    +
    +import (
    +	"flag"
    +	"fmt"
    +	"os"
    +
    +	"aitest/internal"
    +)
    +
    +func main() {
    +	var versionFlag bool
    +	flag.BoolVar(&versionFlag, "v", false, "print version")
    +	dir := flag.String("dir", "", "directory to count files in")
    +	flag.Parse()
    +
    +	if versionFlag {
    +		fmt.Println(internal.GetVersion())
    +		return
    +	}
    +
    +	if *dir != "" {
    +		fileCount, err := internal.CountFiles(*dir)
    +		if err != nil {
    +			fmt.Fprintf(os.Stderr, "Error counting files: %v\n", err)
    +			os.Exit(1)
    +		}
    +		fmt.Printf("Number of files in directory %s: %d\n", *dir, fileCount)
    +	} else {
    +		fmt.Println("No directory specified. No count given.")
    +	}
    +}
    +

    -helix-editor.com/news/release-25-01-highlights/
    +The ./internal/version.go file:

    -

    I found these are excellent examples of how ...


    + +
    package internal
    +
    +var Version = "0.0.0"
    +
    +func GetVersion() string {
    +	return Version
    +}
    +

    -I found these are excellent examples of how #OpenBSD's #relayd can be used.
    +The ./internal/count.go file:

    -www.tumfatig.net/2023/using-openbsd-relayd8-as-an-application-layer-gateway/
    + +
    package internal
    +
    +import (
    +	"os"
    +)
    +
    +func CountFiles(dir string) (int, error) {
    +	files, err := os.ReadDir(dir)
    +	if err != nil {
    +		return 0, err
    +	}
    +
    +	count := 0
    +	for _, file := range files {
    +		if !file.IsDir() {
    +			count++
    +		}
    +	}
    +
    +	return count, nil
    +}
    +

    -

    LLMs for Ops? Summaries of logs, probabilities ...


    +The code is quite straightforward, especially for generating boilerplate code this will be useful for many use cases!

    -LLMs for Ops? Summaries of logs, probabilities about correctness, auto-generating Ansible, some uses cases are there. Wouldn't trust it fully, though.
    +

    In-Editor Code Completion



    -youtu.be/WodaffxVq-E?si=noY0egrfl5izCSQI
    +To leverage Ollama for real-time code completion in my editor, I have integrated it with Helix, my preferred text editor. Helix supports the LSP (Language Server Protocol), which enables advanced code completion features. The lsp-ai is an LSP server that can interface with Ollama models for code completion tasks.

    -

    Enjoying an APC Power-UPS BX750MI in my ...


    +https://helix-editor.com
    +https://github.com/SilasMarvin/lsp-ai

    -Enjoying an APC Power-UPS BX750MI in my #homelab with #FreeBSD and apcupsd. I can easily use the UPS status to auto-shutdown a cluster of FreeBSD machines on a power cut. One FreeBSD machine acts as the apcupsd master, connected via USB to the APC, while the remaining machines read the status remotely via the apcupsd network port from the master. However, it won't work when the master is down. #APC #UPS
    +

    Installation of lsp-ai



    -

    "Even in the projects where I'm the only ...


    +I installed lsp-ai via Rust's Cargo package manager. (If you don't have Rust installed, you can install it via Homebrew as well.):

    -"Even in the projects where I'm the only person, there are at least three people involved: past me, present me, and future me." - Quote from #software #programming
    + +
    cargo install lsp-ai
    +

    -liw.fi/40/#index1h1
    +

    Helix Configuration



    -

    Connecting an #UPS to my #FreeBSD cluster ...


    +I edited ~/.config/helix/languages.toml to include:

    -Connecting an #UPS to my #FreeBSD cluster in my #homelab, protecting it from power cuts!
    +
    +[[language]]
    +name = "go"
    +auto-format= true
    +diagnostic-severity = "hint"
    +formatter = { command = "goimports" }
    +language-servers = [ "gopls", "golangci-lint-lsp", "lsp-ai", "gpt" ]
    +

    -foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi (Gemini)
    -foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.html
    +Note that there is also a gpt language server configured, which is for GitHub Copilot, but it is out of scope of this blog post. Let's also configure lsp-ai settings in the same file:

    -

    So, the Co-founder and CTO of honeycomb.io and ...


    +
    +[language-server.lsp-ai]
    +command = "lsp-ai"
    +
    +[language-server.lsp-ai.config.memory]
    +file_store = { }
    +
    +[language-server.lsp-ai.config.models.model1]
    +type = "ollama"
    +model =  "qwen2.5-coder"
    +
    +[language-server.lsp-ai.config.models.model2]
    +type = "ollama"
    +model = "mistral-nemo:latest"
    +
    +[language-server.lsp-ai.config.models.model3]
    +type = "ollama"
    +model = "deepseek-r1:14b"
    +
    +[language-server.lsp-ai.config.completion]
    +model = "model1"
    +
    +[language-server.lsp-ai.config.completion.parameters]
    +max_tokens = 64
    +max_context = 8096
    +
    +## Configure the messages per your needs
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "system"
    +content = "Instructions:\n- You are an AI programming assistant.\n- Given a
    +piece of code with the cursor location marked by \"<CURSOR>\", replace
    +\"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n
    +- Describe your plan for what to build in pseudocode, written out in great
    +detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your
    +completion fits within the language context of the provided code snippet (e.g.,
    +Go, Ruby, Bash, Java, Puppet DSL).\n\nRules:\n- Only respond with code or
    +comments.\n- Only replace \"<CURSOR>\"; do not include any previously written
    +code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within
    +a comment, complete the comment meaningfully.\n- Handle ambiguous cases by
    +providing the most contextually appropriate completion.\n- Be consistent with
    +your responses."
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "func greet(name) {\n    print(f\"Hello, {<CURSOR>}\")\n}"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "assistant"
    +content = "name"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "func sum(a, b) {\n    return a + <CURSOR>\n}"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "assistant"
    +content = "b"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "func multiply(a, b int ) int {\n    a * <CURSOR>\n}"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "assistant"
    +content = "b"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "// <CURSOR>\nfunc add(a, b) {\n    return a + b\n}"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "assistant"
    +content = "Adds two numbers"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "// This function checks if a number is even\n<CURSOR>"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "assistant"
    +content = "func is_even(n) {\n    return n % 2 == 0\n}"
    +
    +[[language-server.lsp-ai.config.completion.parameters.messages]]
    +role = "user"
    +content = "{CODE}"
    +

    -So, the Co-founder and CTO of honeycomb.io and author of the book Observability Engineering always hated observability. And Distinguished Software Engineer and The Pragmatic Engineer host can't pronounce the word Observability. :-) No, jokes aside, I liked this podcast episode of The Pragmatic Engineer: Observability: the present and future, with Charity Majors #sre #observability
    +As you can see, I have also added other models, such as Mistral Nemo and DeepSeek R1, so that I can switch between them in Helix. Other than that, the completion parameters are interesting. They define how the LLM should interact with the text in the text editor based on the given examples.

    -newsletter.pragmaticengineer.com/p/observability-the-present-and-future
    +If you want to see more lsp-ai configuration examples, they are some for Vim and Helix in the lsp-ai git repository!

    -

    February 2025


    +

    Code completion in action



    -

    I don't know about you, but at work, I usually ...


    +The screenshot shows how Ollama's qwen2.5-coder model provides code completion suggestions within the Helix editor. LSP auto-completion is triggered by leaving the cursor at position <CURSOR> for a short period in the code snippet, and Ollama responds with relevant completions based on the context.

    -I don't know about you, but at work, I usually deal with complex setups involving thousands of servers and work in a complex hybrid microservices-based environment (cloud and on-prem), where homelabbing (as simple as described in my blog post) is really relaxing and recreative. So, I was homelabbing a bit again, securing my #FreeBSD cluster from power cuts. #UPS #recreative
    +Completing the fib-function

    -foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi (Gemini)
    -foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.html
    +In the LSP auto-completion, the one prefixed with ai - was generated by qwen2.5-coder, the other ones are from other LSP servers (GitHub Copilot, Go linter, Go language server, etc.).

    -

    Great proposal (got accepted by the Goteam) for ...


    +I found GitHub Copilot to be still faster than qwen2.5-coder:14b, but the local LLM one is actually workable for me already. And, as mentioned earlier, things will likely improve in the future regarding local LLMs. So I am excited about the future of local LLMs and coding tools like Ollama and Helix.

    -Great proposal (got accepted by the Goteam) for safer file system open functions #golang
    +After trying qwen3-coder:30b-a3b-q4_K_M (following the publication of this blog post), I found it to be significantly faster and more capable than the previous model, making it a promising option for local coding tasks. Honestly, even my current local setup already handles routine coding stuff pretty well—better than I expected.

    -github.com/golang/go/issues/67002
    +

    Conclusion



    -

    My Gemtexter has only 1320 LOC.... The Biggest ...


    +Will there ever be a time we can run larger models (60B, 100B, ...and larger) on consumer hardware, or even on our phones? We are not quite there yet, but I am optimistic that we will see improvements in the next few years. As hardware capabilities improve and/or become cheaper, and more efficient models are developed (or new techniques will be invented to make language models more effective), the landscape of local AI coding assistants will continue to evolve.

    -My Gemtexter has only 1320 LOC.... The Biggest Shell Programs in the World are huuuge... #shell #sh
    +For now, even the models listed in this blog post are very promising already, and they run on consumer-grade hardware (at least in the realm of the initial tests I've performed... the ones in this blog post are overly simplistic, though! But they were good for getting started with Ollama and initial demonstration)! I will continue experimenting with Ollama and other local LLMs to see how they can enhance my coding experience. I may cancel my Copilot subscription, which I currently use only for in-editor auto-completion, at some point.

    -github.com/oils-for-unix/oils/wiki/The-Biggest-Shell-Programs-in-the-World
    +However, truth be told, I don't think the setup described in this blog post currently matches the performance of commercial models like Claude Code (Sonnet 4, Opus 4), Gemini 2.5 Pro, the OpenAI models and others. Maybe we could get close if we had the high-end hardware needed to run the largest Qwen Coder model available. But, as mentioned already, that is out of reach for occasional coders like me. Furthermore, I want to continue coding manually to some degree, as otherwise I will start to forget how to write for-loops, which would be awkward... However, do we always need the best model when AI can help generate boilerplate or repetitive tasks even with smaller models?

    -

    Against /tmp - He is making a point #unix ...


    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Against /tmp - He is making a point #unix #linux #bsd #filesystem via @wallabagapp
    +Other related posts are:

    -dotat.at/@/2024-10-22-tmp.html
    +2025-08-05 Local LLM for Coding with Ollama on macOS (You are currently reading this)
    +2025-06-22 Task Samurai: An agentic coding learning experiment

    -

    Random Weird Things Part 2: #blog ...


    +Back to the main site
    +
    +
    +
    + + f3s: Kubernetes with FreeBSD - Part 6: Storage + + gemini://foo.zone/gemfeed/2025-07-14-f3s-kubernetes-with-freebsd-part-6.gmi + 2025-07-13T16:44:29+03:00, last updated Wed 19 Mar 2026 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the sixth blog post about the f3s series for self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines. + +
    +

    f3s: Kubernetes with FreeBSD - Part 6: Storage



    -Random Weird Things Part 2: #blog #computing
    +Published at 2025-07-13T16:44:29+03:00, last updated Wed 19 Mar 2026

    -foo.zone/gemfeed/2025-02-08-random-weird-things-ii.gmi (Gemini)
    -foo.zone/gemfeed/2025-02-08-random-weird-things-ii.html
    +This is the sixth blog post about the f3s series for self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.

    -

    As a former #Pebble user and fan, thats ...


    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -As a former #Pebble user and fan, thats aweaome news. PebbleOS is now open source and there will aoon be a new watch. I don't know about you, but I will be the first getting one :-) #foss
    +f3s logo

    -ericmigi.com/blog/why-were-bringing-pebble-back
    +

    Table of Contents



    -

    I think I am slowly getting the point of Cue. ...


    +
    +

    Introduction



    -I think I am slowly getting the point of Cue. For example, it can replace both a JSON file and a JSON Schema. Furthermore, you can convert it from and into different formats (Cue, JSON, YAML, Go data types, ...), and you can nicely embed this into a Go project as well. #cue #cuelang #golang #configuration
    +In the previous posts, we set up a WireGuard mesh network. In the future, we will also set up a Kubernetes cluster. Kubernetes workloads often require persistent storage for databases, configuration files, and application data. Local storage on each node has significant limitations:

    -cuelang.org
    +
      +
    • No data sharing: Pods (once we run Kubernetes) on different nodes can't access the same data
    • +
    • Pod mobility: If a pod moves to another node, it loses access to its data
    • +
    • No redundancy: Hardware failure means data loss
    • +

    +This post implements a robust storage solution using:

    -

    Jonathan's reflection of 10 years of ...


    +
      +
    • CARP: For high availability with automatic IP failover
    • +
    • NFS over stunnel: For secure, encrypted network storage
    • +
    • ZFS: For data integrity, encryption, and efficient snapshots
    • +
    • zrepl: For continuous ZFS replication between nodes
    • +

    +The result is a highly available, encrypted storage system that survives node failures while providing shared storage to all Kubernetes pods.

    -Jonathan's reflection of 10 years of programming!
    +Other than what was mentioned in the first post of this blog series, we aren't using HAST, but zrepl for data replication. Read more about it later in this blog post.

    -jonathan-frere.com/posts/10-years-of-programming/
    +

    Additional storage capacity



    -

    Really enjoyed reading this. Easily digestible ...


    +We add 1 TB of additional storage to each of the nodes (f0, f1, f2) in the form of an SSD drive. The Beelink mini PCs have enough space in the chassis for the extra space.

    -Really enjoyed reading this. Easily digestible summary of what's new in Go 1.24. #golang
    +

    -antonz.org/go-1-24/
    +Upgrading the storage was as easy as unscrewing, plugging the drive in, and then screwing it back together again. The procedure was uneventful! We're using two different SSD models (Samsung 870 EVO and Crucial BX500) to avoid simultaneous failures from the same manufacturing batch.

    -

    Some great advice from 40 years of experience ...


    +We then create the zdata ZFS pool on all three nodes:

    -Some great advice from 40 years of experience as a software developer. #software #development
    + +
    paul@f0:~ % doas zpool create -m /data zdata /dev/ada1
    +paul@f0:~ % zpool list
    +NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    +zdata   928G  12.1M   928G        -         -     0%     0%  1.00x    ONLINE  -
    +zroot   472G  29.0G   443G        -         -     0%     6%  1.00x    ONLINE  -
    +
    +paul@f0:/ % doas camcontrol devlist
    +<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    +<Samsung SSD 870 EVO 1TB SVT03B6Q>  at scbus1 target 0 lun 0 (pass1,ada1)
    +paul@f0:/ %
    +

    -liw.fi/40/#index1h1
    +To verify that we have a different SSD on the second node (the third node has the same drive as the first):

    -

    I enjoyed this talk, some recipes I knew ...


    + +
    paul@f1:/ % doas camcontrol devlist
    +<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    +<CT1000BX500SSD1 M6CR072>          at scbus1 target 0 lun 0 (pass1,ada1)
    +

    -I enjoyed this talk, some recipes I knew already, others were new to me. The "line of sight" is my favourite, which I always tend to follow. I also liked the example where the speaker simplified a "complex" nested functions into two not-nested-if-statements. #golang
    +

    ZFS encryption keys



    -www.youtube.com/watch?v=zdKHq9Xo4OY&list=WL&index=5
    +ZFS native encryption requires encryption keys to unlock datasets. We need a secure method to store these keys that balances security with operational needs:

    -

    A way of how to add the version info to the Go ...


    +
      +
    • Security: Keys must not be stored on the same disks they encrypt
    • +
    • Availability: Keys must be available at boot for automatic mounting
    • +
    • Portability: Keys should be easily moved between systems for recovery
    • +

    +Using USB flash drives as hardware key storage provides a convenient and elegant solution. The encrypted data is unreadable without physical access to the USB key, protecting against disk theft or improper disposal. In production environments, you may use enterprise key management systems; however, for a home lab, USB keys offer good security with minimal complexity.

    -A way of how to add the version info to the Go binary. ... I personally just hardcode the version number in version.go and update it there manually for each release. But with Go 1.24, I will try embedding it! #golang
    +

    UFS on USB keys



    -jerrynsh.com/3-easy-ways-to-add-version-flag-in-go/
    +We'll format the USB drives with UFS (Unix File System) rather than ZFS for simplicity. There is no need to use ZFS.

    -

    In other words, using t.Parallel() for ...


    +Let's see the USB keys:

    -In other words, using t.Parallel() for lightweight unit tests will likely make them slower.... #golang
    +USB keys

    -threedots.tech/post/go-test-parallelism/
    +To verify that the USB key (flash disk) is there:

    -

    Neat little blog post, showcasing various ...


    +
    +paul@f0:/ % doas camcontrol devlist
    +<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    +<Samsung SSD 870 EVO 1TB SVT03B6Q>  at scbus1 target 0 lun 0 (pass1,ada1)
    +<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    +paul@f0:/ %
    +

    -Neat little blog post, showcasing various methods unsed for generic programming before of the introduction of generics. Only reflection wasn't listed. #golang
    +Let's create the UFS file system and mount it (done on all three nodes f0, f1 and f2):

    -bitfieldconsulting.com/posts/generics
    + +
    paul@f0:/ % doas newfs /dev/da0
    +/dev/da0: 15000.0MB (30720000 sectors) block size 32768, fragment size 4096
    +        using 24 cylinder groups of 625.22MB, 20007 blks, 80128 inodes.
    +        with soft updates
    +super-block backups (for fsck_ffs -b #) at:
    + 192, 1280640, 2561088, 3841536, 5121984, 6402432, 7682880, 8963328, 10243776,
    +11524224, 12804672, 14085120, 15365568, 16646016, 17926464, 19206912,k 20487360,
    +...
    +
    +paul@f0:/ % echo '/dev/da0 /keys ufs rw 0 2' | doas tee -a /etc/fstab
    +/dev/da0 /keys ufs rw 0 2
    +paul@f0:/ % doas mkdir /keys
    +paul@f0:/ % doas mount /keys
    +paul@f0:/ % df | grep keys
    +/dev/da0             14877596       8  13687384     0%    /keys
    +

    -

    The smallest thing in Go #golang ...


    +USB keys stuck in

    -The smallest thing in Go #golang
    +

    Generating encryption keys



    -bitfieldconsulting.com/posts/iota
    +The following keys will later be used to encrypt the ZFS file systems. They will be stored on all three nodes, serving as a backup in case one of the keys is lost or corrupted. When we later replicate encrypted ZFS volumes from one node to another, the keys must also be available on the destination node.

    -

    Fun with defer in #golang, I did't know, that ...


    +
    +paul@f0:/keys % doas openssl rand -out /keys/f0.lan.buetow.org:bhyve.key 32
    +paul@f0:/keys % doas openssl rand -out /keys/f1.lan.buetow.org:bhyve.key 32
    +paul@f0:/keys % doas openssl rand -out /keys/f2.lan.buetow.org:bhyve.key 32
    +paul@f0:/keys % doas openssl rand -out /keys/f0.lan.buetow.org:zdata.key 32
    +paul@f0:/keys % doas openssl rand -out /keys/f1.lan.buetow.org:zdata.key 32
    +paul@f0:/keys % doas openssl rand -out /keys/f2.lan.buetow.org:zdata.key 32
    +paul@f0:/keys % doas chown root *
    +paul@f0:/keys % doas chmod 400 *
    +
    +paul@f0:/keys % ls -l
    +total 20
    +*r--------  1 root wheel 32 May 25 13:07 f0.lan.buetow.org:bhyve.key
    +*r--------  1 root wheel 32 May 25 13:07 f1.lan.buetow.org:bhyve.key
    +*r--------  1 root wheel 32 May 25 13:07 f2.lan.buetow.org:bhyve.key
    +*r--------  1 root wheel 32 May 25 13:07 f0.lan.buetow.org:zdata.key
    +*r--------  1 root wheel 32 May 25 13:07 f1.lan.buetow.org:zdata.key
    +*r--------  1 root wheel 32 May 25 13:07 f2.lan.buetow.org:zdata.key
    +

    -Fun with defer in #golang, I did't know, that a defer object can either be heap or stack allocated. And there are some rules for inlining, too.
    +After creation, these are copied to the other two nodes, f1 and f2, into the /keys partition (I won't provide the commands here; create a tarball, copy it over, and extract it on the destination nodes).

    -victoriametrics.com/blog/defer-in-go/
    +

    Configuring zdata ZFS pool encryption



    -

    What I like about Go is that it is still ...


    +Let's encrypt our zdata ZFS pool. We are not encrypting the whole pool, but everything within the zdata/enc data set:

    -What I like about Go is that it is still possible to understand what's going on under the hood, whereas in JVM-based languages (for example) or dynamic languages, there are too many optimizations and abstractions. However, you don't need to know too much about how it works under the hood in Go (like memory management in C). It's just the fact that you can—you have a choice. #golang
    + +
    paul@f0:/keys % doas zfs create -o encryption=on -o keyformat=raw -o \
    +  keylocation=file:///keys/`hostname`:zdata.key zdata/enc
    +paul@f0:/ % zfs list | grep zdata
    +zdata                                          836K   899G    96K  /data
    +zdata/enc                                      200K   899G   200K  /data/enc
    +
    +paul@f0:/keys % zfs get all zdata/enc | grep -E -i '(encryption|key)'
    +zdata/enc  encryption            aes-256-gcm                               -
    +zdata/enc  keylocation           file:///keys/f0.lan.buetow.org:zdata.key  local
    +zdata/enc  keyformat             raw                                       -
    +zdata/enc  encryptionroot        zdata/enc                                 -
    +zdata/enc  keystatus             available                                 -
    +

    -blog.devtrovert.com/p/goroutine-scheduler-revealed-youll
    +All future data sets within zdata/enc will inherit the same encryption key.

    -

    March 2025


    +

    Migrating Bhyve VMs to an encrypted bhyve ZFS volume



    -

    Television has somewhat transformed how I work ...


    +We set up Bhyve VMs in a previous blog post. Their ZFS data sets rely on zroot, which is the default ZFS pool on the internal 512GB NVME drive. They aren't encrypted yet, so we encrypt the VM data sets as well now. To do so, we first shut down the VMs on all three nodes:

    -Television has somewhat transformed how I work in the shell on a day-to-day basis. It is especially useful for me in navigating all the local Git repositories on my laptop. I have bound Ctrl+G in my shell for that now. #television #tv #tool #shell
    + +
    paul@f0:/keys % doas vm stop rocky
    +Sending ACPI shutdown to rocky
    +
    +paul@f0:/keys % doas vm list
    +NAME     DATASTORE  LOADER     CPU  MEMORY  VNC  AUTO     STATE
    +rocky    default    uefi       4    14G     -    Yes [1]  Stopped
    +

    -github.com/alexpasmantier/television
    +After this, we rename the unencrypted data set to _old, create a new encrypted data set, and also snapshot it as @hamburger.
    +
    + +
    paul@f0:/keys % doas zfs rename zroot/bhyve zroot/bhyve_old
    +paul@f0:/keys % doas zfs set mountpoint=/mnt zroot/bhyve_old
    +paul@f0:/keys % doas zfs snapshot zroot/bhyve_old/rocky@hamburger
    +
    +paul@f0:/keys % doas zfs create -o encryption=on -o keyformat=raw -o \
    +  keylocation=file:///keys/`hostname`:bhyve.key zroot/bhyve
    +paul@f0:/keys % doas zfs set mountpoint=/zroot/bhyve zroot/bhyve
    +paul@f0:/keys % doas zfs set mountpoint=/zroot/bhyve/rocky zroot/bhyve/rocky
    +

    -

    Once in a while, I like to read a book about a ...


    +Once done, we import the snapshot into the encrypted dataset and also copy some other metadata files from vm-bhyve back over.

    -Once in a while, I like to read a book about a programming language I have been using for a while to find new tricks or to refresh and sharpen my knowledge about it. I just finished reading "Programming Ruby 3.3," and I must say this is my favorite Ruby book now. What makes this one so special is that it is quite recent and covers all the new features. #ruby #programming #coding
    +
    +paul@f0:/keys % doas zfs send zroot/bhyve_old/rocky@hamburger | \
    +  doas zfs recv zroot/bhyve/rocky
    +paul@f0:/keys % doas cp -Rp /mnt/.config /zroot/bhyve/
    +paul@f0:/keys % doas cp -Rp /mnt/.img /zroot/bhyve/
    +paul@f0:/keys % doas cp -Rp /mnt/.templates /zroot/bhyve/
    +paul@f0:/keys % doas cp -Rp /mnt/.iso /zroot/bhyve/
    +

    -pragprog.com/titles/ruby5/programming-ruby-3-3-5th-edition/
    +We also have to make encrypted ZFS data sets mount automatically on boot:

    -

    As you may have noticed, I like to share on ...


    + +
    paul@f0:/keys % doas sysrc zfskeys_enable=YES
    +zfskeys_enable:  -> YES
    +paul@f0:/keys % doas vm init
    +paul@f0:/keys % doas reboot
    +.
    +.
    +.
    +paul@f0:~ % doas vm list
    +paul@f0:~ % doas vm list
    +NAME     DATASTORE  LOADER     CPU  MEMORY  VNC           AUTO     STATE
    +rocky    default    uefi       4    14G     0.0.0.0:5900  Yes [1]  Running (2265)
    +

    -As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that. Having said that, I love these tiny side projects. They are so relaxing to work on! #gos #golang #tool #programming #fun
    +As you can see, the VM is running. This means the encrypted zroot/bhyve was mounted successfully after the reboot! Now we can destroy the old, unencrypted, and now unused bhyve dataset:

    -foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.gmi (Gemini)
    -foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.html
    + +
    paul@f0:~ % doas zfs destroy -R zroot/bhyve_old
    +

    -

    Personally, I think AI (LLMs) are pretty ...


    +To verify once again that zroot/bhyve and zroot/bhyve/rocky are now both encrypted, we run:
    +
    + +
    paul@f0:~ % zfs get all zroot/bhyve | grep -E '(encryption|key)'
    +zroot/bhyve  encryption            aes-256-gcm                               -
    +zroot/bhyve  keylocation           file:///keys/f0.lan.buetow.org:bhyve.key  local
    +zroot/bhyve  keyformat             raw                                       -
    +zroot/bhyve  encryptionroot        zroot/bhyve                               -
    +zroot/bhyve  keystatus             available                                 -
    +
    +paul@f0:~ % zfs get all zroot/bhyve/rocky | grep -E '(encryption|key)'
    +zroot/bhyve/rocky  encryption            aes-256-gcm            -
    +zroot/bhyve/rocky  keylocation           none                   default
    +zroot/bhyve/rocky  keyformat             raw                    -
    +zroot/bhyve/rocky  encryptionroot        zroot/bhyve            -
    +zroot/bhyve/rocky  keystatus             available              -
    +

    -Personally, I think AI (LLMs) are pretty useful. But there's really some Hype around that. However, AI is about to stay - its not all hype
    +

    ZFS Replication with zrepl



    -unixdigest.com/articles/i-passionately-hate-hype-especially-the-ai-hype.html
    +Data replication is the cornerstone of high availability. While CARP handles IP failover (see later in this post), we need continuous data replication to ensure the backup server has current data when it becomes active. Without replication, failover would result in data loss or require shared storage (like iSCSI), which introduces a single point of failure.

    -

    Type aliases in #golang, soon also work with ...


    +

    Understanding Replication Requirements



    -Type aliases in #golang, soon also work with generics. It's an interesting feature, useful for refactorings and simplifications
    +Our storage system has different replication needs:

    -go.dev/blog/alias-names
    +
      +
    • NFS data (/data/nfs/k3svolumes): Soon, it will contain active Kubernetes persistent volumes. Needs frequent replication (every minute) to minimise data loss during failover.
    • +
    • VM data (/zroot/bhyve/freebsd): Contains VM images that change less frequently. Can tolerate longer replication intervals (every 10 minutes).
    • +

    +The 1-minute replication window is perfectly acceptable for my personal use cases. This isn't a high-frequency trading system or a real-time database—it's storage for personal projects, development work, and home lab experiments. Losing at most 1 minute of work in a disaster scenario is a reasonable trade-off for the reliability and simplicity of snapshot-based replication. Additionally, in the case of a "1 minute of data loss," I would likely still have the data available on the client side.

    -

    #Perl, my "first love" of programming ...


    +Why use zrepl instead of HAST? While HAST (Highly Available Storage) is FreeBSD's native solution for high-availability storage and supports synchronous replication—thus eliminating the mentioned 1-minute window—I've chosen zrepl for several important reasons:

    -#Perl, my "first love" of programming languages. Still there, still use it here and then (but not as my primary language at the moment). And others do so as well, apparently. Which makes me happy! :-)
    +
      +
    • HAST can cause ZFS corruption: HAST operates at the block level and doesn't understand ZFS's transactional semantics. During failover, in-flight transactions can lead to corrupted zpools. I've experienced this firsthand (I am confident I have configured something wrong) - the automatic failover would trigger while ZFS was still writing, resulting in an unmountable pool.
    • +
    • ZFS-aware replication: zrepl understands ZFS datasets and snapshots. It replicates at the dataset level, ensuring each snapshot is a consistent point-in-time copy. This is fundamentally safer than block-level replication.
    • +
    • Snapshot history: With zrepl, you get multiple recovery points (every minute for NFS data in our setup). If corruption occurs, you can roll back to any previous snapshot. HAST only gives you the current state.
    • +
    • Easier recovery: When something goes wrong with zrepl, you still have intact snapshots on both sides. With HAST, a corrupted primary often means a corrupted secondary as well.
    • +

    +FreeBSD HAST

    -dev.to/fa5tworm/why-perl-remains-indis..-..e-of-modern-programming-languages-2io0
    +

    Installing zrepl



    -

    I guess there are valid reasons for phttpdget, ...


    +First, install zrepl on both hosts involved (we will replicate data from f0 to f1):

    -I guess there are valid reasons for phttpdget, which I also don't know about? Maybe complexity and/or licensing of other tools. #FreeBSD
    + +
    paul@f0:~ % doas pkg install -y zrepl
    +

    -l33t.codes/2024/12/05/Updating-FreeBSD-and-Re-Inventing-the-Wheel/
    +Then, we verify the pools and datasets on both hosts:

    -

    This is one of the reasons why I like ...


    + +
    # On f0
    +paul@f0:~ % doas zpool list
    +NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    +zdata   928G  1.03M   928G        -         -     0%     0%  1.00x    ONLINE  -
    +zroot   472G  26.7G   445G        -         -     0%     5%  1.00x    ONLINE  -
    +
    +paul@f0:~ % doas zfs list -r zdata/enc
    +NAME        USED  AVAIL  REFER  MOUNTPOINT
    +zdata/enc   200K   899G   200K  /data/enc
    +
    +# On f1
    +paul@f1:~ % doas zpool list
    +NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    +zdata   928G   956K   928G        -         -     0%     0%  1.00x    ONLINE  -
    +zroot   472G  11.7G   460G        -         -     0%     2%  1.00x    ONLINE  -
    +
    +paul@f1:~ % doas zfs list -r zdata/enc
    +NAME        USED  AVAIL  REFER  MOUNTPOINT
    +zdata/enc   200K   899G   200K  /data/enc
    +

    -This is one of the reasons why I like terminal-based applications so much—they are usually more lightweight than GUI-based ones (and also more flexible).
    +Since we have a WireGuard tunnel between f0 and f1, we'll use TCP transport over the secure tunnel instead of SSH. First, check the WireGuard IP addresses:

    -www.arp242.net/stupid-light.html
    + +
    # Check WireGuard interface IPs
    +paul@f0:~ % ifconfig wg0 | grep inet
    +	inet 192.168.2.130 netmask 0xffffff00
    +
    +paul@f1:~ % ifconfig wg0 | grep inet
    +	inet 192.168.2.131 netmask 0xffffff00
    +

    -

    Advanced Concurrency Patterns with #Golang ...


    +Let's create a dedicated dataset for NFS data that will be replicated:

    -Advanced Concurrency Patterns with #Golang
    + +
    # Create the nfsdata dataset that will hold all data exposed via NFS
    +paul@f0:~ % doas zfs create zdata/enc/nfsdata
    +

    -blogtitle.github.io/go-advanced-concurrency-patterns-part-1/
    +Afterwards, we create the zrepl configuration on f0:

    -

    #SQLite was designed as an #TCL extension. ...


    + +
    paul@f0:~ % doas tee /usr/local/etc/zrepl/zrepl.yml <<'EOF'
    +global:
    +  logging:
    +    - type: stdout
    +      level: info
    +      format: human
    +
    +jobs:
    +  - name: f0_to_f1_nfsdata
    +    type: push
    +    connect:
    +      type: tcp
    +      address: "192.168.2.131:8888"
    +    filesystems:
    +      "zdata/enc/nfsdata": true
    +    send:
    +      encrypted: true
    +    snapshotting:
    +      type: periodic
    +      prefix: zrepl_
    +      interval: 1m
    +    pruning:
    +      keep_sender:
    +        - type: last_n
    +          count: 10
    +        - type: grid
    +          grid: 4x7d | 6x30d
    +          regex: "^zrepl_.*"
    +      keep_receiver:
    +        - type: last_n
    +          count: 10
    +        - type: grid
    +          grid: 4x7d | 6x30d
    +          regex: "^zrepl_.*"
    +
    +  - name: f0_to_f1_freebsd
    +    type: push
    +    connect:
    +      type: tcp
    +      address: "192.168.2.131:8888"
    +    filesystems:
    +      "zroot/bhyve/freebsd": true
    +    send:
    +      encrypted: true
    +    snapshotting:
    +      type: periodic
    +      prefix: zrepl_
    +      interval: 10m
    +    pruning:
    +      keep_sender:
    +        - type: last_n
    +          count: 10
    +        - type: grid
    +          grid: 4x7d
    +          regex: "^zrepl_.*"
    +      keep_receiver:
    +        - type: last_n
    +          count: 10
    +        - type: grid
    +          grid: 4x7d
    +          regex: "^zrepl_.*"
    +EOF
    +

    -#SQLite was designed as an #TCL extension. There are ~trillion SQLite databases in active use. SQLite heavily relies on #TCL: C code generation via mksqlite3c.tcl, C code isn't edited directly by the SQLite developers, and for testing , and for doc generation). The devs use a custom editor written in Tcl/Tk called "e" to edit the source! There's a custom versioning system Fossil, a custom chat-room written in Tcl/Tk!
    + We're using two separate replication jobs with different intervals:

    -www.tcl-lang.org/community/tcl2017/assets/talk93/Paper.html
    +
      +
    • f0_to_f1_nfsdata: Replicates NFS data every minute for faster failover recovery
    • +
    • f0_to_f1_freebsd: Replicates FreeBSD VM every ten minutes (less critical)
    • +

    +The FreeBSD VM is only used for development purposes, so it doesn't require as frequent replication as the NFS data. It's off-topic to this blog series, but it showcases how zrepl's flexibility in handling different datasets with varying replication needs.

    -

    Git provides automatic rendering of Markdown ...


    +Furthermore:

    -Git provides automatic rendering of Markdown files, including README.md, in a repository’s root directory" .... so much junk now in LLM powered search engines.... #llm #ai
    +
      +
    • We're specifically replicating zdata/enc/nfsdata instead of the entire zdata/enc dataset. This dedicated dataset will contain all the data we later want to expose via NFS, keeping a clear separation between replicated NFS data and other local encrypted data.
    • +
    • We use send: encrypted: true to keep the replication stream encrypted. While WireGuard already encrypts in transit, this provides additional protection. For reduced CPU overhead, you could set encrypted: false since the tunnel is secure.
    • +

    +

    Configuring zrepl on f1 (sink)



    -

    These are some neat little Go tips. Linters ...


    +On f1 (the sink, meaning it's the node receiving the replication data), we configure zrepl to receive the data as follows:

    -These are some neat little Go tips. Linters already tell you when you silently omit a function return value, though. The slice filter without allocation trick is nice and simple. And I agree that switch statements are preferable to if-else statements. #golang
    + +
    # First, create a dedicated sink dataset
    +paul@f1:~ % doas zfs create zdata/sink
    +
    +paul@f1:~ % doas tee /usr/local/etc/zrepl/zrepl.yml <<'EOF'
    +global:
    +  logging:
    +    - type: stdout
    +      level: info
    +      format: human
    +
    +jobs:
    +  - name: sink
    +    type: sink
    +    serve:
    +      type: tcp
    +      listen: "192.168.2.131:8888"
    +      clients:
    +        "192.168.2.130": "f0"
    +    recv:
    +      placeholder:
    +        encryption: inherit
    +    root_fs: "zdata/sink"
    +EOF
    +

    -blog.devtrovert.com/p/go-ep5-avoid-contextbackground-make
    +

    Enabling and starting zrepl services



    -

    This is a great introductory blog post about ...


    +We then enable and start zrepl on both hosts via:

    -This is a great introductory blog post about the Helix modal editor. It's also been my first choice for over a year now. I am really looking forward to the Steel plugin system, though. I don't think I need a lot of plugins, but one or two would certainly be on my wish list. #HelixEditor #Helix
    + +
    # On f0
    +paul@f0:~ % doas sysrc zrepl_enable=YES
    +zrepl_enable:  -> YES
    +paul@f0:~ % doas service `zrepl` start
    +Starting zrepl.
    +
    +# On f1
    +paul@f1:~ % doas sysrc zrepl_enable=YES
    +zrepl_enable:  -> YES
    +paul@f1:~ % doas service `zrepl` start
    +Starting zrepl.
    +

    -felix-knorr.net/posts/2025-03-16-helix-review.html
    +To check the replication status, we run:

    -

    Maps in Go under the hood #golang ...


    + +
    # On f0, check `zrepl` status (use raw mode for non-tty)
    +paul@f0:~ % doas pkg install jq
    +paul@f0:~ % doas zrepl status --mode raw | grep -A2 "Replication" | jq .
    +"Replication":{"StartAt":"2025-07-01T22:31:48.712143123+03:00"...
    +
    +# Check if services are running
    +paul@f0:~ % doas service zrepl status
    +zrepl is running as pid 2649.
    +
    +paul@f1:~ % doas service zrepl status
    +zrepl is running as pid 2574.
    +
    +# Check for `zrepl` snapshots on source
    +paul@f0:~ % doas zfs list -t snapshot -r zdata/enc | grep zrepl
    +zdata/enc@zrepl_20250701_193148_000    0B      -   176K  -
    +
    +# On f1, verify the replicated datasets  
    +paul@f1:~ % doas zfs list -r zdata | grep f0
    +zdata/f0             576K   899G   200K  none
    +zdata/f0/zdata       376K   899G   200K  none
    +zdata/f0/zdata/enc   176K   899G   176K  none
    +
    +# Check replicated snapshots on f1
    +paul@f1:~ % doas zfs list -t snapshot -r zdata | grep zrepl
    +zdata/f0/zdata/enc@zrepl_20250701_193148_000     0B      -   176K  -
    +zdata/f0/zdata/enc@zrepl_20250701_194148_000     0B      -   176K  -
    +.
    +.
    +.
    +

    -Maps in Go under the hood #golang
    +

    Monitoring replication



    -victoriametrics.com/blog/go-map/
    +You can monitor the replication progress with:

    -

    I found that working on multiple side projects ...


    + +
    paul@f0:~ % doas zrepl status
    +

    -I found that working on multiple side projects concurrently is better than concentrating on just one. This seems inefficient, but if you to lose motivation, you can temporarily switch to another one with full élan. Remember to stop starting and start finishing. This doesn't mean you should be working on 10+ side projects concurrently! Select your projects and commit to finishing them before starting the next thing. For example, my current limit of concurrent side projects is around five.
    +zrepl status

    -

    I have been in incidents. Understandably, ...


    +With this setup, both zdata/enc/nfsdata and zroot/bhyve/freebsd on f0 will be automatically replicated to f1 every 1 minute (or 10 minutes in the case of the FreeBSD VM), with encrypted snapshots preserved on both sides. The pruning policy ensures that we keep the last 10 snapshots while managing disk space efficiently.

    -I have been in incidents. Understandably, everyone wants the issue to be resolved as quickly and others want to know how long TTR will be. IMHO, providing no estimates at all is no solution either. So maybe give a rough estimate but clearly communicate that the estimate is rough and that X, Y, and Z can interfere, meaning there is a chance it will take longer to resolve the incident. Just my thought. What's yours?
    +The replicated data appears on f1 under zdata/sink/ with the source host and dataset hierarchy preserved:

    -firehydrant.com/blog/hot-take-dont-provide-incident-resolution-estimates/
    +
      +
    • zdata/enc/nfsdatazdata/sink/f0/zdata/enc/nfsdata
    • +
    • zroot/bhyve/freebsdzdata/sink/f0/zroot/bhyve/freebsd
    • +

    +This is by design - zrepl preserves the complete path from the source to ensure there are no conflicts when replicating from multiple sources.

    -

    I dont understand what it is. Certificates are ...


    +

    Verifying replication after reboot



    -I dont understand what it is. Certificates are so easy to monitor but still, expirations cause so many incidents. #sre
    +The zrepl service is configured to start automatically at boot. After rebooting both hosts:

    -securityboulevard.com/2024/10/dont-let..-..time-prevent-outages-with-a-smart-clm/
    + +
    paul@f0:~ % uptime
    +11:17PM  up 1 min, 0 users, load averages: 0.16, 0.06, 0.02
    +
    +paul@f0:~ % doas service `zrepl` status
    +zrepl is running as pid 2366.
    +
    +paul@f1:~ % doas service `zrepl` status
    +zrepl is running as pid 2309.
    +
    +# Check that new snapshots are being created and replicated
    +paul@f0:~ % doas zfs list -t snapshot | grep `zrepl` | tail -2
    +zdata/enc/nfsdata@zrepl_20250701_202530_000                0B      -   200K  -
    +zroot/bhyve/freebsd@zrepl_20250701_202530_000               0B      -  2.97G  -
    +.
    +.
    +.
    +
    +paul@f1:~ % doas zfs list -t snapshot -r zdata/sink | grep 202530
    +zdata/sink/f0/zdata/enc/nfsdata@zrepl_20250701_202530_000      0B      -   176K  -
    +zdata/sink/f0/zroot/bhyve/freebsd@zrepl_20250701_202530_000     0B      -  2.97G  -
    +.
    +.
    +.
    +

    -

    Don't just blindly trust LLMs. I recently ...


    +The timestamps confirm that replication resumed automatically after the reboot, ensuring continuous data protection. We can also write a test file to the NFS data directory on f0 and verify whether it appears on f1 after a minute.

    -Don't just blindly trust LLMs. I recently trusted an LLM, spent 1 hour debugging, and ultimately had to verify my assumption about fcntl behavior regarding inherited file descriptors in child processes manually with a C program, as the manual page wasn't clear to me. I could have done that immediately and I would have been done within 10 minutes. #productivity #loss #llm #programming #C
    +

    Understanding Failover Limitations and Design Decisions



    -

    April 2025


    +Our system intentionally fails over to a read-only copy of the replica in the event of the primary's failure. This is due to the nature of zrepl, which only replicates data in one direction. If we mount the data set on the sink node in read-write mode, it would cause the ZFS dataset to diverge from the original, and the replication would break. It can still be mounted read-write on the sink node in case of a genuine issue on the primary node, but that step is left intentionally manual. Therefore, we don't need to fix the replication later on manually.

    -

    I knew about any being equivalent to ...


    +So in summary:

    -I knew about any being equivalent to interface{} in #Golang, but wasn't aware, that it was introduced to Go because of the generics.
    +
      +
    • Split-brain prevention: Automatic failover to a read-write copy can cause both nodes to become active simultaneously if network communication fails. This leads to data divergence that's extremely difficult to resolve.
    • +
    • False positive protection: Temporary network issues or high load can trigger unwanted failovers. Manual intervention ensures that failovers occur only when truly necessary.
    • +
    • Data integrity over availability: For storage systems, data consistency is paramount. A few minutes of downtime is preferable to data corruption in this specific use case.
    • +
    • Simplified recovery: With manual failover, you always know which dataset is authoritative, making recovery more straightforward.
    • +

    +

    Mounting the NFS datasets



    -

    Neat summary of new #Perl features per ...


    +To make the NFS data accessible on both nodes, we need to mount it. On f0, this is straightforward:

    -Neat summary of new #Perl features per release
    + +
    # On f0 - set mountpoint for the primary nfsdata
    +paul@f0:~ % doas zfs set mountpoint=/data/nfs zdata/enc/nfsdata
    +paul@f0:~ % doas mkdir -p /data/nfs
    +
    +# Verify it's mounted
    +paul@f0:~ % df -h /data/nfs
    +Filesystem           Size    Used   Avail Capacity  Mounted on
    +zdata/enc/nfsdata    899G    204K    899G     0%    /data/nfs
    +

    -sheet.shiar.nl/perl
    +On f1, we need to handle the encryption key and mount the standby copy:

    -

    errors.As() checks for the error type, whereas ...


    + +
    # On f1 - first check encryption status
    +paul@f1:~ % doas zfs get keystatus zdata/sink/f0/zdata/enc/nfsdata
    +NAME                             PROPERTY   VALUE        SOURCE
    +zdata/sink/f0/zdata/enc/nfsdata  keystatus  unavailable  -
    +
    +# Load the encryption key (using f0's key stored on the USB)
    +paul@f1:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
    +    zdata/sink/f0/zdata/enc/nfsdata
    +
    +# Set mountpoint and mount (same path as f0 for easier failover)
    +paul@f1:~ % doas mkdir -p /data/nfs
    +paul@f1:~ % doas zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
    +paul@f1:~ % doas zfs mount zdata/sink/f0/zdata/enc/nfsdata
    +
    +# Make it read-only to prevent accidental writes that would break replication
    +paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    +
    +# Verify
    +paul@f1:~ % df -h /data/nfs
    +Filesystem                         Size    Used   Avail Capacity  Mounted on
    +zdata/sink/f0/zdata/enc/nfsdata    896G    204K    896G     0%    /data/nfs
    +

    -errors.As() checks for the error type, whereas errors.Is() checks for the exact error value. Interesting read about Errors in #golang - and there is also a cat meme in the middle of the blog post! And then, it continues with pointers to pointers to error values or how about a pointer to an empty interface?
    +Note: The dataset is mounted at the same path (/data/nfs) on both hosts to simplify failover procedures. The dataset on f1 is set to readonly=on to prevent accidental modifications, which, as mentioned earlier, would break replication. If we did, replication from f0 to f1 would fail like this:

    -adrianlarion.com/golang-error-handling..-..-errors-unwrap-custom-errors-and-more/
    +cannot receive incremental stream: destination zdata/sink/f0/zdata/enc/nfsdata has been modified since most recent snapshot

    -

    Good stuff: 10 years of functional options and ...


    +To fix a broken replication after accidental writes, we can do:

    -Good stuff: 10 years of functional options and key lessons Learned along the way #golang
    + +
    # Option 1: Rollback to the last common snapshot (loses local changes)
    +paul@f1:~ % doas zfs rollback zdata/sink/f0/zdata/enc/nfsdata@zrepl_20250701_204054_000
    +
    +# Option 2: Make it read-only to prevent accidents again
    +paul@f1:~ % doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    +

    -www.bytesizego.com/blog/10-years-functional-options-golang
    +And replication should work again!

    -

    I had some fun with #FreeBSD, #Bhyve and ...


    +

    Troubleshooting: Files not appearing in replication



    -I had some fun with #FreeBSD, #Bhyve and #Rocky #Linux. Not just for fun, also for science and profit! #homelab #selfhosting #self-hosting
    +If you write files to /data/nfs/ on f0 but they don't appear on f1, check if the dataset is mounted on f0?

    -foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.gmi (Gemini)
    -foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.html
    + +
    paul@f0:~ % doas zfs list -o name,mountpoint,mounted | grep nfsdata
    +zdata/enc/nfsdata                             /data/nfs             yes
    +

    -

    The moment your blog receives PRs for typo ...


    +If it shows no, the dataset isn't mounted! This means files are being written to the root filesystem, not ZFS. Next, we should check whether the encryption key is loaded:

    -The moment your blog receives PRs for typo corrections, you notice, that people are actually reading and care about your stuff :-) #blog #personal #tech
    + +
    paul@f0:~ % doas zfs get keystatus zdata/enc/nfsdata
    +NAME               PROPERTY   VALUE        SOURCE
    +zdata/enc/nfsdata  keystatus  available    -
    +# If "unavailable", load the key:
    +paul@f0:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    +paul@f0:~ % doas zfs mount zdata/enc/nfsdata
    +

    -

    One thing not mentioned is that #OpenRsync's ...


    +You can also verify that files are in the snapshot (not just the directory):

    -One thing not mentioned is that #OpenRsync's origin is the #OpenBSD project (at least as far as I am aware! Correct me if I am wrong :-) )! #openbsd #rsync #macos #openrsync
    + +
    paul@f0:~ % ls -la /data/nfs/.zfs/snapshot/zrepl_*/
    +

    -derflounder.wordpress.com/2025/04/06/r..-..laced-with-openrsync-on-macos-sequoia/
    +This issue commonly occurs after a reboot if the encryption keys aren't configured to load automatically.

    -

    This is an interesting #Elixir pipes operator ...


    +

    Configuring automatic key loading on boot



    -This is an interesting #Elixir pipes operator experiment in #Ruby. #Python has also been experimenting with such an operator. Raku (not mentioned in the linked article) already has the ==> sequence operator, of course (which can also can be used backwards <== - who has doubted? :-) ). #syntax #codegolf #fun #coding #RakuLang
    +To ensure all additional encrypted datasets are mounted automatically after reboot as well, we do:

    -zverok.space/blog/2024-11-16-elixir-pipes.html
    + +
    # On f0 - configure all encrypted datasets
    +paul@f0:~ % doas sysrc zfskeys_enable=YES
    +zfskeys_enable: YES -> YES
    +paul@f0:~ % doas sysrc zfskeys_datasets="zdata/enc zdata/enc/nfsdata zroot/bhyve"
    +zfskeys_datasets:  -> zdata/enc zdata/enc/nfsdata zroot/bhyve
    +
    +# Set correct key locations for all datasets
    +paul@f0:~ % doas zfs set \
    +  keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    +
    +# On f1 - include the replicated dataset
    +paul@f1:~ % doas sysrc zfskeys_enable=YES
    +zfskeys_enable: YES -> YES
    +paul@f1:~ % doas sysrc \
    +  zfskeys_datasets="zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata"
    +zfskeys_datasets:  -> zdata/enc zroot/bhyve zdata/sink/f0/zdata/enc/nfsdata
    +
    +# Set key location for replicated dataset
    +paul@f1:~ % doas zfs set \
    +  keylocation=file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
    +

    -

    The story of how my favorite #Golang book was ...


    +Important notes:

    -The story of how my favorite #Golang book was written:
    +
      +
    • Each encryption root needs its own key load entry
    • +
    • The replicated dataset on f1 uses the same encryption key as the source on f0
    • +
    • Always verify datasets are mounted after reboot with zfs list -o name,mounted
    • +
    • Critical: Always ensure the replicated dataset on f1 remains read-only with doas zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    • +

    +

    Troubleshooting: zrepl Replication Not Working



    -www.thecoder.cafe/p/100-go-mistakes
    +If zrepl replication is not working, here's a systematic approach to diagnose and fix common issues:

    -

    These are my personal book notes from Daniel ...


    +

    Check if zrepl Services are Running



    -These are my personal book notes from Daniel Pink's "When: The Scientific Secrets of Perfect Timing." The notes are for me (to improve happiness and productivity). You still need to read the whole book to get your own insights, but maybe the notes will be useful for you as well. #blog #book #booknotes #productivity
    +First, verify that zrepl is running on both nodes:

    -foo.zone/gemfeed/2025-04-19-when-book-notes.gmi (Gemini)
    -foo.zone/gemfeed/2025-04-19-when-book-notes.html
    + +
    # Check service status on both f0 and f1
    +paul@f0:~ % doas service zrepl status
    +paul@f1:~ % doas service zrepl status
    +
    +# If not running, start the service
    +paul@f0:~ % doas service zrepl start
    +paul@f1:~ % doas service zrepl start
    +

    -

    I certainly learned a lot reading this #llm ...


    +

    Check zrepl Status for Errors



    -I certainly learned a lot reading this #llm #coding #programming
    +Use the status command to see detailed error information:

    -simonwillison.net/2025/Mar/11/using-llms-for-code/
    + +
    # Check detailed status (use --mode raw for non-tty environments)
    +paul@f0:~ % doas zrepl status --mode raw
    +
    +# Look for error messages in the replication section
    +# Common errors include "no common snapshot" or connection failures
    +

    -

    Writing indempotent #Bash scripts ...


    +

    Fixing "No Common Snapshot" Errors



    -Writing indempotent #Bash scripts
    +This is the most common replication issue, typically occurring when:

    -arslan.io/2019/07/03/how-to-write-idempotent-bash-scripts/
    +
      +
    • The receiver has existing snapshots that don't match the sender
    • +
    • Different snapshot naming schemes are in use
    • +
    • The receiver dataset was created independently
    • +

    +**Error message example:**
    +
    +no common snapshot or suitable bookmark between sender and receiver
    +

    -

    Regarding #AI for code generation. You should ...


    +**Solution: Clean up conflicting snapshots on receiver**

    -Regarding #AI for code generation. You should be at least a bit curious and exleriement a bit. You don't have to use it if you don't see fit purpose.
    + +
    # First, identify the destination dataset on f1
    +paul@f1:~ % doas zfs list | grep sink
    +
    +# Check existing snapshots on the problematic dataset
    +paul@f1:~ % doas zfs list -t snapshot | grep nfsdata
    +
    +# If you see snapshots with different naming (e.g., @daily-*, @weekly-*)
    +# these conflict with zrepl's @zrepl_* snapshots
    +
    +# Destroy the entire destination dataset to allow clean replication
    +paul@f1:~ % doas zfs destroy -r zdata/sink/f0/zdata/enc/nfsdata
    +
    +# For VM replication, do the same for the freebsd dataset
    +paul@f1:~ % doas zfs destroy -r zdata/sink/f0/zroot/bhyve/freebsd
    +
    +# Wake up zrepl to start fresh replication
    +paul@f0:~ % doas zrepl signal wakeup f0_to_f1_nfsdata
    +paul@f0:~ % doas zrepl signal wakeup f0_to_f1_freebsd
    +
    +# Check replication status
    +paul@f0:~ % doas zrepl status --mode raw
    +

    -registerspill.thorstenball.com/p/they-..-..email=true&r=2n9ive&triedRedirect=true
    +**Verification that replication is working:**

    -

    I like the Rocky metaphor. And this post also ...


    + +
    # Look for "stepping" state and active zfs send processes
    +paul@f0:~ % doas zrepl status --mode raw | grep -A5 "State.*stepping"
    +
    +# Check for active ZFS commands
    +paul@f0:~ % doas zrepl status --mode raw | grep -A10 "ZFSCmds.*Active"
    +
    +# Monitor progress - bytes replicated should be increasing
    +paul@f0:~ % doas zrepl status --mode raw | grep BytesReplicated
    +

    -I like the Rocky metaphor. And this post also reflects my thoughts on coding. #llm #ai #software
    +

    Network Connectivity Issues



    -cekrem.github.io/posts/coding-as-craft-going-back-to-the-old-gym/
    +If replication fails to connect:

    -

    May 2025


    + +
    # Test connectivity between nodes
    +paul@f0:~ % nc -zv 192.168.2.131 8888
    +
    +# Check if zrepl is listening on f1
    +paul@f1:~ % doas netstat -an | grep 8888
    +
    +# Verify WireGuard tunnel is working
    +paul@f0:~ % ping 192.168.2.131
    +

    -

    There's now also a #Fish shell edition of my ...


    +

    Encryption Key Issues



    -There's now also a #Fish shell edition of my #tmux helper scripts: #fishshell
    +If encrypted replication fails:

    -foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.gmi (Gemini)
    -foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.html
    + +
    # Verify encryption keys are available on both nodes
    +paul@f0:~ % doas zfs get keystatus zdata/enc/nfsdata
    +paul@f1:~ % doas zfs get keystatus zdata/sink/f0/zdata/enc/nfsdata
    +
    +# Load keys if unavailable
    +paul@f1:~ % doas zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key \
    +    zdata/sink/f0/zdata/enc/nfsdata
    +

    -

    I loved this talk. It's about how you can ...


    +

    Monitoring Ongoing Replication



    -I loved this talk. It's about how you can create your own #Linux #container without Docker, using less than 100 lines of shell code without Docker or Podman and co. - Why is this talk useful? If you understand how #containers work "under the hood," it becomes easier to make design decisions, write your own tools, or debug production systems. I also recommend his training courses, of which I visited one once.
    +After fixing issues, monitor replication health:

    -www.youtube.com/watch?v=4RUiVAlJE2w
    + +
    # Monitor replication progress (run repeatedly to check status)
    +paul@f0:~ % doas zrepl status --mode raw | grep -A10 BytesReplicated
    +
    +# Or install watch from ports and use it
    +paul@f0:~ % doas pkg install watch
    +paul@f0:~ % watch -n 5 'doas zrepl status --mode raw | grep -A10 BytesReplicated'
    +
    +# Check for new snapshots being created
    +paul@f0:~ % doas zfs list -t snapshot | grep zrepl | tail -5
    +
    +# Verify snapshots appear on receiver
    +paul@f1:~ % doas zfs list -t snapshot -r zdata/sink | grep zrepl | tail -5
    +

    -

    Some unexpected #golang stuff, ppl say, that ...


    +This troubleshooting process resolves the most common zrepl issues and ensures continuous data replication between your storage nodes.

    -Some unexpected #golang stuff, ppl say, that Go is a simple language. IMHO the devil is in the details.
    +

    CARP (Common Address Redundancy Protocol)



    -unexpected-go.com/
    +High availability is crucial for storage systems. If the storage server goes down, all NFS clients (which will also be Kubernetes pods later on in this series) lose access to their persistent data. CARP provides a solution by creating a virtual IP address that automatically migrates to a different server during failures. This means that clients point to that VIP for NFS mounts and are always contacting the current primary node.

    -

    With the advent of AI and LLMs, I have observed ...


    +

    How CARP Works



    -With the advent of AI and LLMs, I have observed that being able to type quickly has become even more important for engineers. Previously, fast typing wasn't as crucial when coding, as most of the time was spent thinking or navigating through the code. However, with LLMs, you find yourself typing much more frequently. That's an unexpected personal win for me, as I recently learned fast touch typing: #llm #coding #programming
    +In our case, CARP allows two hosts (f0 and f1) to share a virtual IP address (VIP). The hosts communicate using multicast to elect a MASTER, while the other remain as BACKUP. When the MASTER fails, the BACKUP automatically promotes itself, and the VIP is reassigned to the new MASTER. This happens within seconds.

    -foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.gmi (Gemini)
    -foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.html
    +Key benefits for our storage system:

    -

    For science, fun and profit, I set up a ...


    +
      +
    • Automatic failover: No manual intervention is required for basic failures, although there are a few limitations. The backup will have read-only access to the available data by default, as we have already learned.
    • +
    • Transparent to clients: Pods continue using the same IP address
    • +
    • Works with stunnel: Behind the VIP, there will be a stunnel process running, which ensures encrypted connections follow the active server.
    • +

    +FreeBSD CARP
    +Stunnel

    -For science, fun and profit, I set up a #WireGuard mesh network for my #FreeBSD, #OpenBSD, #RockyLinux and #Kubernetes #homelab: There's also a mesh generator, which I wrote in #Ruby. #k3s #linux #k8s #k3s
    +

    Configuring CARP



    -foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.gmi (Gemini)
    -foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.html
    +First, we add the CARP configuration to /etc/rc.conf on both f0 and f1:

    -

    Ever wondered about the hung task Linux ...


    +Update: Sun 4 Jan 00:17:00 EET 2026 - Added advskew 100 to f1 so f0 always wins CARP elections when it comes back online after a reboot.

    -Ever wondered about the hung task Linux messages on a busy server? Every case is unique, and there is no standard approach to debug them, but here it gets a bit demystified: #linux #kernel
    + +
    # On f0 - The virtual IP 192.168.1.138 will float between f0 and f1
    +ifconfig_re0_alias0="inet vhid 1 pass testpass alias 192.168.1.138/32"
    +
    +# On f1 - Higher advskew means lower priority, so f0 wins elections
    +ifconfig_re0_alias0="inet vhid 1 advskew 100 pass testpass alias 192.168.1.138/32"
    +

    -blog.cloudflare.com/searching-for-the-cause-of-hung-tasks-in-the-linux-kernel/
    +Whereas:

    -

    A bit of #fun: The FORTRAN hating gateway ― ...


    +
      +
    • vhid 1: Virtual Host ID - must match on all CARP members
    • +
    • advskew: Advertisement skew - higher value means lower priority (f1 uses 100, f0 uses default 0)
    • +
    • pass testpass: Password for CARP authentication (if you follow this, use a different password!)
    • +
    • alias 192.168.1.138/32: The virtual IP address with a /32 netmask
    • +

    +Next, update /etc/hosts on all nodes (f0, f1, f2, r0, r1, r2) to resolve the VIP hostname:

    -A bit of #fun: The FORTRAN hating gateway ― Andreas Zwinkau
    +
    +192.168.2.138 f3s-storage-ha f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
    +fd42:beef:cafe:2::138 f3s-storage-ha f3s-storage-ha.wg0 f3s-storage-ha.wg0.wan.buetow.org
    +

    -beza1e1.tuxen.de/lore/fortran_hating_gateway.html
    +This allows clients to connect to f3s-storage-ha regardless of which physical server is currently the MASTER.

    -

    So, Golang was invented while engineers at ...


    +

    CARP State Change Notifications



    -So, Golang was invented while engineers at Google waited for C++ to compile. Here I am, waiting a long time for Java to compile...
    +To correctly manage services during failover, we need to detect CARP state changes. FreeBSD's devd system can notify us when CARP transitions between MASTER and BACKUP states.

    -

    I couldn't do without here-docs. If they did ...


    +Add this to /etc/devd.conf on both f0 and f1:

    -I couldn't do without here-docs. If they did not exist, I would need to find another field and pursue a career there. #bash #sh #shell
    + +
    paul@f0:~ % cat <<END | doas tee -a /etc/devd.conf
    +notify 0 {
    +        match "system"          "CARP";
    +        match "subsystem"       "[0-9]+@[0-9a-z.]+";
    +        match "type"            "(MASTER|BACKUP)";
    +        action "/usr/local/bin/carpcontrol.sh $subsystem $type";
    +};
    +END
    +
    +paul@f0:~ % doas service devd restart
    +

    -rednafi.com/misc/heredoc_headache/
    +Next, we create the CARP control script that will restart stunnel when the CARP state changes:

    -

    I started using computers as a kid on MS-DOS ...


    +Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed $1 to $2 because devd passes $subsystem $type, so the state is in the second argument.

    -I started using computers as a kid on MS-DOS and mainly used Norton Commander to navigate the file system in order to start games. Later, I became more interested in computing in general and switched to Linux, but there was no NC. However, there was GNU Midnight Commander, which I still use regularly to this day. It's absolutely worth checking out, even in the modern day. #tools #opensource
    + +
    paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
    +#!/bin/sh
    +# CARP state change control script
    +
    +case "$2" in
    +    MASTER)
    +        logger "CARP state changed to MASTER, starting services"
    +        ;;
    +    BACKUP)
    +        logger "CARP state changed to BACKUP, stopping services"
    +        ;;
    +    *)
    +        logger "CARP state changed to $2 (unhandled)"
    +        ;;
    +esac
    +EOF
    +
    +paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
    +
    +# Copy the same script to f1
    +paul@f0:~ % scp /usr/local/bin/carpcontrol.sh f1:/tmp/
    +paul@f1:~ % doas mv /tmp/carpcontrol.sh /usr/local/bin/
    +paul@f1:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
    +

    -en.wikipedia.org/wiki/Midnight_Commander
    +Note that carpcontrol.sh doesn't do anything useful yet. We will provide more details (including starting and stopping services upon failover) later in this blog post.

    -

    Thats interesting, running #Android in ...


    +To enable CARP in /boot/loader.conf, run:

    -Thats interesting, running #Android in #Kubernetes
    + +
    paul@f0:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf
    +carp_load="YES"
    +paul@f1:~ % echo 'carp_load="YES"' | doas tee -a /boot/loader.conf  
    +carp_load="YES"
    +

    -ku.bz/Gs4-wpK5h
    +Then reboot both hosts or run doas kldload carp to load the module immediately.

    -

    Before wiping the pre-installed #Windows 11 ...


    +

    NFS Server Configuration



    -Before wiping the pre-installed #Windows 11 Pro on my new Beelink mini PC, I tested #WSL2 with #Fedora #Linux. I compiled my pet project, I/O Riot NG (ior), which requires many system libraries, including #BPF. I’m impressed—everything works just like on native Fedora, and my tool runs and traces I/O syscalls with BPF out of the box. I might would prefer now Windows over MacOS if I had to chose between those two for work.
    +With ZFS replication in place, we can now set up NFS servers on both f0 and f1 to export the replicated data. Since native NFS over TLS (RFC 9289) has compatibility issues between Linux and FreeBSD (not digging into the details here, but I couldn't get it to work), we'll use stunnel to provide encryption.

    -codeberg.org/snonux/ior
    +

    Setting up NFS on f0 (Primary)



    -

    Some might hate me saying this, but didnt ...


    +First, enable the NFS services in rc.conf:

    -Some might hate me saying this, but didnt #systemd solve the problem of a shared /tmp directory by introducing PrivateTmp?? but yes why did it have to go that way...
    + +
    paul@f0:~ % doas sysrc nfs_server_enable=YES
    +nfs_server_enable: YES -> YES
    +paul@f0:~ % doas sysrc nfsv4_server_enable=YES
    +nfsv4_server_enable: YES -> YES
    +paul@f0:~ % doas sysrc nfsuserd_enable=YES
    +nfsuserd_enable: YES -> YES
    +paul@f0:~ % doas sysrc nfsuserd_flags="-domain lan.buetow.org"
    +nfsuserd_flags: "" -> "-domain lan.buetow.org"
    +paul@f0:~ % doas sysrc mountd_enable=YES
    +mountd_enable: NO -> YES
    +paul@f0:~ % doas sysrc rpcbind_enable=YES
    +rpcbind_enable: NO -> YES
    +

    -www.osnews.com/story/140968/tmp-should-not-exist/
    +Update: 08.08.2025: I've added the domain to nfsuserd_flags

    -

    Wouldn't still do that, even with 100% test ...


    +And we also create a dedicated directory for Kubernetes volumes:

    -Wouldn't still do that, even with 100% test coverage, LT and integration tests, unless theres an exception the business relies on #sre
    + +
    # First, ensure the dataset is mounted
    +paul@f0:~ % doas zfs get mounted zdata/enc/nfsdata
    +NAME               PROPERTY  VALUE    SOURCE
    +zdata/enc/nfsdata  mounted   yes      -
    +
    +# Create the k3svolumes directory
    +paul@f0:~ % doas mkdir -p /data/nfs/k3svolumes
    +paul@f0:~ % doas chmod 755 /data/nfs/k3svolumes
    +

    -medium.com/openclassrooms-product-desi..-..g/do-not-deploy-on-friday-92b1b46ebfe6
    +We also create the /etc/exports file. Since we're using stunnel for encryption, ALL clients must connect through stunnel, which appears as localhost (127.0.0.1) to the NFS server:

    -

    Some neat slice tricks for Go: #golang ...


    + +
    paul@f0:~ % doas tee /etc/exports <<'EOF'
    +V4: /data/nfs -sec=sys
    +/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
    +EOF
    +

    -Some neat slice tricks for Go: #golang
    +The exports configuration:

    -blog.devtrovert.com/p/12-slice-tricks-to-enhance-your-go
    +
      +
    • V4: /data/nfs -sec=sys: Sets the NFSv4 root directory to /data/nfs
    • +
    • -maproot=root: Maps root user from client to root on server
    • +
    • -network 127.0.0.1: Only accepts connections from localhost (stunnel)
    • +

    +To start the NFS services, we run:

    -

    I understand that Kubernetes is not for ...


    + +
    paul@f0:~ % doas service rpcbind start
    +Starting rpcbind.
    +paul@f0:~ % doas service mountd start
    +Starting mountd.
    +paul@f0:~ % doas service nfsd start
    +Starting nfsd.
    +paul@f0:~ % doas service nfsuserd start
    +Starting nfsuserd.
    +

    -I understand that Kubernetes is not for everyone, but it still seems to be the new default for everything newly built. Despite the fact that Kubernetes is complex to maintain and use, there is still a lot of SRE/DevOps talent out there who have it on their CVs, which contributes significantly to the supportability of the infrastructure and the applications running on it. This way, you don't have to teach every new engineer your "own way" infrastructure. It's like a standard language of infrastructure that many people speak. However, Kubernetes should not be the default solution for everything, in my opinion. #kubernetes #k8s
    +

    Configuring Stunnel for NFS Encryption with CARP Failover



    -www.gitpod.io/blog/we-are-leaving-kubernetes
    +Using stunnel with client certificate authentication for NFS encryption provides several advantages:

    -

    June 2025


    +
      +
    • Compatibility: Works with any NFS version and between different operating systems
    • +
    • Strong encryption: Uses TLS/SSL with configurable cipher suites
    • +
    • Transparent: Applications don't need modification, encryption happens at the transport layer
    • +
    • Performance: Minimal overhead (~2% in benchmarks)
    • +
    • Flexibility: Can encrypt any TCP-based protocol, not just NFS
    • +
    • Strong Authentication: Client certificates provide cryptographic proof of identity
    • +
    • Access Control: Only clients with valid certificates signed by your CA can connect
    • +
    • Certificate Revocation: You can revoke access by removing certificates from the CA
    • +

    +Stunnel integrates seamlessly with our CARP setup:

    -

    Some great advices, will try out some of them! ...


    +
    +                    CARP VIP (192.168.1.138)
    +                           |
    +    f0 (MASTER) ←---------→|←---------→ f1 (BACKUP)
    +    stunnel:2323           |           stunnel:stopped
    +    nfsd:2049              |           nfsd:stopped
    +                           |
    +                    Clients connect here
    +

    -Some great advices, will try out some of them! #programming
    +The key insight is that stunnel binds to the CARP VIP. When CARP fails over, the VIP is moved to the new master, and stunnel starts there automatically. Clients maintain their connection to the same IP throughout.

    -endler.dev/2025/best-programmers/
    +

    Creating a Certificate Authority for Client Authentication



    -

    In #Golang, values are actually copied when ...


    +First, create a CA to sign both server and client certificates:

    -In #Golang, values are actually copied when assigned (boxed) into an interface. That can have performance impact.
    + +
    # On f0 - Create CA
    +paul@f0:~ % doas mkdir -p /usr/local/etc/stunnel/ca
    +paul@f0:~ % cd /usr/local/etc/stunnel/ca
    +paul@f0:~ % doas openssl genrsa -out ca-key.pem 4096
    +paul@f0:~ % doas openssl req -new -x509 -days 3650 -key ca-key.pem -out ca-cert.pem \
    +  -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=F3S Stunnel CA'
    +
    +# Create server certificate
    +paul@f0:~ % cd /usr/local/etc/stunnel
    +paul@f0:~ % doas openssl genrsa -out server-key.pem 4096
    +paul@f0:~ % doas openssl req -new -key server-key.pem -out server.csr \
    +  -subj '/C=US/ST=State/L=City/O=F3S Storage/CN=f3s-storage-ha.lan'
    +paul@f0:~ % doas openssl x509 -req -days 3650 -in server.csr -CA ca/ca-cert.pem \
    +  -CAkey ca/ca-key.pem -CAcreateserial -out server-cert.pem
    +
    +# Create client certificates for authorised clients
    +paul@f0:~ % cd /usr/local/etc/stunnel/ca
    +paul@f0:~ % doas sh -c 'for client in r0 r1 r2 earth; do 
    +  openssl genrsa -out ${client}-key.pem 4096
    +  openssl req -new -key ${client}-key.pem -out ${client}.csr \
    +    -subj "/C=US/ST=State/L=City/O=F3S Storage/CN=${client}.lan.buetow.org"
    +  openssl x509 -req -days 3650 -in ${client}.csr -CA ca-cert.pem \
    +    -CAkey ca-key.pem -CAcreateserial -out ${client}-cert.pem
    +  # Combine cert and key into a single file for stunnel client
    +  cat ${client}-cert.pem ${client}-key.pem > ${client}-stunnel.pem
    +done'
    +

    -goperf.dev/01-common-patterns/interface-boxing/
    +

    Install and Configure Stunnel on f0



    -

    This is a great little tutorial for searching ...


    + +
    # Install stunnel
    +paul@f0:~ % doas pkg install -y stunnel
    +
    +# Configure stunnel server with client certificate authentication
    +paul@f0:~ % doas tee /usr/local/etc/stunnel/stunnel.conf <<'EOF'
    +cert = /usr/local/etc/stunnel/server-cert.pem
    +key = /usr/local/etc/stunnel/server-key.pem
    +
    +setuid = stunnel
    +setgid = stunnel
    +
    +[nfs-tls]
    +accept = 192.168.1.138:2323
    +connect = 127.0.0.1:2049
    +CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
    +verify = 2
    +requireCert = yes
    +EOF
    +
    +# Enable and start stunnel
    +paul@f0:~ % doas sysrc stunnel_enable=YES
    +stunnel_enable:  -> YES
    +paul@f0:~ % doas service stunnel start
    +Starting stunnel.
    +
    +# Restart stunnel to apply the CARP VIP binding
    +paul@f0:~ % doas service stunnel restart
    +Stopping stunnel.
    +Starting stunnel.
    +

    -This is a great little tutorial for searching in the #HelixEditor #editor #coding
    +The configuration includes:

    -helix-editor-tutorials.com/tutorials/using-helix-global-search/
    +
      +
    • verify = 2: Verify client certificate and fail if not provided
    • +
    • requireCert = yes: Client must present a valid certificate
    • +
    • CAfile: Path to the CA certificate that signed the client certificates
    • +

    +

    Setting up NFS on f1 (Standby)



    -

    The mov instruction of a CPU is turing ...


    +Repeat the same configuration on f1:

    -The mov instruction of a CPU is turing complete. And theres an implementation of #Doom only using mov, it renders one frame per 7 hours! #fun
    + +
    paul@f1:~ % doas sysrc nfs_server_enable=YES
    +nfs_server_enable: NO -> YES
    +paul@f1:~ % doas sysrc nfsv4_server_enable=YES
    +nfsv4_server_enable: NO -> YES
    +paul@f1:~ % doas sysrc nfsuserd_enable=YES
    +nfsuserd_enable: NO -> YES
    +paul@f1:~ % doas sysrc mountd_enable=YES
    +mountd_enable: NO -> YES
    +paul@f1:~ % doas sysrc rpcbind_enable=YES
    +rpcbind_enable: NO -> YES
    +
    +paul@f1:~ % doas tee /etc/exports <<'EOF'
    +V4: /data/nfs -sec=sys
    +/data/nfs -alldirs -maproot=root -network 127.0.0.1 -mask 255.255.255.255
    +EOF
    +
    +paul@f1:~ % doas service rpcbind start
    +Starting rpcbind.
    +paul@f1:~ % doas service mountd start
    +Starting mountd.
    +paul@f1:~ % doas service nfsd start
    +Starting nfsd.
    +paul@f1:~ % doas service nfsuserd start
    +Starting nfsuserd.
    +

    -beza1e1.tuxen.de/articles/accidentally_turing_complete.html
    +And to configure stunnel on f1, we run:

    -

    I removed the social media profile from my ...


    + +
    # Install stunnel
    +paul@f1:~ % doas pkg install -y stunnel
    +
    +# Copy certificates from f0
    +paul@f0:~ % doas tar -cf /tmp/stunnel-certs.tar \
    +  -C /usr/local/etc/stunnel server-cert.pem server-key.pem ca
    +paul@f0:~ % scp /tmp/stunnel-certs.tar f1:/tmp/
    +
    +paul@f1:~ % cd /usr/local/etc/stunnel && doas tar -xf /tmp/stunnel-certs.tar
    +
    +# Configure stunnel server on f1 with client certificate authentication
    +paul@f1:~ % doas tee /usr/local/etc/stunnel/stunnel.conf <<'EOF'
    +cert = /usr/local/etc/stunnel/server-cert.pem
    +key = /usr/local/etc/stunnel/server-key.pem
    +
    +setuid = stunnel
    +setgid = stunnel
    +
    +[nfs-tls]
    +accept = 192.168.1.138:2323
    +connect = 127.0.0.1:2049
    +CAfile = /usr/local/etc/stunnel/ca/ca-cert.pem
    +verify = 2
    +requireCert = yes
    +EOF
    +
    +# Enable and start stunnel
    +paul@f1:~ % doas sysrc stunnel_enable=YES
    +stunnel_enable:  -> YES
    +paul@f1:~ % doas service stunnel start
    +Starting stunnel.
    +
    +# Restart stunnel to apply the CARP VIP binding
    +paul@f1:~ % doas service stunnel restart
    +Stopping stunnel.
    +Starting stunnel.
    +

    -I removed the social media profile from my GrapheneOS phone. Originally, I created a separate profile just for social media to avoid using it too often. But I noticed that I switched to it too frequently. Not having social media within reach is probably the best option. #socialmedia #sm #distractions
    +

    CARP Control Script for Clean Failover



    -

    So want a "real" recent UNIX? Use AIX! #macos ...


    +With stunnel configured to bind to the CARP VIP (192.168.1.138), only the server that is currently the CARP MASTER will accept stunnel connections. This provides automatic failover for encrypted NFS:

    -So want a "real" recent UNIX? Use AIX! #macos #unix #aix
    +
      +
    • When f0 is CARP MASTER: stunnel on f0 accepts connections on 192.168.1.138:2323
    • +
    • When f1 becomes CARP MASTER: stunnel on f1 starts accepting connections on 192.168.1.138:2323
    • +
    • The backup server's stunnel process will fail to bind to the VIP and won't accept connections
    • +

    +This ensures that clients always connect to the active NFS server through the CARP VIP. To ensure clean failover behaviour and prevent stale file handles, we'll update our carpcontrol.sh script so that:

    -www.osnews.com/story/141633/apples-macos-unix-certification-is-a-lie/
    +
      +
    • Stops NFS services on BACKUP nodes (preventing split-brain scenarios)
    • +
    • Starts NFS services only on the MASTER node
    • +
    • Manages stunnel binding to the CARP VIP
    • +

    +This approach ensures clients can only connect to the active server, eliminating stale handles from the inactive server:

    -

    This episode, I think, is kind of an eye-opener ...


    +Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed $1 to $2 because devd passes $subsystem $type, so the state is in the second argument.

    -This episode, I think, is kind of an eye-opener for me personally. I knew, that AI is there to stay, but you better should now start playing with your pet projects, otherwise your performance reviews will be awkward in a year or two from now, when you are expected to use AI for your daily work. #ai #llm #coding #programming
    + +
    # Create CARP control script on both f0 and f1
    +paul@f0:~ % doas tee /usr/local/bin/carpcontrol.sh <<'EOF'
    +#!/bin/sh
    +# CARP state change control script
    +
    +HOSTNAME=`hostname`
    +
    +if [ ! -f /data/nfs/nfs.DO_NOT_REMOVE ]; then
    +    logger '/data/nfs not mounted, mounting it now!'
    +    if [ "$HOSTNAME" = 'f0.lan.buetow.org' ]; then
    +        zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/enc/nfsdata
    +        zfs set mountpoint=/data/nfs zdata/enc/nfsdata
    +    else
    +        zfs load-key -L file:///keys/f0.lan.buetow.org:zdata.key zdata/sink/f0/zdata/enc/nfsdata
    +        zfs set mountpoint=/data/nfs zdata/sink/f0/zdata/enc/nfsdata
    +        zfs mount zdata/sink/f0/zdata/enc/nfsdata
    +        zfs set readonly=on zdata/sink/f0/zdata/enc/nfsdata
    +    fi
    +    service nfsd stop 2>&1
    +    service mountd stop 2>&1
    +fi
    +
    +
    +case "$2" in
    +    MASTER)
    +        logger "CARP state changed to MASTER, starting services"
    +        service rpcbind start >/dev/null 2>&1
    +        service mountd start >/dev/null 2>&1
    +        service nfsd start >/dev/null 2>&1
    +        service nfsuserd start >/dev/null 2>&1
    +        service stunnel restart >/dev/null 2>&1
    +        logger "CARP MASTER: NFS and stunnel services started"
    +        ;;
    +    BACKUP)
    +        logger "CARP state changed to BACKUP, stopping services"
    +        service stunnel stop >/dev/null 2>&1
    +        service nfsd stop >/dev/null 2>&1
    +        service mountd stop >/dev/null 2>&1
    +        service nfsuserd stop >/dev/null 2>&1
    +        logger "CARP BACKUP: NFS and stunnel services stopped"
    +        ;;
    +    *)
    +        logger "CARP state changed to $2 (unhandled)"
    +        ;;
    +esac
    +EOF
    +
    +paul@f0:~ % doas chmod +x /usr/local/bin/carpcontrol.sh
    +

    -changelog.com/friends/96
    +

    CARP Management Script



    -

    My #OpenBSD blog setup got mentioned in the ...


    +To simplify CARP state management and failover testing, create this helper script on both f0 and f1:

    -My #OpenBSD blog setup got mentioned in the BSDNow.tv Podcast (In the Feedback section) :-) #BSD #podcast #runbsd
    + +
    # Create the CARP management script
    +paul@f0:~ % doas tee /usr/local/bin/carp <<'EOF'
    +#!/bin/sh
    +# CARP state management script
    +# Usage: carp [master|backup|auto-failback enable|auto-failback disable]
    +# Without arguments: shows current state
    +
    +# Find the interface with CARP configured
    +CARP_IF=$(ifconfig -l | xargs -n1 | while read if; do
    +    ifconfig "$if" 2>/dev/null | grep -q "carp:" && echo "$if" && break
    +done)
    +
    +if [ -z "$CARP_IF" ]; then
    +    echo "Error: No CARP interface found"
    +    exit 1
    +fi
    +
    +# Get CARP VHID
    +VHID=$(ifconfig "$CARP_IF" | grep "carp:" | sed -n 's/.*vhid \([0-9]*\).*/\1/p')
    +
    +if [ -z "$VHID" ]; then
    +    echo "Error: Could not determine CARP VHID"
    +    exit 1
    +fi
    +
    +# Function to get the current state
    +get_state() {
    +    ifconfig "$CARP_IF" | grep "carp:" | awk '{print $2}'
    +}
    +
    +# Check for auto-failback block file
    +BLOCK_FILE="/data/nfs/nfs.NO_AUTO_FAILBACK"
    +check_auto_failback() {
    +    if [ -f "$BLOCK_FILE" ]; then
    +        echo "WARNING: Auto-failback is DISABLED (file exists: $BLOCK_FILE)"
    +    fi
    +}
    +
    +# Main logic
    +case "$1" in
    +    "")
    +        # No argument - show current state
    +        STATE=$(get_state)
    +        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    +        check_auto_failback
    +        ;;
    +    master)
    +        # Force to MASTER state
    +        echo "Setting CARP to MASTER state..."
    +        ifconfig "$CARP_IF" vhid "$VHID" state master
    +        sleep 1
    +        STATE=$(get_state)
    +        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    +        check_auto_failback
    +        ;;
    +    backup)
    +        # Force to BACKUP state
    +        echo "Setting CARP to BACKUP state..."
    +        ifconfig "$CARP_IF" vhid "$VHID" state backup
    +        sleep 1
    +        STATE=$(get_state)
    +        echo "CARP state on $CARP_IF (vhid $VHID): $STATE"
    +        check_auto_failback
    +        ;;
    +    auto-failback)
    +        case "$2" in
    +            enable)
    +                if [ -f "$BLOCK_FILE" ]; then
    +                    rm "$BLOCK_FILE"
    +                    echo "Auto-failback ENABLED (removed $BLOCK_FILE)"
    +                else
    +                    echo "Auto-failback was already enabled"
    +                fi
    +                ;;
    +            disable)
    +                if [ ! -f "$BLOCK_FILE" ]; then
    +                    touch "$BLOCK_FILE"
    +                    echo "Auto-failback DISABLED (created $BLOCK_FILE)"
    +                else
    +                    echo "Auto-failback was already disabled"
    +                fi
    +                ;;
    +            *)
    +                echo "Usage: $0 auto-failback [enable|disable]"
    +                echo "  enable:  Remove block file to allow automatic failback"
    +                echo "  disable: Create block file to prevent automatic failback"
    +                exit 1
    +                ;;
    +        esac
    +        ;;
    +    *)
    +        echo "Usage: $0 [master|backup|auto-failback enable|auto-failback disable]"
    +        echo "  Without arguments: show current CARP state"
    +        echo "  master: force this node to become CARP MASTER"
    +        echo "  backup: force this node to become CARP BACKUP"
    +        echo "  auto-failback enable:  allow automatic failback to f0"
    +        echo "  auto-failback disable: prevent automatic failback to f0"
    +        exit 1
    +        ;;
    +esac
    +EOF
    +
    +paul@f0:~ % doas chmod +x /usr/local/bin/carp
    +
    +# Copy to f1 as well
    +paul@f0:~ % scp /usr/local/bin/carp f1:/tmp/
    +paul@f1:~ % doas cp /tmp/carp /usr/local/bin/carp && doas chmod +x /usr/local/bin/carp
    +

    -www.bsdnow.tv/614
    +Now you can easily manage CARP states and auto-failback:

    -

    #Golang is the best when it comes to agentic ...


    + +
    # Check current CARP state
    +paul@f0:~ % doas carp
    +CARP state on re0 (vhid 1): MASTER
    +
    +# If auto-failback is disabled, you'll see a warning
    +paul@f0:~ % doas carp
    +CARP state on re0 (vhid 1): MASTER
    +WARNING: Auto-failback is DISABLED (file exists: /data/nfs/nfs.NO_AUTO_FAILBACK)
    +
    +# Force f0 to become BACKUP (triggers failover to f1)
    +paul@f0:~ % doas carp backup
    +Setting CARP to BACKUP state...
    +CARP state on re0 (vhid 1): BACKUP
    +
    +# Disable auto-failback (useful for maintenance)
    +paul@f0:~ % doas carp auto-failback disable
    +Auto-failback DISABLED (created /data/nfs/nfs.NO_AUTO_FAILBACK)
    +
    +# Enable auto-failback
    +paul@f0:~ % doas carp auto-failback enable
    +Auto-failback ENABLED (removed /data/nfs/nfs.NO_AUTO_FAILBACK)
    +

    -#Golang is the best when it comes to agentic coding: #llm
    +

    Automatic Failback After Reboot



    -lucumr.pocoo.org/2025/6/12/agentic-coding/
    +When f0 reboots (planned or unplanned), f1 takes over as CARP MASTER. To ensure f0 automatically reclaims its primary role once it's fully operational, we'll implement an automatic failback mechanism. With:

    -

    Where #zsh is better than #bash ...


    +Update: Fixed the script at Sun 4 Jan 00:04:28 EET 2026 - removed the NFS service check because when f0 is BACKUP, NFS services are intentionally stopped by carpcontrol.sh, which would prevent auto-failback from ever triggering.

    -Where #zsh is better than #bash
    + +
    paul@f0:~ % doas tee /usr/local/bin/carp-auto-failback.sh <<'EOF'
    +#!/bin/sh
    +# CARP automatic failback script for f0
    +# Ensures f0 reclaims MASTER role after reboot when storage is ready
    +
    +LOGFILE="/var/log/carp-auto-failback.log"
    +MARKER_FILE="/data/nfs/nfs.DO_NOT_REMOVE"
    +BLOCK_FILE="/data/nfs/nfs.NO_AUTO_FAILBACK"
    +
    +log_message() {
    +    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOGFILE"
    +}
    +
    +# Check if we're already MASTER
    +CURRENT_STATE=$(/usr/local/bin/carp | awk '{print $NF}')
    +if [ "$CURRENT_STATE" = "MASTER" ]; then
    +    exit 0
    +fi
    +
    +# Check if /data/nfs is mounted
    +if ! mount | grep -q "on /data/nfs "; then
    +    log_message "SKIP: /data/nfs not mounted"
    +    exit 0
    +fi
    +
    +# Check if the marker file exists
    +# (identifies that the ZFS data set is properly mounted)
    +if [ ! -f "$MARKER_FILE" ]; then
    +    log_message "SKIP: Marker file $MARKER_FILE not found"
    +    exit 0
    +fi
    +
    +# Check if failback is blocked (for maintenance)
    +if [ -f "$BLOCK_FILE" ]; then
    +    log_message "SKIP: Failback blocked by $BLOCK_FILE"
    +    exit 0
    +fi
    +
    +# All conditions met - promote to MASTER
    +log_message "CONDITIONS MET: Promoting to MASTER (was $CURRENT_STATE)"
    +/usr/local/bin/carp master
    +
    +# Log result
    +sleep 2
    +NEW_STATE=$(/usr/local/bin/carp | awk '{print $NF}')
    +log_message "Failback complete: State is now $NEW_STATE"
    +
    +# If successful, log to the system log too
    +if [ "$NEW_STATE" = "MASTER" ]; then
    +    logger "CARP: f0 automatically reclaimed MASTER role"
    +fi
    +EOF
    +
    +paul@f0:~ % doas chmod +x /usr/local/bin/carp-auto-failback.sh
    +

    -www.arp242.net/why-zsh.html
    +The marker file identifies that the ZFS data set is mounted correctly. We create it with:

    -

    I really enjoyed this talk about obscure Go ...


    + +
    paul@f0:~ % doas touch /data/nfs/nfs.DO_NOT_REMOVE
    +

    -I really enjoyed this talk about obscure Go optimizations. None of it is really standard and can change from one version of Go to another, though. #golang #talk
    +We add a cron job to check every minute:

    -www.youtube.com/watch?v=rRtihWOcaLI
    + +
    paul@f0:~ % echo "* * * * * /usr/local/bin/carp-auto-failback.sh" | doas crontab -
    +

    -

    Commenting your regular expression is generally ...


    +The enhanced CARP script provides integrated control over auto-failback. To temporarily turn off automatic failback (e.g., for f0 maintenance), we run:

    -Commenting your regular expression is generally a good advice! Works pretty well as described in the article not just in #Ruby, but also in #Perl (@Perl), #RakuLang, ...
    + +
    paul@f0:~ % doas carp auto-failback disable
    +Auto-failback DISABLED (created /data/nfs/nfs.NO_AUTO_FAILBACK)
    +

    -thoughtbot.com/blog/comment-your-regular-expressions
    +And to re-enable it:

    -

    You have to make a decision for yourself, but ...


    + +
    paul@f0:~ % doas carp auto-failback enable
    +Auto-failback ENABLED (removed /data/nfs/nfs.NO_AUTO_FAILBACK)
    +

    -You have to make a decision for yourself, but generally, work smarter (and faster—but keep the quality)! About 40 hours #productivity #work #workload
    +To check whether auto-failback is enabled, we run:

    -thesquareplanet.com/blog/about-40-hours/
    + +
    paul@f0:~ % doas carp
    +CARP state on re0 (vhid 1): MASTER
    +# If disabled, you'll see: WARNING: Auto-failback is DISABLED
    +

    -

    "100 Go Mistakes and How to Avoid Them" is one ...


    +The failback attempts are logged to /var/log/carp-auto-failback.log!

    -"100 Go Mistakes and How to Avoid Them" is one of my favorite #Golang books. Julia Evans also stumbled across some issues she'd learned from this book. The book itself is an absolute must for every Gopher (or someone who wants to become one!)
    +So, in summary:

    -jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/
    +
      +
    • After f0 reboots: f1 is MASTER, f0 boots as BACKUP
    • +
    • Cron runs every minute: Checks if conditions are met (Is f0 currently BACKUP? (don't run if already MASTER)), (Is /data/nfs mounted? (ZFS datasets are ready)), (Does marker file exist? (confirms this is primary storage)), (Is failback blocked? (admin can prevent failback)), (Are NFS services running? (system is fully ready))
    • +
    • Failback occurs: Typically 2-3 minutes after boot completes
    • +
    • Logging: All attempts logged for troubleshooting
    • +

    +This ensures f0 automatically resumes its role as primary storage server after any reboot, while providing administrative control when needed.

    -

    The #Ruby Data class seems quite helpful ...


    +

    Client Configuration for NFS via Stunnel



    -The #Ruby Data class seems quite helpful
    +To mount NFS shares with stunnel encryption, clients must install and configure stunnel using their client certificates.

    -allaboutcoding.ghinda.com/example-of-value-objects-using-rubys-data-class
    +

    Configuring Rocky Linux Clients (r0, r1, r2)



    -Other related posts:
    +On the Rocky Linux VMs, we run:

    -2025-01-01 Posts from October to December 2024
    -2025-07-01 Posts from January to June 2025 (You are currently reading this)
    -2026-01-01 Posts from July to December 2025
    + +
    # Install stunnel on client (example for `r0`)
    +[root@r0 ~]# dnf install -y stunnel nfs-utils
    +
    +# Copy client certificate and CA certificate from f0
    +[root@r0 ~]# scp f0:/usr/local/etc/stunnel/ca/r0-stunnel.pem /etc/stunnel/
    +[root@r0 ~]# scp f0:/usr/local/etc/stunnel/ca/ca-cert.pem /etc/stunnel/
    +
    +# Configure stunnel client with certificate authentication
    +[root@r0 ~]# tee /etc/stunnel/stunnel.conf <<'EOF'
    +cert = /etc/stunnel/r0-stunnel.pem
    +CAfile = /etc/stunnel/ca-cert.pem
    +client = yes
    +verify = 2
    +
    +[nfs-ha]
    +accept = 127.0.0.1:2323
    +connect = 192.168.1.138:2323
    +EOF
    +
    +# Enable and start stunnel
    +[root@r0 ~]# systemctl enable --now stunnel
    +
    +# Repeat for r1 and r2 with their respective certificates
    +

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +Note: Each client must use its certificate file (r0-stunnel.pem, r1-stunnel.pem, r2-stunnel.pem, or earth-stunnel.pem - the latter is for my Laptop, which can also mount the NFS shares).

    -Back to the main site
    -
    -
    -
    - - Task Samurai: An agentic coding learning experiment - - gemini://foo.zone/gemfeed/2025-06-22-task-samurai.gmi - 2025-06-22T20:00:51+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Task Samurai is a fast terminal interface for Taskwarrior written in Go using the Bubble Tea framework. It displays your tasks in a table and allows you to manage them without leaving your keyboard. - -
    -

    Task Samurai: An agentic coding learning experiment


    +

    NFSv4 user mapping config on Rocky



    -Published at 2025-06-22T20:00:51+03:00
    +Update: This section was added 08.08.2025!

    -Task Samurai Logo
    +For this, we need to set the Domain in /etc/idmapd.conf on all 3 Rocky hosts to lan.buetow.org (remember, earlier in this blog post we set the nfsuserd domain on the NFS server side to lan.buetow.org as well!)

    -

    Table of Contents


    +
    +[General]
    +
    +Domain = lan.buetow.org
    +.
    +.
    +.
    +

    -
    -

    Introduction


    +We also need to increase the inotify limit, otherwise nfs-idmapd may fail to start with "Too many open files":

    -Task Samurai is a fast terminal interface for Taskwarrior written in Go using the Bubble Tea framework. It displays your tasks in a table and allows you to manage them without leaving your keyboard.
    + +
    [root@r0 ~]# echo 'fs.inotify.max_user_instances = 512' > /etc/sysctl.d/99-inotify.conf
    +[root@r0 ~]# sysctl -w fs.inotify.max_user_instances=512
    +

    -https://taskwarrior.org
    -https://github.com/charmbracelet/bubbletea
    +And afterwards, we need to run the following on all 3 Rocky hosts:

    -

    Why does this exist?


    + +
    [root@r0 ~]# systemctl start nfs-idmapd
    +[root@r0 ~]# systemctl enable --now nfs-client.target
    +

    -I wanted to tinker with agentic coding. This project was implemented entirely using OpenAI Codex. (After this blog post was published, I also used the Claude Code CLI.)
    +and then, safest, reboot those.

    -
      -
    • I wanted a faster UI for Taskwarrior than other options, like Vit, which is Python-based.
    • -
    • I wanted something built with Bubble Tea, but I never had time to dive deep into it.
    • -
    • I wanted to build a toy project (like Task Samurai) first, before tackling the big ones, to get started with agentic coding.
    • -

    -https://openai.com/codex/
    +

    Testing NFS Mount with Stunnel



    -I've been curious about agentic coding for a while and wanted to see what it's actually like to build something with it. So I gave it a go (no pun intended).
    +To mount NFS through the stunnel encrypted tunnel, we run:

    -

    How it works


    + +
    # Create a mount point
    +[root@r0 ~]# mkdir -p /data/nfs/k3svolumes
    +
    +# Mount through stunnel (using localhost and NFSv4)
    +[root@r0 ~]# mount -t nfs4 -o port=2323 127.0.0.1:/k3svolumes /data/nfs/k3svolumes
    +
    +# Verify mount
    +[root@r0 ~]# mount | grep k3svolumes
    +127.0.0.1:/k3svolumes on /data/nfs/k3svolumes 
    +  type nfs4 (rw,relatime,vers=4.2,rsize=131072,wsize=131072,
    +  namlen=255,hard,proto=tcp,port=2323,timeo=600,retrans=2,sec=sys,
    +  clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1)
    +
    +# For persistent mount, add to /etc/fstab:
    +127.0.0.1:/k3svolumes /data/nfs/k3svolumes nfs4 port=2323,_netdev,soft,timeo=10,retrans=2,intr 0 0
    +

    -Task Samurai invokes the task command (that's the original Taskwarrior CLI command) to read and modify tasks. The tasks are displayed in a Bubble Tea table, where each row represents a task. Hotkeys trigger Taskwarrior commands such as starting, completing or annotating tasks. The UI refreshes automatically after each action, so the table is always up to date.
    +Note: The mount uses localhost (127.0.0.1) because stunnel is listening locally and forwarding the encrypted traffic to the remote server.

    -Task Samurai Screenshot
    +

    Testing CARP Failover with mounted clients and stale file handles:



    -

    Where and how to get it


    +To test the failover process:

    -Go to:
    + +
    # On f0 (current MASTER) - trigger failover
    +paul@f0:~ % doas ifconfig re0 vhid 1 state backup
    +
    +# On f1 - verify it becomes MASTER
    +paul@f1:~ % ifconfig re0 | grep carp
    +    inet 192.168.1.138 netmask 0xffffffff broadcast 192.168.1.138 vhid 1
    +
    +# Check stunnel is now listening on f1
    +paul@f1:~ % doas sockstat -l | grep 2323
    +stunnel  stunnel    4567  3  tcp4   192.168.1.138:2323    *:*
    +
    +# On client - verify NFS mount still works
    +[root@r0 ~]# ls /data/nfs/k3svolumes/
    +[root@r0 ~]# echo "Test after failover" > /data/nfs/k3svolumes/failover-test.txt
    +

    -https://codeberg.org/snonux/tasksamurai
    +After a CARP failover, NFS clients may experience "Stale file handle" errors because they cached file handles from the previous server. To resolve this manually, we can run:

    -And follow the README.md!
    + +
    # Force unmount and remount
    +[root@r0 ~]# umount -f /data/nfs/k3svolumes
    +[root@r0 ~]# mount /data/nfs/k3svolumes
    +

    -

    Lessons learned from building Task Samurai with agentic coding


    +For the automatic recovery, we create a script:

    -

    Developer workflow


    + +
    [root@r0 ~]# cat > /usr/local/bin/check-nfs-mount.sh << 'EOF'
    +#!/bin/bash
    +# Fast NFS mount health monitor - runs every 10 seconds via systemd timer
    +
    +MOUNT_POINT="/data/nfs/k3svolumes"
    +LOCK_FILE="/var/run/nfs-mount-check.lock"
    +
    +# Use a lock file to prevent concurrent runs
    +if [ -f "$LOCK_FILE" ]; then
    +    exit 0
    +fi
    +touch "$LOCK_FILE"
    +trap "rm -f $LOCK_FILE" EXIT
    +
    +MOUNT_FIXED=0
    +
    +fix_mount () {
    +    echo "Attempting to remount NFS mount $MOUNT_POINT"
    +    if mount -o remount -f "$MOUNT_POINT" 2>/dev/null; then
    +        echo "Remount command issued for $MOUNT_POINT"
    +    else
    +        echo "Failed to remount NFS mount $MOUNT_POINT"
    +    fi
    +
    +    echo "Checking if $MOUNT_POINT is a mountpoint"
    +    if mountpoint "$MOUNT_POINT" >/dev/null 2>&1; then
    +        echo "$MOUNT_POINT is a valid mountpoint"
    +    else
    +        echo "$MOUNT_POINT is not a valid mountpoint, attempting mount"
    +        if mount "$MOUNT_POINT"; then
    +            echo "Successfully mounted $MOUNT_POINT"
    +            MOUNT_FIXED=1
    +            return
    +        else
    +            echo "Failed to mount $MOUNT_POINT"
    +        fi
    +    fi
    +
    +    echo "Attempting to unmount $MOUNT_POINT"
    +    if umount -f "$MOUNT_POINT" 2>/dev/null; then
    +        echo "Successfully unmounted $MOUNT_POINT"
    +    else
    +        echo "Failed to unmount $MOUNT_POINT (it might not be mounted)"
    +    fi
    +
    +    echo "Attempting to mount $MOUNT_POINT"
    +    if mount "$MOUNT_POINT"; then
    +        echo "NFS mount $MOUNT_POINT mounted successfully"
    +        MOUNT_FIXED=1
    +        return
    +    else
    +        echo "Failed to mount NFS mount $MOUNT_POINT"
    +    fi
    +
    +    echo "Failed to fix NFS mount $MOUNT_POINT"
    +    exit 1
    +}
    +
    +if ! mountpoint "$MOUNT_POINT" >/dev/null 2>&1; then
    +    echo "NFS mount $MOUNT_POINT not found"
    +    fix_mount
    +fi
    +
    +if ! timeout 2s stat "$MOUNT_POINT" >/dev/null 2>&1; then
    +    echo "NFS mount $MOUNT_POINT appears to be unresponsive"
    +    fix_mount
    +fi
    +
    +# After a successful remount, delete pods stuck on this node
    +if [ "$MOUNT_FIXED" -eq 1 ]; then
    +    echo "Mount was fixed, checking for stuck pods on this node..."
    +    NODE=$(hostname)
    +    export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
    +    kubectl get pods --all-namespaces \
    +      --field-selector="spec.nodeName=$NODE" \
    +      -o json 2>/dev/null | jq -r '
    +        .items[] |
    +        select(
    +          .status.phase == "Unknown" or
    +          .status.phase == "Pending" or
    +          (.status.conditions // [] |
    +            any(.type == "Ready" and .status == "False")) or
    +          (.status.containerStatuses // [] |
    +            any(.state.waiting.reason == "ContainerCreating"))
    +        ) | "\(.metadata.namespace) \(.metadata.name)"' | \
    +      while read ns pod; do
    +        echo "Deleting stuck pod $ns/$pod"
    +        kubectl delete pod -n "$ns" "$pod" \
    +          --grace-period=0 --force 2>&1
    +      done
    +fi
    +EOF
    +
    +[root@r0 ~]# chmod +x /usr/local/bin/check-nfs-mount.sh
    +

    -I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am currently trying out) credits (it still happens!), but Codex was still available to me. So, I took the opportunity to push agentic coding a bit further with another platform.
    +And we create the systemd service as follows:

    -I didn't really love the web UI you have to use for Codex, as I usually live in the terminal. But this is all I have for Codex for now, and I thought I'd give it a try regardless. The web UI is simple and pretty straightforward. There's also a Codex CLI one could use directly in the terminal, but I didn't get it working. I will try again soon.
    + +
    [root@r0 ~]# cat > /etc/systemd/system/nfs-mount-monitor.service << 'EOF'
    +[Unit]
    +Description=NFS Mount Health Monitor
    +After=network-online.target
    +
    +[Service]
    +Type=oneshot
    +ExecStart=/usr/local/bin/check-nfs-mount.sh
    +StandardOutput=journal
    +StandardError=journal
    +EOF
    +

    -Update: Codex CLI now works for me, after OpenAI released a new version!
    +And we also create the systemd timer (runs every 10 seconds):

    -For every task given to Codex, it spins up its own container. From there, you can drill down and watch what it is doing. At the end, the result (in the form of a code diff) will be presented. From there, you can make suggestions about what else to change in the codebase. What I found inconvenient is that for every additional change, there's an overhead because Codex has to spin up a container and bootstrap the entire development environment again, which adds extra delay. That could be eliminated by setting up predefined custom containers, but that feature still seems somewhat limited.
    + +
    [root@r0 ~]# cat > /etc/systemd/system/nfs-mount-monitor.timer << 'EOF'
    +[Unit]
    +Description=Run NFS Mount Health Monitor every 10 seconds
    +Requires=nfs-mount-monitor.service
    +
    +[Timer]
    +OnBootSec=30s
    +OnUnitActiveSec=10s
    +AccuracySec=1s
    +
    +[Install]
    +WantedBy=timers.target
    +EOF
    +

    -Once satisfied, you can ask Codex to create a GitHub PR (too bad only GitHub is supported and no other Git hosters); from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
    +To enable and start the timer, we run:

    -

    How it went


    + +
    [root@r0 ~]# systemctl daemon-reload
    +[root@r0 ~]# systemctl enable nfs-mount-monitor.timer
    +[root@r0 ~]# systemctl start nfs-mount-monitor.timer
    +
    +# Check status
    +[root@r0 ~]# systemctl status nfs-mount-monitor.timer
    +● nfs-mount-monitor.timer - Run NFS Mount Health Monitor every 10 seconds
    +     Loaded: loaded (/etc/systemd/system/nfs-mount-monitor.timer; enabled)
    +     Active: active (waiting) since Sat 2025-07-06 10:00:00 EEST
    +    Trigger: Sat 2025-07-06 10:00:10 EEST; 8s left
    +
    +# Monitor logs
    +[root@r0 ~]# journalctl -u nfs-mount-monitor -f
    +

    -Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits:
    +Note: Stale file handles are inherent to NFS failover because file handles are server-specific. The best approach depends on your application's tolerance for brief disruptions. Of course, all the changes made to r0 above must also be applied to r1 and r2.

    -
      -
    • June 19: Scaffolded the Go boilerplate, set up tests, integrated the Bubble Tea UI framework, and got the first table views showing up.
    • -
    • June 20: (The big one—120 commits!) Added hotkeys, colourized tasks, annotation support, undo/redo, and, for fun, fireworks on quit (which never worked and got removed at a later point). This is where most of the bugs, merges, and fast-paced changes happen.
    • -
    • June 21: Refined searching, theming, and column sizing and documented all those hotkeys. Numerous tweaks to make the UI cleaner and more user-friendly.
    • -
    • June 22: Final touches—added screenshots, polished the logo, fixed module paths… and then it was a wrap.
    • -

    -Most big breakthroughs (and bug introductions) came during that middle day of intense iteration. The latter stages were all about smoothing out the rough edges.
    +Updated Wed 19 Mar 2026: Added automatic pod restart after NFS remount

    -It's worth noting that I worked on it in the evenings when I had some free time, as I also had to fit in my regular work and family commitments during the day. So, I didn't spend full working days on this project.
    +The script now also tracks whether a mount was fixed via the MOUNT_FIXED variable. After a successful remount, it queries kubectl for pods on the local node that are stuck in Unknown, Pending, or ContainerCreating state and force-deletes them. Kubernetes then automatically reschedules these pods, which will now succeed because the NFS mount is healthy again. Without this, pods that hit a stale mount would remain broken until manually deleted, even after the underlying NFS issue was resolved.

    -

    What went wrong


    +

    Complete Failover Test



    -Going agentic isn't all smooth. Here are the hiccups I ran into, plus a few lessons:
    +Here's a comprehensive test of the failover behaviour with all optimisations in place:

    -
      -
    • Merge Floods: Every minor feature or fix existed on its branch, so merging was a constant process. It kept progress flowing but also drowned the committed history in noise and the occasional conflict. I found this to be an issue with OpenAI's Codex in particular. Not so much with other agentic coding tools like Claude Code CLI (not covered in this blog post.)
    • -
    • Fixes on fixes: Features like "fireworks on exit" had chains of "fix exit," "fix cell selection," etc. Sometimes, new additions introduced bugs that needed rapid patching.
    • -

    -

    Patterns that helped


    + +
    # 1. Check the initial state
    +paul@f0:~ % ifconfig re0 | grep carp
    +    carp: MASTER vhid 1 advbase 1 advskew 0
    +paul@f1:~ % ifconfig re0 | grep carp
    +    carp: BACKUP vhid 1 advbase 1 advskew 100
    +
    +# 2. Create a test file from a client
    +[root@r0 ~]# echo "test before failover" > /data/nfs/k3svolumes/test-before.txt
    +
    +# 3. Trigger failover (f0 → f1)
    +paul@f0:~ % doas ifconfig re0 vhid 1 state backup
    +
    +# 4. Monitor client behaviour
    +[root@r0 ~]# ls /data/nfs/k3svolumes/
    +ls: cannot access '/data/nfs/k3svolumes/': Stale file handle
    +
    +# 5. Check automatic recovery (within 10 seconds)
    +[root@r0 ~]# journalctl -u nfs-mount-monitor -f
    +Jul 06 10:15:32 r0 nfs-monitor[1234]: NFS mount unhealthy detected at \
    +  Sun Jul 6 10:15:32 EEST 2025
    +Jul 06 10:15:32 r0 nfs-monitor[1234]: Attempting to fix stale NFS mount at \
    +  Sun Jul 6 10:15:32 EEST 2025
    +Jul 06 10:15:33 r0 nfs-monitor[1234]: NFS mount fixed at \
    +  Sun Jul 6 10:15:33 EEST 2025
    +

    -Despite the chaos, a few strategies kept things moving:
    +Failover Timeline:

      -
    • Scaffolding First: I started with the basic table UI and command wrappers, then layered on features—never the other way around.
    • -
    • Tiny PRs: Small, atomic merges meant feedback came fast (and so did fixes).
    • -
    • Tests Matter: A solid base of unit tests for task manipulations kept things from breaking entirely when experimenting.
    • -
    • Live Documentation: Documentation, such as the README, is updated regularly to reflect all the hotkey and feature changes.
    • +
    • 0 seconds: CARP failover triggered
    • +
    • 0-2 seconds: Clients get "Stale file handle" errors (not hanging)
    • +
    • 3-10 seconds: Soft mounts ensure quick failure of operations
    • +
    • Within 10 seconds: Automatic recovery via systemd timer

    -Maybe a better approach would have been to design the whole application from scratch before letting Codix do any of the coding. I will try that with my next toy project.
    -
    -

    What I learned using agentic coding


    -
    -Stepping into agentic coding with Codex as my "pair programmer" was a big shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at high speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).
    -
    -

    how much time did I save?


    -
    -Did it buy me speed?
    +Benefits of the Optimised Setup:

      -
    • Say each commit takes Codex 5 minutes to generate, and you need to review/guide 179 commits = about _6 hours of active development_.
    • -
    • If you coded it all yourself, including all the bug fixes, features, design, and documentation, you might spend _10–20 hours_.
    • -
    • That's a couple of days of potential savings—and I am by no means an expert in agentic coding, since this was my first completed agentic coding project.
    • +
    • No hanging processes - Soft mounts fail quickly
    • +
    • Clean failover - Old server stops serving immediately
    • +
    • Fast automatic recovery - No manual intervention needed
    • +
    • Predictable timing - Recovery within 10 seconds with systemd timer
    • +
    • Better visibility - systemd journal provides detailed logs

    -

    Conclusion


    -
    -Building Task Samurai with agentic coding was a wild ride—rapid feature growth, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a terminal UI in days instead of weeks is a neat little showcase vibe coding.
    -
    -Am I an agentic coding expert now? I don't think so. There are still many things to learn, and the landscape is constantly evolving.
    -
    -While working on Task Samurai, there were times I missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
    -
    -Personally, I am not sure whether I like where the industry is going with agentic coding. I love "traditional" coding, and with agentic coding you operate at a higher level and don't interact directly with code as often, which I would miss. I think that in the future, designing, reviewing, and being able to read and understand code will be more important than writing code by hand.
    -
    -Do you have any thoughts on that? I hope, I am partially wrong at least.
    -
    -E-Mail your comments to paul@nospam.buetow.org :-)
    -
    -Other related posts are:
    -
    -2025-08-05 Local LLM for Coding with Ollama on macOS
    -2025-06-22 Task Samurai: An agentic coding learning experiment (You are currently reading this)
    +Important Considerations:

    -Back to the main site
    -
    -
    -
    - - 'A Monk's Guide to Happiness' book notes - - gemini://foo.zone/gemfeed/2025-06-07-a-monks-guide-to-happiness-book-notes.gmi - 2025-06-07T10:30:11+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - These are my personal book notes from Gelong Thubten's 'A Monk's Guide to Happiness: Meditation in the 21st century.' They are for my own reference, but I hope they might be useful to you as well. - -
    -

    "A Monk's Guide to Happiness" book notes


    +
      +
    • Recent writes (within 1 minute) may not be visible after failover due to replication lag
    • +
    • Applications should handle brief NFS errors gracefully
    • +
    • For zero-downtime requirements, consider synchronous replication or distributed storage (see "Future storage explorations" section later in this blog post)
    • +

    +

    Update: Upgrade to 4TB drives



    -Published at 2025-06-07T10:30:11+03:00
    +Update: 27.01.2026 I have since replaced the 1TB drives with 4TB drives for more storage capacity. The upgrade procedure was different for each node!

    -These are my personal book notes from Gelong Thubten's "A Monk's Guide to Happiness: Meditation in the 21st century." They are for my own reference, but I hope they might be useful to you as well.
    +

    Upgrading f1 (simpler approach)



    -

    Table of Contents


    +Since f1 is the replication sink, the upgrade was straightforward:


    -

    Understanding Happiness


    +

    Upgrading f0 (using ZFS resilvering)



    -
      -
    • Happiness is a skill we can train.
    • -
    • Happiness is not about accomplishing goals, as that would be in the future.
    • -
    • Feel free now. No urge about past and future.
    • -
    • We can learn to produce our own happiness independently of physical needs. When we walk in a park, how do we feel? We can train to reproduce that feeling independently.
    • -

    -

    The Role of Meditation


    +For f0, which is the primary storage node, I used ZFS resilvering to avoid data loss:

      -
    • Meditation is not about clearing your mind. A busy mind has nothing to do with interfering with your meditation.
    • -
    • Our problem is that we need to detect that awareness. Meditation connects us with awareness. Awareness is freedom.
    • -
    • We can let the mind be and don't care about the thoughts. It will have benefits for your life. It will protect you from all kinds of stress.
    • -
    • Better meditate with open eyes so you don't associate it with the dark. You will also be able to be in a meditation state of mind outside of the meditation session.
    • -
    • Have a baseline for time to build up discipline.
    • -
    • We don't need to do anything about stress, just take a step back.
    • +
    • 1. Plugged the new 4TB drive into an external USB SSD drive reader
    • +
    • 2. Attached the 4TB drive to the zdata pool for resilvering
    • +
    • 3. Once resilvering completed, detached the 1TB drive from the zdata pool
    • +
    • 4. Shutdown f0 and physically replaced the internal drive
    • +
    • 5. Booted with the new drive in place
    • +
    • 6. Expanded the pool to use the full 4TB capacity:

    -

    Managing Thoughts and Emotions


    + +
    paul@f0:~ % doas zpool online -e /dev/ada1
    +

      -
    • Our flow of emotions is really just habits. That can be changed through training, e.g., meditation training.
    • -
    • A part of the mind recognises that we are sad or angry. That part is not sad or angry by itself, obviously. So we can escape to that part of the mind, be the observer, and not draw in the constant flow of emotions and thoughts.
    • -
    • Let the front and back doors of your house open, and let the thoughts come in and leave. Just don't serve them tea. This once said, a great Zen master.
    • -
    • Thoughts are friends and not enemies.
    • -
    • Thoughts help the meditation as they make us notice that we wandered off, and therefore, we strengthen the reflection.
    • +
    • 7. Reloaded the encryption keys as described in this blog post
    • +
    • 8. Set the mount point again for the encrypted dataset

    -

    Practice and Discipline


    +This was a one-time effort on both nodes - after a reboot, everything was remembered and came up normally. Here are the updated outputs:
    +
    + +
    paul@f0:~ % doas zpool list
    +NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    +zdata  3.63T   677G  2.97T        -         -     3%    18%  1.00x    ONLINE  -
    +zroot   472G  68.4G   404G        -         -    13%    14%  1.00x    ONLINE  -
    +
    +paul@f0:~ % doas camcontrol devlist
    +<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    +<SD Ultra 3D 4TB 530500WD>         at scbus1 target 0 lun 0 (pass1,ada1)
    +<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    +
    +
    +We're still using different SSD models on f1 (WD Blue SA510 4TB) to avoid simultaneous failures:
    +
    + +
    paul@f1:~ % doas camcontrol devlist
    +<512GB SSD D910R170>               at scbus0 target 0 lun 0 (pass0,ada0)
    +<WD Blue SA510 2.5 4TB 530500WD>   at scbus1 target 0 lun 0 (pass1,ada1)
    +<Generic Flash Disk 8.07>          at scbus2 target 0 lun 0 (da0,pass2)
    +
    +
    +

    Conclusion


    +
    +We've built a robust, encrypted storage system for our FreeBSD-based Kubernetes cluster that provides:

      -
    • The importance of habits to practice mindfulness. Bring mindfulness into the daily practice.
    • -
    • Integrating short moments of mindfulness during the day is the fast track to happiness. Start off with small tasks, e.g. while washing your hands.
    • -
    • Have many small doses of mindfulness and don't prolong as otherwise, your mind will revolt.
    • -
    • Have a small moment of mindfulness when you wake up and go to sleep.
    • -
    • Practice staying fully present in an uncomfortable situation and without judgement.
    • -
    • Don't become two persons who never meet: the meditator and the not meditator. So integrate mindfulness during the day too.
    • +
    • High Availability: CARP ensures the storage VIP moves automatically during failures
    • +
    • Data Protection: ZFS encryption protects data at rest, stunnel protects data in transit
    • +
    • Continuous Replication: 1-minute RPO for the data, automated via zrepl
    • +
    • Secure Access: Client certificate authentication prevents unauthorised access

    -

    Perspectives on Relationships and Interactions


    +Some key lessons learned are:

      -
    • Who is the opponent? The other person. The things he said or our reactions to things? Forgiveness is a high form of compassion.
    • -
    • Understand the suffering of the person who "hurt" us. Where is the aggressor really coming from?
    • -
    • People who are stressed or unhappy do and say things they wouldn't have said have done otherwise. Acting under anger is like being influenced by alcohol.
    • -
    • People don't have a masterplan to destroy others, even if it seems so. They are under strong bad influence by themselves. Something terrible happened to them. Revenge makes no sense.
    • -
    • Be grateful for people "trying" to hurt you as they help you to practice your path.
    • +
    • Stunnel vs Native NFS/TLS: While native encryption would be ideal, stunnel provides better cross-platform compatibility
    • +
    • Manual vs Automatic Failover: For storage systems, controlled failover often prevents more problems than it causes
    • +
    • Client Compatibility: Different NFS implementations behave differently - test thoroughly

    -

    Reflective Questions


    +

    Future Storage Explorations


    +
    +While zrepl provides excellent snapshot-based replication for disaster recovery, there are other storage technologies worth exploring for the f3s project:
    +
    +

    MinIO for S3-Compatible Object Storage


    +
    +MinIO is a high-performance, S3-compatible object storage system that could complement our ZFS-based storage. Some potential use cases:

      -
    • Why do I do all the things I do? What do I try to achieve?
    • -
    • What am I doing about that?
    • -
    • Is it working?
    • -
    • What are the real causes of happiness and suffering?
    • -
    • What about meditation? How does that address the situation?
    • +
    • S3 API compatibility: Many modern applications expect S3-style object storage APIs. MinIO could provide this interface while using our ZFS storage as the backend.
    • +
    • Multi-site replication: MinIO supports active-active replication across multiple sites, which could work well with our f0/f1/f2 node setup.
    • +
    • Kubernetes native: MinIO has excellent Kubernetes integration with operators and CSI drivers, making it ideal for the f3s k3s environment.

    -

    Miscellaneous Guidelines


    +

    MooseFS for Distributed High Availability


    +
    +MooseFS is a fault-tolerant, distributed file system that could provide proper high-availability storage:

      -
    • Posture is important as the mind and body are connected.
    • -
    • Don't use music, so you don't rely on music to change your state of mind. Similar regular guided meditation. Guided meditation is good for learning a technique, but you should not rely on another voice.
    • -
    • You are not trying to relax. Relaxing and trying are two different things.
    • -
    • When you love everything, even the bad things happening to you, then you are invincible.
    • -
    • Happiness is all in your mind. As if you flip a switch there.
    • -
    • Digging for answers will never end. It will always cause more material to dig.
    • +
    • True HA: Unlike our current setup, which requires manual failover, MooseFS provides automatic failover with no single point of failure.
    • +
    • POSIX compliance: Applications can use MooseFS like any regular filesystem, no code changes needed.
    • +
    • Flexible redundancy: Configure different replication levels per directory or file, optimising storage efficiency.
    • +
    • FreeBSD support: MooseFS has native FreeBSD support, making it a natural fit for the f3s project.

    -If happiness is a mental issue. Clearly, the best time is spent training your mind in your free time and don't always be busy with other things. E.g. meditation, or think about the benefits of meditation. All that we do in our free time is search for happiness. Are the things we do actually working? There is always something around the corner...
    +Both technologies could run on top of our encrypted ZFS volumes, combining ZFS's data integrity and encryption features with distributed storage capabilities. This would be particularly interesting for workloads that need either S3-compatible APIs (MinIO) or transparent distributed POSIX storage (MooseFS). What about Ceph and GlusterFS? Unfortunately, there doesn't seem to be great native FreeBSD support for them. However, other alternatives also appear suitable for my use case.

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +Read the next post of this series:

    -Other book notes of mine are:
    +f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

    -2025-11-02 'The Courage To Be Disliked' book notes
    -2025-06-07 'A Monk's Guide to Happiness' book notes (You are currently reading this)
    -2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    -2024-10-24 'Staff Engineer' book notes
    -2024-07-07 'The Stoic Challenge' book notes
    -2024-05-01 'Slow Productivity' book notes
    -2023-11-11 'Mind Management' book notes
    -2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    -2023-05-06 'The Obstacle is the Way' book notes
    -2023-04-01 'Never split the difference' book notes
    -2023-03-16 'The Pragmatic Programmer' book notes
    +Other *BSD-related posts:
    +
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +
    +E-Mail your comments to paul@nospam.buetow.org

    Back to the main site
    - f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network - - gemini://foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.gmi - 2025-05-11T11:35:57+03:00, last updated Thu 15 Jan 19:30:46 EET 2026 + Posts from January to June 2025 + + gemini://foo.zone/gemfeed/2025-07-01-posts-from-january-to-june-2025.gmi + 2025-07-01T22:39:29+03:00 Paul Buetow aka snonux paul@dev.buetow.org - This is the fifth blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines. + These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.
    -

    f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network


    -
    -Published at 2025-05-11T11:35:57+03:00, last updated Thu 15 Jan 19:30:46 EET 2026
    -
    -This is the fifth blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.
    -
    -I will post a new entry every month or so (there are too many other side projects for more frequent updates — I bet you can understand).
    -
    -This post has been updated to include two roaming clients (earth - Fedora laptop, pixel7pro - Android phone) that connect to the mesh via the internet gateways. The updated content is integrated throughout the post.
    -
    -These are all the posts so far:
    -
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network (You are currently reading this)
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -
    -f3s logo
    -
    -ChatGPT generated logo.
    -
    -Let's begin...
    -
    -

    Table of Contents


    -
    -
    -

    Introduction


    +

    Posts from January to June 2025



    -By default, traffic within my home LAN, including traffic inside a k3s cluster, is not encrypted. While it resides in the "secure" home LAN, adopting a zero-trust policy means encryption is still preferable to ensure confidentiality and security. So we decide to secure all the traffic of all f3s participating hosts by building a mesh network:
    +Published at 2025-07-01T22:39:29+03:00

    -WireGuard mesh network topology
    +These are my social media posts from the last six months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.

    -The mesh network consists of eight infrastructure hosts and two roaming clients:
    +These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.

    -Infrastructure hosts (full mesh):
    +My about page
    +https://codeberg.org/snonux/gos

    -
      -
    • f0, f1, and f2 are the FreeBSD base hosts in my home LAN
    • -
    • r0, r1, and r2 are the Rocky Linux Bhyve VMs running on the FreeBSD hosts
    • -
    • blowfish and fishfinger are two OpenBSD systems running on the internet (as mentioned in the first blog of this series—these systems are already built; in fact, this very blog is served by those OpenBSD systems)
    • -

    -oaming clients (gateway-only connections):
    +

    Table of Contents




    -As we can see from the diagram, the eight infrastructure hosts form a true full-mesh network, where every host has a VPN tunnel to every other host. The benefit is that we do not need to route traffic through intermediate hosts (significantly simplifying the routing configuration). However, the downside is that there is some overhead in configuring and managing all the tunnels. The roaming clients take a simpler approach—they only connect to the two internet-facing gateways (blowfish and fishfinger), which is sufficient for remote access and internet connectivity.
    +

    January 2025



    -For simplicity, we also establish VPN tunnels between f0 <-> r0, f1 <-> r1, and f2 <-> r2. Technically, this wouldn't be strictly required since the VMs rN are running on the hosts fN, and no network traffic is leaving the box. However, it simplifies the configuration as we don't have to account for exceptions, and we are going to automate the mesh network configuration anyway (read on).
    +

    I am currently binge-listening to the Google ...



    -

    Expected traffic flow


    +I am currently binge-listening to the Google #SRE ProdCast. It's really great to learn about the stories of individual SREs and their journeys. It is not just about SREs at Google; there are also external guests.

    -The traffic is expected to flow between the host groups through the mesh network as follows:
    +sre.google/prodcast/

    -nfrastructure mesh traffic:
    +

    Recently, there was a >5000 LOC #bash ...



    -
      -
    • fN <-> rN: The traffic between the FreeBSD hosts and the Rocky Linux VMs will be routed through the VPN tunnels for persistent storage. In a later post in this series, we will set up an NFS server on the fN hosts.
    • -
    • fN <-> blowfish,fishfinger: The traffic between the FreeBSD hosts and the OpenBSD host blowfish,fishfinger will be routed through the VPN tunnels for management. We may want to log in via the internet to set it up remotely. The VPN tunnel will also be used for monitoring purposes.
    • -
    • rN <-> blowfish,fishfinger: The traffic between the Rocky Linux VMs and the OpenBSD host blowfish,fishfinger will be routed through the VPN tunnels for usage traffic. Since k3s will be running on the rN hosts, the OpenBSD servers will route the traffic through relayd to the services running in Kubernetes.
    • -
    • fN <-> fM: The traffic between the FreeBSD hosts may be later used for data replication for the NFS storage.
    • -
    • rN <-> rM: The traffic between the Rocky Linux VMs will later be used by the k3s cluster itself, as every rN will be a Kubernetes worker node.
    • -
    • blowfish <-> fishfinger: The traffic between the OpenBSD hosts isn't strictly required for this setup, but I set it up anyway for future use cases.
    • -

    -oaming client traffic:
    +Recently, there was a >5000 LOC #bash codebase at work that reported the progress of a migration, nobody understood it and it was wonky (sometimes it would not return the desired results). On top of that, the coding style was very bad as well (I could rant forever here). The engineer who wrote it left the company. I rewrote it in #Perl in about 300 LOC. Colleagues asked why not Python. Perl is the perfect choice here—it's even in its name: Practical Extraction and Report Language!

    -
      -
    • earth,pixel7pro <-> blowfish,fishfinger: The roaming clients connect exclusively to the two internet gateways. All traffic from these clients (0.0.0.0/0) is routed through the VPN, providing secure internet access and the ability to reach services running in the mesh (via the gateways). The gateways use NAT to allow roaming clients to access the internet using the gateway's public IP address. The roaming clients cannot be reached by the LAN hosts—they are client-only and initiate all connections.
    • -

    -We won't cover all the details in this blog post, as we only focus on setting up the Mesh network in this blog post. Subsequent posts in this series will cover the other details.
    +

    Ghostty is a terminal emulator that was ...



    -

    Deciding on WireGuard


    +Ghostty is a terminal emulator that was recently released publicly as open-source. I love that it works natively on both Linux and macOS; it looks great (font rendering) and is fast and customizable via a config file (which I manage with a config mng system). Ghostty is a passion project written in Zig, the author loved the community so much while working on it that he donated $300k to the Zig Foundation. #terminal #emulator

    -I have decided to use WireGuard as the VPN technology for this purpose.
    +ghostty.org

    -WireGuard is a lightweight, modern, and secure VPN protocol designed for simplicity, speed, and strong cryptography. It is an excellent choice due to its minimal codebase, ease of configuration, high performance, and robust security, utilizing state-of-the-art encryption standards. WireGuard is supported on various operating systems, and its implementations are compatible with each other. Therefore, establishing WireGuard VPN tunnels between FreeBSD, Linux, and OpenBSD is seamless. This cross-platform availability makes it suitable for setups like the one described in this blog series.
    +

    Go is not an easy programming language. Don't ...



    -We could have used Tailscale for an easy to set up and manage the WireGuard network, but the benefits of creating our own mesh network are:
    +Go is not an easy programming language. Don't confuse easy with simple syntax. I'd agree to this. With the recent addition of Generics to the language I also feel that even the syntax stops being simple.. Also, simplicity is complex (especially under the hood how the language works - there are many mechanics you need to know if you really want to master the language). #golang

    -
      -
    • Learning about WireGuard configuration details
    • -
    • Have full control over the setup
    • -
    • Don't rely on an external provider like Tailscale (even if some of the components are open-source)
    • -
    • Have even more fun along the way
    • -
    • WireGuard is easy to configure on my target operating systems and, therefore, easier to maintain in the long run.
    • -
    • There are no official Tailscale packages available for OpenBSD and FreeBSD. However, getting Tailscale running on these systems is still possible, though some tinkering would be required. Instead, we use that tinkering time to set up WireGuard tunnels ourselves.
    • -

    -https://en.wikipedia.org/wiki/WireGuard
    -https://www.wireguard.com/
    -https://tailscale.com/
    +www.arp242.net/go-easy.html

    -WireGuard Logo
    +

    How will AI change software engineering (or has ...



    -

    Base configuration


    +How will AI change software engineering (or has it already)? The bottom line is that less experienced engineers may have problems (accepting incomplete or incorrect programs, only reaching 70 percent solutions), while experienced engineers can leverage AI to boost their performance as they know how to fix the remaining 30 percent of the generated code. #ai #engineering #software

    -In the following, we prepare the base configuration for the WireGuard mesh network. We will use a similar configuration on all participating hosts, with the exception of the host IP addresses and the private keys.
    +newsletter.pragmaticengineer.com/p/how-ai-will-change-software-engineering

    -

    FreeBSD


    +

    Eliminating toil - Toil is not always a bad ...



    -On the FreeBSD hosts f0, f1 and f2, similar as last time, first, we bring the system up to date:
    +Eliminating toil - Toil is not always a bad thing - some even enjoy toil - it is calming in small amounts - but it becomes toxic in large amounts - #SRE

    - -
    paul@f0:~ % doas freebsd-update fetch
    -paul@f0:~ % doas freebsd-update install
    -paul@f0:~ % doas shutdown -r now
    -..
    -..
    -paul@f0:~ % doas pkg update
    -paul@f0:~ % doas pkg upgrade
    -paul@f0:~ % reboot
    -
    +sre.google/sre-book/eliminating-toil/

    -Next, we install wireguard-tools and configure the WireGuard service:
    +

    Fun read. How about using the character ...



    - -
    paul@f0:~ % doas pkg install wireguard-tools
    -paul@f0:~ % doas sysrc wireguard_interfaces=wg0
    -wireguard_interfaces:  -> wg0
    -paul@f0:~ % doas sysrc wireguard_enable=YES
    -wireguard_enable:  -> YES
    -paul@f0:~ % doas mkdir -p /usr/local/etc/wireguard
    -paul@f0:~ % doas touch /usr/local/etc/wireguard/wg0.conf
    -paul@f0:~ % doas service wireguard start
    -paul@f0:~ % doas wg show
    -interface: wg0
    -  public key: L+V9o0fNYkMVKNqsX7spBzD/9oSvxM/C7ZCZX1jLO3Q=
    -  private key: (hidden)
    -  listening port: 20246
    -
    +Fun read. How about using the character sequence :-) as a statement separator in a programming language?

    -We now have the WireGuard up and running, but it is not yet in any functional configuration. We will come back to that later.
    +ntietz.com/blog/researching-why-we-use-semicolons-as-statement-terminators/

    -Next, we add all the participating WireGuard IPs to the hosts file. This is only convenience, so we don't have to manage an external DNS server for this:
    +

    Thats unexpected, you cant remove a NaN key ...



    - -
    paul@f0:~ % cat <<END | doas tee -a /etc/hosts
    -
    -192.168.1.120 r0 r0.lan r0.lan.buetow.org
    -192.168.1.121 r1 r1.lan r1.lan.buetow.org
    -192.168.1.122 r2 r2.lan r2.lan.buetow.org
    -
    -192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    -192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    -192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    -192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    -192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -END
    -
    +Thats unexpected, you cant remove a NaN key from a map without clearing it! #golang via @wallabagapp

    -As you can see, 192.168.1.0/24 is the network used in my LAN (with the fN and rN hosts) and 192.168.2.0/24 is the network used for the WireGuard mesh network. The wg0 interface will be used for all WireGuard traffic.
    +unexpected-go.com/you-cant-remove-a-nan-key-from-a-map-without-clearing-it.html

    -

    Rocky Linux


    +

    Nice refresher for #shell #bash #zsh ...



    -We bring the Rocky Linux VMs up to date as well with the following:
    +Nice refresher for #shell #bash #zsh redirection rules

    - -
    [root@r0 ~] dnf update -y
    -[root@r0 ~] reboot
    -
    +rednafi.com/misc/shell_redirection/

    -Next, we prepare WireGuard on them. Same as on the FreeBSD hosts, we will only prepare WireGuard without any useful configuration yet:
    -
    - -
    [root@r0 ~] dnf install -y wireguard-tools
    -[root@r0 ~] mkdir -p /etc/wireguard
    -[root@r0 ~] touch /etc/wireguard/wg0.conf
    -[root@r0 ~] systemctl enable wg-quick@wg0.service
    -[root@r0 ~] systemctl start wg-quick@wg0.service
    -[root@r0 ~] systemctl disable firewalld
    -
    +

    I think discussing action items in incident ...



    -We also update the hosts file accordingly:
    +I think discussing action items in incident reviews is important. At least the obvious should be captured and noted down. It does not mean that the action items need to be fully refined in the review meeting; that would be out of scope, in my opinion.

    - -
    [root@r0 ~] cat <<END >>/etc/hosts
    -
    -192.168.1.130 f0 f0.lan f0.lan.buetow.org
    -192.168.1.131 f1 f1.lan f1.lan.buetow.org
    -192.168.1.132 f2 f2.lan f2.lan.buetow.org
    -
    -192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    -192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    -192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    -192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    -192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -END
    -
    +surfingcomplexity.blog/2024/09/28/why-..-..-action-items-during-incident-reviews/

    -Unfortunately, the SELinux policy on Rocky Linux blocks WireGuard's operation. By making the wireguard_t domain permissive using semanage permissive -a wireguard_t, SELinux will no longer enforce restrictions for WireGuard, allowing it to work as intended:
    +

    At first, functional options add a bit of ...



    - -
    [root@r0 ~] dnf install -y policycoreutils-python-utils
    -[root@r0 ~] semanage permissive -a wireguard_t
    -[root@r0 ~] reboot
    -
    +At first, functional options add a bit of boilerplate, but they turn out to be quite neat, especially when you have very long parameter lists that need to be made neat and tidy. #golang

    -https://github.com/angristan/wireguard-install/discussions/499
    +www.calhoun.io/using-functional-options-instead-of-method-chaining-in-go/

    -

    OpenBSD


    +

    In the "Working with an SRE Interview" I have ...



    -Other than the FreeBSD and Rocky Linux hosts involved, my OpenBSD hosts (blowfish and fishfinger, which are running at OpenBSD Amsterdam and Hetzner on the internet) have been running already for longer, so I can't provide you with the "from scratch" installation details here. In the following, we will only focus on the additional configuration needed to set up WireGuard:
    +In the "Working with an SRE Interview" I have been askd about what it's like working with an SRE! We'd covered much more in depth, but we decided not to make it too long in the final version! #sre #interview

    - -
    blowfish$ doas pkg_add wireguard-tools
    -blowfish$ doas mkdir /etc/wireguard
    -blowfish$ doas touch /etc/wireguard/wg0.conf
    -blowsish$ cat <<END | doas tee /etc/hostname.wg0
    -inet 192.168.2.110 255.255.255.0 NONE
    -up
    -!/usr/local/bin/wg setconf wg0 /etc/wireguard/wg0.conf
    -END
    -
    +foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.gmi (Gemini)
    +foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.html

    -Note that on blowfish, we configure 192.168.2.110 here in the hostname.wg, and on fishfinger, we configure 192.168.2.111. Those are the IP addresses of the WireGuard interfaces on those hosts.
    +

    Small introduction to the #Android ...



    -And here, we also update the hosts file accordingly:
    +Small introduction to the #Android distribution called #GrapheneOS For myself, I am using a Pixel 7 Pro, which comes with "only" 5 years of support (not yet 7 years like the Pixel 8 and 9 series). I also wrote about GrapheneOS here once:

    - -
    blowfish$ cat <<END | doas tee -a /etc/hosts
    -
    -192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    -192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    -192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    -192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    -192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -192.168.2.200 earth.wg0 earth.wg0.wan.buetow.org
    -192.168.2.201 pixel7pro.wg0 pixel7pro.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    -fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    -fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    -
    -fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    -fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    -fd42:beef:cafe:2::200 earth.wg0 earth.wg0.wan.buetow.org
    -fd42:beef:cafe:2::201 pixel7pro.wg0 pixel7pro.wg0.wan.buetow.org
    -END
    -
    +dataswamp.org/~solene/2025-01-12-intro-to-grapheneos.html
    +foo.zone/gemfeed/2023-01-23-why-grapheneos-rox.gmi (Gemini)
    +foo.zone/gemfeed/2023-01-23-why-grapheneos-rox.html

    -To enable roaming clients (like earth and pixel7pro) to access the internet through the VPN, we need to configure NAT on the OpenBSD gateways. This allows the roaming clients to use the gateway's public IP address for outbound traffic. We add the following to /etc/pf.conf on both blowfish and fishfinger:
    +

    Helix 2025.01 has been released. The completion ...



    - -
    # NAT for WireGuard clients to access internet
    -match out on vio0 from 192.168.2.0/24 to any nat-to (vio0)
    -
    -# Allow inbound traffic on WireGuard interface
    -pass in on wg0
    -
    -# Allow all UDP traffic on WireGuard port
    -pass in inet proto udp from any to any port 56709
    -
    +Helix 2025.01 has been released. The completion of path names and the snippet functionality will be particularly useful for me. Overall, it's a great release. The release notes cover only some highlights, but there are many more changes in this version so also have a look at the Changelog! #HelixEditor

    -The NAT rule translates outgoing traffic from the WireGuard network (192.168.2.0/24) to the gateway's public IP. The firewall rules permit WireGuard traffic on the wg0 interface and UDP port 56709. After updating /etc/pf.conf, reload the firewall:
    +helix-editor.com/news/release-25-01-highlights/

    - -
    blowfish$ doas pfctl -f /etc/pf.conf
    -
    +

    I found these are excellent examples of how ...



    -

    WireGuard configuration


    +I found these are excellent examples of how #OpenBSD's #relayd can be used.

    -So far, we have only started WireGuard on all participating hosts without any useful configuration. This means that no VPN tunnel has been established yet between any of the hosts.
    +www.tumfatig.net/2023/using-openbsd-relayd8-as-an-application-layer-gateway/

    -

    Example wg0.conf


    +

    LLMs for Ops? Summaries of logs, probabilities ...



    -Generally speaking, a wg0.conf looks like this (example from f0 host):
    +LLMs for Ops? Summaries of logs, probabilities about correctness, auto-generating Ansible, some uses cases are there. Wouldn't trust it fully, though.

    -
    -[Interface]
    -# f0.wg0.wan.buetow.org
    -Address = 192.168.2.130
    -PrivateKey = **************************
    -ListenPort = 56709
    -
    -[Peer]
    -# f1.lan.buetow.org as f1.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.131/32
    -Endpoint = 192.168.1.131:56709
    -# No KeepAlive configured
    -
    -[Peer]
    -# f2.lan.buetow.org as f2.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.132/32
    -Endpoint = 192.168.1.132:56709
    -# No KeepAlive configured
    -
    -[Peer]
    -# r0.lan.buetow.org as r0.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.120/32
    -Endpoint = 192.168.1.120:56709
    -# No KeepAlive configured
    -
    -[Peer]
    -# r1.lan.buetow.org as r1.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.121/32
    -Endpoint = 192.168.1.121:56709
    -# No KeepAlive configured
    -
    -[Peer]
    -# r2.lan.buetow.org as r2.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.122/32
    -Endpoint = 192.168.1.122:56709
    -# No KeepAlive configured
    -
    -[Peer]
    -# blowfish.buetow.org as blowfish.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.110/32
    -Endpoint = 23.88.35.144:56709
    -PersistentKeepalive = 25
    -
    -[Peer]
    -# fishfinger.buetow.org as fishfinger.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 192.168.2.111/32
    -Endpoint = 46.23.94.99:56709
    -PersistentKeepalive = 25
    -
    +youtu.be/WodaffxVq-E?si=noY0egrfl5izCSQI

    -For roaming clients like pixel7pro (Android phone) or earth (Fedora laptop), the configuration looks different because they route all traffic through the VPN and only connect to the internet gateways:
    +

    Enjoying an APC Power-UPS BX750MI in my ...



    -
    -[Interface]
    -# pixel7pro.wg0.wan.buetow.org
    -Address = 192.168.2.201
    -PrivateKey = **************************
    -ListenPort = 56709
    -DNS = 1.1.1.1, 8.8.8.8
    -
    -[Peer]
    -# blowfish.buetow.org as blowfish.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 0.0.0.0/0, ::/0
    -Endpoint = 23.88.35.144:56709
    -PersistentKeepalive = 25
    -
    -[Peer]
    -# fishfinger.buetow.org as fishfinger.wg0.wan.buetow.org
    -PublicKey = **************************
    -PresharedKey = **************************
    -AllowedIPs = 0.0.0.0/0, ::/0
    -Endpoint = 46.23.94.99:56709
    -PersistentKeepalive = 25
    -
    +Enjoying an APC Power-UPS BX750MI in my #homelab with #FreeBSD and apcupsd. I can easily use the UPS status to auto-shutdown a cluster of FreeBSD machines on a power cut. One FreeBSD machine acts as the apcupsd master, connected via USB to the APC, while the remaining machines read the status remotely via the apcupsd network port from the master. However, it won't work when the master is down. #APC #UPS

    -Note the key differences for roaming clients:
    -
      -
    • DNS is configured to use external DNS servers (Cloudflare and Google)
    • -
    • AllowedIPs = 0.0.0.0/0, ::/0 routes all traffic (IPv4 and IPv6) through the VPN
    • -
    • Only two peers are configured (the internet gateways), not the full mesh
    • -
    • PersistentKeepalive = 25 is used for both peers to maintain NAT traversal
    • -

    -Whereas there are two main sections. One is [Interface], which configures the current host (here: f0 or pixel7pro):
    +

    "Even in the projects where I'm the only ...



    -
      -
    • Address: Local virtual IP address on the WireGuard interface.
    • -
    • PrivateKey: Private key for this node.
    • -
    • ListenPort: Port on which this WireGuard interface listens for incoming connections.
    • -

    -And in the following, there is one [Peer] section for every peer node on the mesh network:
    +"Even in the projects where I'm the only person, there are at least three people involved: past me, present me, and future me." - Quote from #software #programming

    -
      -
    • PublicKey: The public key of the remote peer is used to authenticate their identity.
    • -
    • PresharedKey: An optional symmetric key is used to enhance security (used in addition to PublicKey).
    • -
    • AllowedIPs: IPs or subnets routed through this peer (traffic is allowed to/from these IPs).
    • -
    • Endpoint: The public IP:port combination of the remote peer for connection.
    • -
    • PersistentKeepalive: Keeps the tunnel alive by sending periodic packets; used for NAT traversal.
    • -

    -

    NAT traversal and keepalive


    +liw.fi/40/#index1h1

    -As all participating hosts, except for blowfish and fishfinger (which are on the internet), are behind a NAT gateway (my home router), we need to use PersistentKeepalive to establish and maintain the VPN tunnel from the LAN to the internet because:
    +

    Connecting an #UPS to my #FreeBSD cluster ...



    -By default, WireGuard tries to be as silent as possible when not being used; it is not a chatty protocol. For the most part, it only transmits data when a peer wishes to send packets. When it's not being asked to send packets, it stops sending packets until it is asked again. In the majority of configurations, this works well. However, when a peer is behind NAT or a firewall, it might wish to be able to receive incoming packets even when it is not sending any packets. Because NAT and stateful firewalls keep track of "connections", if a peer behind NAT or a firewall wishes to receive incoming packets, he must keep the NAT/firewall mapping valid, by periodically sending keepalive packets. This is called persistent keepalives. When this option is enabled, a keepalive packet is sent to the server endpoint once every interval seconds. A sensible interval that works with a wide variety of firewalls is 25 seconds. Setting it to 0 turns the feature off, which is the default, since most users will not need this, and it makes WireGuard slightly more chatty. This feature may be specified by adding the PersistentKeepalive = field to a peer in the configuration file, or setting persistent-keepalive at the command line. If you don't need this feature, don't enable it. But if you're behind NAT or a firewall and you want to receive incoming connections long after network traffic has gone silent, this option will keep the "connection" open in the eyes of NAT.
    +Connecting an #UPS to my #FreeBSD cluster in my #homelab, protecting it from power cuts!

    -That's why you see PersistentKeepAlive = 25 in the blowfish and fishfinger peer configurations. This means that every 25 seconds, a keep-alive signal is sent over the tunnel to maintain its connection. If the tunnel is not yet established, it will be created within 25 seconds latest.
    +foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi (Gemini)
    +foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.html

    -Without this, we might never have a VPN tunnel open, as the systems in the LAN may not actively attempt to contact blowfish and fishfinger on their own. In fact, the opposite would likely occur, with the traffic flowing inward instead of outward (this is beyond the scope of this blog post but will be covered in a later post in this series!).
    +

    So, the Co-founder and CTO of honeycomb.io and ...



    -

    Preshared key


    +So, the Co-founder and CTO of honeycomb.io and author of the book Observability Engineering always hated observability. And Distinguished Software Engineer and The Pragmatic Engineer host can't pronounce the word Observability. :-) No, jokes aside, I liked this podcast episode of The Pragmatic Engineer: Observability: the present and future, with Charity Majors #sre #observability

    -In a WireGuard configuration, the PSK (preshared key) is an optional additional layer of symmetric encryption used alongside the standard public key cryptography. It is a shared secret known to both peers that enhances security by requiring an attacker to compromise both the private keys and the PSK to decrypt communication. While optional, using a PSK is better as it strengthens the cryptographic security, mitigating risks of potential vulnerabilities in the key exchange process.
    +newsletter.pragmaticengineer.com/p/observability-the-present-and-future

    -So, because it's better, we are using it.
    +

    February 2025



    -

    Mesh network generator


    +

    I don't know about you, but at work, I usually ...



    -Manually generating wg0.conf files for every peer in a mesh network setup is cumbersome because each peer requires its own unique public/private key pair and a preshared key for each VPN tunnel (resulting in 29 preshared keys for 8 hosts). This complexity scales almost exponentially with the number of peers as the relationships between all peers must be explicitly defined, including their unique configurations such as AllowedIPs and Endpoint and optional settings like PersistentKeepalive. Automating the process ensures consistency, reduces human error, saves considerable time, and allows for centralized management of configuration files.
    +I don't know about you, but at work, I usually deal with complex setups involving thousands of servers and work in a complex hybrid microservices-based environment (cloud and on-prem), where homelabbing (as simple as described in my blog post) is really relaxing and recreative. So, I was homelabbing a bit again, securing my #FreeBSD cluster from power cuts. #UPS #recreative

    -Instead, a script can handle key generation, coordinate relationships, and generate all necessary configuration files simultaneously, making it scalable and far less error-prone.
    +foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi (Gemini)
    +foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.html

    -I have written a Ruby script wireguardmeshgenerator.rb to do this for our purposes:
    +

    Great proposal (got accepted by the Goteam) for ...



    -https://codeberg.org/snonux/wireguardmeshgenerator
    +Great proposal (got accepted by the Goteam) for safer file system open functions #golang

    -I use Fedora Linux as my main driver on my personal Laptop, so the script was developed and tested only on Fedora Linux. However, it should also work on other Linux and Unix-like systems.
    +github.com/golang/go/issues/67002

    -To set up the mesh generator on Fedora Linux, we run the following:
    +

    My Gemtexter has only 1320 LOC.... The Biggest ...



    - -
    > git clone https://codeberg.org/snonux/wireguardmeshgenerator
    -> cd ./wireguardmeshgenerator
    -> bundle install
    -> sudo dnf install -y wireguard-tools
    -
    +My Gemtexter has only 1320 LOC.... The Biggest Shell Programs in the World are huuuge... #shell #sh

    -This assumes that Ruby and the bundler gem are already installed. If not, refer to the docs of your distribution.
    +github.com/oils-for-unix/oils/wiki/The-Biggest-Shell-Programs-in-the-World

    -

    wireguardmeshgenerator.yaml


    +

    Against /tmp - He is making a point #unix ...



    -The file wireguardmeshgenerator.yaml configures the mesh generator script.
    +Against /tmp - He is making a point #unix #linux #bsd #filesystem via @wallabagapp

    -
    ----
    -hosts:
    -  f0:
    -    os: FreeBSD
    -    ssh:
    -      user: paul
    -      conf_dir: /usr/local/etc/wireguard
    -      sudo_cmd: doas
    -      reload_cmd: service wireguard reload
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.130'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.130'
    -      ipv6: 'fd42:beef:cafe:2::130'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  f1:
    -    os: FreeBSD
    -    ssh:
    -      user: paul
    -      conf_dir: /usr/local/etc/wireguard
    -      sudo_cmd: doas
    -      reload_cmd: service wireguard reload
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.131'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.131'
    -      ipv6: 'fd42:beef:cafe:2::131'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  f2:
    -    os: FreeBSD
    -    ssh:
    -      user: paul
    -      conf_dir: /usr/local/etc/wireguard
    -      sudo_cmd: doas
    -      reload_cmd: service wireguard reload
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.132'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.132'
    -      ipv6: 'fd42:beef:cafe:2::132'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  r0:
    -    os: Linux
    -    ssh:
    -      user: root
    -      conf_dir: /etc/wireguard
    -      sudo_cmd:
    -      reload_cmd: systemctl reload wg-quick@wg0.service
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.120'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.120'
    -      ipv6: 'fd42:beef:cafe:2::120'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  r1:
    -    os: Linux
    -    ssh:
    -      user: root
    -      conf_dir: /etc/wireguard
    -      sudo_cmd:
    -      reload_cmd: systemctl reload wg-quick@wg0.service
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.121'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.121'
    -      ipv6: 'fd42:beef:cafe:2::121'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  r2:
    -    os: Linux
    -    ssh:
    -      user: root
    -      conf_dir: /etc/wireguard
    -      sudo_cmd:
    -      reload_cmd: systemctl reload wg-quick@wg0.service
    -    lan:
    -      domain: 'lan.buetow.org'
    -      ip: '192.168.1.122'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.122'
    -      ipv6: 'fd42:beef:cafe:2::122'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  blowfish:
    -    os: OpenBSD
    -    ssh:
    -      user: rex
    -      port: 2
    -      conf_dir: /etc/wireguard
    -      sudo_cmd: doas
    -      reload_cmd: sh /etc/netstart wg0
    -    internet:
    -      domain: 'buetow.org'
    -      ip: '23.88.35.144'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.110'
    -      ipv6: 'fd42:beef:cafe:2::110'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  fishfinger:
    -    os: OpenBSD
    -    ssh:
    -      user: rex
    -      port: 2
    -      conf_dir: /etc/wireguard
    -      sudo_cmd: doas
    -      reload_cmd: sh /etc/netstart wg0
    -    internet:
    -      domain: 'buetow.org'
    -      ip: '46.23.94.99'
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.111'
    -      ipv6: 'fd42:beef:cafe:2::111'
    -    exclude_peers:
    -      - earth
    -      - pixel7pro
    -  earth:
    -    os: Linux
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.200'
    -      ipv6: 'fd42:beef:cafe:2::200'
    -    exclude_peers:
    -      - f0
    -      - f1
    -      - f2
    -      - r0
    -      - r1
    -      - r2
    -      - pixel7pro
    -  pixel7pro:
    -    os: Android
    -    wg0:
    -      domain: 'wg0.wan.buetow.org'
    -      ip: '192.168.2.201'
    -      ipv6: 'fd42:beef:cafe:2::201'
    -    exclude_peers:
    -      - f0
    -      - f1
    -      - f2
    -      - r0
    -      - r1
    -      - r2
    -      - earth
    -
    +dotat.at/@/2024-10-22-tmp.html
    +
    +

    Random Weird Things Part 2: #blog ...


    +
    +Random Weird Things Part 2: #blog #computing
    +
    +foo.zone/gemfeed/2025-02-08-random-weird-things-ii.gmi (Gemini)
    +foo.zone/gemfeed/2025-02-08-random-weird-things-ii.html
    +
    +

    As a former #Pebble user and fan, thats ...


    +
    +As a former #Pebble user and fan, thats aweaome news. PebbleOS is now open source and there will aoon be a new watch. I don't know about you, but I will be the first getting one :-) #foss
    +
    +ericmigi.com/blog/why-were-bringing-pebble-back
    +
    +

    I think I am slowly getting the point of Cue. ...



    -The file specifies details such as SSH user settings, configuration directories, sudo or reload commands, and IP/domain assignments for both internal LAN-facing interfaces and WireGuard (wg0) interfaces. Each host is assigned specific roles, including internal participants and publicly accessible nodes with internet-facing IPs, enabling the creation of a fully connected mesh VPN.
    +I think I am slowly getting the point of Cue. For example, it can replace both a JSON file and a JSON Schema. Furthermore, you can convert it from and into different formats (Cue, JSON, YAML, Go data types, ...), and you can nicely embed this into a Go project as well. #cue #cuelang #golang #configuration

    -Roaming clients: Note the earth and pixel7pro entries—these are configured differently from the infrastructure hosts. They have no lan or internet sections, which signals to the generator that they are roaming clients. The exclude_peers configuration ensures they only connect to the internet gateways (blowfish and fishfinger) and are not reachable by LAN hosts. The generator automatically configures these clients with AllowedIPs = 0.0.0.0/0, ::/0 to route all traffic through the VPN, includes DNS configuration (1.1.1.1, 8.8.8.8), and enables PersistentKeepalive for NAT traversal.
    +cuelang.org

    -

    wireguardmeshgenerator.rb overview


    +

    Jonathan's reflection of 10 years of ...



    -The wireguardmeshgenerator.rb script consists of the following base classes:
    +Jonathan's reflection of 10 years of programming!

    -
      -
    • KeyTool: Manages WireGuard key generation and retrieval. It ensures the presence of public/private key pairs and preshared keys (PSKs). If keys are missing, it generates them using the wg tool. It provides methods to read the public/private keys and retrieve or generate a PSK for communication with a peer. The keys are stored in a temp directory on the system from where the generator is run.
    • -
    • PeerSnippet: A Struct representing the configuration for a single WireGuard peer in the mesh. Based on the provided attributes and configuration, it generates the peer's WireGuard configuration, including public key, PSK, allowed IPs, endpoint, and keepalive settings.
    • -
    • WireguardConfig: This function generates WireGuard configuration files for the specified host in the mesh network. It includes the [Interface] section for the host itself and the [Peer] sections for all other peers. It can also clean up generated files and directories and create the required directory structure for storing configuration files locally on the system from which the script is run.
    • -
    • InstallConfig: Handles uploading, installing, and restarting the WireGuard service on remote hosts using SSH and SCP. It ensures the configuration file is uploaded to the remote machine, the necessary directories are present and correctly configured, and the WireGuard service reloads with the new configuration.
    • -

    -At the end (if you want to see the code for the stuff listed above, go to the Git repo and have a look), we glue it all together in this block:
    +jonathan-frere.com/posts/10-years-of-programming/

    - -
    begin
    -  options = { hosts: [] }
    -  OptionParser.new do |opts|
    -    opts.banner = 'Usage: wireguardmeshgenerator.rb [options]'
    -    opts.on('--generate', 'Generate Wireguard configs') do
    -      options[:generate] = true
    -    end
    -    opts.on('--install', 'Install Wireguard configs') do
    -      options[:install] = true
    -    end
    -    opts.on('--clean', 'Clean Wireguard configs') do
    -      options[:clean] = true
    -    end
    -    opts.on('--hosts=HOSTS', 'Comma separated hosts to configure') do |hosts|
    -      options[:hosts] = hosts.split(',')
    -    end
    -  end.parse!
    -
    -  conf = YAML.load_file('wireguardmeshgenerator.yaml').freeze
    -  conf['hosts'].keys.select { options[:hosts].empty? || options[:hosts].include?(_1) }
    -               .each do |host|
    -    # Generate Wireguard configuration for the host reload!
    -    WireguardConfig.new(host, conf['hosts']).generate! if options[:generate]
    -    # Install Wireguard configuration for the host.
    -    InstallConfig.new(host, conf['hosts']).upload!.install!.reload! if options[:install]
    -    # Clean Wireguard configuration for the host.
    -    WireguardConfig.new(host, conf['hosts']).clean! if options[:clean]
    -  end
    -rescue StandardError => e
    -  puts "Error: #{e.message}"
    -  puts e.backtrace.join("\n")
    -  exit 2
    -end
    -
    +

    Really enjoyed reading this. Easily digestible ...



    -And we also have a Rakefile:
    +Really enjoyed reading this. Easily digestible summary of what's new in Go 1.24. #golang

    - -
    task :generate do
    -  ruby 'wireguardmeshgenerator.rb', '--generate'
    -end
    -
    -task :clean do
    -  ruby 'wireguardmeshgenerator.rb', '--clean'
    -end
    -
    -task :install do
    -  ruby 'wireguardmeshgenerator.rb', '--install'
    -end
    -
    -task default: :generate
    -
    +antonz.org/go-1-24/

    +

    Some great advice from 40 years of experience ...



    -

    Invoking the mesh network generator


    +Some great advice from 40 years of experience as a software developer. #software #development

    -

    Generating the wg0.conf files and keys


    +liw.fi/40/#index1h1

    -To generate everything (the wg0.conf of all participating hosts, including all keys involved), we run the following:
    +

    I enjoyed this talk, some recipes I knew ...



    - -
    > rake generate
    -/usr/bin/ruby wireguardmeshgenerator.rb --generate
    -Generating dist/f0/etc/wireguard/wg0.conf
    -Generating dist/f1/etc/wireguard/wg0.conf
    -Generating dist/f2/etc/wireguard/wg0.conf
    -Generating dist/r0/etc/wireguard/wg0.conf
    -Generating dist/r1/etc/wireguard/wg0.conf
    -Generating dist/r2/etc/wireguard/wg0.conf
    -Generating dist/blowfish/etc/wireguard/wg0.conf
    -Generating dist/fishfinger/etc/wireguard/wg0.conf
    -Generating dist/earth/etc/wireguard/wg0.conf
    -Generating dist/pixel7pro/etc/wireguard/wg0.conf
    -
    +I enjoyed this talk, some recipes I knew already, others were new to me. The "line of sight" is my favourite, which I always tend to follow. I also liked the example where the speaker simplified a "complex" nested functions into two not-nested-if-statements. #golang

    -It generated all the wg0.conf files listed in the output, plus those keys:
    +www.youtube.com/watch?v=zdKHq9Xo4OY&list=WL&index=5

    - -
    > find keys/ -type f
    -keys/f0/priv.key
    -keys/f0/pub.key
    -keys/psk/f0_f1.key
    -keys/psk/f0_f2.key
    -keys/psk/f0_r0.key
    -keys/psk/f0_r1.key
    -keys/psk/f0_r2.key
    -keys/psk/blowfish_f0.key
    -keys/psk/f0_fishfinger.key
    -keys/psk/f1_f2.key
    -keys/psk/f1_r0.key
    -keys/psk/f1_r1.key
    -keys/psk/f1_r2.key
    -keys/psk/blowfish_f1.key
    -keys/psk/f1_fishfinger.key
    -keys/psk/f2_r0.key
    -keys/psk/f2_r1.key
    -keys/psk/f2_r2.key
    -keys/psk/blowfish_f2.key
    -keys/psk/f2_fishfinger.key
    -keys/psk/r0_r1.key
    -keys/psk/r0_r2.key
    -keys/psk/blowfish_r0.key
    -keys/psk/fishfinger_r0.key
    -keys/psk/r1_r2.key
    -keys/psk/blowfish_r1.key
    -keys/psk/fishfinger_r1.key
    -keys/psk/blowfish_r2.key
    -keys/psk/fishfinger_r2.key
    -keys/psk/blowfish_fishfinger.key
    -keys/psk/blowfish_earth.key
    -keys/psk/earth_fishfinger.key
    -keys/psk/blowfish_pixel7pro.key
    -keys/psk/fishfinger_pixel7pro.key
    -keys/f1/priv.key
    -keys/f1/pub.key
    -keys/f2/priv.key
    -keys/f2/pub.key
    -keys/r0/priv.key
    -keys/r0/pub.key
    -keys/r1/priv.key
    -keys/r1/pub.key
    -keys/r2/priv.key
    -keys/r2/pub.key
    -keys/blowfish/priv.key
    -keys/blowfish/pub.key
    -keys/fishfinger/priv.key
    -keys/fishfinger/pub.key
    -keys/earth/priv.key
    -keys/earth/pub.key
    -keys/pixel7pro/priv.key
    -keys/pixel7pro/pub.key
    -
    +

    A way of how to add the version info to the Go ...


    +
    +A way of how to add the version info to the Go binary. ... I personally just hardcode the version number in version.go and update it there manually for each release. But with Go 1.24, I will try embedding it! #golang
    +
    +jerrynsh.com/3-easy-ways-to-add-version-flag-in-go/
    +
    +

    In other words, using t.Parallel() for ...


    +
    +In other words, using t.Parallel() for lightweight unit tests will likely make them slower.... #golang
    +
    +threedots.tech/post/go-test-parallelism/
    +
    +

    Neat little blog post, showcasing various ...


    +
    +Neat little blog post, showcasing various methods unsed for generic programming before of the introduction of generics. Only reflection wasn't listed. #golang
    +
    +bitfieldconsulting.com/posts/generics
    +
    +

    The smallest thing in Go #golang ...


    +
    +The smallest thing in Go #golang
    +
    +bitfieldconsulting.com/posts/iota
    +
    +

    Fun with defer in #golang, I did't know, that ...


    +
    +Fun with defer in #golang, I did't know, that a defer object can either be heap or stack allocated. And there are some rules for inlining, too.
    +
    +victoriametrics.com/blog/defer-in-go/
    +
    +

    What I like about Go is that it is still ...


    +
    +What I like about Go is that it is still possible to understand what's going on under the hood, whereas in JVM-based languages (for example) or dynamic languages, there are too many optimizations and abstractions. However, you don't need to know too much about how it works under the hood in Go (like memory management in C). It's just the fact that you can—you have a choice. #golang
    +
    +blog.devtrovert.com/p/goroutine-scheduler-revealed-youll
    +
    +

    March 2025


    +
    +

    Television has somewhat transformed how I work ...


    +
    +Television has somewhat transformed how I work in the shell on a day-to-day basis. It is especially useful for me in navigating all the local Git repositories on my laptop. I have bound Ctrl+G in my shell for that now. #television #tv #tool #shell
    +
    +github.com/alexpasmantier/television
    +
    +

    Once in a while, I like to read a book about a ...


    +
    +Once in a while, I like to read a book about a programming language I have been using for a while to find new tricks or to refresh and sharpen my knowledge about it. I just finished reading "Programming Ruby 3.3," and I must say this is my favorite Ruby book now. What makes this one so special is that it is quite recent and covers all the new features. #ruby #programming #coding
    +
    +pragprog.com/titles/ruby5/programming-ruby-3-3-5th-edition/
    +
    +

    As you may have noticed, I like to share on ...


    +
    +As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that. Having said that, I love these tiny side projects. They are so relaxing to work on! #gos #golang #tool #programming #fun
    +
    +foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.gmi (Gemini)
    +foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.html
    +
    +

    Personally, I think AI (LLMs) are pretty ...


    +
    +Personally, I think AI (LLMs) are pretty useful. But there's really some Hype around that. However, AI is about to stay - its not all hype
    +
    +unixdigest.com/articles/i-passionately-hate-hype-especially-the-ai-hype.html
    +
    +

    Type aliases in #golang, soon also work with ...


    +
    +Type aliases in #golang, soon also work with generics. It's an interesting feature, useful for refactorings and simplifications
    +
    +go.dev/blog/alias-names
    +
    +

    #Perl, my "first love" of programming ...


    +
    +#Perl, my "first love" of programming languages. Still there, still use it here and then (but not as my primary language at the moment). And others do so as well, apparently. Which makes me happy! :-)
    +
    +dev.to/fa5tworm/why-perl-remains-indis..-..e-of-modern-programming-languages-2io0
    +
    +

    I guess there are valid reasons for phttpdget, ...


    +
    +I guess there are valid reasons for phttpdget, which I also don't know about? Maybe complexity and/or licensing of other tools. #FreeBSD
    +
    +l33t.codes/2024/12/05/Updating-FreeBSD-and-Re-Inventing-the-Wheel/
    +
    +

    This is one of the reasons why I like ...


    +
    +This is one of the reasons why I like terminal-based applications so much—they are usually more lightweight than GUI-based ones (and also more flexible).
    +
    +www.arp242.net/stupid-light.html
    +
    +

    Advanced Concurrency Patterns with #Golang ...


    +
    +Advanced Concurrency Patterns with #Golang
    +
    +blogtitle.github.io/go-advanced-concurrency-patterns-part-1/
    +
    +

    #SQLite was designed as an #TCL extension. ...



    -Those keys are embedded in the resulting wg0.conf, so later, we only need to install the wg0.conf files and not all the keys individually.
    +#SQLite was designed as an #TCL extension. There are ~trillion SQLite databases in active use. SQLite heavily relies on #TCL: C code generation via mksqlite3c.tcl, C code isn't edited directly by the SQLite developers, and for testing , and for doc generation). The devs use a custom editor written in Tcl/Tk called "e" to edit the source! There's a custom versioning system Fossil, a custom chat-room written in Tcl/Tk!

    -

    Installing the wg0.conf files


    +www.tcl-lang.org/community/tcl2017/assets/talk93/Paper.html

    -Uploading the wg0.conf files to the participating hosts and reloading WireGuard on them is then just a matter of executing (this expects, that all participating hosts are up and running):
    +

    Git provides automatic rendering of Markdown ...



    - -
    > rake install
    -/usr/bin/ruby wireguardmeshgenerator.rb --install
    -Uploading dist/f0/etc/wireguard/wg0.conf to f0.lan.buetow.org:.
    -Installing Wireguard config on f0
    -Uploading cmd.sh to f0.lan.buetow.org:.
    -+ [ ! -d /usr/local/etc/wireguard ]
    -+ doas chmod 700 /usr/local/etc/wireguard
    -+ doas mv -v wg0.conf /usr/local/etc/wireguard
    -wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    -+ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on f0
    -Uploading cmd.sh to f0.lan.buetow.org:.
    -+ doas service wireguard reload
    -+ rm cmd.sh
    -Uploading dist/f1/etc/wireguard/wg0.conf to f1.lan.buetow.org:.
    -Installing Wireguard config on f1
    -Uploading cmd.sh to f1.lan.buetow.org:.
    -+ [ ! -d /usr/local/etc/wireguard ]
    -+ doas chmod 700 /usr/local/etc/wireguard
    -+ doas mv -v wg0.conf /usr/local/etc/wireguard
    -wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    -+ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on f1
    -Uploading cmd.sh to f1.lan.buetow.org:.
    -+ doas service wireguard reload
    -+ rm cmd.sh
    -Uploading dist/f2/etc/wireguard/wg0.conf to f2.lan.buetow.org:.
    -Installing Wireguard config on f2
    -Uploading cmd.sh to f2.lan.buetow.org:.
    -+ [ ! -d /usr/local/etc/wireguard ]
    -+ doas chmod 700 /usr/local/etc/wireguard
    -+ doas mv -v wg0.conf /usr/local/etc/wireguard
    -wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    -+ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on f2
    -Uploading cmd.sh to f2.lan.buetow.org:.
    -+ doas service wireguard reload
    -+ rm cmd.sh
    -Uploading dist/r0/etc/wireguard/wg0.conf to r0.lan.buetow.org:.
    -Installing Wireguard config on r0
    -Uploading cmd.sh to r0.lan.buetow.org:.
    -+ '[' '!' -d /etc/wireguard ']'
    -+ chmod 700 /etc/wireguard
    -+ mv -v wg0.conf /etc/wireguard
    -renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    -+ chmod 644 /etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on r0
    -Uploading cmd.sh to r0.lan.buetow.org:.
    -+ systemctl reload wg-quick@wg0.service
    -+ rm cmd.sh
    -Uploading dist/r1/etc/wireguard/wg0.conf to r1.lan.buetow.org:.
    -Installing Wireguard config on r1
    -Uploading cmd.sh to r1.lan.buetow.org:.
    -+ '[' '!' -d /etc/wireguard ']'
    -+ chmod 700 /etc/wireguard
    -+ mv -v wg0.conf /etc/wireguard
    -renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    -+ chmod 644 /etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on r1
    -Uploading cmd.sh to r1.lan.buetow.org:.
    -+ systemctl reload wg-quick@wg0.service
    -+ rm cmd.sh
    -Uploading dist/r2/etc/wireguard/wg0.conf to r2.lan.buetow.org:.
    -Installing Wireguard config on r2
    -Uploading cmd.sh to r2.lan.buetow.org:.
    -+ '[' '!' -d /etc/wireguard ']'
    -+ chmod 700 /etc/wireguard
    -+ mv -v wg0.conf /etc/wireguard
    -renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    -+ chmod 644 /etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on r2
    -Uploading cmd.sh to r2.lan.buetow.org:.
    -+ systemctl reload wg-quick@wg0.service
    -+ rm cmd.sh
    -Uploading dist/blowfish/etc/wireguard/wg0.conf to blowfish.buetow.org:.
    -Installing Wireguard config on blowfish
    -Uploading cmd.sh to blowfish.buetow.org:.
    -+ [ ! -d /etc/wireguard ]
    -+ doas chmod 700 /etc/wireguard
    -+ doas mv -v wg0.conf /etc/wireguard
    -wg0.conf -> /etc/wireguard/wg0.conf
    -+ doas chmod 644 /etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on blowfish
    -Uploading cmd.sh to blowfish.buetow.org:.
    -+ doas sh /etc/netstart wg0
    -+ rm cmd.sh
    -Uploading dist/fishfinger/etc/wireguard/wg0.conf to fishfinger.buetow.org:.
    -Installing Wireguard config on fishfinger
    -Uploading cmd.sh to fishfinger.buetow.org:.
    -+ [ ! -d /etc/wireguard ]
    -+ doas chmod 700 /etc/wireguard
    -+ doas mv -v wg0.conf /etc/wireguard
    -wg0.conf -> /etc/wireguard/wg0.conf
    -+ doas chmod 644 /etc/wireguard/wg0.conf
    -+ rm cmd.sh
    -Reloading Wireguard on fishfinger
    -Uploading cmd.sh to fishfinger.buetow.org:.
    -+ doas sh /etc/netstart wg0
    -+ rm cmd.sh
    -
    +Git provides automatic rendering of Markdown files, including README.md, in a repository’s root directory" .... so much junk now in LLM powered search engines.... #llm #ai

    -

    Re-generating mesh and installing the wg0.conf files again


    +

    These are some neat little Go tips. Linters ...



    -The mesh network can be re-generated and re-installed as follows:
    +These are some neat little Go tips. Linters already tell you when you silently omit a function return value, though. The slice filter without allocation trick is nice and simple. And I agree that switch statements are preferable to if-else statements. #golang

    - -
    > rake clean
    -> rake generate
    -> rake install
    -
    +blog.devtrovert.com/p/go-ep5-avoid-contextbackground-make

    -That would also delete and re-generate all the keys involved.
    +

    This is a great introductory blog post about ...



    -

    Setting up roaming clients


    +This is a great introductory blog post about the Helix modal editor. It's also been my first choice for over a year now. I am really looking forward to the Steel plugin system, though. I don't think I need a lot of plugins, but one or two would certainly be on my wish list. #HelixEditor #Helix

    -For roaming clients like earth (Fedora laptop) and pixel7pro (Android phone), the setup process differs slightly since these devices are not always accessible via SSH:
    +felix-knorr.net/posts/2025-03-16-helix-review.html

    -Android phone (pixel7pro):
    +

    Maps in Go under the hood #golang ...



    -The configuration is transferred to the phone using a QR code. The official WireGuard Android app (from Google Play Store) can scan and import the configuration:
    +Maps in Go under the hood #golang

    - -
    > sudo dnf install qrencode
    -> qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0.conf
    -
    +victoriametrics.com/blog/go-map/

    -Scan the QR code with the WireGuard app to import the configuration. The phone will then route all traffic through the VPN when the tunnel is activated. Note that WireGuard does not support automatic failover between the two gateways (blowfish and fishfinger)—if one fails, manual disconnection and reconnection is required to switch to the other.
    +

    I found that working on multiple side projects ...



    -Fedora laptop (earth):
    +I found that working on multiple side projects concurrently is better than concentrating on just one. This seems inefficient, but if you to lose motivation, you can temporarily switch to another one with full élan. Remember to stop starting and start finishing. This doesn't mean you should be working on 10+ side projects concurrently! Select your projects and commit to finishing them before starting the next thing. For example, my current limit of concurrent side projects is around five.

    -For the laptop, manually copy the generated configuration:
    +

    I have been in incidents. Understandably, ...



    - -
    > sudo cp dist/earth/etc/wireguard/wg0.conf /etc/wireguard/
    -> sudo chmod 600 /etc/wireguard/wg0.conf
    -> sudo systemctl start wg-quick@wg0.service  # Start manually
    -> sudo systemctl disable wg-quick@wg0.service  # Prevent auto-start
    -
    +I have been in incidents. Understandably, everyone wants the issue to be resolved as quickly and others want to know how long TTR will be. IMHO, providing no estimates at all is no solution either. So maybe give a rough estimate but clearly communicate that the estimate is rough and that X, Y, and Z can interfere, meaning there is a chance it will take longer to resolve the incident. Just my thought. What's yours?

    -The service is disabled from auto-start so the VPN is only active when manually started. This allows selective VPN usage based on need.
    +firehydrant.com/blog/hot-take-dont-provide-incident-resolution-estimates/

    -

    Adding IPv6 support to the mesh


    +

    I dont understand what it is. Certificates are ...



    -After setting up the IPv4-only mesh network, I decided to add dual-stack IPv6 support to enable more networking capabilities and prepare for the future. All 10 hosts (8 infrastructure + 2 roaming clients) now have both IPv4 and IPv6 addresses on their WireGuard interfaces.
    +I dont understand what it is. Certificates are so easy to monitor but still, expirations cause so many incidents. #sre

    -

    IPv6 addressing scheme


    +securityboulevard.com/2024/10/dont-let..-..time-prevent-outages-with-a-smart-clm/

    -We use ULA (Unique Local Address) private IPv6 space, analogous to RFC1918 private IPv4 addresses:
    +

    Don't just blindly trust LLMs. I recently ...



    -
      -
    • Prefix: fd42:beef:cafe::/48
    • -
    • Subnet: fd42:beef:cafe:2::/64 (wg0 interfaces)
    • -

    -All hosts receive dual-stack addresses:
    +Don't just blindly trust LLMs. I recently trusted an LLM, spent 1 hour debugging, and ultimately had to verify my assumption about fcntl behavior regarding inherited file descriptors in child processes manually with a C program, as the manual page wasn't clear to me. I could have done that immediately and I would have been done within 10 minutes. #productivity #loss #llm #programming #C

    -
    -fd42:beef:cafe:2::110/64  - blowfish.wg0 (OpenBSD gateway)
    -fd42:beef:cafe:2::111/64  - fishfinger.wg0 (OpenBSD gateway)
    -fd42:beef:cafe:2::120/64  - r0.wg0 (Rocky Linux VM)
    -fd42:beef:cafe:2::121/64  - r1.wg0 (Rocky Linux VM)
    -fd42:beef:cafe:2::122/64  - r2.wg0 (Rocky Linux VM)
    -fd42:beef:cafe:2::130/64  - f0.wg0 (FreeBSD host)
    -fd42:beef:cafe:2::131/64  - f1.wg0 (FreeBSD host)
    -fd42:beef:cafe:2::132/64  - f2.wg0 (FreeBSD host)
    -fd42:beef:cafe:2::200/64  - earth.wg0 (roaming laptop)
    -fd42:beef:cafe:2::201/64  - pixel7pro.wg0 (roaming phone)
    -
    +

    April 2025



    -

    Updating the mesh generator for IPv6


    +

    I knew about any being equivalent to ...


    +
    +I knew about any being equivalent to interface{} in #Golang, but wasn't aware, that it was introduced to Go because of the generics.
    +
    +

    Neat summary of new #Perl features per ...


    +
    +Neat summary of new #Perl features per release
    +
    +sheet.shiar.nl/perl
    +
    +

    errors.As() checks for the error type, whereas ...


    +
    +errors.As() checks for the error type, whereas errors.Is() checks for the exact error value. Interesting read about Errors in #golang - and there is also a cat meme in the middle of the blog post! And then, it continues with pointers to pointers to error values or how about a pointer to an empty interface?
    +
    +adrianlarion.com/golang-error-handling..-..-errors-unwrap-custom-errors-and-more/
    +
    +

    Good stuff: 10 years of functional options and ...


    +
    +Good stuff: 10 years of functional options and key lessons Learned along the way #golang
    +
    +www.bytesizego.com/blog/10-years-functional-options-golang
    +
    +

    I had some fun with #FreeBSD, #Bhyve and ...


    +
    +I had some fun with #FreeBSD, #Bhyve and #Rocky #Linux. Not just for fun, also for science and profit! #homelab #selfhosting #self-hosting
    +
    +foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.gmi (Gemini)
    +foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.html
    +
    +

    The moment your blog receives PRs for typo ...


    +
    +The moment your blog receives PRs for typo corrections, you notice, that people are actually reading and care about your stuff :-) #blog #personal #tech
    +
    +

    One thing not mentioned is that #OpenRsync's ...


    +
    +One thing not mentioned is that #OpenRsync's origin is the #OpenBSD project (at least as far as I am aware! Correct me if I am wrong :-) )! #openbsd #rsync #macos #openrsync
    +
    +derflounder.wordpress.com/2025/04/06/r..-..laced-with-openrsync-on-macos-sequoia/
    +
    +

    This is an interesting #Elixir pipes operator ...


    +
    +This is an interesting #Elixir pipes operator experiment in #Ruby. #Python has also been experimenting with such an operator. Raku (not mentioned in the linked article) already has the ==> sequence operator, of course (which can also can be used backwards <== - who has doubted? :-) ). #syntax #codegolf #fun #coding #RakuLang
    +
    +zverok.space/blog/2024-11-16-elixir-pipes.html
    +
    +

    The story of how my favorite #Golang book was ...


    +
    +The story of how my favorite #Golang book was written:
    +
    +www.thecoder.cafe/p/100-go-mistakes
    +
    +

    These are my personal book notes from Daniel ...


    +
    +These are my personal book notes from Daniel Pink's "When: The Scientific Secrets of Perfect Timing." The notes are for me (to improve happiness and productivity). You still need to read the whole book to get your own insights, but maybe the notes will be useful for you as well. #blog #book #booknotes #productivity
    +
    +foo.zone/gemfeed/2025-04-19-when-book-notes.gmi (Gemini)
    +foo.zone/gemfeed/2025-04-19-when-book-notes.html
    +
    +

    I certainly learned a lot reading this #llm ...


    +
    +I certainly learned a lot reading this #llm #coding #programming

    -The mesh generator required two modifications to support dual-stack configurations:
    +simonwillison.net/2025/Mar/11/using-llms-for-code/

    -**1. Address generation (address method)**
    +

    Writing indempotent #Bash scripts ...



    -The generator now outputs multiple Address directives when IPv6 is present:
    +Writing indempotent #Bash scripts

    - -
    def address
    -  return '# No Address = ... for OpenBSD here' if hosts[myself]['os'] == 'OpenBSD'
    -
    -  ipv4 = hosts[myself]['wg0']['ip']
    -  ipv6 = hosts[myself]['wg0']['ipv6']
    -
    -  # WireGuard supports multiple Address directives for dual-stack
    -  if ipv6
    -    "Address = #{ipv4}\nAddress = #{ipv6}/64"
    -  else
    -    "Address = #{ipv4}"
    -  end
    -end
    -
    +arslan.io/2019/07/03/how-to-write-idempotent-bash-scripts/

    -**2. AllowedIPs generation (peers method)**
    +

    Regarding #AI for code generation. You should ...



    -For mesh peers, both IPv4 and IPv6 addresses are included in AllowedIPs:
    +Regarding #AI for code generation. You should be at least a bit curious and exleriement a bit. You don't have to use it if you don't see fit purpose.

    - -
    if is_roaming
    -  allowed_ips = '0.0.0.0/0, ::/0'
    -else
    -  # For mesh peers, allow both IPv4 and IPv6 if present
    -  ipv4 = data['wg0']['ip']
    -  ipv6 = data['wg0']['ipv6']
    -  allowed_ips = ipv6 ? "#{ipv4}/32, #{ipv6}/128" : "#{ipv4}/32"
    -end
    -
    +registerspill.thorstenball.com/p/they-..-..email=true&r=2n9ive&triedRedirect=true

    -Roaming clients keep AllowedIPs = 0.0.0.0/0, ::/0 to route all traffic (IPv4 and IPv6) through the VPN.
    +

    I like the Rocky metaphor. And this post also ...



    -

    IPv6 NAT on OpenBSD gateways


    +I like the Rocky metaphor. And this post also reflects my thoughts on coding. #llm #ai #software

    -To allow roaming clients to access the internet via IPv6, we added NAT66 rules to the OpenBSD gateways' pf.conf:
    +cekrem.github.io/posts/coding-as-craft-going-back-to-the-old-gym/

    -
    -# NAT for WireGuard clients to access internet (IPv4)
    -match out on vio0 from 192.168.2.0/24 to any nat-to (vio0)
    -
    -# NAT66 for WireGuard clients to access internet (IPv6)
    -# Uses NPTv6 (Network Prefix Translation) to translate ULA to public IPv6
    -match out on vio0 inet6 from fd42:beef:cafe:2::/64 to any nat-to (vio0)
    -
    -# Allow all UDP traffic on WireGuard port (IPv4 and IPv6)
    -pass in inet proto udp from any to any port 56709
    -pass in inet6 proto udp from any to any port 56709
    -
    +

    May 2025



    -OpenBSD's PF firewall supports IPv6 NAT with the same syntax as IPv4, using NPTv6 (RFC 6296) to translate the ULA addresses to the gateway's public IPv6 address.
    +

    There's now also a #Fish shell edition of my ...



    -

    Manual OpenBSD interface configuration


    +There's now also a #Fish shell edition of my #tmux helper scripts: #fishshell

    -Since OpenBSD doesn't use the Address directive in WireGuard configs, IPv6 must be manually configured on the wg0 interfaces. On blowfish:
    +foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.gmi (Gemini)
    +foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.html

    - -
    rex@blowfish:~ $ doas vi /etc/hostname.wg0
    -
    +

    I loved this talk. It's about how you can ...



    -Add the IPv6 address (note the order - IPv6 must be configured before up):
    +I loved this talk. It's about how you can create your own #Linux #container without Docker, using less than 100 lines of shell code without Docker or Podman and co. - Why is this talk useful? If you understand how #containers work "under the hood," it becomes easier to make design decisions, write your own tools, or debug production systems. I also recommend his training courses, of which I visited one once.

    -
    -inet 192.168.2.110 255.255.255.0 NONE
    -inet6 fd42:beef:cafe:2::110 64
    -up
    -!/usr/local/bin/wg setconf wg0 /etc/wireguard/wg0.conf
    -
    +www.youtube.com/watch?v=4RUiVAlJE2w

    -Important: The IPv6 address must be specified before the up directive. This ensures the interface has both addresses configured before WireGuard peers are loaded.
    +

    Some unexpected #golang stuff, ppl say, that ...



    -Apply the configuration:
    +Some unexpected #golang stuff, ppl say, that Go is a simple language. IMHO the devil is in the details.

    - -
    rex@blowfish:~ $ doas sh /etc/netstart wg0
    -rex@blowfish:~ $ ifconfig wg0 | grep inet6
    -inet6 fd42:beef:cafe:2::110 prefixlen 64
    -
    +unexpected-go.com/

    -Repeat for fishfinger with address fd42:beef:cafe:2::111.
    +

    With the advent of AI and LLMs, I have observed ...



    -After reboot, the interface will automatically come up with both IPv4 and IPv6 addresses. WireGuard peers may take 30-60 seconds to establish handshakes after boot.
    +With the advent of AI and LLMs, I have observed that being able to type quickly has become even more important for engineers. Previously, fast typing wasn't as crucial when coding, as most of the time was spent thinking or navigating through the code. However, with LLMs, you find yourself typing much more frequently. That's an unexpected personal win for me, as I recently learned fast touch typing: #llm #coding #programming

    -

    Verifying dual-stack connectivity


    +foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.gmi (Gemini)
    +foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.html

    -After regenerating and deploying the configurations, both IPv4 and IPv6 work across the mesh:
    +

    For science, fun and profit, I set up a ...



    - -
    # From r0 (Rocky Linux VM)
    -root@r0:~ # ping -c 2 192.168.2.130  # IPv4 to f0
    -64 bytes from 192.168.2.130: icmp_seq=1 ttl=64 time=2.12 ms
    -64 bytes from 192.168.2.130: icmp_seq=2 ttl=64 time=0.681 ms
    -
    -root@r0:~ # ping6 -c 2 fd42:beef:cafe:2::130  # IPv6 to f0
    -64 bytes from fd42:beef:cafe:2::130: icmp_seq=1 ttl=64 time=2.16 ms
    -64 bytes from fd42:beef:cafe:2::130: icmp_seq=2 ttl=64 time=0.909 ms
    -
    +For science, fun and profit, I set up a #WireGuard mesh network for my #FreeBSD, #OpenBSD, #RockyLinux and #Kubernetes #homelab: There's also a mesh generator, which I wrote in #Ruby. #k3s #linux #k8s #k3s

    -The dual-stack configuration is backward compatible—hosts without the ipv6 field in the YAML configuration will continue to generate IPv4-only configs.
    +foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.gmi (Gemini)
    +foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.html

    -

    Benefits of dual-stack


    +

    Ever wondered about the hung task Linux ...



    -Adding IPv6 to the mesh network provides:
    +Ever wondered about the hung task Linux messages on a busy server? Every case is unique, and there is no standard approach to debug them, but here it gets a bit demystified: #linux #kernel

    -
      -
    • Future-proofing: Ready for IPv6-only services and networks
    • -
    • Compatibility: Dual-stack maintains full IPv4 compatibility
    • -
    • Learning: Hands-on experience with IPv6 networking
    • -
    • Flexibility: Roaming clients can access both IPv4 and IPv6 internet resources
    • -

    -

    Happy WireGuard-ing


    +blog.cloudflare.com/searching-for-the-cause-of-hung-tasks-in-the-linux-kernel/

    -All is set up now. E.g. on f0:
    +

    A bit of #fun: The FORTRAN hating gateway ― ...



    - -
    paul@f0:~ % doas wg show
    -interface: wg0
    -  public key: Jm6YItMt94++dIeOyVi1I9AhNt2qQcryxCZezoX7X2Y=
    -  private key: (hidden)
    -  listening port: 56709
    -
    -peer: 8PvGZH1NohHpZPVJyjhctBX9xblsNvYBhpg68FsFcns=
    -  preshared key: (hidden)
    -  endpoint: 46.23.94.99:56709
    -  allowed ips: 192.168.2.111/32, fd42:beef:cafe:2::111/128
    -  latest handshake: 1 minute, 46 seconds ago
    -  transfer: 124 B received, 1.75 KiB sent
    -  persistent keepalive: every 25 seconds
    -
    -peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    -  preshared key: (hidden)
    -  endpoint: 23.88.35.144:56709
    -  allowed ips: 192.168.2.110/32, fd42:beef:cafe:2::110/128
    -  latest handshake: 1 minute, 52 seconds ago
    -  transfer: 124 B received, 1.60 KiB sent
    -  persistent keepalive: every 25 seconds
    -
    -peer: s3e93XoY7dPUQgLiVO4d8x/SRCFgEew+/wP7+zwgehI=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.120:56709
    -  allowed ips: 192.168.2.120/32, fd42:beef:cafe:2::120/128
    -
    -peer: 2htXdNcxzpI2FdPDJy4T4VGtm1wpMEQu1AkQHjNY6F8=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.131:56709
    -  allowed ips: 192.168.2.131/32, fd42:beef:cafe:2::131/128
    -
    -peer: 0Y/H20W8YIbF7DA1sMwMacLI8WS9yG+1/QO7m2oyllg=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.122:56709
    -  allowed ips: 192.168.2.122/32, fd42:beef:cafe:2::122/128
    -
    -peer: Hhy9kMPOOjChXV2RA5WeCGs+J0FE3rcNPDw/TLSn7i8=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.121:56709
    -  allowed ips: 192.168.2.121/32, fd42:beef:cafe:2::121/128
    -
    -peer: SlGVsACE1wiaRoGvCR3f7AuHfRS+1jjhS+YwEJ2HvF0=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.132:56709
    -  allowed ips: 192.168.2.132/32, fd42:beef:cafe:2::132/128
    -
    +A bit of #fun: The FORTRAN hating gateway ― Andreas Zwinkau

    -All the hosts are pingable as well, e.g.:
    +beza1e1.tuxen.de/lore/fortran_hating_gateway.html

    - -
    paul@f0:~ % foreach peer ( f1 f2 r0 r1 r2 blowfish fishfinger )
    -foreach? ping -c2 $peer.wg0
    -foreach? echo
    -foreach? end
    -PING f1.wg0 (192.168.2.131): 56 data bytes
    -64 bytes from 192.168.2.131: icmp_seq=0 ttl=64 time=0.334 ms
    -64 bytes from 192.168.2.131: icmp_seq=1 ttl=64 time=0.260 ms
    -
    ---- f1.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 0.260/0.297/0.334/0.037 ms
    -
    -PING f2.wg0 (192.168.2.132): 56 data bytes
    -64 bytes from 192.168.2.132: icmp_seq=0 ttl=64 time=0.323 ms
    -64 bytes from 192.168.2.132: icmp_seq=1 ttl=64 time=0.303 ms
    -
    ---- f2.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 0.303/0.313/0.323/0.010 ms
    -
    -PING r0.wg0 (192.168.2.120): 56 data bytes
    -64 bytes from 192.168.2.120: icmp_seq=0 ttl=64 time=0.716 ms
    -64 bytes from 192.168.2.120: icmp_seq=1 ttl=64 time=0.406 ms
    -
    ---- r0.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 0.406/0.561/0.716/0.155 ms
    -
    -PING r1.wg0 (192.168.2.121): 56 data bytes
    -64 bytes from 192.168.2.121: icmp_seq=0 ttl=64 time=0.639 ms
    -64 bytes from 192.168.2.121: icmp_seq=1 ttl=64 time=0.629 ms
    -
    ---- r1.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 0.629/0.634/0.639/0.005 ms
    -
    -PING r2.wg0 (192.168.2.122): 56 data bytes
    -64 bytes from 192.168.2.122: icmp_seq=0 ttl=64 time=0.569 ms
    -64 bytes from 192.168.2.122: icmp_seq=1 ttl=64 time=0.479 ms
    -
    ---- r2.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 0.479/0.524/0.569/0.045 ms
    -
    -PING blowfish.wg0 (192.168.2.110): 56 data bytes
    -64 bytes from 192.168.2.110: icmp_seq=0 ttl=255 time=35.745 ms
    -64 bytes from 192.168.2.110: icmp_seq=1 ttl=255 time=35.481 ms
    -
    ---- blowfish.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 35.481/35.613/35.745/0.132 ms
    -
    -PING fishfinger.wg0 (192.168.2.111): 56 data bytes
    -64 bytes from 192.168.2.111: icmp_seq=0 ttl=255 time=33.992 ms
    -64 bytes from 192.168.2.111: icmp_seq=1 ttl=255 time=33.751 ms
    -
    ---- fishfinger.wg0 ping statistics ---
    -2 packets transmitted, 2 packets received, 0.0% packet loss
    -round-trip min/avg/max/stddev = 33.751/33.872/33.992/0.120 ms
    -
    +

    So, Golang was invented while engineers at ...



    -Note that the loop above is a tcsh loop, the default shell used in FreeBSD. Of course, all other peers can ping their peers as well!
    +So, Golang was invented while engineers at Google waited for C++ to compile. Here I am, waiting a long time for Java to compile...

    -After the first ping, VPN tunnels now also show handshakes and the amount of data transferred through them:
    +

    I couldn't do without here-docs. If they did ...



    - -
    paul@f0:~ % doas wg show
    -interface: wg0
    -  public key: Jm6YItMt94++dIeOyVi1I9AhNt2qQcryxCZezoX7X2Y=
    -  private key: (hidden)
    -  listening port: 56709
    -
    -peer: 0Y/H20W8YIbF7DA1sMwMacLI8WS9yG+1/QO7m2oyllg=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.122:56709
    -  allowed ips: 192.168.2.122/32, fd42:beef:cafe:2::122/128
    -  latest handshake: 10 seconds ago
    -  transfer: 440 B received, 532 B sent
    -
    -peer: Hhy9kMPOOjChXV2RA5WeCGs+J0FE3rcNPDw/TLSn7i8=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.121:56709
    -  allowed ips: 192.168.2.121/32, fd42:beef:cafe:2::121/128
    -  latest handshake: 12 seconds ago
    -  transfer: 440 B received, 564 B sent
    -
    -peer: s3e93XoY7dPUQgLiVO4d8x/SRCFgEew+/wP7+zwgehI=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.120:56709
    -  allowed ips: 192.168.2.120/32, fd42:beef:cafe:2::120/128
    -  latest handshake: 14 seconds ago
    -  transfer: 440 B received, 564 B sent
    -
    -peer: SlGVsACE1wiaRoGvCR3f7AuHfRS+1jjhS+YwEJ2HvF0=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.132:56709
    -  allowed ips: 192.168.2.132/32, fd42:beef:cafe:2::132/128
    -  latest handshake: 17 seconds ago
    -  transfer: 472 B received, 564 B sent
    -
    -peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    -  preshared key: (hidden)
    -  endpoint: 23.88.35.144:56709
    -  allowed ips: 192.168.2.110/32, fd42:beef:cafe:2::110/128
    -  latest handshake: 55 seconds ago
    -  transfer: 472 B received, 596 B sent
    -  persistent keepalive: every 25 seconds
    -
    -peer: 8PvGZH1NohHpZPVJyjhctBX9xblsNvYBhpg68FsFcns=
    -  preshared key: (hidden)
    -  endpoint: 46.23.94.99:56709
    -  allowed ips: 192.168.2.111/32, fd42:beef:cafe:2::111/128
    -  latest handshake: 55 seconds ago
    -  transfer: 472 B received, 596 B sent
    -  persistent keepalive: every 25 seconds
    -
    -peer: 2htXdNcxzpI2FdPDJy4T4VGtm1wpMEQu1AkQHjNY6F8=
    -  preshared key: (hidden)
    -  endpoint: 192.168.1.131:56709
    -  allowed ips: 192.168.2.131/32, fd42:beef:cafe:2::131/128
    -
    +I couldn't do without here-docs. If they did not exist, I would need to find another field and pursue a career there. #bash #sh #shell

    -

    Managing Roaming Client Tunnels


    +rednafi.com/misc/heredoc_headache/

    -Since roaming clients like earth and pixel7pro connect on-demand rather than being always-on like the infrastructure hosts, it's useful to know how to configure and manage the WireGuard tunnels.
    +

    I started using computers as a kid on MS-DOS ...



    -

    Manual gateway failover configuration


    +I started using computers as a kid on MS-DOS and mainly used Norton Commander to navigate the file system in order to start games. Later, I became more interested in computing in general and switched to Linux, but there was no NC. However, there was GNU Midnight Commander, which I still use regularly to this day. It's absolutely worth checking out, even in the modern day. #tools #opensource

    -The default configuration for roaming clients includes both gateways (blowfish and fishfinger) with AllowedIPs = 0.0.0.0/0, ::/0. However, WireGuard doesn't automatically failover between multiple peers with identical AllowedIPs routes. When both gateways are configured this way, WireGuard uses the first peer with a recent handshake. If that gateway goes down, traffic won't automatically switch to the backup gateway.
    +en.wikipedia.org/wiki/Midnight_Commander

    -To enable manual failover, separate configuration files can be created for roaming clients (earth laptop and pixel7pro phone), each containing only a single gateway peer. This provides explicit control over which gateway handles traffic.
    +

    Thats interesting, running #Android in ...



    -Configuration files for pixel7pro (phone):
    +Thats interesting, running #Android in #Kubernetes

    -Two separate configs in /home/paul/git/wireguardmeshgenerator/dist/pixel7pro/etc/wireguard/:
    +ku.bz/Gs4-wpK5h

    -
      -
    • wg0-blowfish.conf - Routes all traffic through blowfish gateway (23.88.35.144)
    • -
    • wg0-fishfinger.conf - Routes all traffic through fishfinger gateway (46.23.94.99)
    • -

    -Generate QR codes for importing into the WireGuard Android app:
    +

    Before wiping the pre-installed #Windows 11 ...



    - -
    qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0-blowfish.conf
    -qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0-fishfinger.conf
    -
    +Before wiping the pre-installed #Windows 11 Pro on my new Beelink mini PC, I tested #WSL2 with #Fedora #Linux. I compiled my pet project, I/O Riot NG (ior), which requires many system libraries, including #BPF. I’m impressed—everything works just like on native Fedora, and my tool runs and traces I/O syscalls with BPF out of the box. I might would prefer now Windows over MacOS if I had to chose between those two for work.

    -Import both QR codes using the WireGuard app to create two separate tunnel profiles. You can then manually enable/disable each tunnel to select which gateway to use. Only enable one tunnel at a time.
    +codeberg.org/snonux/ior

    -Configuration files for earth (laptop):
    +

    Some might hate me saying this, but didnt ...



    -Two separate configs in /home/paul/git/wireguardmeshgenerator/dist/earth/etc/wireguard/:
    +Some might hate me saying this, but didnt #systemd solve the problem of a shared /tmp directory by introducing PrivateTmp?? but yes why did it have to go that way...

    -
      -
    • wg0-blowfish.conf - Routes all traffic through blowfish gateway
    • -
    • wg0-fishfinger.conf - Routes all traffic through fishfinger gateway
    • -

    -Install both configurations:
    +www.osnews.com/story/140968/tmp-should-not-exist/

    - -
    sudo cp dist/earth/etc/wireguard/wg0-blowfish.conf /etc/wireguard/
    -sudo cp dist/earth/etc/wireguard/wg0-fishfinger.conf /etc/wireguard/
    -
    +

    Wouldn't still do that, even with 100% test ...



    -This approach provides explicit control over which gateway handles roaming client traffic, useful when one gateway needs maintenance or experiences connectivity issues.
    +Wouldn't still do that, even with 100% test coverage, LT and integration tests, unless theres an exception the business relies on #sre

    -

    Starting and stopping on earth (Fedora laptop)


    +medium.com/openclassrooms-product-desi..-..g/do-not-deploy-on-friday-92b1b46ebfe6

    -On the Fedora laptop, WireGuard is managed via systemd. Using the separate gateway configs:
    +

    Some neat slice tricks for Go: #golang ...



    - -
    # Start with blowfish gateway
    -earth$ sudo systemctl start wg-quick@wg0-blowfish.service
    -
    -# Or start with fishfinger gateway
    -earth$ sudo systemctl start wg-quick@wg0-fishfinger.service
    -
    -# Check tunnel status (example with blowfish gateway)
    -earth$ sudo wg show
    -interface: wg0
    -  public key: Mc1CpSS3rbLN9A2w9c75XugQyXUkGPHKI2iCGbh8DRo=
    -  private key: (hidden)
    -  listening port: 56709
    -  fwmark: 0xca6c
    -
    -peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    -  preshared key: (hidden)
    -  endpoint: 23.88.35.144:56709
    -  allowed ips: 0.0.0.0/0, ::/0
    -  latest handshake: 5 seconds ago
    -  transfer: 15.89 KiB received, 32.15 KiB sent
    -  persistent keepalive: every 25 seconds
    -
    +Some neat slice tricks for Go: #golang

    -Stopping the tunnel:
    +blog.devtrovert.com/p/12-slice-tricks-to-enhance-your-go

    - -
    earth$ sudo systemctl stop wg-quick@wg0-blowfish.service
    -# Or if using fishfinger:
    -earth$ sudo systemctl stop wg-quick@wg0-fishfinger.service
    -
    -earth$ sudo wg show
    -# No output - WireGuard interface is down
    -
    +

    I understand that Kubernetes is not for ...



    -Switching between gateways:
    +I understand that Kubernetes is not for everyone, but it still seems to be the new default for everything newly built. Despite the fact that Kubernetes is complex to maintain and use, there is still a lot of SRE/DevOps talent out there who have it on their CVs, which contributes significantly to the supportability of the infrastructure and the applications running on it. This way, you don't have to teach every new engineer your "own way" infrastructure. It's like a standard language of infrastructure that many people speak. However, Kubernetes should not be the default solution for everything, in my opinion. #kubernetes #k8s

    - -
    # Switch from blowfish to fishfinger
    -earth$ sudo systemctl stop wg-quick@wg0-blowfish.service
    -earth$ sudo systemctl start wg-quick@wg0-fishfinger.service
    -
    +www.gitpod.io/blog/we-are-leaving-kubernetes

    -The services remain disabled to prevent auto-start on boot, allowing manual control of when the VPN is active and which gateway to use.
    +

    June 2025



    -

    Starting and stopping on pixel7pro (Android phone)


    +

    Some great advices, will try out some of them! ...



    -On Android using the official WireGuard app, you now have two tunnel profiles (wg0-blowfish and wg0-fishfinger) after importing the QR codes:
    +Some great advices, will try out some of them! #programming

    -Starting a tunnel:
    +endler.dev/2025/best-programmers/

    -
      -
    • 1. Open the WireGuard app
    • -
    • 2. Tap the toggle switch next to either wg0-blowfish or wg0-fishfinger tunnel configuration
    • -
    • 3. The switch turns blue/green and shows "Active"
    • -
    • 4. A key icon appears in the notification bar indicating VPN is active
    • -
    • 5. All traffic now routes through the selected gateway
    • -

    -Stopping the tunnel:
    +

    In #Golang, values are actually copied when ...



    -
      -
    • 1. Open the WireGuard app
    • -
    • 2. Tap the toggle switch again to disable it
    • -
    • 3. The switch turns gray and shows "Inactive"
    • -
    • 4. The notification bar key icon disappears
    • -
    • 5. Normal internet routing resumes
    • -

    -Switching between gateways:
    +In #Golang, values are actually copied when assigned (boxed) into an interface. That can have performance impact.

    -
      -
    • 1. Disable the currently active tunnel (e.g., wg0-blowfish)
    • -
    • 2. Enable the other tunnel (e.g., wg0-fishfinger)
    • -
    • Only enable one tunnel at a time
    • -

    -Quick toggling from notification:
    +goperf.dev/01-common-patterns/interface-boxing/
    +
    +

    This is a great little tutorial for searching ...


    +
    +This is a great little tutorial for searching in the #HelixEditor #editor #coding
    +
    +helix-editor-tutorials.com/tutorials/using-helix-global-search/
    +
    +

    The mov instruction of a CPU is turing ...


    +
    +The mov instruction of a CPU is turing complete. And theres an implementation of #Doom only using mov, it renders one frame per 7 hours! #fun
    +
    +beza1e1.tuxen.de/articles/accidentally_turing_complete.html
    +
    +

    I removed the social media profile from my ...


    +
    +I removed the social media profile from my GrapheneOS phone. Originally, I created a separate profile just for social media to avoid using it too often. But I noticed that I switched to it too frequently. Not having social media within reach is probably the best option. #socialmedia #sm #distractions
    +
    +

    So want a "real" recent UNIX? Use AIX! #macos ...


    +
    +So want a "real" recent UNIX? Use AIX! #macos #unix #aix
    +
    +www.osnews.com/story/141633/apples-macos-unix-certification-is-a-lie/
    +
    +

    This episode, I think, is kind of an eye-opener ...


    +
    +This episode, I think, is kind of an eye-opener for me personally. I knew, that AI is there to stay, but you better should now start playing with your pet projects, otherwise your performance reviews will be awkward in a year or two from now, when you are expected to use AI for your daily work. #ai #llm #coding #programming
    +
    +changelog.com/friends/96
    +
    +

    My #OpenBSD blog setup got mentioned in the ...


    +
    +My #OpenBSD blog setup got mentioned in the BSDNow.tv Podcast (In the Feedback section) :-) #BSD #podcast #runbsd
    +
    +www.bsdnow.tv/614
    +
    +

    #Golang is the best when it comes to agentic ...


    +
    +#Golang is the best when it comes to agentic coding: #llm
    +
    +lucumr.pocoo.org/2025/6/12/agentic-coding/
    +
    +

    Where #zsh is better than #bash ...


    +
    +Where #zsh is better than #bash
    +
    +www.arp242.net/why-zsh.html
    +
    +

    I really enjoyed this talk about obscure Go ...


    +
    +I really enjoyed this talk about obscure Go optimizations. None of it is really standard and can change from one version of Go to another, though. #golang #talk
    +
    +www.youtube.com/watch?v=rRtihWOcaLI

    -
      -
    • Pull down the notification shade
    • -
    • Tap the WireGuard notification to quickly enable/disable the tunnel without opening the app
    • -

    -The WireGuard Android app supports automatically activating tunnels based on:
    +

    Commenting your regular expression is generally ...



    -
      -
    • Mobile data connection (e.g., enable VPN when on cellular)
    • -
    • WiFi SSID (e.g., disable VPN when on trusted home network)
    • -
    • Ethernet connection status
    • -

    -These settings can be configured by tapping the pencil icon next to the tunnel name, then scrolling to "Toggle on/off based on" options.
    +Commenting your regular expression is generally a good advice! Works pretty well as described in the article not just in #Ruby, but also in #Perl (@Perl), #RakuLang, ...

    -

    Verifying connectivity


    +thoughtbot.com/blog/comment-your-regular-expressions

    -Once the tunnel is active on either device, verify connectivity:
    +

    You have to make a decision for yourself, but ...



    - -
    # From earth laptop:
    -earth$ ping -c2 blowfish.wg0
    -earth$ ping -c2 fishfinger.wg0
    -earth$ curl https://ifconfig.me  # Should show gateway's public IP
    -
    +You have to make a decision for yourself, but generally, work smarter (and faster—but keep the quality)! About 40 hours #productivity #work #workload

    -Check which gateway is active: Check the transfer statistics with sudo wg show on earth to see which peer shows recent handshakes and increasing transfer bytes. On Android, the WireGuard app shows the active tunnel with data transfer statistics.
    +thesquareplanet.com/blog/about-40-hours/

    -

    Conclusion


    +

    "100 Go Mistakes and How to Avoid Them" is one ...



    -Having a mesh network on our hosts is great for securing all the traffic between them for our future k3s setup. A self-managed WireGuard mesh network is better than Tailscale as it eliminates reliance on a third party and provides full control over the configuration. It reduces unnecessary abstraction and "magic," enabling easier debugging and ensuring full ownership of our network.
    +"100 Go Mistakes and How to Avoid Them" is one of my favorite #Golang books. Julia Evans also stumbled across some issues she'd learned from this book. The book itself is an absolute must for every Gopher (or someone who wants to become one!)

    -Read the next post of this series:
    +jvns.ca/blog/2024/08/06/go-structs-copied-on-assignment/

    -f3s: Kubernetes with FreeBSD - Part 6: Storage
    +

    The #Ruby Data class seems quite helpful ...



    -Other *BSD-related posts:
    +The #Ruby Data class seems quite helpful

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network (You are currently reading this)
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +allaboutcoding.ghinda.com/example-of-value-objects-using-rubys-data-class

    -E-Mail your comments to paul@nospam.buetow.org
    +Other related posts:
    +
    +2025-01-01 Posts from October to December 2024
    +2025-07-01 Posts from January to June 2025 (You are currently reading this)
    +2026-01-01 Posts from July to December 2025
    +
    +E-Mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - Terminal multiplexing with `tmux` - Fish edition - - gemini://foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.gmi - 2025-05-02T00:09:23+03:00 + Task Samurai: An agentic coding learning experiment + + gemini://foo.zone/gemfeed/2025-06-22-task-samurai.gmi + 2025-06-22T20:00:51+03:00 Paul Buetow aka snonux paul@dev.buetow.org - This is the Fish shell edition of the same post (but for Z-Shell) of mine from last year: + Task Samurai is a fast terminal interface for Taskwarrior written in Go using the Bubble Tea framework. It displays your tasks in a table and allows you to manage them without leaving your keyboard.
    -

    Terminal multiplexing with tmux - Fish edition


    +

    Task Samurai: An agentic coding learning experiment



    -Published at 2025-05-02T00:09:23+03:00
    +Published at 2025-06-22T20:00:51+03:00

    -This is the Fish shell edition of the same post (but for Z-Shell) of mine from last year:
    +Task Samurai Logo

    -./2024-06-23-terminal-multiplexing-with-tmux.html
    +

    Table of Contents



    -Tmux (Terminal Multiplexer) is a powerful, terminal-based tool that manages multiple terminal sessions within a single window. Here are some of its primary features and functionalities:
    +
    +

    Introduction


    +
    +Task Samurai is a fast terminal interface for Taskwarrior written in Go using the Bubble Tea framework. It displays your tasks in a table and allows you to manage them without leaving your keyboard.
    +
    +https://taskwarrior.org
    +https://github.com/charmbracelet/bubbletea
    +
    +

    Why does this exist?


    +
    +I wanted to tinker with agentic coding. This project was implemented entirely using OpenAI Codex. (After this blog post was published, I also used the Claude Code CLI.)

      -
    • Session management
    • -
    • Window and Pane management
    • -
    • Persistent Workspace
    • -
    • Customization
    • +
    • I wanted a faster UI for Taskwarrior than other options, like Vit, which is Python-based.
    • +
    • I wanted something built with Bubble Tea, but I never had time to dive deep into it.
    • +
    • I wanted to build a toy project (like Task Samurai) first, before tackling the big ones, to get started with agentic coding.

    -https://github.com/tmux/tmux/wiki
    +https://openai.com/codex/

    -
    -            _______                           s
    -           |.-----.|                           s
    -           || Tmux||                          s
    -           ||_.-._||       |\   \\\\__     o          s
    -           `--)-(--`       | \_/    o \    o          s
    -          __[=== o]__      > _   (( <_  oo            s
    -         |:::::::::::|\    | / \__+___/               s
    -   jgs   `-=========-`()   |/     |/                  s
    -       mod. by Paul B.
    -
    +I've been curious about agentic coding for a while and wanted to see what it's actually like to build something with it. So I gave it a go (no pun intended).

    -

    Table of Contents


    +

    How it works


    +
    +Task Samurai invokes the task command (that's the original Taskwarrior CLI command) to read and modify tasks. The tasks are displayed in a Bubble Tea table, where each row represents a task. Hotkeys trigger Taskwarrior commands such as starting, completing or annotating tasks. The UI refreshes automatically after each action, so the table is always up to date.
    +
    +Task Samurai Screenshot
    +
    +

    Where and how to get it


    +
    +Go to:
    +
    +https://codeberg.org/snonux/tasksamurai
    +
    +And follow the README.md!
    +
    +

    Lessons learned from building Task Samurai with agentic coding


    +
    +

    Developer workflow


    +
    +I was trying out OpenAI Codex because I regularly run out of Claude Code CLI (another agentic coding tool I am currently trying out) credits (it still happens!), but Codex was still available to me. So, I took the opportunity to push agentic coding a bit further with another platform.
    +
    +I didn't really love the web UI you have to use for Codex, as I usually live in the terminal. But this is all I have for Codex for now, and I thought I'd give it a try regardless. The web UI is simple and pretty straightforward. There's also a Codex CLI one could use directly in the terminal, but I didn't get it working. I will try again soon.
    +
    +Update: Codex CLI now works for me, after OpenAI released a new version!
    +
    +For every task given to Codex, it spins up its own container. From there, you can drill down and watch what it is doing. At the end, the result (in the form of a code diff) will be presented. From there, you can make suggestions about what else to change in the codebase. What I found inconvenient is that for every additional change, there's an overhead because Codex has to spin up a container and bootstrap the entire development environment again, which adds extra delay. That could be eliminated by setting up predefined custom containers, but that feature still seems somewhat limited.
    +
    +Once satisfied, you can ask Codex to create a GitHub PR (too bad only GitHub is supported and no other Git hosters); from there, you can merge it and then pull it to your local laptop or workstation to test the changes again. I found myself looping a lot around the Codex UI, GitHub PRs, and local checkouts.
    +
    +

    How it went


    +
    +Task Samurai's codebase came together quickly: the entire Git history spans from June 19 to 22, 2025, culminating in 179 commits:


    -

    Before continuing...


    +Most big breakthroughs (and bug introductions) came during that middle day of intense iteration. The latter stages were all about smoothing out the rough edges.

    -Before continuing to read this post, I encourage you to get familiar with Tmux first (unless you already know the basics). You can go through the official getting started guide:
    +It's worth noting that I worked on it in the evenings when I had some free time, as I also had to fit in my regular work and family commitments during the day. So, I didn't spend full working days on this project.

    -https://github.com/tmux/tmux/wiki/Getting-Started
    +

    What went wrong



    -I can also recommend this book (this is the book I got started with with Tmux):
    +Going agentic isn't all smooth. Here are the hiccups I ran into, plus a few lessons:

    -https://pragprog.com/titles/bhtmux2/tmux-2/
    +
      +
    • Merge Floods: Every minor feature or fix existed on its branch, so merging was a constant process. It kept progress flowing but also drowned the committed history in noise and the occasional conflict. I found this to be an issue with OpenAI's Codex in particular. Not so much with other agentic coding tools like Claude Code CLI (not covered in this blog post.)
    • +
    • Fixes on fixes: Features like "fireworks on exit" had chains of "fix exit," "fix cell selection," etc. Sometimes, new additions introduced bugs that needed rapid patching.
    • +

    +

    Patterns that helped



    -Over the years, I have built a couple of shell helper functions to optimize my workflows. Tmux is extensively integrated into my daily workflows (personal and work). I had colleagues asking me about my Tmux config and helper scripts for Tmux several times. It would be neat to blog about it so that everyone interested in it can make a copy of my configuration and scripts.
    +Despite the chaos, a few strategies kept things moving:

    -The configuration and scripts in this blog post are only the non-work-specific parts. There are more helper scripts, which I only use for work (and aren't really useful outside of work due to the way servers and clusters are structured there).
    +
      +
    • Scaffolding First: I started with the basic table UI and command wrappers, then layered on features—never the other way around.
    • +
    • Tiny PRs: Small, atomic merges meant feedback came fast (and so did fixes).
    • +
    • Tests Matter: A solid base of unit tests for task manipulations kept things from breaking entirely when experimenting.
    • +
    • Live Documentation: Documentation, such as the README, is updated regularly to reflect all the hotkey and feature changes.
    • +

    +Maybe a better approach would have been to design the whole application from scratch before letting Codix do any of the coding. I will try that with my next toy project.

    -Tmux is highly configurable, and I think I am only scratching the surface of what is possible with it. Nevertheless, it may still be useful for you. I also love that Tmux is part of the OpenBSD base system!
    +

    What I learned using agentic coding



    -

    Shell aliases


    +Stepping into agentic coding with Codex as my "pair programmer" was a big shift. I learned a lot—not just about automating code generation, but also about how you have to tightly steer, guide, and audit every line as things move at high speed. I must admit, I sometimes lost track of what all the generated code was actually doing. But as the features seemed to work after a few iterations, I was satisfied—which is a bit concerning. Imagine if I approved a PR for a production-grade deployment without fully understanding what it was doing (and not a toy project like in this post).

    -Since last week, I am playing a bit with the Fish shell. As a result, I also converted all my tmux helper scripts (mentioned in this blog post) from Z-Shell to Fish.
    +

    how much time did I save?



    -https://fishshell.com
    +Did it buy me speed?

    -For the most common Tmux commands I use, I have created the following shell aliases:
    +
      +
    • Say each commit takes Codex 5 minutes to generate, and you need to review/guide 179 commits = about _6 hours of active development_.
    • +
    • If you coded it all yourself, including all the bug fixes, features, design, and documentation, you might spend _10–20 hours_.
    • +
    • That's a couple of days of potential savings—and I am by no means an expert in agentic coding, since this was my first completed agentic coding project.
    • +

    +

    Conclusion



    -
    -alias tn 'tmux::new'
    -alias ta 'tmux::attach'
    -alias tx 'tmux::remote'
    -alias ts 'tmux::search'
    -alias tssh 'tmux::cluster_ssh'
    -alias tm tmux
    -alias tl 'tmux list-sessions'
    -alias foo 'tmux::new foo'
    -alias bar 'tmux::new bar'
    -alias baz 'tmux::new baz'
    -
    +Building Task Samurai with agentic coding was a wild ride—rapid feature growth, countless fast fixes, and more merge commits I'd expected. Keep the iterations short (or maybe in my next experiment, much larger, with better and more complete design before generating a single line of code), keep tests and documentation concise, and review and refine for final polish at the end. Even with the bumps along the way, shipping a terminal UI in days instead of weeks is a neat little showcase vibe coding.
    +
    +Am I an agentic coding expert now? I don't think so. There are still many things to learn, and the landscape is constantly evolving.
    +
    +While working on Task Samurai, there were times I missed manual coding and the satisfaction that comes from writing every line yourself, debugging issues manually, and crafting solutions from scratch. However, this is the direction in which the industry seems to be shifting, unfortunately. If applied correctly, AI will boost performance, and if you don't use AI, your next performance review may be awkward.
    +
    +Personally, I am not sure whether I like where the industry is going with agentic coding. I love "traditional" coding, and with agentic coding you operate at a higher level and don't interact directly with code as often, which I would miss. I think that in the future, designing, reviewing, and being able to read and understand code will be more important than writing code by hand.
    +
    +Do you have any thoughts on that? I hope, I am partially wrong at least.
    +
    +E-Mail your comments to paul@nospam.buetow.org :-)
    +
    +Other related posts are:
    +
    +2025-08-05 Local LLM for Coding with Ollama on macOS
    +2025-06-22 Task Samurai: An agentic coding learning experiment (You are currently reading this)
    +
    +Back to the main site
    +
    +
    +
    + + 'A Monk's Guide to Happiness' book notes + + gemini://foo.zone/gemfeed/2025-06-07-a-monks-guide-to-happiness-book-notes.gmi + 2025-06-07T10:30:11+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + These are my personal book notes from Gelong Thubten's 'A Monk's Guide to Happiness: Meditation in the 21st century.' They are for my own reference, but I hope they might be useful to you as well. + +
    +

    "A Monk's Guide to Happiness" book notes



    -Note all tmux::...; those are custom shell functions doing certain things, and they aren't part of the Tmux distribution. But let's run through every aliases one by one.
    +Published at 2025-06-07T10:30:11+03:00

    -The first two are pretty straightforward. tm is simply a shorthand for tmux, so I have to type less, and tl lists all Tmux sessions that are currently open. No magic here.
    +These are my personal book notes from Gelong Thubten's "A Monk's Guide to Happiness: Meditation in the 21st century." They are for my own reference, but I hope they might be useful to you as well.

    -

    The tn alias - Creating a new session


    +

    Table of Contents



    -The tn alias is referencing this function:
    +
    +

    Understanding Happiness



    -
    -# Create new session and if alread exists attach to it
    -function tmux::new
    -    set -l session $argv[1]
    -    _tmux::cleanup_default
    -    if test -z "$session"
    -        tmux::new (string join "" T (date +%s))
    -    else
    -        tmux new-session -d -s $session
    -        tmux -2 attach-session -t $session || tmux -2 switch-client -t $session
    -    end
    -end
    -
    +
      +
    • Happiness is a skill we can train.
    • +
    • Happiness is not about accomplishing goals, as that would be in the future.
    • +
    • Feel free now. No urge about past and future.
    • +
    • We can learn to produce our own happiness independently of physical needs. When we walk in a park, how do we feel? We can train to reproduce that feeling independently.
    • +

    +

    The Role of Meditation



    -There is a lot going on here. Let's have a detailed look at what it is doing.
    +
      +
    • Meditation is not about clearing your mind. A busy mind has nothing to do with interfering with your meditation.
    • +
    • Our problem is that we need to detect that awareness. Meditation connects us with awareness. Awareness is freedom.
    • +
    • We can let the mind be and don't care about the thoughts. It will have benefits for your life. It will protect you from all kinds of stress.
    • +
    • Better meditate with open eyes so you don't associate it with the dark. You will also be able to be in a meditation state of mind outside of the meditation session.
    • +
    • Have a baseline for time to build up discipline.
    • +
    • We don't need to do anything about stress, just take a step back.
    • +

    +

    Managing Thoughts and Emotions



    -First, a Tmux session name can be passed to the function as a first argument. That session name is only optional. Without it, Tmux will select a session named (string join "" T (date +%s)) as a default. Which is T followed by the UNIX epoch, e.g. T1717133796.
    +
      +
    • Our flow of emotions is really just habits. That can be changed through training, e.g., meditation training.
    • +
    • A part of the mind recognises that we are sad or angry. That part is not sad or angry by itself, obviously. So we can escape to that part of the mind, be the observer, and not draw in the constant flow of emotions and thoughts.
    • +
    • Let the front and back doors of your house open, and let the thoughts come in and leave. Just don't serve them tea. This once said, a great Zen master.
    • +
    • Thoughts are friends and not enemies.
    • +
    • Thoughts help the meditation as they make us notice that we wandered off, and therefore, we strengthen the reflection.
    • +

    +

    Practice and Discipline



    -

    Cleaning up default sessions automatically


    +
      +
    • The importance of habits to practice mindfulness. Bring mindfulness into the daily practice.
    • +
    • Integrating short moments of mindfulness during the day is the fast track to happiness. Start off with small tasks, e.g. while washing your hands.
    • +
    • Have many small doses of mindfulness and don't prolong as otherwise, your mind will revolt.
    • +
    • Have a small moment of mindfulness when you wake up and go to sleep.
    • +
    • Practice staying fully present in an uncomfortable situation and without judgement.
    • +
    • Don't become two persons who never meet: the meditator and the not meditator. So integrate mindfulness during the day too.
    • +

    +

    Perspectives on Relationships and Interactions



    -Note also the call to _tmux::cleanup_default; it would clean up all already opened default sessions if they aren't attached. Those sessions were only temporary, and I had too many flying around after a while. So, I decided to auto-delete the sessions if they weren't attached. If I want to keep sessions around, I will rename them with the Tmux command prefix-key $. This is the cleanup function:
    +
      +
    • Who is the opponent? The other person. The things he said or our reactions to things? Forgiveness is a high form of compassion.
    • +
    • Understand the suffering of the person who "hurt" us. Where is the aggressor really coming from?
    • +
    • People who are stressed or unhappy do and say things they wouldn't have said have done otherwise. Acting under anger is like being influenced by alcohol.
    • +
    • People don't have a masterplan to destroy others, even if it seems so. They are under strong bad influence by themselves. Something terrible happened to them. Revenge makes no sense.
    • +
    • Be grateful for people "trying" to hurt you as they help you to practice your path.
    • +

    +

    Reflective Questions



    -
    -function _tmux::cleanup_default
    -    tmux list-sessions | string match -r '^T.*: ' | string match -v -r attached | string split ':' | while read -l s
    -        echo "Killing $s"
    -        tmux kill-session -t "$s"
    -    end
    -end
    -
    +
      +
    • Why do I do all the things I do? What do I try to achieve?
    • +
    • What am I doing about that?
    • +
    • Is it working?
    • +
    • What are the real causes of happiness and suffering?
    • +
    • What about meditation? How does that address the situation?
    • +

    +

    Miscellaneous Guidelines



    -The cleanup function kills all open Tmux sessions that haven't been renamed properly yet—but only if they aren't attached (e.g., don't run in the foreground in any terminal). Cleaning them up automatically keeps my Tmux sessions as neat and tidy as possible.
    +
      +
    • Posture is important as the mind and body are connected.
    • +
    • Don't use music, so you don't rely on music to change your state of mind. Similar regular guided meditation. Guided meditation is good for learning a technique, but you should not rely on another voice.
    • +
    • You are not trying to relax. Relaxing and trying are two different things.
    • +
    • When you love everything, even the bad things happening to you, then you are invincible.
    • +
    • Happiness is all in your mind. As if you flip a switch there.
    • +
    • Digging for answers will never end. It will always cause more material to dig.
    • +

    +If happiness is a mental issue. Clearly, the best time is spent training your mind in your free time and don't always be busy with other things. E.g. meditation, or think about the benefits of meditation. All that we do in our free time is search for happiness. Are the things we do actually working? There is always something around the corner...

    -

    Renaming sessions


    +E-Mail your comments to paul@nospam.buetow.org :-)

    -Whenever I am in a temporary session (named T....), I may decide that I want to keep this session around. I have to rename the session to prevent the cleanup function from doing its thing. That's, as mentioned already, easily accomplished with the standard prefix-key $ Tmux command.
    +Other book notes of mine are:

    -

    The ta alias - Attaching to a session


    +2025-11-02 'The Courage To Be Disliked' book notes
    +2025-06-07 'A Monk's Guide to Happiness' book notes (You are currently reading this)
    +2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    +2024-10-24 'Staff Engineer' book notes
    +2024-07-07 'The Stoic Challenge' book notes
    +2024-05-01 'Slow Productivity' book notes
    +2023-11-11 'Mind Management' book notes
    +2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    +2023-05-06 'The Obstacle is the Way' book notes
    +2023-04-01 'Never split the difference' book notes
    +2023-03-16 'The Pragmatic Programmer' book notes

    -This alias refers to the following function, which tries to attach to an already-running Tmux session.
    +Back to the main site
    +
    +
    +
    + + f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network + + gemini://foo.zone/gemfeed/2025-05-11-f3s-kubernetes-with-freebsd-part-5.gmi + 2025-05-11T11:35:57+03:00, last updated Thu 15 Jan 19:30:46 EET 2026 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the fifth blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines. + +
    +

    f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network



    -
    -function tmux::attach
    -    set -l session $argv[1]
    -    if test -z "$session"
    -        tmux attach-session || tmux::new
    -    else
    -        tmux attach-session -t $session || tmux::new $session
    -    end
    -end
    -
    +Published at 2025-05-11T11:35:57+03:00, last updated Thu 15 Jan 19:30:46 EET 2026

    -If no session is specified (as the argument of the function), it will try to attach to the first open session. If no Tmux server is running, it will create a new one with tmux::new. Otherwise, with a session name given as the argument, it will attach to it. If unsuccessful (e.g., the session doesn't exist), it will be created and attached to.
    +This is the fifth blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.

    -

    The tr alias - For a nested remote session


    +I will post a new entry every month or so (there are too many other side projects for more frequent updates — I bet you can understand).

    -This SSHs into the remote server specified and then, remotely on the server itself, starts a nested Tmux session. So we have one Tmux session on the local computer and, inside of it, an SSH connection to a remote server with a Tmux session running again. The benefit of this is that, in case my network connection breaks down, the next time I connect, I can continue my work on the remote server exactly where I left off. The session name is the name of the server being SSHed into. If a session like this already exists, it simply attaches to it.
    +This post has been updated to include two roaming clients (earth - Fedora laptop, pixel7pro - Android phone) that connect to the mesh via the internet gateways. The updated content is integrated throughout the post.

    -
    -function tmux::remote
    -    set -l server $argv[1]
    -    tmux new -s $server "ssh -A -t $server 'tmux attach-session || tmux'" || tmux attach-session -d -t $server
    -end
    -
    +These are all the posts so far:

    -

    Change of the Tmux prefix for better nesting


    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network (You are currently reading this)
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -To make nested Tmux sessions work smoothly, one must change the Tmux prefix key locally or remotely. By default, the Tmux prefix key is Ctrl-b, so Ctrl-b $, for example, renames the current session. To change the prefix key from the standard Ctrl-b to, for example, Ctrl-g, you must add this to the tmux.conf:
    +f3s logo

    -
    -set-option -g prefix C-g
    -
    +ChatGPT generated logo.

    -This way, when I want to rename the remote Tmux session, I have to use Ctrl-g $, and when I want to rename the local Tmux session, I still have to use Ctrl-b $. In my case, I have this deployed to all remote servers through a configuration management system (out of scope for this blog post).
    +Let's begin...

    -There might also be another way around this (without reconfiguring the prefix key), but that is cumbersome to use, as far as I remember.
    +

    Table of Contents



    -

    The ts alias - Searching sessions with fuzzy finder


    +
    +

    Introduction



    -Despite the fact that with _tmux::cleanup_default, I don't leave a huge mess with trillions of Tmux sessions flying around all the time, at times, it can become challenging to find exactly the session I am currently interested in. After a busy workday, I often end up with around twenty sessions on my laptop. This is where fuzzy searching for session names comes in handy, as I often don't remember the exact session names.
    +By default, traffic within my home LAN, including traffic inside a k3s cluster, is not encrypted. While it resides in the "secure" home LAN, adopting a zero-trust policy means encryption is still preferable to ensure confidentiality and security. So we decide to secure all the traffic of all f3s participating hosts by building a mesh network:

    - -
    function tmux::search
    -    set -l session (tmux list-sessions | fzf | cut -d: -f1)
    -    if test -z "$TMUX"
    -        tmux attach-session -t $session
    -    else
    -        tmux switch -t $session
    -    end
    -end
    -
    +WireGuard mesh network topology

    -All it does is list all currently open sessions in fzf, where one of them can be searched and selected through fuzzy find, and then either switch (if already inside a session) to the other session or attach to the other session (if not yet in Tmux).
    +The mesh network consists of eight infrastructure hosts and two roaming clients:

    -You must install the fzf command on your computer for this to work. This is how it looks like:
    +Infrastructure hosts (full mesh):

    -Tmux session fuzzy finder
    +
      +
    • f0, f1, and f2 are the FreeBSD base hosts in my home LAN
    • +
    • r0, r1, and r2 are the Rocky Linux Bhyve VMs running on the FreeBSD hosts
    • +
    • blowfish and fishfinger are two OpenBSD systems running on the internet (as mentioned in the first blog of this series—these systems are already built; in fact, this very blog is served by those OpenBSD systems)
    • +

    +oaming clients (gateway-only connections):
    +
    +
      +
    • earth is my Fedora laptop (192.168.2.200) which connects only to the internet gateways for remote access
    • +
    • pixel7pro is my Android phone (192.168.2.201) which routes all traffic through the VPN when activated
    • +

    +As we can see from the diagram, the eight infrastructure hosts form a true full-mesh network, where every host has a VPN tunnel to every other host. The benefit is that we do not need to route traffic through intermediate hosts (significantly simplifying the routing configuration). However, the downside is that there is some overhead in configuring and managing all the tunnels. The roaming clients take a simpler approach—they only connect to the two internet-facing gateways (blowfish and fishfinger), which is sufficient for remote access and internet connectivity.

    -

    The tssh alias - Cluster SSH replacement


    +For simplicity, we also establish VPN tunnels between f0 <-> r0, f1 <-> r1, and f2 <-> r2. Technically, this wouldn't be strictly required since the VMs rN are running on the hosts fN, and no network traffic is leaving the box. However, it simplifies the configuration as we don't have to account for exceptions, and we are going to automate the mesh network configuration anyway (read on).

    -Before I used Tmux, I was a heavy user of ClusterSSH, which allowed me to log in to multiple servers at once in a single terminal window and type and run commands on all of them in parallel.
    +

    Expected traffic flow



    -https://github.com/duncs/clusterssh
    +The traffic is expected to flow between the host groups through the mesh network as follows:

    -However, since I started using Tmux, I retired ClusterSSH, as it came with the benefit that Tmux only needs to be run in the terminal, whereas ClusterSSH spawned terminal windows, which aren't easily portable (e.g., from a Linux desktop to macOS). The tmux::cluster_ssh function can have N arguments, where:
    +nfrastructure mesh traffic:

      -
    • ...the first argument will be the session name (see tmux::tssh_from_argument helper function), and all remaining arguments will be server hostnames/FQDNs to connect to simultaneously.
    • -
    • ...or, the first argument is a file name, and the file contains a list of hostnames/FQDNs (see tmux::ssh_from_file helper function)
    • +
    • fN <-> rN: The traffic between the FreeBSD hosts and the Rocky Linux VMs will be routed through the VPN tunnels for persistent storage. In a later post in this series, we will set up an NFS server on the fN hosts.
    • +
    • fN <-> blowfish,fishfinger: The traffic between the FreeBSD hosts and the OpenBSD host blowfish,fishfinger will be routed through the VPN tunnels for management. We may want to log in via the internet to set it up remotely. The VPN tunnel will also be used for monitoring purposes.
    • +
    • rN <-> blowfish,fishfinger: The traffic between the Rocky Linux VMs and the OpenBSD host blowfish,fishfinger will be routed through the VPN tunnels for usage traffic. Since k3s will be running on the rN hosts, the OpenBSD servers will route the traffic through relayd to the services running in Kubernetes.
    • +
    • fN <-> fM: The traffic between the FreeBSD hosts may be later used for data replication for the NFS storage.
    • +
    • rN <-> rM: The traffic between the Rocky Linux VMs will later be used by the k3s cluster itself, as every rN will be a Kubernetes worker node.
    • +
    • blowfish <-> fishfinger: The traffic between the OpenBSD hosts isn't strictly required for this setup, but I set it up anyway for future use cases.

    -This is the function definition behind the tssh alias:
    -
    -
    -function tmux::cluster_ssh
    -    if test -f "$argv[1]"
    -        tmux::tssh_from_file $argv[1]
    -        return
    -    end
    -    tmux::tssh_from_argument $argv
    -end
    -
    -
    -This function is just a wrapper around the more complex tmux::tssh_from_file and tmux::tssh_from_argument functions, as you have learned already. Most of the magic happens there.
    +oaming client traffic:

    -

    The tmux::tssh_from_argument helper


    +
      +
    • earth,pixel7pro <-> blowfish,fishfinger: The roaming clients connect exclusively to the two internet gateways. All traffic from these clients (0.0.0.0/0) is routed through the VPN, providing secure internet access and the ability to reach services running in the mesh (via the gateways). The gateways use NAT to allow roaming clients to access the internet using the gateway's public IP address. The roaming clients cannot be reached by the LAN hosts—they are client-only and initiate all connections.
    • +

    +We won't cover all the details in this blog post, as we only focus on setting up the Mesh network in this blog post. Subsequent posts in this series will cover the other details.

    -This is the most magic helper function we will cover in this post. It looks like this:
    +

    Deciding on WireGuard



    -
    -function tmux::tssh_from_argument
    -    set -l session $argv[1]
    -    set first_server_or_container $argv[2]
    -    set remaining_servers $argv[3..-1]
    -    if test -z "$first_server_or_container"
    -        set first_server_or_container $session
    -    end
    -
    -    tmux new-session -d -s $session (_tmux::connect_command "$first_server_or_container")
    -    if not tmux list-session | grep "^$session:"
    -        echo "Could not create session $session"
    -        return 2
    -    end
    -    for server_or_container in $remaining_servers
    -        tmux split-window -t $session "tmux select-layout tiled; $(_tmux::connect_command "$server_or_container")"
    -    end
    -    tmux setw -t $session synchronize-panes on
    -    tmux -2 attach-session -t $session || tmux -2 switch-client -t $session
    -end
    -
    +I have decided to use WireGuard as the VPN technology for this purpose.

    -It expects at least two arguments. The first argument is the session name to create for the clustered SSH session. All other arguments are server hostnames or FQDNs to which to connect. The first one is used to make the initial session. All remaining ones are added to that session with tmux split-window -t $session.... At the end, we enable synchronized panes by default, so whenever you type, the commands will be sent to every SSH connection, thus allowing the neat ClusterSSH feature to run commands on multiple servers simultaneously. Once done, we attach (or switch, if already in Tmux) to it.
    +WireGuard is a lightweight, modern, and secure VPN protocol designed for simplicity, speed, and strong cryptography. It is an excellent choice due to its minimal codebase, ease of configuration, high performance, and robust security, utilizing state-of-the-art encryption standards. WireGuard is supported on various operating systems, and its implementations are compatible with each other. Therefore, establishing WireGuard VPN tunnels between FreeBSD, Linux, and OpenBSD is seamless. This cross-platform availability makes it suitable for setups like the one described in this blog series.

    -Sometimes, I don't want the synchronized panes behavior and want to switch it off temporarily. I can do that with prefix-key p and prefix-key P after adding the following to my local tmux.conf:
    +We could have used Tailscale for an easy to set up and manage the WireGuard network, but the benefits of creating our own mesh network are:

    -
    -bind-key p setw synchronize-panes off
    -bind-key P setw synchronize-panes on
    -
    +
      +
    • Learning about WireGuard configuration details
    • +
    • Have full control over the setup
    • +
    • Don't rely on an external provider like Tailscale (even if some of the components are open-source)
    • +
    • Have even more fun along the way
    • +
    • WireGuard is easy to configure on my target operating systems and, therefore, easier to maintain in the long run.
    • +
    • There are no official Tailscale packages available for OpenBSD and FreeBSD. However, getting Tailscale running on these systems is still possible, though some tinkering would be required. Instead, we use that tinkering time to set up WireGuard tunnels ourselves.
    • +

    +https://en.wikipedia.org/wiki/WireGuard
    +https://www.wireguard.com/
    +https://tailscale.com/

    -

    The tmux::tssh_from_file helper


    +WireGuard Logo

    -This one sets the session name to the file name and then reads a list of servers from that file, passing the list of servers to tmux::tssh_from_argument as the arguments. So, this is a neat little wrapper that also enables me to open clustered SSH sessions from an input file.
    +

    Base configuration



    -
    -function tmux::tssh_from_file
    -    set -l serverlist $argv[1]
    -    set -l session (basename $serverlist | cut -d. -f1)
    -    tmux::tssh_from_argument $session (awk '{ print $1 }' $serverlist | sed 's/.lan./.lan/g')
    -end
    -
    +In the following, we prepare the base configuration for the WireGuard mesh network. We will use a similar configuration on all participating hosts, with the exception of the host IP addresses and the private keys.

    -

    tssh examples


    +

    FreeBSD



    -To open a new session named fish and log in to 4 remote hosts, run this command (Note that it is also possible to specify the remote user):
    +On the FreeBSD hosts f0, f1 and f2, similar as last time, first, we bring the system up to date:

    -
    -$ tssh fish blowfish.buetow.org fishfinger.buetow.org \
    -    fishbone.buetow.org user@octopus.buetow.org
    +
    +
    paul@f0:~ % doas freebsd-update fetch
    +paul@f0:~ % doas freebsd-update install
    +paul@f0:~ % doas shutdown -r now
    +..
    +..
    +paul@f0:~ % doas pkg update
    +paul@f0:~ % doas pkg upgrade
    +paul@f0:~ % reboot
     

    -To open a new session named manyservers, put many servers (one FQDN per line) into a file called manyservers.txt and simply run:
    +Next, we install wireguard-tools and configure the WireGuard service:

    -
    -$ tssh manyservers.txt
    +
    +
    paul@f0:~ % doas pkg install wireguard-tools
    +paul@f0:~ % doas sysrc wireguard_interfaces=wg0
    +wireguard_interfaces:  -> wg0
    +paul@f0:~ % doas sysrc wireguard_enable=YES
    +wireguard_enable:  -> YES
    +paul@f0:~ % doas mkdir -p /usr/local/etc/wireguard
    +paul@f0:~ % doas touch /usr/local/etc/wireguard/wg0.conf
    +paul@f0:~ % doas service wireguard start
    +paul@f0:~ % doas wg show
    +interface: wg0
    +  public key: L+V9o0fNYkMVKNqsX7spBzD/9oSvxM/C7ZCZX1jLO3Q=
    +  private key: (hidden)
    +  listening port: 20246
     

    -

    Common Tmux commands I use in tssh


    -
    -These are default Tmux commands that I make heavy use of in a tssh session:
    -
    -
      -
    • Press prefix-key DIRECTION to switch panes. DIRECTION is by default any of the arrow keys, but I also configured Vi keybindings.
    • -
    • Press prefix-key <space> to change the pane layout (can be pressed multiple times to cycle through them).
    • -
    • Press prefix-key z to zoom in and out of the current active pane.
    • -

    -

    Copy and paste workflow


    +We now have the WireGuard up and running, but it is not yet in any functional configuration. We will come back to that later.

    -As you will see later in this blog post, I have configured a history limit of 1 million items in Tmux so that I can scroll back quite far. One main workflow of mine is to search for text in the Tmux history, select and copy it, and then switch to another window or session and paste it there (e.g., into my text editor to do something with it).
    +Next, we add all the participating WireGuard IPs to the hosts file. This is only convenience, so we don't have to manage an external DNS server for this:

    -This works by pressing prefix-key [ to enter Tmux copy mode. From there, I can browse the Tmux history of the current window using either the arrow keys or vi-like navigation (see vi configuration later in this blog post) and the Pg-Dn and Pg-Up keys.
    + +
    paul@f0:~ % cat <<END | doas tee -a /etc/hosts
    +
    +192.168.1.120 r0 r0.lan r0.lan.buetow.org
    +192.168.1.121 r1 r1.lan r1.lan.buetow.org
    +192.168.1.122 r2 r2.lan r2.lan.buetow.org
    +
    +192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    +192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    +192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    +192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    +192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +END
    +

    -I often search the history backwards with prefix-key [ followed by a ?, which opens the Tmux history search prompt.
    +As you can see, 192.168.1.0/24 is the network used in my LAN (with the fN and rN hosts) and 192.168.2.0/24 is the network used for the WireGuard mesh network. The wg0 interface will be used for all WireGuard traffic.

    -Once I have identified the terminal text to be copied, I enter visual select mode with v, highlight all the text to be copied (using arrow keys or Vi motions), and press y to yank it (sorry if this all sounds a bit complicated, but Vim/NeoVim users will know this, as it is pretty much how you do it there as well).
    +

    Rocky Linux



    -For v and y to work, the following has to be added to the Tmux configuration file:
    +We bring the Rocky Linux VMs up to date as well with the following:

    -
    -bind-key -T copy-mode-vi 'v' send -X begin-selection
    -bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
    +
    +
    [root@r0 ~] dnf update -y
    +[root@r0 ~] reboot
     

    -Once the text is yanked, I switch to another Tmux window or session where, for example, a text editor is running and paste the yanked text from Tmux into the editor with prefix-key ]. Note that when pasting into a modal text editor like Vi or Helix, you would first need to enter insert mode before prefix-key ] would paste anything.
    -
    -

    Tmux configurations


    +Next, we prepare WireGuard on them. Same as on the FreeBSD hosts, we will only prepare WireGuard without any useful configuration yet:
    +
    + +
    [root@r0 ~] dnf install -y wireguard-tools
    +[root@r0 ~] mkdir -p /etc/wireguard
    +[root@r0 ~] touch /etc/wireguard/wg0.conf
    +[root@r0 ~] systemctl enable wg-quick@wg0.service
    +[root@r0 ~] systemctl start wg-quick@wg0.service
    +[root@r0 ~] systemctl disable firewalld
    +

    -Some features I have configured directly in Tmux don't require an external shell alias to function correctly. Let's walk line by line through my local ~/.config/tmux/tmux.conf:
    +We also update the hosts file accordingly:

    -
    -source ~/.config/tmux/tmux.local.conf
    +
    +
    [root@r0 ~] cat <<END >>/etc/hosts
     
    -set-option -g allow-rename off
    -set-option -g history-limit 100000
    -set-option -g status-bg '#444444'
    -set-option -g status-fg '#ffa500'
    -set-option -s escape-time 0
    +192.168.1.130 f0 f0.lan f0.lan.buetow.org
    +192.168.1.131 f1 f1.lan f1.lan.buetow.org
    +192.168.1.132 f2 f2.lan f2.lan.buetow.org
    +
    +192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    +192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    +192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    +192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    +192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +END
     

    -There's yet to be much magic happening here. I source a tmux.local.conf, which I sometimes use to override the default configuration that comes from the configuration management system. But it is mostly just an empty file, so it doesn't throw any errors on Tmux startup when I don't use it.
    -
    -I work with many terminal outputs, which I also like to search within Tmux. So, I added a large enough history-limit, enabling me to search backwards in Tmux for any output up to a million lines of text.
    -
    -Besides changing some colours (personal taste), I also set escape-time to 0, which is just a workaround. Otherwise, my Helix text editor's ESC key would take ages to trigger within Tmux. I am trying to remember the gory details. You can leave it out; if everything works fine for you, leave it out.
    -
    -The next lines in the configuration file are:
    +Unfortunately, the SELinux policy on Rocky Linux blocks WireGuard's operation. By making the wireguard_t domain permissive using semanage permissive -a wireguard_t, SELinux will no longer enforce restrictions for WireGuard, allowing it to work as intended:

    -
    -set-window-option -g mode-keys vi
    -bind-key -T copy-mode-vi 'v' send -X begin-selection
    -bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
    +
    +
    [root@r0 ~] dnf install -y policycoreutils-python-utils
    +[root@r0 ~] semanage permissive -a wireguard_t
    +[root@r0 ~] reboot
     

    -I navigate within Tmux using Vi keybindings, so the mode-keys is set to vi. I use the Helix modal text editor, which is close enough to Vi bindings for simple navigation to feel "native" to me. (By the way, I have been a long-time Vim and NeoVim user, but I eventually switched to Helix. It's off-topic here, but it may be worth another blog post once.)
    +https://github.com/angristan/wireguard-install/discussions/499

    -The two bind-key commands make it so that I can use v and y in copy mode, which feels more Vi-like (as already discussed earlier in this post).
    +

    OpenBSD



    -The next set of lines in the configuration file are:
    +Other than the FreeBSD and Rocky Linux hosts involved, my OpenBSD hosts (blowfish and fishfinger, which are running at OpenBSD Amsterdam and Hetzner on the internet) have been running already for longer, so I can't provide you with the "from scratch" installation details here. In the following, we will only focus on the additional configuration needed to set up WireGuard:

    -
    -bind-key h select-pane -L
    -bind-key j select-pane -D
    -bind-key k select-pane -U
    -bind-key l select-pane -R
    -
    -bind-key H resize-pane -L 5
    -bind-key J resize-pane -D 5
    -bind-key K resize-pane -U 5
    -bind-key L resize-pane -R 5
    +
    +
    blowfish$ doas pkg_add wireguard-tools
    +blowfish$ doas mkdir /etc/wireguard
    +blowfish$ doas touch /etc/wireguard/wg0.conf
    +blowsish$ cat <<END | doas tee /etc/hostname.wg0
    +inet 192.168.2.110 255.255.255.0 NONE
    +up
    +!/usr/local/bin/wg setconf wg0 /etc/wireguard/wg0.conf
    +END
     

    -These allow me to use prefix-key h, prefix-key j, prefix-key k, and prefix-key l for switching panes and prefix-key H, prefix-key J, prefix-key K, and prefix-key L for resizing the panes. If you don't know Vi/Vim/NeoVim, the letters hjkl are commonly used there for left, down, up, and right, which is also the same for Helix, by the way.
    +Note that on blowfish, we configure 192.168.2.110 here in the hostname.wg, and on fishfinger, we configure 192.168.2.111. Those are the IP addresses of the WireGuard interfaces on those hosts.

    -The next set of lines in the configuration file are:
    +And here, we also update the hosts file accordingly:

    -
    -bind-key c new-window -c '#{pane_current_path}'
    -bind-key F new-window -n "session-switcher" "tmux list-sessions | fzf | cut -d: -f1 | xargs tmux switch-client -t"
    -bind-key T choose-tree
    +
    +
    blowfish$ cat <<END | doas tee -a /etc/hosts
    +
    +192.168.2.130 f0.wg0 f0.wg0.wan.buetow.org
    +192.168.2.131 f1.wg0 f1.wg0.wan.buetow.org
    +192.168.2.132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +192.168.2.120 r0.wg0 r0.wg0.wan.buetow.org
    +192.168.2.121 r1.wg0 r1.wg0.wan.buetow.org
    +192.168.2.122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +192.168.2.110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +192.168.2.111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +192.168.2.200 earth.wg0 earth.wg0.wan.buetow.org
    +192.168.2.201 pixel7pro.wg0 pixel7pro.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::130 f0.wg0 f0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::131 f1.wg0 f1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::132 f2.wg0 f2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::120 r0.wg0 r0.wg0.wan.buetow.org
    +fd42:beef:cafe:2::121 r1.wg0 r1.wg0.wan.buetow.org
    +fd42:beef:cafe:2::122 r2.wg0 r2.wg0.wan.buetow.org
    +
    +fd42:beef:cafe:2::110 blowfish.wg0 blowfish.wg0.wan.buetow.org
    +fd42:beef:cafe:2::111 fishfinger.wg0 fishfinger.wg0.wan.buetow.org
    +fd42:beef:cafe:2::200 earth.wg0 earth.wg0.wan.buetow.org
    +fd42:beef:cafe:2::201 pixel7pro.wg0 pixel7pro.wg0.wan.buetow.org
    +END
     

    -The first one is that any new window starts in the current directory. The second one is more interesting. I list all open sessions in the fuzzy finder. I rely heavily on this during my daily workflow to switch between various sessions depending on the task. E.g. from a remote cluster SSH session to a local code editor.
    -
    -The third one, choose-tree, opens a tree view in Tmux listing all sessions and windows. This one is handy to get a better overview of what is currently running in any local Tmux session. It looks like this (it also allows me to press a hotkey to switch to a particular Tmux window):
    -
    -Tmux sessiont tree view
    +To enable roaming clients (like earth and pixel7pro) to access the internet through the VPN, we need to configure NAT on the OpenBSD gateways. This allows the roaming clients to use the gateway's public IP address for outbound traffic. We add the following to /etc/pf.conf on both blowfish and fishfinger:

    -The last remaining lines in my configuration file are:
    -
    -
    -bind-key p setw synchronize-panes off
    -bind-key P setw synchronize-panes on
    -bind-key r source-file ~/.config/tmux/tmux.conf \; display-message "tmux.conf reloaded"
    +
    +
    # NAT for WireGuard clients to access internet
    +match out on vio0 from 192.168.2.0/24 to any nat-to (vio0)
    +
    +# Allow inbound traffic on WireGuard interface
    +pass in on wg0
    +
    +# Allow all UDP traffic on WireGuard port
    +pass in inet proto udp from any to any port 56709
     

    -We discussed synchronized panes earlier. I use it all the time in clustered SSH sessions. When enabled, all panes (remote SSH sessions) receive the same keystrokes. This is very useful when you want to run the same commands on many servers at once, such as navigating to a common directory, restarting a couple of services at once, or running tools like htop to quickly monitor system resources.
    -
    -The last one reloads my Tmux configuration on the fly.
    -
    -E-Mail your comments to paul@nospam.buetow.org :-)
    +The NAT rule translates outgoing traffic from the WireGuard network (192.168.2.0/24) to the gateway's public IP. The firewall rules permit WireGuard traffic on the wg0 interface and UDP port 56709. After updating /etc/pf.conf, reload the firewall:

    -Other related posts are:
    + +
    blowfish$ doas pfctl -f /etc/pf.conf
    +

    -2026-02-02 A tmux popup editor for Cursor Agent CLI prompts
    -2025-05-02 Terminal multiplexing with tmux - Fish edition (You are currently reading this)
    -2024-06-23 Terminal multiplexing with tmux - Z-Shell edition
    +

    WireGuard configuration



    -Back to the main site
    -
    -
    -
    - - 'When: The Scientific Secrets of Perfect Timing' book notes - - gemini://foo.zone/gemfeed/2025-04-19-when-book-notes.gmi - 2025-04-19T10:26:05+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - These are my personal book notes from Daniel Pink's 'When: The Scientific Secrets of Perfect Timing.' They are for me, but I hope they might be useful to you too. - -
    -

    "When: The Scientific Secrets of Perfect Timing" book notes


    +So far, we have only started WireGuard on all participating hosts without any useful configuration. This means that no VPN tunnel has been established yet between any of the hosts.

    -Published at 2025-04-19T10:26:05+03:00
    +

    Example wg0.conf



    -These are my personal book notes from Daniel Pink's "When: The Scientific Secrets of Perfect Timing." They are for me, but I hope they might be useful to you too.
    +Generally speaking, a wg0.conf looks like this (example from f0 host):

    -	  __
    - (`/\
    - `=\/\ __...--~~~~~-._   _.-~~~~~--...__
    -  `=\/\               \ /               \\
    -   `=\/                V                 \\
    -   //_\___--~~~~~~-._  |  _.-~~~~~~--...__\\
    -  //  ) (..----~~~~._\ | /_.~~~~----.....__\\
    - ===( INK )==========\\|//====================
    -__ejm\___/________dwb`---`______________________
    +[Interface]
    +# f0.wg0.wan.buetow.org
    +Address = 192.168.2.130
    +PrivateKey = **************************
    +ListenPort = 56709
    +
    +[Peer]
    +# f1.lan.buetow.org as f1.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.131/32
    +Endpoint = 192.168.1.131:56709
    +# No KeepAlive configured
    +
    +[Peer]
    +# f2.lan.buetow.org as f2.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.132/32
    +Endpoint = 192.168.1.132:56709
    +# No KeepAlive configured
    +
    +[Peer]
    +# r0.lan.buetow.org as r0.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.120/32
    +Endpoint = 192.168.1.120:56709
    +# No KeepAlive configured
    +
    +[Peer]
    +# r1.lan.buetow.org as r1.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.121/32
    +Endpoint = 192.168.1.121:56709
    +# No KeepAlive configured
    +
    +[Peer]
    +# r2.lan.buetow.org as r2.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.122/32
    +Endpoint = 192.168.1.122:56709
    +# No KeepAlive configured
    +
    +[Peer]
    +# blowfish.buetow.org as blowfish.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.110/32
    +Endpoint = 23.88.35.144:56709
    +PersistentKeepalive = 25
    +
    +[Peer]
    +# fishfinger.buetow.org as fishfinger.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 192.168.2.111/32
    +Endpoint = 46.23.94.99:56709
    +PersistentKeepalive = 25
     

    -

    Table of Contents


    -
    -
    -You are a different kind of organism based on the time of day. For example, school tests show worse results later in the day, especially if there are fewer computers than students available. Every person has a chronotype, such as a late or early peaker, or somewhere in the middle (like most people). You can assess your chronotype here:
    -
    -Chronotype Assessment
    -
    -Following your chronotype can lead to more happiness and higher job satisfaction.
    -
    -

    Daily Rhythms


    -
    -Peak, Trough, Rebound (Recovery): Most people experience these periods throughout the day. It's best to "eat the frog" or tackle daunting tasks during the peak. A twin peak exists every day, with mornings and early evenings being optimal for most people. Negative moods follow the opposite pattern, peaking in the afternoon. Light helps adjust but isn't the main driver of our internal clock. Like plants, humans have intrinsic rhythms.
    +For roaming clients like pixel7pro (Android phone) or earth (Fedora laptop), the configuration looks different because they route all traffic through the VPN and only connect to the internet gateways:

    -

    Optimal Task Timing


    +
    +[Interface]
    +# pixel7pro.wg0.wan.buetow.org
    +Address = 192.168.2.201
    +PrivateKey = **************************
    +ListenPort = 56709
    +DNS = 1.1.1.1, 8.8.8.8
    +
    +[Peer]
    +# blowfish.buetow.org as blowfish.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 0.0.0.0/0, ::/0
    +Endpoint = 23.88.35.144:56709
    +PersistentKeepalive = 25
    +
    +[Peer]
    +# fishfinger.buetow.org as fishfinger.wg0.wan.buetow.org
    +PublicKey = **************************
    +PresharedKey = **************************
    +AllowedIPs = 0.0.0.0/0, ::/0
    +Endpoint = 46.23.94.99:56709
    +PersistentKeepalive = 25
    +

    +Note the key differences for roaming clients:
      -
    • Analytical work requiring sharpness and focus is best at the peak.
    • -
    • Creative work is more effective during non-peak times.
    • -
    • Biorhythms can sway performance by up to twenty percent.
    • +
    • DNS is configured to use external DNS servers (Cloudflare and Google)
    • +
    • AllowedIPs = 0.0.0.0/0, ::/0 routes all traffic (IPv4 and IPv6) through the VPN
    • +
    • Only two peers are configured (the internet gateways), not the full mesh
    • +
    • PersistentKeepalive = 25 is used for both peers to maintain NAT traversal

    -

    Exercise Timing


    -
    -Exercise in the morning to lose weight; you burn up to twenty percent more fat if you exercise before eating. Exercising after eating aids muscle gain, using the energy from the food. Morning exercises elevate mood, with the effect lasting all day. They also make forming a habit easier. The late afternoon is best for athletic performance due to optimal body temperature, reducing injury risk.
    -
    -

    Drinking Habits


    +Whereas there are two main sections. One is [Interface], which configures the current host (here: f0 or pixel7pro):

      -
    • Drink water in the morning to counter mild dehydration upon waking.
    • -
    • Delay coffee consumption until cortisol production peaks an hour or 90 minutes after waking. This helps avoid caffeine resistance.
    • -
    • For an afternoon boost, have coffee once cortisol levels drop.
    • +
    • Address: Local virtual IP address on the WireGuard interface.
    • +
    • PrivateKey: Private key for this node.
    • +
    • ListenPort: Port on which this WireGuard interface listens for incoming connections.

    -

    Afternoon Challenges ("Bermuda Triangle")


    +And in the following, there is one [Peer] section for every peer node on the mesh network:

      -
    • Mistakes are more common in hospitals during this period, like incorrect antibiotic subscriptions or missed handwashing.
    • -
    • Traffic accidents and unfavorable judge decisions occur more frequently in the afternoon.
    • -
    • 2:55 pm is the least productive time of the day.
    • +
    • PublicKey: The public key of the remote peer is used to authenticate their identity.
    • +
    • PresharedKey: An optional symmetric key is used to enhance security (used in addition to PublicKey).
    • +
    • AllowedIPs: IPs or subnets routed through this peer (traffic is allowed to/from these IPs).
    • +
    • Endpoint: The public IP:port combination of the remote peer for connection.
    • +
    • PersistentKeepalive: Keeps the tunnel alive by sending periodic packets; used for NAT traversal.

    -

    Breaks and Productivity


    +

    NAT traversal and keepalive



    -Short, restorative breaks enhance performance. Student exam results improved with a half-hour break beforehand. Even micro-breaks can be beneficial—hourly five-minute walking breaks can increase productivity as much as 30-minute walks. Nature-based breaks are more effective than indoor ones, and full detachment in breaks is essential for restoration. Physical activity during breaks boosts concentration and productivity more than long walks do. Complete detachment from work during breaks is critical.
    +As all participating hosts, except for blowfish and fishfinger (which are on the internet), are behind a NAT gateway (my home router), we need to use PersistentKeepalive to establish and maintain the VPN tunnel from the LAN to the internet because:

    -

    Napping


    +By default, WireGuard tries to be as silent as possible when not being used; it is not a chatty protocol. For the most part, it only transmits data when a peer wishes to send packets. When it's not being asked to send packets, it stops sending packets until it is asked again. In the majority of configurations, this works well. However, when a peer is behind NAT or a firewall, it might wish to be able to receive incoming packets even when it is not sending any packets. Because NAT and stateful firewalls keep track of "connections", if a peer behind NAT or a firewall wishes to receive incoming packets, he must keep the NAT/firewall mapping valid, by periodically sending keepalive packets. This is called persistent keepalives. When this option is enabled, a keepalive packet is sent to the server endpoint once every interval seconds. A sensible interval that works with a wide variety of firewalls is 25 seconds. Setting it to 0 turns the feature off, which is the default, since most users will not need this, and it makes WireGuard slightly more chatty. This feature may be specified by adding the PersistentKeepalive = field to a peer in the configuration file, or setting persistent-keepalive at the command line. If you don't need this feature, don't enable it. But if you're behind NAT or a firewall and you want to receive incoming connections long after network traffic has gone silent, this option will keep the "connection" open in the eyes of NAT.

    -Short naps (10-20 minutes) significantly enhance mood, alertness, and cognitive performance, improving learning and problem-solving abilities. Napping increases with age, benefiting mood, flow, and overall health. A "nappuccino," or napping after coffee, offers a double boost, as caffeine takes around 25 minutes to kick in.
    +That's why you see PersistentKeepAlive = 25 in the blowfish and fishfinger peer configurations. This means that every 25 seconds, a keep-alive signal is sent over the tunnel to maintain its connection. If the tunnel is not yet established, it will be created within 25 seconds latest.

    -

    Scheduling Breaks


    +Without this, we might never have a VPN tunnel open, as the systems in the LAN may not actively attempt to contact blowfish and fishfinger on their own. In fact, the opposite would likely occur, with the traffic flowing inward instead of outward (this is beyond the scope of this blog post but will be covered in a later post in this series!).

    -
      -
    • Track breaks just as you do with tasks—aim for three breaks a day.
    • -
    • Every 25 minutes, look away and daydream for 20 seconds, or engage in short exercises.
    • -
    • Meditating for even three minutes is a highly effective restorative activity.
    • -
    • The "Fresh Start Effect" (e.g., beginning a diet on January 1st or a new week) impacts motivation, as does recognizing progress. At the end of each day, spends two minutes to write down accomplishments.
    • -

    -

    Final Impressions


    +

    Preshared key



    -
      -
    • The concluding experience of a vacation significantly influences overall memories.
    • -
    • Restaurant reviews often hinge on the end of the visit, highlighting extras like wrong bills or additional desserts.
    • -
    • Considering one's older future self can motivate improvements in the present.
    • -

    -

    The Midlife U Curve


    +In a WireGuard configuration, the PSK (preshared key) is an optional additional layer of symmetric encryption used alongside the standard public key cryptography. It is a shared secret known to both peers that enhances security by requiring an attacker to compromise both the private keys and the PSK to decrypt communication. While optional, using a PSK is better as it strengthens the cryptographic security, mitigating risks of potential vulnerabilities in the key exchange process.

    -Life satisfaction tends to dip in midlife, around the forties, but increases around age 54.
    +So, because it's better, we are using it.

    -

    Project Management Tips


    +

    Mesh network generator



    -
      -
    • Halfway through a project, there's a concentrated work effort ("Oh Oh Effect"), similar to an alarm when slightly behind schedule.
    • -
    • Recognizing daily accomplishments can elevate motivation and satisfaction.
    • -

    -These insights from "When" can guide actions to optimize performance, well-being, and satisfaction across various aspects of life.
    +Manually generating wg0.conf files for every peer in a mesh network setup is cumbersome because each peer requires its own unique public/private key pair and a preshared key for each VPN tunnel (resulting in 29 preshared keys for 8 hosts). This complexity scales almost exponentially with the number of peers as the relationships between all peers must be explicitly defined, including their unique configurations such as AllowedIPs and Endpoint and optional settings like PersistentKeepalive. Automating the process ensures consistency, reduces human error, saves considerable time, and allows for centralized management of configuration files.

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +Instead, a script can handle key generation, coordinate relationships, and generate all necessary configuration files simultaneously, making it scalable and far less error-prone.

    -Other book notes of mine are:
    +I have written a Ruby script wireguardmeshgenerator.rb to do this for our purposes:

    -2025-11-02 'The Courage To Be Disliked' book notes
    -2025-06-07 'A Monk's Guide to Happiness' book notes
    -2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes (You are currently reading this)
    -2024-10-24 'Staff Engineer' book notes
    -2024-07-07 'The Stoic Challenge' book notes
    -2024-05-01 'Slow Productivity' book notes
    -2023-11-11 'Mind Management' book notes
    -2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    -2023-05-06 'The Obstacle is the Way' book notes
    -2023-04-01 'Never split the difference' book notes
    -2023-03-16 'The Pragmatic Programmer' book notes
    +https://codeberg.org/snonux/wireguardmeshgenerator

    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs - - gemini://foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.gmi - 2025-04-04T23:21:01+03:00, last updated Fri 26 Dec 08:51:06 EET 2025 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the fourth blog post about the f3s series for self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs


    +I use Fedora Linux as my main driver on my personal Laptop, so the script was developed and tested only on Fedora Linux. However, it should also work on other Linux and Unix-like systems.

    -Published at 2025-04-04T23:21:01+03:00, last updated Fri 26 Dec 08:51:06 EET 2025
    +To set up the mesh generator on Fedora Linux, we run the following:
    +
    + +
    > git clone https://codeberg.org/snonux/wireguardmeshgenerator
    +> cd ./wireguardmeshgenerator
    +> bundle install
    +> sudo dnf install -y wireguard-tools
    +
    +
    +This assumes that Ruby and the bundler gem are already installed. If not, refer to the docs of your distribution.
    +
    +

    wireguardmeshgenerator.yaml


    +
    +The file wireguardmeshgenerator.yaml configures the mesh generator script.
    +
    +
    +---
    +hosts:
    +  f0:
    +    os: FreeBSD
    +    ssh:
    +      user: paul
    +      conf_dir: /usr/local/etc/wireguard
    +      sudo_cmd: doas
    +      reload_cmd: service wireguard reload
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.130'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.130'
    +      ipv6: 'fd42:beef:cafe:2::130'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  f1:
    +    os: FreeBSD
    +    ssh:
    +      user: paul
    +      conf_dir: /usr/local/etc/wireguard
    +      sudo_cmd: doas
    +      reload_cmd: service wireguard reload
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.131'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.131'
    +      ipv6: 'fd42:beef:cafe:2::131'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  f2:
    +    os: FreeBSD
    +    ssh:
    +      user: paul
    +      conf_dir: /usr/local/etc/wireguard
    +      sudo_cmd: doas
    +      reload_cmd: service wireguard reload
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.132'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.132'
    +      ipv6: 'fd42:beef:cafe:2::132'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  r0:
    +    os: Linux
    +    ssh:
    +      user: root
    +      conf_dir: /etc/wireguard
    +      sudo_cmd:
    +      reload_cmd: systemctl reload wg-quick@wg0.service
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.120'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.120'
    +      ipv6: 'fd42:beef:cafe:2::120'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  r1:
    +    os: Linux
    +    ssh:
    +      user: root
    +      conf_dir: /etc/wireguard
    +      sudo_cmd:
    +      reload_cmd: systemctl reload wg-quick@wg0.service
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.121'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.121'
    +      ipv6: 'fd42:beef:cafe:2::121'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  r2:
    +    os: Linux
    +    ssh:
    +      user: root
    +      conf_dir: /etc/wireguard
    +      sudo_cmd:
    +      reload_cmd: systemctl reload wg-quick@wg0.service
    +    lan:
    +      domain: 'lan.buetow.org'
    +      ip: '192.168.1.122'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.122'
    +      ipv6: 'fd42:beef:cafe:2::122'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  blowfish:
    +    os: OpenBSD
    +    ssh:
    +      user: rex
    +      port: 2
    +      conf_dir: /etc/wireguard
    +      sudo_cmd: doas
    +      reload_cmd: sh /etc/netstart wg0
    +    internet:
    +      domain: 'buetow.org'
    +      ip: '23.88.35.144'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.110'
    +      ipv6: 'fd42:beef:cafe:2::110'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  fishfinger:
    +    os: OpenBSD
    +    ssh:
    +      user: rex
    +      port: 2
    +      conf_dir: /etc/wireguard
    +      sudo_cmd: doas
    +      reload_cmd: sh /etc/netstart wg0
    +    internet:
    +      domain: 'buetow.org'
    +      ip: '46.23.94.99'
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.111'
    +      ipv6: 'fd42:beef:cafe:2::111'
    +    exclude_peers:
    +      - earth
    +      - pixel7pro
    +  earth:
    +    os: Linux
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.200'
    +      ipv6: 'fd42:beef:cafe:2::200'
    +    exclude_peers:
    +      - f0
    +      - f1
    +      - f2
    +      - r0
    +      - r1
    +      - r2
    +      - pixel7pro
    +  pixel7pro:
    +    os: Android
    +    wg0:
    +      domain: 'wg0.wan.buetow.org'
    +      ip: '192.168.2.201'
    +      ipv6: 'fd42:beef:cafe:2::201'
    +    exclude_peers:
    +      - f0
    +      - f1
    +      - f2
    +      - r0
    +      - r1
    +      - r2
    +      - earth
    +

    -This is the fourth blog post about the f3s series for self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.
    +The file specifies details such as SSH user settings, configuration directories, sudo or reload commands, and IP/domain assignments for both internal LAN-facing interfaces and WireGuard (wg0) interfaces. Each host is assigned specific roles, including internal participants and publicly accessible nodes with internet-facing IPs, enabling the creation of a fully connected mesh VPN.

    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs (You are currently reading this)
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +Roaming clients: Note the earth and pixel7pro entries—these are configured differently from the infrastructure hosts. They have no lan or internet sections, which signals to the generator that they are roaming clients. The exclude_peers configuration ensures they only connect to the internet gateways (blowfish and fishfinger) and are not reachable by LAN hosts. The generator automatically configures these clients with AllowedIPs = 0.0.0.0/0, ::/0 to route all traffic through the VPN, includes DNS configuration (1.1.1.1, 8.8.8.8), and enables PersistentKeepalive for NAT traversal.

    -f3s logo
    +

    wireguardmeshgenerator.rb overview



    -

    Table of Contents


    +The wireguardmeshgenerator.rb script consists of the following base classes:


    -

    Introduction


    -
    -In this blog post, we are going to install the Bhyve hypervisor.
    -
    -The FreeBSD Bhyve hypervisor is a lightweight, modern hypervisor that enables virtualization on FreeBSD systems. Bhyve's strengths include its minimal overhead, which allows it to achieve near-native performance for virtual machines. It's efficient and lightweight, leveraging the capabilities of the FreeBSD operating system for performance and network management.
    -
    -https://wiki.freebsd.org/bhyve
    -
    -Bhyve supports running various guest operating systems, including FreeBSD, Linux, and Windows, on hardware platforms that support hardware virtualization extensions (such as Intel VT-x or AMD-V). In our case, we are going to virtualize Rocky Linux, which will later in this series be used to run k3s.
    -
    -

    Check for POPCNT CPU support


    -
    -POPCNT is a CPU instruction that counts the number of set bits (ones) in a binary number. CPU virtualization and Bhyve support for the POPCNT instruction are important because guest operating systems utilize this instruction to perform various tasks more efficiently. If the host CPU supports POPCNT, Bhyve can pass this capability to virtual machines for better performance. Without POPCNT support, some applications might not run or perform sub-optimally in virtualized environments.
    -
    -To check for POPCNT support, run:
    -
    - -
    paul@f0:~ % dmesg | grep 'Features2=.*POPCNT'
    -  Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,
    -	FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,
    -	OSXSAVE,AVX,F16C,RDRAND>
    -
    -
    -So it's there! All good.
    -
    -

    Basic Bhyve setup


    -
    -For managing the Bhyve VMs, we are using vm-bhyve, a tool not part of the FreeBSD operating system but available as a ready-to-use package. It eases VM management and reduces a lot of overhead. We also install the required package to make Bhyve work with the UEFI firmware.
    -
    -https://github.com/churchers/vm-bhyve
    -
    -The following commands are executed on all three hosts f0, f1, and f2, where re0 is the name of the Ethernet interface (which may need to be adjusted if your hardware is different):
    -
    - -
    paul@f0:~ % doas pkg install vm-bhyve bhyve-firmware
    -paul@f0:~ % doas sysrc vm_enable=YES
    -vm_enable:  -> YES
    -paul@f0:~ % doas sysrc vm_dir=zfs:zroot/bhyve
    -vm_dir:  -> zfs:zroot/bhyve
    -paul@f0:~ % doas zfs create zroot/bhyve
    -paul@f0:~ % doas vm init
    -paul@f0:~ % doas vm switch create public
    -paul@f0:~ % doas vm switch add public re0
    -
    -
    -Bhyve stores all its data in the /bhyve of the zroot ZFS pool:
    -
    - -
    paul@f0:~ % zfs list | grep bhyve
    -zroot/bhyve                                   1.74M   453G  1.74M  /zroot/bhyve
    -
    -
    -For convenience, we also create this symlink:
    +At the end (if you want to see the code for the stuff listed above, go to the Git repo and have a look), we glue it all together in this block:

    -
    paul@f0:~ % doas ln -s /zroot/bhyve/ /bhyve
    +
    begin
    +  options = { hosts: [] }
    +  OptionParser.new do |opts|
    +    opts.banner = 'Usage: wireguardmeshgenerator.rb [options]'
    +    opts.on('--generate', 'Generate Wireguard configs') do
    +      options[:generate] = true
    +    end
    +    opts.on('--install', 'Install Wireguard configs') do
    +      options[:install] = true
    +    end
    +    opts.on('--clean', 'Clean Wireguard configs') do
    +      options[:clean] = true
    +    end
    +    opts.on('--hosts=HOSTS', 'Comma separated hosts to configure') do |hosts|
    +      options[:hosts] = hosts.split(',')
    +    end
    +  end.parse!
     
    +  conf = YAML.load_file('wireguardmeshgenerator.yaml').freeze
    +  conf['hosts'].keys.select { options[:hosts].empty? || options[:hosts].include?(_1) }
    +               .each do |host|
    +    # Generate Wireguard configuration for the host reload!
    +    WireguardConfig.new(host, conf['hosts']).generate! if options[:generate]
    +    # Install Wireguard configuration for the host.
    +    InstallConfig.new(host, conf['hosts']).upload!.install!.reload! if options[:install]
    +    # Clean Wireguard configuration for the host.
    +    WireguardConfig.new(host, conf['hosts']).clean! if options[:clean]
    +  end
    +rescue StandardError => e
    +  puts "Error: #{e.message}"
    +  puts e.backtrace.join("\n")
    +  exit 2
    +end
     

    -Now, Bhyve is ready to rumble, but no VMs are there yet:
    -
    - -
    paul@f0:~ % doas vm list
    -NAME  DATASTORE  LOADER  CPU  MEMORY  VNC  AUTO  STATE
    -
    -
    -

    Rocky Linux VMs


    -
    -As guest VMs I decided to use Rocky Linux.
    -
    -Using Rocky Linux 9 as a VM-based OS is beneficial primarily because of its long-term support and stable release cycle. This ensures a reliable environment that receives security updates and bug fixes for an extended period, reducing the need for frequent upgrades.
    -
    -Rocky Linux is community-driven and aims to be fully compatible with enterprise Linux, making it a solid choice for consistency and performance in various deployment scenarios.
    -
    -https://rockylinux.org/
    -
    -

    ISO download


    -
    -We're going to install the Rocky Linux from the latest minimal iso:
    -
    - -
    paul@f0:~ % doas vm iso \
    - https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.5-x86_64-minimal.iso
    -/zroot/bhyve/.iso/Rocky-9.5-x86_64-minimal.iso        1808 MB 4780 kBps 06m28s
    -paul@f0:/bhyve % doas vm create rocky
    -
    -
    -

    VM configuration


    -
    -The default Bhyve VM configuration looks like this now:
    -
    - -
    paul@f0:/bhyve/rocky % cat rocky.conf
    -loader="bhyveload"
    -cpu=1
    -memory=256M
    -network0_type="virtio-net"
    -network0_switch="public"
    -disk0_type="virtio-blk"
    -disk0_name="disk0.img"
    -uuid="1c4655ac-c828-11ef-a920-e8ff1ed71ca0"
    -network0_mac="58:9c:fc:0d:13:3f"
    -
    -
    -The uuid and the network0_mac differ for each of the three VMs (the ones being installed on f0, f1 and f2).
    -
    -But to make Rocky Linux boot it (plus some other adjustments, e.g. as we intend to run the majority of the workload in the k3s cluster running on those Linux VMs, we give them beefy specs like 4 CPU cores and 14GB RAM). So we run doas vm configure rocky and modified it to:
    -
    -
    -guest="linux"
    -loader="uefi"
    -uefi_vars="yes"
    -cpu=4
    -memory=14G
    -network0_type="virtio-net"
    -network0_switch="public"
    -disk0_type="virtio-blk"
    -disk0_name="disk0.img"
    -graphics="yes"
    -graphics_vga=io
    -uuid="1c45400b-c828-11ef-8871-e8ff1ed71cac"
    -network0_mac="58:9c:fc:0d:13:3f"
    -
    -
    -

    VM installation


    -
    -To start the installer from the downloaded ISO, we run:
    +And we also have a Rakefile:

    -
    paul@f0:~ % doas vm install rocky Rocky-9.5-x86_64-minimal.iso
    -Starting rocky
    -  * found guest in /zroot/bhyve/rocky
    -  * booting...
    +
    task :generate do
    +  ruby 'wireguardmeshgenerator.rb', '--generate'
    +end
     
    -paul@f0:/bhyve/rocky % doas vm list
    -NAME   DATASTORE  LOADER  CPU  MEMORY  VNC           AUTO  STATE
    -rocky  default    uefi    4    14G     0.0.0.0:5900  No    Locked (f0.lan.buetow.org)
    +task :clean do
    +  ruby 'wireguardmeshgenerator.rb', '--clean'
    +end
     
    -paul@f0:/bhyve/rocky % doas sockstat -4 | grep 5900
    -root     bhyve       6079 8   tcp4   *:5900                *:*
    -
    -
    -Port 5900 now also opens for VNC connections, so I connected it with a VNC client and ran through the installation dialogues. This could be done unattended or more automated, but there are only three VMs to install, and the automation doesn't seem worth it as we do it only once a year or less often.
    -
    -

    Increase of the disk image


    -
    -By default, the VM disk image is only 20G, which is a bit small for our purposes, so we have to stop the VMs again, run truncate on the image file to enlarge them to 100G, and restart the installation:
    -
    - -
    paul@f0:/bhyve/rocky % doas vm stop rocky
    -paul@f0:/bhyve/rocky % doas truncate -s 100G disk0.img
    -paul@f0:/bhyve/rocky % doas vm install rocky Rocky-9.5-x86_64-minimal.iso
    +task :install do
    +  ruby 'wireguardmeshgenerator.rb', '--install'
    +end
    +
    +task default: :generate
     

    -

    Connect to VNC


    -
    -For the installation, I opened the VNC client on my Fedora laptop (GNOME comes with a simple VNC client) and manually ran through the base installation for each of the VMs. Again, I am sure this could have been automated a bit more, but there were just three VMs, and it wasn't worth the effort. The three VNC addresses of the VMs were vnc://f0:5900, vnc://f1:5900, and vnc://f2:5900.
    -
    -
    -
    -
    -
    -I primarily selected the default settings (auto partitioning on the 100GB drive and a root user password). After the installation, the VMs were rebooted.
    -
    -
    -
    -
    -
    -

    After install



    -We perform the following steps for all three VMs. In the following, the examples are all executed on f0 (the VM r0 running on f0):
    +

    Invoking the mesh network generator



    -

    VM auto-start after host reboot


    +

    Generating the wg0.conf files and keys



    -To automatically start the VM on the servers, we add the following to the rc.conf on the FreeBSD hosts:
    +To generate everything (the wg0.conf of all participating hosts, including all keys involved), we run the following:

    -
    paul@f0:/bhyve/rocky % cat <<END | doas tee -a /etc/rc.conf
    -vm_list="rocky"
    -vm_delay="5"
    +
    > rake generate
    +/usr/bin/ruby wireguardmeshgenerator.rb --generate
    +Generating dist/f0/etc/wireguard/wg0.conf
    +Generating dist/f1/etc/wireguard/wg0.conf
    +Generating dist/f2/etc/wireguard/wg0.conf
    +Generating dist/r0/etc/wireguard/wg0.conf
    +Generating dist/r1/etc/wireguard/wg0.conf
    +Generating dist/r2/etc/wireguard/wg0.conf
    +Generating dist/blowfish/etc/wireguard/wg0.conf
    +Generating dist/fishfinger/etc/wireguard/wg0.conf
    +Generating dist/earth/etc/wireguard/wg0.conf
    +Generating dist/pixel7pro/etc/wireguard/wg0.conf
     

    -The vm_delay isn't really required. It is used to wait 5 seconds before starting each VM, but there is currently only one VM per host. Maybe later, when there are more, this will be useful. After adding, there's now a Yes indicator in the AUTO column.
    +It generated all the wg0.conf files listed in the output, plus those keys:

    -
    paul@f0:~ % doas vm list
    -NAME   DATASTORE  LOADER  CPU  MEMORY  VNC           AUTO     STATE
    -rocky  default    uefi    4    14G     0.0.0.0:5900  Yes [1]  Running (2063)
    +
    > find keys/ -type f
    +keys/f0/priv.key
    +keys/f0/pub.key
    +keys/psk/f0_f1.key
    +keys/psk/f0_f2.key
    +keys/psk/f0_r0.key
    +keys/psk/f0_r1.key
    +keys/psk/f0_r2.key
    +keys/psk/blowfish_f0.key
    +keys/psk/f0_fishfinger.key
    +keys/psk/f1_f2.key
    +keys/psk/f1_r0.key
    +keys/psk/f1_r1.key
    +keys/psk/f1_r2.key
    +keys/psk/blowfish_f1.key
    +keys/psk/f1_fishfinger.key
    +keys/psk/f2_r0.key
    +keys/psk/f2_r1.key
    +keys/psk/f2_r2.key
    +keys/psk/blowfish_f2.key
    +keys/psk/f2_fishfinger.key
    +keys/psk/r0_r1.key
    +keys/psk/r0_r2.key
    +keys/psk/blowfish_r0.key
    +keys/psk/fishfinger_r0.key
    +keys/psk/r1_r2.key
    +keys/psk/blowfish_r1.key
    +keys/psk/fishfinger_r1.key
    +keys/psk/blowfish_r2.key
    +keys/psk/fishfinger_r2.key
    +keys/psk/blowfish_fishfinger.key
    +keys/psk/blowfish_earth.key
    +keys/psk/earth_fishfinger.key
    +keys/psk/blowfish_pixel7pro.key
    +keys/psk/fishfinger_pixel7pro.key
    +keys/f1/priv.key
    +keys/f1/pub.key
    +keys/f2/priv.key
    +keys/f2/pub.key
    +keys/r0/priv.key
    +keys/r0/pub.key
    +keys/r1/priv.key
    +keys/r1/pub.key
    +keys/r2/priv.key
    +keys/r2/pub.key
    +keys/blowfish/priv.key
    +keys/blowfish/pub.key
    +keys/fishfinger/priv.key
    +keys/fishfinger/pub.key
    +keys/earth/priv.key
    +keys/earth/pub.key
    +keys/pixel7pro/priv.key
    +keys/pixel7pro/pub.key
     

    -

    Static IP configuration


    -
    -After that, we change the network configuration of the VMs to be static (from DHCP) here. As per the previous post of this series, the three FreeBSD hosts were already in my /etc/hosts file:
    +Those keys are embedded in the resulting wg0.conf, so later, we only need to install the wg0.conf files and not all the keys individually.

    -
    -192.168.1.130 f0 f0.lan f0.lan.buetow.org
    -192.168.1.131 f1 f1.lan f1.lan.buetow.org
    -192.168.1.132 f2 f2.lan f2.lan.buetow.org
    -
    +

    Installing the wg0.conf files



    -For the Rocky VMs, we add those to the FreeBSD host systems as well:
    +Uploading the wg0.conf files to the participating hosts and reloading WireGuard on them is then just a matter of executing (this expects, that all participating hosts are up and running):

    -
    paul@f0:/bhyve/rocky % cat <<END | doas tee -a /etc/hosts
    -192.168.1.120 r0 r0.lan r0.lan.buetow.org
    -192.168.1.121 r1 r1.lan r1.lan.buetow.org
    -192.168.1.122 r2 r2.lan r2.lan.buetow.org
    -END
    +
    > rake install
    +/usr/bin/ruby wireguardmeshgenerator.rb --install
    +Uploading dist/f0/etc/wireguard/wg0.conf to f0.lan.buetow.org:.
    +Installing Wireguard config on f0
    +Uploading cmd.sh to f0.lan.buetow.org:.
    ++ [ ! -d /usr/local/etc/wireguard ]
    ++ doas chmod 700 /usr/local/etc/wireguard
    ++ doas mv -v wg0.conf /usr/local/etc/wireguard
    +wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    ++ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on f0
    +Uploading cmd.sh to f0.lan.buetow.org:.
    ++ doas service wireguard reload
    ++ rm cmd.sh
    +Uploading dist/f1/etc/wireguard/wg0.conf to f1.lan.buetow.org:.
    +Installing Wireguard config on f1
    +Uploading cmd.sh to f1.lan.buetow.org:.
    ++ [ ! -d /usr/local/etc/wireguard ]
    ++ doas chmod 700 /usr/local/etc/wireguard
    ++ doas mv -v wg0.conf /usr/local/etc/wireguard
    +wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    ++ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on f1
    +Uploading cmd.sh to f1.lan.buetow.org:.
    ++ doas service wireguard reload
    ++ rm cmd.sh
    +Uploading dist/f2/etc/wireguard/wg0.conf to f2.lan.buetow.org:.
    +Installing Wireguard config on f2
    +Uploading cmd.sh to f2.lan.buetow.org:.
    ++ [ ! -d /usr/local/etc/wireguard ]
    ++ doas chmod 700 /usr/local/etc/wireguard
    ++ doas mv -v wg0.conf /usr/local/etc/wireguard
    +wg0.conf -> /usr/local/etc/wireguard/wg0.conf
    ++ doas chmod 644 /usr/local/etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on f2
    +Uploading cmd.sh to f2.lan.buetow.org:.
    ++ doas service wireguard reload
    ++ rm cmd.sh
    +Uploading dist/r0/etc/wireguard/wg0.conf to r0.lan.buetow.org:.
    +Installing Wireguard config on r0
    +Uploading cmd.sh to r0.lan.buetow.org:.
    ++ '[' '!' -d /etc/wireguard ']'
    ++ chmod 700 /etc/wireguard
    ++ mv -v wg0.conf /etc/wireguard
    +renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    ++ chmod 644 /etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on r0
    +Uploading cmd.sh to r0.lan.buetow.org:.
    ++ systemctl reload wg-quick@wg0.service
    ++ rm cmd.sh
    +Uploading dist/r1/etc/wireguard/wg0.conf to r1.lan.buetow.org:.
    +Installing Wireguard config on r1
    +Uploading cmd.sh to r1.lan.buetow.org:.
    ++ '[' '!' -d /etc/wireguard ']'
    ++ chmod 700 /etc/wireguard
    ++ mv -v wg0.conf /etc/wireguard
    +renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    ++ chmod 644 /etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on r1
    +Uploading cmd.sh to r1.lan.buetow.org:.
    ++ systemctl reload wg-quick@wg0.service
    ++ rm cmd.sh
    +Uploading dist/r2/etc/wireguard/wg0.conf to r2.lan.buetow.org:.
    +Installing Wireguard config on r2
    +Uploading cmd.sh to r2.lan.buetow.org:.
    ++ '[' '!' -d /etc/wireguard ']'
    ++ chmod 700 /etc/wireguard
    ++ mv -v wg0.conf /etc/wireguard
    +renamed 'wg0.conf' -> '/etc/wireguard/wg0.conf'
    ++ chmod 644 /etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on r2
    +Uploading cmd.sh to r2.lan.buetow.org:.
    ++ systemctl reload wg-quick@wg0.service
    ++ rm cmd.sh
    +Uploading dist/blowfish/etc/wireguard/wg0.conf to blowfish.buetow.org:.
    +Installing Wireguard config on blowfish
    +Uploading cmd.sh to blowfish.buetow.org:.
    ++ [ ! -d /etc/wireguard ]
    ++ doas chmod 700 /etc/wireguard
    ++ doas mv -v wg0.conf /etc/wireguard
    +wg0.conf -> /etc/wireguard/wg0.conf
    ++ doas chmod 644 /etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on blowfish
    +Uploading cmd.sh to blowfish.buetow.org:.
    ++ doas sh /etc/netstart wg0
    ++ rm cmd.sh
    +Uploading dist/fishfinger/etc/wireguard/wg0.conf to fishfinger.buetow.org:.
    +Installing Wireguard config on fishfinger
    +Uploading cmd.sh to fishfinger.buetow.org:.
    ++ [ ! -d /etc/wireguard ]
    ++ doas chmod 700 /etc/wireguard
    ++ doas mv -v wg0.conf /etc/wireguard
    +wg0.conf -> /etc/wireguard/wg0.conf
    ++ doas chmod 644 /etc/wireguard/wg0.conf
    ++ rm cmd.sh
    +Reloading Wireguard on fishfinger
    +Uploading cmd.sh to fishfinger.buetow.org:.
    ++ doas sh /etc/netstart wg0
    ++ rm cmd.sh
     

    -And we configure the IPs accordingly on the VMs themselves by opening a root shell via SSH to the VMs and entering the following commands on each of the VMs:
    +

    Re-generating mesh and installing the wg0.conf files again


    +
    +The mesh network can be re-generated and re-installed as follows:

    -
    [root@r0 ~] % nmcli connection modify enp0s5 ipv4.address 192.168.1.120/24
    -[root@r0 ~] % nmcli connection modify enp0s5 ipv4.gateway 192.168.1.1
    -[root@r0 ~] % nmcli connection modify enp0s5 ipv4.DNS 192.168.1.1
    -[root@r0 ~] % nmcli connection modify enp0s5 ipv4.method manual
    -[root@r0 ~] % nmcli connection down enp0s5
    -[root@r0 ~] % nmcli connection up enp0s5
    -[root@r0 ~] % hostnamectl set-hostname r0.lan.buetow.org
    -[root@r0 ~] % cat <<END >>/etc/hosts
    -192.168.1.120 r0 r0.lan r0.lan.buetow.org
    -192.168.1.121 r1 r1.lan r1.lan.buetow.org
    -192.168.1.122 r2 r2.lan r2.lan.buetow.org
    -END
    +
    > rake clean
    +> rake generate
    +> rake install
     

    -Whereas:
    +That would also delete and re-generate all the keys involved.

    -
      -
    • 192.168.1.120 is the IP of the VM itself (here: r0.lan.buetow.org)
    • -
    • 192.168.1.1 is the address of my home router, which also does DNS.
    • -

    -

    Permitting root login


    +

    Setting up roaming clients



    -As these VMs aren't directly reachable via SSH from the internet, we enable root login by adding a line with PermitRootLogin yes to /etc/sshd/sshd_config.
    +For roaming clients like earth (Fedora laptop) and pixel7pro (Android phone), the setup process differs slightly since these devices are not always accessible via SSH:

    -Once done, we reboot the VM by running reboot inside the VM to test whether everything was configured and persisted correctly.
    +Android phone (pixel7pro):

    -After reboot, we copy a public key over. E.g. I did this from my Laptop as follows:
    +The configuration is transferred to the phone using a QR code. The official WireGuard Android app (from Google Play Store) can scan and import the configuration:

    -
    % for i in 0 1 2; do ssh-copy-id root@r$i.lan.buetow.org; done
    +
    > sudo dnf install qrencode
    +> qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0.conf
     

    -Then, we edit the /etc/ssh/sshd_config file again on all three VMs and configure PasswordAuthentication no to only allow SSH key authentication from now on.
    +Scan the QR code with the WireGuard app to import the configuration. The phone will then route all traffic through the VPN when the tunnel is activated. Note that WireGuard does not support automatic failover between the two gateways (blowfish and fishfinger)—if one fails, manual disconnection and reconnection is required to switch to the other.

    -

    Install latest updates


    +Fedora laptop (earth):
    +
    +For the laptop, manually copy the generated configuration:

    -
    [root@r0 ~] % dnf update
    -[root@r0 ~] % reboot
    +
    > sudo cp dist/earth/etc/wireguard/wg0.conf /etc/wireguard/
    +> sudo chmod 600 /etc/wireguard/wg0.conf
    +> sudo systemctl start wg-quick@wg0.service  # Start manually
    +> sudo systemctl disable wg-quick@wg0.service  # Prevent auto-start
     

    -

    Stress testing CPU


    +The service is disabled from auto-start so the VPN is only active when manually started. This allows selective VPN usage based on need.

    -The aim is to prove that bhyve VMs are CPU efficient. As I could not find an off-the-shelf benchmarking tool available in the same version for FreeBSD as well as for Rocky Linux 9, I wrote my own silly CPU benchmarking tool in Go:
    +

    Adding IPv6 support to the mesh



    - -
    package main
    -
    -import "testing"
    -
    -func BenchmarkCPUSilly1(b *testing.B) {
    -	for i := 0; i < b.N; i++ {
    -		_ = i * i
    -	}
    -}
    -
    -func BenchmarkCPUSilly2(b *testing.B) {
    -	var sillyResult float64
    -	for i := 0; i < b.N; i++ {
    -		sillyResult += float64(i)
    -		sillyResult *= float64(i)
    -		divisor := float64(i) + 1
    -		if divisor > 0 {
    -			sillyResult /= divisor
    -		}
    -	}
    -	_ = sillyResult // to avoid compiler optimization
    -}
    +After setting up the IPv4-only mesh network, I decided to add dual-stack IPv6 support to enable more networking capabilities and prepare for the future. All 10 hosts (8 infrastructure + 2 roaming clients) now have both IPv4 and IPv6 addresses on their WireGuard interfaces.
    +
    +

    IPv6 addressing scheme


    +
    +We use ULA (Unique Local Address) private IPv6 space, analogous to RFC1918 private IPv4 addresses:
    +
    +
      +
    • Prefix: fd42:beef:cafe::/48
    • +
    • Subnet: fd42:beef:cafe:2::/64 (wg0 interfaces)
    • +

    +All hosts receive dual-stack addresses:
    +
    +
    +fd42:beef:cafe:2::110/64  - blowfish.wg0 (OpenBSD gateway)
    +fd42:beef:cafe:2::111/64  - fishfinger.wg0 (OpenBSD gateway)
    +fd42:beef:cafe:2::120/64  - r0.wg0 (Rocky Linux VM)
    +fd42:beef:cafe:2::121/64  - r1.wg0 (Rocky Linux VM)
    +fd42:beef:cafe:2::122/64  - r2.wg0 (Rocky Linux VM)
    +fd42:beef:cafe:2::130/64  - f0.wg0 (FreeBSD host)
    +fd42:beef:cafe:2::131/64  - f1.wg0 (FreeBSD host)
    +fd42:beef:cafe:2::132/64  - f2.wg0 (FreeBSD host)
    +fd42:beef:cafe:2::200/64  - earth.wg0 (roaming laptop)
    +fd42:beef:cafe:2::201/64  - pixel7pro.wg0 (roaming phone)
     

    -You can find the repository here:
    +

    Updating the mesh generator for IPv6



    -https://codeberg.org/snonux/sillybench
    +The mesh generator required two modifications to support dual-stack configurations:

    -

    Silly FreeBSD host benchmark


    +**1. Address generation (address method)**

    -To install it on FreeBSD, we run:
    +The generator now outputs multiple Address directives when IPv6 is present:

    -
    paul@f0:~ % doas pkg install git go
    -paul@f0:~ % mkdir ~/git && cd ~/git && \
    -  git clone https://codeberg.org/snonux/sillybench && \
    -  cd sillybench
    +
    def address
    +  return '# No Address = ... for OpenBSD here' if hosts[myself]['os'] == 'OpenBSD'
    +
    +  ipv4 = hosts[myself]['wg0']['ip']
    +  ipv6 = hosts[myself]['wg0']['ipv6']
    +
    +  # WireGuard supports multiple Address directives for dual-stack
    +  if ipv6
    +    "Address = #{ipv4}\nAddress = #{ipv6}/64"
    +  else
    +    "Address = #{ipv4}"
    +  end
    +end
     

    -And to run it:
    +**2. AllowedIPs generation (peers method)**
    +
    +For mesh peers, both IPv4 and IPv6 addresses are included in AllowedIPs:

    -
    paul@f0:~/git/sillybench % go version
    -go version go1.24.1 freebsd/amd64
    -
    -paul@f0:~/git/sillybench % go test -bench=.
    -goos: freebsd
    -goarch: amd64
    -pkg: codeberg.org/snonux/sillybench
    -cpu: Intel(R) N100
    -BenchmarkCPUSilly1-4    1000000000               0.4022 ns/op
    -BenchmarkCPUSilly2-4    1000000000               0.4027 ns/op
    -PASS
    -ok      codeberg.org/snonux/sillybench 0.891s
    +
    if is_roaming
    +  allowed_ips = '0.0.0.0/0, ::/0'
    +else
    +  # For mesh peers, allow both IPv4 and IPv6 if present
    +  ipv4 = data['wg0']['ip']
    +  ipv6 = data['wg0']['ipv6']
    +  allowed_ips = ipv6 ? "#{ipv4}/32, #{ipv6}/128" : "#{ipv4}/32"
    +end
     

    -

    Silly Rocky Linux VM @ Bhyve benchmark


    +Roaming clients keep AllowedIPs = 0.0.0.0/0, ::/0 to route all traffic (IPv4 and IPv6) through the VPN.

    -OK, let's compare this with the Rocky Linux VM running on Bhyve:
    +

    IPv6 NAT on OpenBSD gateways



    - -
    [root@r0 ~]# dnf install golang git
    -[root@r0 ~]# mkdir ~/git && cd ~/git && \
    -  git clone https://codeberg.org/snonux/sillybench && \
    -  cd sillybench
    +To allow roaming clients to access the internet via IPv6, we added NAT66 rules to the OpenBSD gateways' pf.conf:
    +
    +
    +# NAT for WireGuard clients to access internet (IPv4)
    +match out on vio0 from 192.168.2.0/24 to any nat-to (vio0)
    +
    +# NAT66 for WireGuard clients to access internet (IPv6)
    +# Uses NPTv6 (Network Prefix Translation) to translate ULA to public IPv6
    +match out on vio0 inet6 from fd42:beef:cafe:2::/64 to any nat-to (vio0)
    +
    +# Allow all UDP traffic on WireGuard port (IPv4 and IPv6)
    +pass in inet proto udp from any to any port 56709
    +pass in inet6 proto udp from any to any port 56709
     

    -And to run it:
    +OpenBSD's PF firewall supports IPv6 NAT with the same syntax as IPv4, using NPTv6 (RFC 6296) to translate the ULA addresses to the gateway's public IPv6 address.
    +
    +

    Manual OpenBSD interface configuration


    +
    +Since OpenBSD doesn't use the Address directive in WireGuard configs, IPv6 must be manually configured on the wg0 interfaces. On blowfish:

    -
    [root@r0 sillybench]# go version
    -go version go1.22.9 (Red Hat 1.22.9-2.el9_5) linux/amd64
    -[root@r0 sillybench]# go test -bench=.
    -goos: linux
    -goarch: amd64
    -pkg: codeberg.org/snonux/sillybench
    -cpu: Intel(R) N100
    -BenchmarkCPUSilly1-4    1000000000               0.4347 ns/op
    -BenchmarkCPUSilly2-4    1000000000               0.4345 ns/op
    +
    rex@blowfish:~ $ doas vi /etc/hostname.wg0
     

    -The Linux benchmark is slightly slower than the FreeBSD one. The Go version is also a bit older. I tried the same with the up-to-date version of Go (1.24.x) with similar results. There could be a slight Bhyve overhead, or FreeBSD is just slightly more efficient in this benchmark. Overall, this shows that Bhyve performs excellently.
    +Add the IPv6 address (note the order - IPv6 must be configured before up):

    -

    Silly FreeBSD VM @ Bhyve benchmark


    +
    +inet 192.168.2.110 255.255.255.0 NONE
    +inet6 fd42:beef:cafe:2::110 64
    +up
    +!/usr/local/bin/wg setconf wg0 /etc/wireguard/wg0.conf
    +

    -But as I am curious and don't want to compare apples with bananas, I decided to install a FreeBSD Bhyve VM to run the same silly benchmark in it. I am not going through the details of how to install a FreeBSD Bhyve VM here; you can easily look it up in the documentation.
    +Important: The IPv6 address must be specified before the up directive. This ensures the interface has both addresses configured before WireGuard peers are loaded.

    -But here are the results running the same silly benchmark in a FreeBSD Bhyve VM with the same FreeBSD and Go versions as the host system (I have the VM 4 vCPUs and 14GB of RAM; the benchmark won't use as many CPUs (and memory) anyway):
    +Apply the configuration:

    -
    root@freebsd:~/git/sillybench # go test -bench=.
    -goos: freebsd
    -goarch: amd64
    -pkg: codeberg.org/snonux/sillybench
    -cpu: Intel(R) N100
    -BenchmarkCPUSilly1      1000000000               0.4273 ns/op
    -BenchmarkCPUSilly2      1000000000               0.4286 ns/op
    -PASS
    -ok      codeberg.org/snonux/sillybench  0.949s
    +
    rex@blowfish:~ $ doas sh /etc/netstart wg0
    +rex@blowfish:~ $ ifconfig wg0 | grep inet6
    +inet6 fd42:beef:cafe:2::110 prefixlen 64
     

    -It's a bit better than Linux! I am sure that this is not really a scientific benchmark, so take the results with a grain of salt!
    -
    -

    Benchmarking with ubench


    +Repeat for fishfinger with address fd42:beef:cafe:2::111.

    -Let's run another, more sophisticated benchmark using ubench, the Unix Benchmark Utility available for FreeBSD. It was installed by simply running doas pkg install ubench. It can benchmark CPU and memory performance. Here, we limit it to one CPU for the first run with -s, and then let it run at full speed (using all available CPUs in parallel) in the second run.
    +After reboot, the interface will automatically come up with both IPv4 and IPv6 addresses. WireGuard peers may take 30-60 seconds to establish handshakes after boot.

    -

    FreeBSD host ubench benchmark


    +

    Verifying dual-stack connectivity



    -Single CPU:
    +After regenerating and deploying the configurations, both IPv4 and IPv6 work across the mesh:

    -
    paul@f0:~ % doas ubench -s 1
    -Unix Benchmark Utility v.0.3
    -Copyright (C) July, 1999 PhysTech, Inc.
    -Author: Sergei Viznyuk <sv@phystech.com>
    -http://www.phystech.com/download/ubench.html
    -FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    -Ubench Single CPU:   671010 (0.40s)
    -Ubench Single MEM:  1705237 (0.48s)
    ------------------------------------
    -Ubench Single AVG:  1188123
    +
    # From r0 (Rocky Linux VM)
    +root@r0:~ # ping -c 2 192.168.2.130  # IPv4 to f0
    +64 bytes from 192.168.2.130: icmp_seq=1 ttl=64 time=2.12 ms
    +64 bytes from 192.168.2.130: icmp_seq=2 ttl=64 time=0.681 ms
     
    +root@r0:~ # ping6 -c 2 fd42:beef:cafe:2::130  # IPv6 to f0
    +64 bytes from fd42:beef:cafe:2::130: icmp_seq=1 ttl=64 time=2.16 ms
    +64 bytes from fd42:beef:cafe:2::130: icmp_seq=2 ttl=64 time=0.909 ms
     

    -All CPUs (with all Bhyve VMs stopped):
    +The dual-stack configuration is backward compatible—hosts without the ipv6 field in the YAML configuration will continue to generate IPv4-only configs.
    +
    +

    Benefits of dual-stack


    +
    +Adding IPv6 to the mesh network provides:
    +
    +
      +
    • Future-proofing: Ready for IPv6-only services and networks
    • +
    • Compatibility: Dual-stack maintains full IPv4 compatibility
    • +
    • Learning: Hands-on experience with IPv6 networking
    • +
    • Flexibility: Roaming clients can access both IPv4 and IPv6 internet resources
    • +

    +

    Happy WireGuard-ing


    +
    +All is set up now. E.g. on f0:

    -
    paul@f0:~ % doas ubench
    -Unix Benchmark Utility v.0.3
    -Copyright (C) July, 1999 PhysTech, Inc.
    -Author: Sergei Viznyuk <sv@phystech.com>
    -http://www.phystech.com/download/ubench.html
    -FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    -Ubench CPU:  2660220
    -Ubench MEM:  3095182
    ---------------------
    -Ubench AVG:  2877701
    +
    paul@f0:~ % doas wg show
    +interface: wg0
    +  public key: Jm6YItMt94++dIeOyVi1I9AhNt2qQcryxCZezoX7X2Y=
    +  private key: (hidden)
    +  listening port: 56709
    +
    +peer: 8PvGZH1NohHpZPVJyjhctBX9xblsNvYBhpg68FsFcns=
    +  preshared key: (hidden)
    +  endpoint: 46.23.94.99:56709
    +  allowed ips: 192.168.2.111/32, fd42:beef:cafe:2::111/128
    +  latest handshake: 1 minute, 46 seconds ago
    +  transfer: 124 B received, 1.75 KiB sent
    +  persistent keepalive: every 25 seconds
    +
    +peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    +  preshared key: (hidden)
    +  endpoint: 23.88.35.144:56709
    +  allowed ips: 192.168.2.110/32, fd42:beef:cafe:2::110/128
    +  latest handshake: 1 minute, 52 seconds ago
    +  transfer: 124 B received, 1.60 KiB sent
    +  persistent keepalive: every 25 seconds
    +
    +peer: s3e93XoY7dPUQgLiVO4d8x/SRCFgEew+/wP7+zwgehI=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.120:56709
    +  allowed ips: 192.168.2.120/32, fd42:beef:cafe:2::120/128
    +
    +peer: 2htXdNcxzpI2FdPDJy4T4VGtm1wpMEQu1AkQHjNY6F8=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.131:56709
    +  allowed ips: 192.168.2.131/32, fd42:beef:cafe:2::131/128
    +
    +peer: 0Y/H20W8YIbF7DA1sMwMacLI8WS9yG+1/QO7m2oyllg=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.122:56709
    +  allowed ips: 192.168.2.122/32, fd42:beef:cafe:2::122/128
    +
    +peer: Hhy9kMPOOjChXV2RA5WeCGs+J0FE3rcNPDw/TLSn7i8=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.121:56709
    +  allowed ips: 192.168.2.121/32, fd42:beef:cafe:2::121/128
    +
    +peer: SlGVsACE1wiaRoGvCR3f7AuHfRS+1jjhS+YwEJ2HvF0=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.132:56709
    +  allowed ips: 192.168.2.132/32, fd42:beef:cafe:2::132/128
     

    -

    FreeBSD VM @ Bhyve ubench benchmark


    -
    -Single CPU:
    +All the hosts are pingable as well, e.g.:

    -
    root@freebsd:~ # ubench -s 1
    -Unix Benchmark Utility v.0.3
    -Copyright (C) July, 1999 PhysTech, Inc.
    -Author: Sergei Viznyuk <sv@phystech.com>
    -http://www.phystech.com/download/ubench.html
    -FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    -Ubench Single CPU:   672792 (0.40s)
    -Ubench Single MEM:   852757 (0.48s)
    ------------------------------------
    -Ubench Single AVG:   762774
    +
    paul@f0:~ % foreach peer ( f1 f2 r0 r1 r2 blowfish fishfinger )
    +foreach? ping -c2 $peer.wg0
    +foreach? echo
    +foreach? end
    +PING f1.wg0 (192.168.2.131): 56 data bytes
    +64 bytes from 192.168.2.131: icmp_seq=0 ttl=64 time=0.334 ms
    +64 bytes from 192.168.2.131: icmp_seq=1 ttl=64 time=0.260 ms
    +
    +--- f1.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 0.260/0.297/0.334/0.037 ms
    +
    +PING f2.wg0 (192.168.2.132): 56 data bytes
    +64 bytes from 192.168.2.132: icmp_seq=0 ttl=64 time=0.323 ms
    +64 bytes from 192.168.2.132: icmp_seq=1 ttl=64 time=0.303 ms
    +
    +--- f2.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 0.303/0.313/0.323/0.010 ms
    +
    +PING r0.wg0 (192.168.2.120): 56 data bytes
    +64 bytes from 192.168.2.120: icmp_seq=0 ttl=64 time=0.716 ms
    +64 bytes from 192.168.2.120: icmp_seq=1 ttl=64 time=0.406 ms
    +
    +--- r0.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 0.406/0.561/0.716/0.155 ms
    +
    +PING r1.wg0 (192.168.2.121): 56 data bytes
    +64 bytes from 192.168.2.121: icmp_seq=0 ttl=64 time=0.639 ms
    +64 bytes from 192.168.2.121: icmp_seq=1 ttl=64 time=0.629 ms
    +
    +--- r1.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 0.629/0.634/0.639/0.005 ms
    +
    +PING r2.wg0 (192.168.2.122): 56 data bytes
    +64 bytes from 192.168.2.122: icmp_seq=0 ttl=64 time=0.569 ms
    +64 bytes from 192.168.2.122: icmp_seq=1 ttl=64 time=0.479 ms
    +
    +--- r2.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 0.479/0.524/0.569/0.045 ms
    +
    +PING blowfish.wg0 (192.168.2.110): 56 data bytes
    +64 bytes from 192.168.2.110: icmp_seq=0 ttl=255 time=35.745 ms
    +64 bytes from 192.168.2.110: icmp_seq=1 ttl=255 time=35.481 ms
    +
    +--- blowfish.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 35.481/35.613/35.745/0.132 ms
    +
    +PING fishfinger.wg0 (192.168.2.111): 56 data bytes
    +64 bytes from 192.168.2.111: icmp_seq=0 ttl=255 time=33.992 ms
    +64 bytes from 192.168.2.111: icmp_seq=1 ttl=255 time=33.751 ms
    +
    +--- fishfinger.wg0 ping statistics ---
    +2 packets transmitted, 2 packets received, 0.0% packet loss
    +round-trip min/avg/max/stddev = 33.751/33.872/33.992/0.120 ms
     

    -Wow, the CPU in the VM was a tiny bit faster than on the host! So this was probably just a glitch in the matrix. Memory seems slower, though.
    +Note that the loop above is a tcsh loop, the default shell used in FreeBSD. Of course, all other peers can ping their peers as well!

    -All CPUs:
    +After the first ping, VPN tunnels now also show handshakes and the amount of data transferred through them:

    -
    root@freebsd:~ # ubench
    -Unix Benchmark Utility v.0.3
    -Copyright (C) July, 1999 PhysTech, Inc.
    -Author: Sergei Viznyuk <sv@phystech.com>
    -http://www.phystech.com/download/ubench.html
    -FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    -Ubench CPU:  2652857
    -swap_pager: out of swap space
    -swp_pager_getswapspace(27): failed
    -swap_pager: out of swap space
    -swp_pager_getswapspace(18): failed
    -Apr  4 23:02:43 freebsd kernel: pid 862 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    -swp_pager_getswapspace(6): failed
    -Apr  4 23:02:46 freebsd kernel: pid 863 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    -Apr  4 23:02:47 freebsd kernel: pid 864 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    -Apr  4 23:02:48 freebsd kernel: pid 865 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    -Apr  4 23:02:49 freebsd kernel: pid 861 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    -Apr  4 23:02:51 freebsd kernel: pid 839 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +
    paul@f0:~ % doas wg show
    +interface: wg0
    +  public key: Jm6YItMt94++dIeOyVi1I9AhNt2qQcryxCZezoX7X2Y=
    +  private key: (hidden)
    +  listening port: 56709
    +
    +peer: 0Y/H20W8YIbF7DA1sMwMacLI8WS9yG+1/QO7m2oyllg=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.122:56709
    +  allowed ips: 192.168.2.122/32, fd42:beef:cafe:2::122/128
    +  latest handshake: 10 seconds ago
    +  transfer: 440 B received, 532 B sent
    +
    +peer: Hhy9kMPOOjChXV2RA5WeCGs+J0FE3rcNPDw/TLSn7i8=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.121:56709
    +  allowed ips: 192.168.2.121/32, fd42:beef:cafe:2::121/128
    +  latest handshake: 12 seconds ago
    +  transfer: 440 B received, 564 B sent
    +
    +peer: s3e93XoY7dPUQgLiVO4d8x/SRCFgEew+/wP7+zwgehI=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.120:56709
    +  allowed ips: 192.168.2.120/32, fd42:beef:cafe:2::120/128
    +  latest handshake: 14 seconds ago
    +  transfer: 440 B received, 564 B sent
    +
    +peer: SlGVsACE1wiaRoGvCR3f7AuHfRS+1jjhS+YwEJ2HvF0=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.132:56709
    +  allowed ips: 192.168.2.132/32, fd42:beef:cafe:2::132/128
    +  latest handshake: 17 seconds ago
    +  transfer: 472 B received, 564 B sent
    +
    +peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    +  preshared key: (hidden)
    +  endpoint: 23.88.35.144:56709
    +  allowed ips: 192.168.2.110/32, fd42:beef:cafe:2::110/128
    +  latest handshake: 55 seconds ago
    +  transfer: 472 B received, 596 B sent
    +  persistent keepalive: every 25 seconds
    +
    +peer: 8PvGZH1NohHpZPVJyjhctBX9xblsNvYBhpg68FsFcns=
    +  preshared key: (hidden)
    +  endpoint: 46.23.94.99:56709
    +  allowed ips: 192.168.2.111/32, fd42:beef:cafe:2::111/128
    +  latest handshake: 55 seconds ago
    +  transfer: 472 B received, 596 B sent
    +  persistent keepalive: every 25 seconds
    +
    +peer: 2htXdNcxzpI2FdPDJy4T4VGtm1wpMEQu1AkQHjNY6F8=
    +  preshared key: (hidden)
    +  endpoint: 192.168.1.131:56709
    +  allowed ips: 192.168.2.131/32, fd42:beef:cafe:2::131/128
     

    -The multi-CPU benchmark in the Bhyve VM ran with almost identical results to the FreeBSD host system. However, the memory benchmark failed with out-of-swap space errors. I am unsure why, as the VM has 14GB RAM, but I am not investigating further.
    +

    Managing Roaming Client Tunnels



    -Also, during the benchmark, I noticed the bhyve process on the host was constantly using 399% of the CPU (all 4 CPUs).
    +Since roaming clients like earth and pixel7pro connect on-demand rather than being always-on like the infrastructure hosts, it's useful to know how to configure and manage the WireGuard tunnels.

    -
    -  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    - 7449 root         14  20    0    14G    78M kqread   2   2:12 399.81% bhyve
    -
    +

    Manual gateway failover configuration



    -Overall, Bhyve has a small overhead, but the CPU performance difference is negligible. The FreeBSD host is slightly faster than the FreeBSD VM running on Bhyve, but the difference is small enough for our use cases. The memory benchmark seems slightly off, but I'm not sure whether to trust it, especially due to the swap errors. Does ubench's memory benchmark use swap space for the memory test? That wouldn't make sense and might explain the difference to some degree, though. Do you have any ideas?
    +The default configuration for roaming clients includes both gateways (blowfish and fishfinger) with AllowedIPs = 0.0.0.0/0, ::/0. However, WireGuard doesn't automatically failover between multiple peers with identical AllowedIPs routes. When both gateways are configured this way, WireGuard uses the first peer with a recent handshake. If that gateway goes down, traffic won't automatically switch to the backup gateway.

    -

    Rocky Linux VM @ Bhyve ubench benchmark


    +To enable manual failover, separate configuration files can be created for roaming clients (earth laptop and pixel7pro phone), each containing only a single gateway peer. This provides explicit control over which gateway handles traffic.

    -Unfortunately, I wasn't able to find ubench in any of the Rocky Linux repositories. So, I skipped this test.
    +Configuration files for pixel7pro (phone):

    -

    Update: Improving Disk I/O Performance for etcd


    +Two separate configs in /home/paul/git/wireguardmeshgenerator/dist/pixel7pro/etc/wireguard/:

    -Updated: Fri 26 Dec 08:51:23 EET 2025
    +
      +
    • wg0-blowfish.conf - Routes all traffic through blowfish gateway (23.88.35.144)
    • +
    • wg0-fishfinger.conf - Routes all traffic through fishfinger gateway (46.23.94.99)
    • +

    +Generate QR codes for importing into the WireGuard Android app:

    -After running k3s for some time, I noticed frequent etcd leader elections and "apply request took too long" warnings in the logs. Investigation revealed that etcd's sync writes were extremely slow - around 250 kB/s with the default virtio-blk disk emulation. etcd requires fast sync writes (ideally under 10ms fsync latency) for stable operation.
    + +
    qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0-blowfish.conf
    +qrencode -t ansiutf8 < dist/pixel7pro/etc/wireguard/wg0-fishfinger.conf
    +

    -

    The Problem


    +Import both QR codes using the WireGuard app to create two separate tunnel profiles. You can then manually enable/disable each tunnel to select which gateway to use. Only enable one tunnel at a time.

    -The k3s logs showed etcd struggling with disk I/O:
    +Configuration files for earth (laptop):

    -
    -{"level":"warn","msg":"apply request took too long","took":"4.996516657s","expected-duration":"100ms"}
    -{"level":"warn","msg":"slow fdatasync","took":"1.328469363s","expected-duration":"1s"}
    -
    +Two separate configs in /home/paul/git/wireguardmeshgenerator/dist/earth/etc/wireguard/:

    -A simple sync write benchmark confirmed the issue:
    +
      +
    • wg0-blowfish.conf - Routes all traffic through blowfish gateway
    • +
    • wg0-fishfinger.conf - Routes all traffic through fishfinger gateway
    • +

    +Install both configurations:

    -
    [root@r0 ~]# dd if=/dev/zero of=/tmp/test bs=4k count=2000 oflag=dsync
    -8192000 bytes copied, 31.7058 s, 258 kB/s
    +
    sudo cp dist/earth/etc/wireguard/wg0-blowfish.conf /etc/wireguard/
    +sudo cp dist/earth/etc/wireguard/wg0-fishfinger.conf /etc/wireguard/
     

    -

    The Solution: Switch to NVMe Emulation


    -
    -Bhyve's NVMe emulation provides significantly better I/O performance than virtio-blk.
    +This approach provides explicit control over which gateway handles roaming client traffic, useful when one gateway needs maintenance or experiences connectivity issues.

    -

    Step 1: Prepare the Guest OS


    +

    Starting and stopping on earth (Fedora laptop)



    -Before changing the disk type, the guest needs NVMe drivers in the initramfs and LVM must be configured to scan all devices (not just those recorded during installation):
    +On the Fedora laptop, WireGuard is managed via systemd. Using the separate gateway configs:

    -
    [root@r0 ~]# cat > /etc/dracut.conf.d/nvme.conf << EOF
    -add_drivers+=" nvme nvme_core "
    -hostonly=no
    -EOF
    +
    # Start with blowfish gateway
    +earth$ sudo systemctl start wg-quick@wg0-blowfish.service
     
    -[root@r0 ~]# sed -i 's/# use_devicesfile = 1/use_devicesfile = 0/' /etc/lvm/lvm.conf
    -[root@r0 ~]# dracut -f
    -[root@r0 ~]# shutdown -h now
    +# Or start with fishfinger gateway
    +earth$ sudo systemctl start wg-quick@wg0-fishfinger.service
    +
    +# Check tunnel status (example with blowfish gateway)
    +earth$ sudo wg show
    +interface: wg0
    +  public key: Mc1CpSS3rbLN9A2w9c75XugQyXUkGPHKI2iCGbh8DRo=
    +  private key: (hidden)
    +  listening port: 56709
    +  fwmark: 0xca6c
    +
    +peer: Xow+d3qVXgUMk4pcRSQ6Fe+vhYBa3VDyHX/4jrGoKns=
    +  preshared key: (hidden)
    +  endpoint: 23.88.35.144:56709
    +  allowed ips: 0.0.0.0/0, ::/0
    +  latest handshake: 5 seconds ago
    +  transfer: 15.89 KiB received, 32.15 KiB sent
    +  persistent keepalive: every 25 seconds
     

    -The hostonly=no setting ensures the initramfs includes drivers for hardware not currently present. The use_devicesfile = 0 tells LVM to scan all block devices rather than only those recorded in /etc/lvm/devices/system.devices - this is important because the device path changes from /dev/vda to /dev/nvme0n1.
    -
    -

    Step 2: Update the Bhyve Configuration


    -
    -On the FreeBSD host, update the VM configuration to use NVMe:
    +Stopping the tunnel:

    -
    paul@f0:~ % doas vm stop rocky
    -paul@f0:~ % doas vm configure rocky
    -
    -
    -Change disk0_type from virtio-blk to nvme:
    -
    -
    -disk0_type="nvme"
    +
    earth$ sudo systemctl stop wg-quick@wg0-blowfish.service
    +# Or if using fishfinger:
    +earth$ sudo systemctl stop wg-quick@wg0-fishfinger.service
    +
    +earth$ sudo wg show
    +# No output - WireGuard interface is down
     

    -Then start the VM:
    +Switching between gateways:

    -
    paul@f0:~ % doas vm start rocky
    +
    # Switch from blowfish to fishfinger
    +earth$ sudo systemctl stop wg-quick@wg0-blowfish.service
    +earth$ sudo systemctl start wg-quick@wg0-fishfinger.service
     

    -

    Benchmark Results


    +The services remain disabled to prevent auto-start on boot, allowing manual control of when the VPN is active and which gateway to use.

    -After switching to NVMe emulation, the sync write performance improved dramatically:
    +

    Starting and stopping on pixel7pro (Android phone)


    +
    +On Android using the official WireGuard app, you now have two tunnel profiles (wg0-blowfish and wg0-fishfinger) after importing the QR codes:
    +
    +Starting a tunnel:
    +
    +
      +
    • 1. Open the WireGuard app
    • +
    • 2. Tap the toggle switch next to either wg0-blowfish or wg0-fishfinger tunnel configuration
    • +
    • 3. The switch turns blue/green and shows "Active"
    • +
    • 4. A key icon appears in the notification bar indicating VPN is active
    • +
    • 5. All traffic now routes through the selected gateway
    • +

    +Stopping the tunnel:
    +
    +
      +
    • 1. Open the WireGuard app
    • +
    • 2. Tap the toggle switch again to disable it
    • +
    • 3. The switch turns gray and shows "Inactive"
    • +
    • 4. The notification bar key icon disappears
    • +
    • 5. Normal internet routing resumes
    • +

    +Switching between gateways:
    +
    +
      +
    • 1. Disable the currently active tunnel (e.g., wg0-blowfish)
    • +
    • 2. Enable the other tunnel (e.g., wg0-fishfinger)
    • +
    • Only enable one tunnel at a time
    • +

    +Quick toggling from notification:
    +
    +
      +
    • Pull down the notification shade
    • +
    • Tap the WireGuard notification to quickly enable/disable the tunnel without opening the app
    • +

    +The WireGuard Android app supports automatically activating tunnels based on:
    +
    +
      +
    • Mobile data connection (e.g., enable VPN when on cellular)
    • +
    • WiFi SSID (e.g., disable VPN when on trusted home network)
    • +
    • Ethernet connection status
    • +

    +These settings can be configured by tapping the pencil icon next to the tunnel name, then scrolling to "Toggle on/off based on" options.
    +
    +

    Verifying connectivity


    +
    +Once the tunnel is active on either device, verify connectivity:

    -
    [root@r0 ~]# dd if=/dev/zero of=/tmp/test bs=4k count=2000 oflag=dsync
    -8192000 bytes copied, 0.330718 s, 24.8 MB/s
    +
    # From earth laptop:
    +earth$ ping -c2 blowfish.wg0
    +earth$ ping -c2 fishfinger.wg0
    +earth$ curl https://ifconfig.me  # Should show gateway's public IP
     

    -That's approximately **100x faster** than before (24.8 MB/s vs 258 kB/s).
    +Check which gateway is active: Check the transfer statistics with sudo wg show on earth to see which peer shows recent handshakes and increasing transfer bytes. On Android, the WireGuard app shows the active tunnel with data transfer statistics.

    -The etcd metrics also showed healthy fsync latencies:
    +

    Conclusion


    +
    +Having a mesh network on our hosts is great for securing all the traffic between them for our future k3s setup. A self-managed WireGuard mesh network is better than Tailscale as it eliminates reliance on a third party and provides full control over the configuration. It reduces unnecessary abstraction and "magic," enabling easier debugging and ensuring full ownership of our network.
    +
    +Read the next post of this series:
    +
    +f3s: Kubernetes with FreeBSD - Part 6: Storage
    +
    +Other *BSD-related posts:
    +
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network (You are currently reading this)
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +
    +E-Mail your comments to paul@nospam.buetow.org
    +
    +Back to the main site
    +
    +
    +
    + + Terminal multiplexing with `tmux` - Fish edition + + gemini://foo.zone/gemfeed/2025-05-02-terminal-multiplexing-with-tmux-fish-edition.gmi + 2025-05-02T00:09:23+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the Fish shell edition of the same post (but for Z-Shell) of mine from last year: + +
    +

    Terminal multiplexing with tmux - Fish edition


    +
    +Published at 2025-05-02T00:09:23+03:00
    +
    +This is the Fish shell edition of the same post (but for Z-Shell) of mine from last year:
    +
    +./2024-06-23-terminal-multiplexing-with-tmux.html
    +
    +Tmux (Terminal Multiplexer) is a powerful, terminal-based tool that manages multiple terminal sessions within a single window. Here are some of its primary features and functionalities:
    +
    +
      +
    • Session management
    • +
    • Window and Pane management
    • +
    • Persistent Workspace
    • +
    • Customization
    • +

    +https://github.com/tmux/tmux/wiki

    -etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 347
    -etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 396
    -etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 408
    +            _______                           s
    +           |.-----.|                           s
    +           || Tmux||                          s
    +           ||_.-._||       |\   \\\\__     o          s
    +           `--)-(--`       | \_/    o \    o          s
    +          __[=== o]__      > _   (( <_  oo            s
    +         |:::::::::::|\    | / \__+___/               s
    +   jgs   `-=========-`()   |/     |/                  s
    +       mod. by Paul B.
     

    -Most fsyncs now complete in under 1ms, and there are no more "slow fdatasync" warnings in the logs. The k3s cluster is now stable without spurious leader elections.
    -
    -

    Important Notes


    +

    Table of Contents




    -

    Conclusion


    +

    Before continuing...



    -Having Linux VMs running inside FreeBSD's Bhyve is a solid move for future f3s hosting in my home lab. Bhyve provides a reliable way to manage VMs without much hassle. With Linux VMs, I can tap into all the cool stuff (e.g., Kubernetes, eBPF, systemd) in the Linux world while keeping the steady reliability of FreeBSD.
    +Before continuing to read this post, I encourage you to get familiar with Tmux first (unless you already know the basics). You can go through the official getting started guide:

    -Future uses (out of scope for this blog series) would be additional VMs for different workloads. For example, how about a Windows or NetBSD VM to tinker with?
    +https://github.com/tmux/tmux/wiki/Getting-Started

    -This flexibility is great for keeping options open and managing different workloads without overcomplicating things. Overall, it's a nice setup for getting the most out of my hardware and keeping things running smoothly.
    +I can also recommend this book (this is the book I got started with with Tmux):

    -Read the next post of this series:
    +https://pragprog.com/titles/bhtmux2/tmux-2/
    +
    +Over the years, I have built a couple of shell helper functions to optimize my workflows. Tmux is extensively integrated into my daily workflows (personal and work). I had colleagues asking me about my Tmux config and helper scripts for Tmux several times. It would be neat to blog about it so that everyone interested in it can make a copy of my configuration and scripts.
    +
    +The configuration and scripts in this blog post are only the non-work-specific parts. There are more helper scripts, which I only use for work (and aren't really useful outside of work due to the way servers and clusters are structured there).
    +
    +Tmux is highly configurable, and I think I am only scratching the surface of what is possible with it. Nevertheless, it may still be useful for you. I also love that Tmux is part of the OpenBSD base system!
    +
    +

    Shell aliases


    +
    +Since last week, I am playing a bit with the Fish shell. As a result, I also converted all my tmux helper scripts (mentioned in this blog post) from Z-Shell to Fish.
    +
    +https://fishshell.com
    +
    +For the most common Tmux commands I use, I have created the following shell aliases:
    +
    +
    +alias tn 'tmux::new'
    +alias ta 'tmux::attach'
    +alias tx 'tmux::remote'
    +alias ts 'tmux::search'
    +alias tssh 'tmux::cluster_ssh'
    +alias tm tmux
    +alias tl 'tmux list-sessions'
    +alias foo 'tmux::new foo'
    +alias bar 'tmux::new bar'
    +alias baz 'tmux::new baz'
    +

    -f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +Note all tmux::...; those are custom shell functions doing certain things, and they aren't part of the Tmux distribution. But let's run through every aliases one by one.

    -Other *BSD-related posts:
    +The first two are pretty straightforward. tm is simply a shorthand for tmux, so I have to type less, and tl lists all Tmux sessions that are currently open. No magic here.

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs (You are currently reading this)
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +

    The tn alias - Creating a new session



    -E-Mail your comments to paul@nospam.buetow.org
    +The tn alias is referencing this function:

    -Back to the main site
    -
    -
    -
    - - Sharing on Social Media with Gos v1.0.0 - - gemini://foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.gmi - 2025-03-04T21:22:07+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that. - -
    -

    Sharing on Social Media with Gos v1.0.0


    +
    +# Create new session and if alread exists attach to it
    +function tmux::new
    +    set -l session $argv[1]
    +    _tmux::cleanup_default
    +    if test -z "$session"
    +        tmux::new (string join "" T (date +%s))
    +    else
    +        tmux new-session -d -s $session
    +        tmux -2 attach-session -t $session || tmux -2 switch-client -t $session
    +    end
    +end
    +

    -Published at 2025-03-04T21:22:07+02:00
    +There is a lot going on here. Let's have a detailed look at what it is doing.

    -As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that.
    +First, a Tmux session name can be passed to the function as a first argument. That session name is only optional. Without it, Tmux will select a session named (string join "" T (date +%s)) as a default. Which is T followed by the UNIX epoch, e.g. T1717133796.

    -Gos logo
    +

    Cleaning up default sessions automatically



    -

    Table of Contents


    +Note also the call to _tmux::cleanup_default; it would clean up all already opened default sessions if they aren't attached. Those sessions were only temporary, and I had too many flying around after a while. So, I decided to auto-delete the sessions if they weren't attached. If I want to keep sessions around, I will rename them with the Tmux command prefix-key $. This is the cleanup function:

    -
    -

    Introduction


    +
    +function _tmux::cleanup_default
    +    tmux list-sessions | string match -r '^T.*: ' | string match -v -r attached | string split ':' | while read -l s
    +        echo "Killing $s"
    +        tmux kill-session -t "$s"
    +    end
    +end
    +

    -Gos is a Go-based replacement (which I wrote) for Buffer.com, providing the ability to schedule and manage social media posts from the command line. It can be run, for example, every time you open a new shell or only once every N hours when you open a new shell.
    +The cleanup function kills all open Tmux sessions that haven't been renamed properly yet—but only if they aren't attached (e.g., don't run in the foreground in any terminal). Cleaning them up automatically keeps my Tmux sessions as neat and tidy as possible.

    -I used Buffer.com to schedule and post my social media messages for a long time. However, over time, there were more problems with that service, including a slow and unintuitive UI, and the free version only allows scheduling up to 10 messages. At one point, they started to integrate an AI assistant (which would seemingly randomly pop up in separate JavaScript-powered input boxes), and then I had enough and decided I had to build my own social sharing tool—and Gos was born.
    +

    Renaming sessions



    -https://buffer.com
    -https://codeberg.org/snonux/gos
    +Whenever I am in a temporary session (named T....), I may decide that I want to keep this session around. I have to rename the session to prevent the cleanup function from doing its thing. That's, as mentioned already, easily accomplished with the standard prefix-key $ Tmux command.

    -

    Gos features


    +

    The ta alias - Attaching to a session



    -
      -
    • Mastodon and LinkedIn support.
    • -
    • Dry run mode for testing posts without actually publishing.
    • -
    • Configurable via flags and environment variables.
    • -
    • Easy to integrate into automated workflows.
    • -
    • OAuth2 authentication for LinkedIn.
    • -
    • Image previews for LinkedIn posts.
    • -

    -

    Installation


    +This alias refers to the following function, which tries to attach to an already-running Tmux session.

    -

    Prequisites


    +
    +function tmux::attach
    +    set -l session $argv[1]
    +    if test -z "$session"
    +        tmux attach-session || tmux::new
    +    else
    +        tmux attach-session -t $session || tmux::new $session
    +    end
    +end
    +

    -The prerequisites are:
    +If no session is specified (as the argument of the function), it will try to attach to the first open session. If no Tmux server is running, it will create a new one with tmux::new. Otherwise, with a session name given as the argument, it will attach to it. If unsuccessful (e.g., the session doesn't exist), it will be created and attached to.

    -
      -
    • Go (version 1.24 or later)
    • -
    • Supported browsers like Firefox, Chrome, etc for oauth2.
    • -

    -

    Build and install


    +

    The tr alias - For a nested remote session



    -Clone the repository:
    +This SSHs into the remote server specified and then, remotely on the server itself, starts a nested Tmux session. So we have one Tmux session on the local computer and, inside of it, an SSH connection to a remote server with a Tmux session running again. The benefit of this is that, in case my network connection breaks down, the next time I connect, I can continue my work on the remote server exactly where I left off. The session name is the name of the server being SSHed into. If a session like this already exists, it simply attaches to it.

    - -
    git clone https://codeberg.org/snonux/gos.git
    -cd gos
    +
    +function tmux::remote
    +    set -l server $argv[1]
    +    tmux new -s $server "ssh -A -t $server 'tmux attach-session || tmux'" || tmux attach-session -d -t $server
    +end
     

    -Build the binaries:
    -
    - -
    go build -o gos ./cmd/gos
    -go build -o gosc ./cmd/gosc
    -sudo mv gos ~/go/bin
    -sudo mv gosc ~/go/bin
    -
    +

    Change of the Tmux prefix for better nesting



    -Or, if you want to use the Taskfile:
    +To make nested Tmux sessions work smoothly, one must change the Tmux prefix key locally or remotely. By default, the Tmux prefix key is Ctrl-b, so Ctrl-b $, for example, renames the current session. To change the prefix key from the standard Ctrl-b to, for example, Ctrl-g, you must add this to the tmux.conf:

    - -
    go-task install
    +
    +set-option -g prefix C-g
     

    -

    Configuration


    +This way, when I want to rename the remote Tmux session, I have to use Ctrl-g $, and when I want to rename the local Tmux session, I still have to use Ctrl-b $. In my case, I have this deployed to all remote servers through a configuration management system (out of scope for this blog post).

    -Gos requires a configuration file to store API secrets and OAuth2 credentials for each supported social media platform. The configuration is managed using a Secrets structure, which is stored as a JSON file in ~/.config/gos/gos.json.
    +There might also be another way around this (without reconfiguring the prefix key), but that is cumbersome to use, as far as I remember.

    -Example Configuration File (~/.config/gos/gos.json):
    +

    The ts alias - Searching sessions with fuzzy finder


    +
    +Despite the fact that with _tmux::cleanup_default, I don't leave a huge mess with trillions of Tmux sessions flying around all the time, at times, it can become challenging to find exactly the session I am currently interested in. After a busy workday, I often end up with around twenty sessions on my laptop. This is where fuzzy searching for session names comes in handy, as I often don't remember the exact session names.

    -
    {
    -  "MastodonURL": "https://mastodon.example.com",
    -  "MastodonAccessToken": "your-mastodon-access-token",
    -  "LinkedInClientID": "your-linkedin-client-id",
    -  "LinkedInSecret": "your-linkedin-client-secret",
    -  "LinkedInRedirectURL": "http://localhost:8080/callback",
    -}
    +
    function tmux::search
    +    set -l session (tmux list-sessions | fzf | cut -d: -f1)
    +    if test -z "$TMUX"
    +        tmux attach-session -t $session
    +    else
    +        tmux switch -t $session
    +    end
    +end
     

    -

    Configuration fields


    +All it does is list all currently open sessions in fzf, where one of them can be searched and selected through fuzzy find, and then either switch (if already inside a session) to the other session or attach to the other session (if not yet in Tmux).

    -
      -
    • MastodonURL: The base URL of the Mastodon instance you are using (e.g., https://mastodon.social).
    • -
    • MastodonAccessToken: Your access token for the Mastodon API, which is used to authenticate your posts.
    • -
    • LinkedInClientID: The client ID for your LinkedIn app, which is needed for OAuth2 authentication.
    • -
    • LinkedInSecret: The client secret for your LinkedIn app.
    • -
    • LinkedInRedirectURL: The redirect URL configured for handling OAuth2 responses.
    • -
    • LinkedInAccessToken: Gos will automatically update this after successful OAuth2 authentication with LinkedIn.
    • -
    • LinkedInPersonID: Gos will automatically update this after successful OAuth2 authentication with LinkedIn.
    • -

    -

    Automatically managed fields


    +You must install the fzf command on your computer for this to work. This is how it looks like:

    -Once you finish the OAuth2 setup (after the initial run of gos), some fields—like LinkedInAccessToken and LinkedInPersonID will get filled in automatically. To check if everything's working without actually posting anything, you can run the app in dry run mode with the --dry option. After OAuth2 is successful, the file will be updated with LinkedInClientID and LinkedInAccessToken. If the access token expires, it will go through the OAuth2 process again.
    +Tmux session fuzzy finder

    -

    Invoking Gos


    +

    The tssh alias - Cluster SSH replacement



    -Gos is a command-line tool for posting updates to multiple social media platforms. You can run it with various flags to customize its behaviour, such as posting in dry run mode, limiting posts by size, or targeting specific platforms.
    +Before I used Tmux, I was a heavy user of ClusterSSH, which allowed me to log in to multiple servers at once in a single terminal window and type and run commands on all of them in parallel.

    -Flags control the tool's behavior. Below are several common ways to invoke Gos and descriptions of the available flags.
    +https://github.com/duncs/clusterssh

    -

    Common flags


    +However, since I started using Tmux, I retired ClusterSSH, as it came with the benefit that Tmux only needs to be run in the terminal, whereas ClusterSSH spawned terminal windows, which aren't easily portable (e.g., from a Linux desktop to macOS). The tmux::cluster_ssh function can have N arguments, where:

      -
    • -dry: Run the application in dry run mode, simulating operations without making any changes.
    • -
    • -version: Display the current version of the application.
    • -
    • -compose: Compose a new entry. Default is set by composeEntryDefault.
    • -
    • -gosDir: Specify the directory for Gos' queue and database files. The default is ~/.gosdir.
    • -
    • —cacheDir: Specify the directory for Gos' cache. The default is based on the gosDir path.
    • -
    • -browser: Choose the browser for OAuth2 processes. The default is "firefox".
    • -
    • -configPath: Path to the configuration file. Default is ~/.config/gos/gos.json.
    • -
    • —platforms: The enabled platforms and their post size limits. The default is "Mastodon:500,LinkedIn:1000."
    • -
    • -target: Target number of posts per week. The default is 2.
    • -
    • -minQueued: Minimum number of queued items before a warning message is printed. The default is 4.
    • -
    • -maxDaysQueued: Maximum number of days' worth of queued posts before the target increases and pauseDays decreases. The default is 365.
    • -
    • -pauseDays: Number of days until the next post can be submitted. The default is 3.
    • -
    • -runInterval: Number of hours until the next post run. The default is 12.
    • -
    • —lookback: The number of days to look back in time to review posting history. The default is 30.
    • -
    • -geminiSummaryFor: Generate a Gemini Gemtext format summary specifying months as a comma-separated string.
    • -
    • -geminiCapsules: Comma-separated list of Gemini capsules. Used to detect Gemtext links.
    • -
    • -gemtexterEnable: Add special tags for Gemtexter, the static site generator, to the Gemini Gemtext summary.
    • -
    • -dev: For internal development purposes only.
    • +
    • ...the first argument will be the session name (see tmux::tssh_from_argument helper function), and all remaining arguments will be server hostnames/FQDNs to connect to simultaneously.
    • +
    • ...or, the first argument is a file name, and the file contains a list of hostnames/FQDNs (see tmux::ssh_from_file helper function)

    -

    Examples


    -
    -*Dry run mode*
    +This is the function definition behind the tssh alias:
    +
    +
    +function tmux::cluster_ssh
    +    if test -f "$argv[1]"
    +        tmux::tssh_from_file $argv[1]
    +        return
    +    end
    +    tmux::tssh_from_argument $argv
    +end
    +

    -Dry run mode lets you simulate the entire posting process without actually sending the posts. This is useful for testing configurations or seeing what would happen before making real posts.
    +This function is just a wrapper around the more complex tmux::tssh_from_file and tmux::tssh_from_argument functions, as you have learned already. Most of the magic happens there.

    - -
    ./gos --dry
    +

    The tmux::tssh_from_argument helper


    +
    +This is the most magic helper function we will cover in this post. It looks like this:
    +
    +
    +function tmux::tssh_from_argument
    +    set -l session $argv[1]
    +    set first_server_or_container $argv[2]
    +    set remaining_servers $argv[3..-1]
    +    if test -z "$first_server_or_container"
    +        set first_server_or_container $session
    +    end
    +
    +    tmux new-session -d -s $session (_tmux::connect_command "$first_server_or_container")
    +    if not tmux list-session | grep "^$session:"
    +        echo "Could not create session $session"
    +        return 2
    +    end
    +    for server_or_container in $remaining_servers
    +        tmux split-window -t $session "tmux select-layout tiled; $(_tmux::connect_command "$server_or_container")"
    +    end
    +    tmux setw -t $session synchronize-panes on
    +    tmux -2 attach-session -t $session || tmux -2 switch-client -t $session
    +end
     

    -*Normal run*
    +It expects at least two arguments. The first argument is the session name to create for the clustered SSH session. All other arguments are server hostnames or FQDNs to which to connect. The first one is used to make the initial session. All remaining ones are added to that session with tmux split-window -t $session.... At the end, we enable synchronized panes by default, so whenever you type, the commands will be sent to every SSH connection, thus allowing the neat ClusterSSH feature to run commands on multiple servers simultaneously. Once done, we attach (or switch, if already in Tmux) to it.

    -Sharing to all platforms is as simple as the following (assuming it is configured correctly):
    +Sometimes, I don't want the synchronized panes behavior and want to switch it off temporarily. I can do that with prefix-key p and prefix-key P after adding the following to my local tmux.conf:

    - -
    ./gos 
    +
    +bind-key p setw synchronize-panes off
    +bind-key P setw synchronize-panes on
     

    -:-)
    +

    The tmux::tssh_from_file helper



    -Gos Screenshot
    +This one sets the session name to the file name and then reads a list of servers from that file, passing the list of servers to tmux::tssh_from_argument as the arguments. So, this is a neat little wrapper that also enables me to open clustered SSH sessions from an input file.

    -However, you will notice that no messages are queued to be posted yet (not like on the screenshot yet!). Relax and read on...
    +
    +function tmux::tssh_from_file
    +    set -l serverlist $argv[1]
    +    set -l session (basename $serverlist | cut -d. -f1)
    +    tmux::tssh_from_argument $session (awk '{ print $1 }' $serverlist | sed 's/.lan./.lan/g')
    +end
    +

    -

    Composing messages to be posted


    +

    tssh examples



    -To post messages using Gos, you need to create text files containing the posts' content. These files are placed inside the directory specified by the --gosDir flag (the default directory is ~/.gosdir). Each text file represents a single post and must have the .txt extension. You can also simply run gos --compose to compose a new entry. It will open simply a new text file in gosDir.
    +To open a new session named fish and log in to 4 remote hosts, run this command (Note that it is also possible to specify the remote user):

    -

    Basic structure of a message file


    +
    +$ tssh fish blowfish.buetow.org fishfinger.buetow.org \
    +    fishbone.buetow.org user@octopus.buetow.org
    +

    -Each text file should contain the message you want to post on the specified platforms. That's it. Example of a Basic Post File ~/.gosdir/samplepost.txt:
    +To open a new session named manyservers, put many servers (one FQDN per line) into a file called manyservers.txt and simply run:

    -This is a sample message to be posted on social media platforms.
    -
    -Maybe add a link here: https://foo.zone
    -
    -#foo #cool #gos #golang
    +$ tssh manyservers.txt
     

    -The message is just arbitrary text, and, besides inline share tags (see later in this document) at the beginning, Gos does not parse any of the content other than ensuring the overall allowed size for the social media platform isn't exceeded. If it exceeds the limit, Gos will prompt you to edit the post using your standard text editor (as specified by the EDITOR environment variable). When posting, all the hyperlinks, hashtags, etc., are interpreted by the social platforms themselves (e.g., Mastodon, LinkedIn).
    +

    Common Tmux commands I use in tssh



    -

    Adding share tags in the filename


    +These are default Tmux commands that I make heavy use of in a tssh session:

    -You can control which platforms a post is shared to, and manage other behaviors using tags embedded in the filename. Add tags in the format share:platform1.-platform2 to target specific platforms within the filename. This instructs Gos to share the message only to platform1 (e.g., Mastodon) and explicitly exclude platform2 (e.g., LinkedIn). You can include multiple platforms by listing them after share:, separated by a .. Use the - symbol to exclude a platform.
    +
      +
    • Press prefix-key DIRECTION to switch panes. DIRECTION is by default any of the arrow keys, but I also configured Vi keybindings.
    • +
    • Press prefix-key <space> to change the pane layout (can be pressed multiple times to cycle through them).
    • +
    • Press prefix-key z to zoom in and out of the current active pane.
    • +

    +

    Copy and paste workflow



    -Currently, only linkedin and mastodon are supported, and the shortcuts li and ma also work.
    +As you will see later in this blog post, I have configured a history limit of 1 million items in Tmux so that I can scroll back quite far. One main workflow of mine is to search for text in the Tmux history, select and copy it, and then switch to another window or session and paste it there (e.g., into my text editor to do something with it).

    -**Examples:**
    +This works by pressing prefix-key [ to enter Tmux copy mode. From there, I can browse the Tmux history of the current window using either the arrow keys or vi-like navigation (see vi configuration later in this blog post) and the Pg-Dn and Pg-Up keys.

    -
      -
    • To share only on Mastodon: ~/.gosdir/foopost.share:mastodon.txt
    • -
    • To exclude sharing on LinkedIn: ~/.gosdir/foopost.share:-linkedin.txt
    • -
    • To explicitly share on both LinkedIn and Mastodon: ~/.gosdir/foopost.share:linkedin:mastodon.txt
    • -
    • To explicitly share only on LinkedIn and exclude Mastodon: ~/.gosdir/foopost.share:linkedin:-mastodon.txt
    • -

    -Besides encoding share tags in the filename, they can also be embedded within the .txt file content to be queued. For example, a file named ~/.gosdir/foopost.txt with the following content:
    +I often search the history backwards with prefix-key [ followed by a ?, which opens the Tmux history search prompt.
    +
    +Once I have identified the terminal text to be copied, I enter visual select mode with v, highlight all the text to be copied (using arrow keys or Vi motions), and press y to yank it (sorry if this all sounds a bit complicated, but Vim/NeoVim users will know this, as it is pretty much how you do it there as well).
    +
    +For v and y to work, the following has to be added to the Tmux configuration file:

    -share:mastodon The content of the post here
    +bind-key -T copy-mode-vi 'v' send -X begin-selection
    +bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
     

    -or
    +Once the text is yanked, I switch to another Tmux window or session where, for example, a text editor is running and paste the yanked text from Tmux into the editor with prefix-key ]. Note that when pasting into a modal text editor like Vi or Helix, you would first need to enter insert mode before prefix-key ] would paste anything.
    +
    +

    Tmux configurations


    +
    +Some features I have configured directly in Tmux don't require an external shell alias to function correctly. Let's walk line by line through my local ~/.config/tmux/tmux.conf:

    -share:mastodon
    -
    -The content of the post is here https://some.foo/link
    +source ~/.config/tmux/tmux.local.conf
     
    -#some #hashtags
    +set-option -g allow-rename off
    +set-option -g history-limit 100000
    +set-option -g status-bg '#444444'
    +set-option -g status-fg '#ffa500'
    +set-option -s escape-time 0
     

    -Gos will parse this content, extract the tags, and queue it as ~/.gosdir/db/platforms/mastodon/foopost.share:mastodon.extracted.txt.... (see how post queueing works later in this document).
    -
    -

    Using the prio tag


    +There's yet to be much magic happening here. I source a tmux.local.conf, which I sometimes use to override the default configuration that comes from the configuration management system. But it is mostly just an empty file, so it doesn't throw any errors on Tmux startup when I don't use it.

    -Gos randomly picks any queued message without any specific order or priority. However, you can assign a higher priority to a message. The priority determines the order in which posts are processed, with messages without a priority tag being posted last and those with priority tags being posted first. If multiple messages have the priority tag, then a random message will be selected from them.
    +I work with many terminal outputs, which I also like to search within Tmux. So, I added a large enough history-limit, enabling me to search backwards in Tmux for any output up to a million lines of text.

    -*Examples using the Priority tag:*
    +Besides changing some colours (personal taste), I also set escape-time to 0, which is just a workaround. Otherwise, my Helix text editor's ESC key would take ages to trigger within Tmux. I am trying to remember the gory details. You can leave it out; if everything works fine for you, leave it out.

    -
      -
    • To share only on Mastodon: ~/.gosdir/foopost.prio.share:mastodon.txt
    • -
    • To not share on LinkedIn: ~/.gosdir/foopost.prio.share:-linkedin.txt
    • -
    • To explicitly share on both: ~/.gosdir/foopost.prio.share:linkedin:mastodon.txt
    • -
    • To explicitly share on only linkedin: ~/.gosdir/foopost.prio.share:linkedin:-mastodon.txt
    • -

    -There is more: you can also use the soon tag. It is almost the same as the prio tag, just with one lower priority.
    +The next lines in the configuration file are:

    -

    More tags


    +
    +set-window-option -g mode-keys vi
    +bind-key -T copy-mode-vi 'v' send -X begin-selection
    +bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
    +

    -
      -
    • A .ask. in the filename will prompt you to choose whether to queue, edit, or delete a file before queuing it.
    • -
    • A .now. in the filename will schedule a post immediately, regardless of the target status.
    • -

    -So you could also have filenames like those:
    +I navigate within Tmux using Vi keybindings, so the mode-keys is set to vi. I use the Helix modal text editor, which is close enough to Vi bindings for simple navigation to feel "native" to me. (By the way, I have been a long-time Vim and NeoVim user, but I eventually switched to Helix. It's off-topic here, but it may be worth another blog post once.)

    -
      -
    • ~/.gosdir/foopost.ask.txt
    • -
    • ~/.gosdir/foopost.now.txt
    • -
    • ~/.gosdir/foopost.ask.share:mastodon.txt
    • -
    • ~/.gosdir/foopost.ask.prio.share:mastodon.txt
    • -
    • ~/.gosdir/foopost.ask.now.share:-mastodon.txt
    • -
    • ~/.gosdir/foopost.now.share:-linkedin.txt
    • -

    -etc...
    +The two bind-key commands make it so that I can use v and y in copy mode, which feels more Vi-like (as already discussed earlier in this post).

    -All of the above also works with embedded tags. E.g.:
    +The next set of lines in the configuration file are:

    -share:mastodon,ask,prio Hello wold :-)
    +bind-key h select-pane -L
    +bind-key j select-pane -D
    +bind-key k select-pane -U
    +bind-key l select-pane -R
    +
    +bind-key H resize-pane -L 5
    +bind-key J resize-pane -D 5
    +bind-key K resize-pane -U 5
    +bind-key L resize-pane -R 5
     

    -or
    +These allow me to use prefix-key h, prefix-key j, prefix-key k, and prefix-key l for switching panes and prefix-key H, prefix-key J, prefix-key K, and prefix-key L for resizing the panes. If you don't know Vi/Vim/NeoVim, the letters hjkl are commonly used there for left, down, up, and right, which is also the same for Helix, by the way.
    +
    +The next set of lines in the configuration file are:

    -share:mastodon,ask,prio
    -
    -Hello World :-)
    +bind-key c new-window -c '#{pane_current_path}'
    +bind-key F new-window -n "session-switcher" "tmux list-sessions | fzf | cut -d: -f1 | xargs tmux switch-client -t"
    +bind-key T choose-tree
     

    -

    The gosc binary


    +The first one is that any new window starts in the current directory. The second one is more interesting. I list all open sessions in the fuzzy finder. I rely heavily on this during my daily workflow to switch between various sessions depending on the task. E.g. from a remote cluster SSH session to a local code editor.

    -gosc stands for Gos Composer and will simply launch your $EDITOR on a new text file in the gosDir. It's the same as running gos --compose, really. It is a quick way of composing new posts. Once composed, it will ask for your confirmation on whether the message should be queued or not.
    +The third one, choose-tree, opens a tree view in Tmux listing all sessions and windows. This one is handy to get a better overview of what is currently running in any local Tmux session. It looks like this (it also allows me to press a hotkey to switch to a particular Tmux window):

    -

    How queueing works in gos


    +Tmux sessiont tree view

    -When you place a message file in the gosDir, Gos processes it by moving the message through a queueing system before posting it to the target social media platforms. A message's lifecycle includes several key stages, from creation to posting, all managed through the ./db/platforms/PLATFORM directories.
    +The last remaining lines in my configuration file are:
    +
    +
    +bind-key p setw synchronize-panes off
    +bind-key P setw synchronize-panes on
    +bind-key r source-file ~/.config/tmux/tmux.conf \; display-message "tmux.conf reloaded"
    +

    -

    Step-by-step queueing process


    +We discussed synchronized panes earlier. I use it all the time in clustered SSH sessions. When enabled, all panes (remote SSH sessions) receive the same keystrokes. This is very useful when you want to run the same commands on many servers at once, such as navigating to a common directory, restarting a couple of services at once, or running tools like htop to quickly monitor system resources.

    -1. Inserting a Message into gosDir: You start by creating a text file that represents your post (e.g., foo.txt) and placing it in the gosDir. When Gos runs, this file is processed. The easiest way is to use gosc here.
    +The last one reloads my Tmux configuration on the fly.

    -2. Moving to the Queue: Upon running Gos, the tool identifies the message in the gosDir and places it into the queue for the specified platform. The message is moved into the appropriate directory for each platform in ./db/platforms/PLATFORM. During this stage, the message file is renamed to include a timestamp indicating when it was queued and given a .queued extension.
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -*Example: If a message is queued for LinkedIn, the filename might look like this:*
    +Other related posts are:

    -
    -~/.gosdir/db/platforms/linkedin/foo.share:-mastodon.txt.20241022-102343.queued
    -
    +2026-02-02 A tmux popup editor for Cursor Agent CLI prompts
    +2025-05-02 Terminal multiplexing with tmux - Fish edition (You are currently reading this)
    +2024-06-23 Terminal multiplexing with tmux - Z-Shell edition

    -3. Posting the Message: Once a message is placed in the queue, Gos posts it to the specified social media platforms.
    +Back to the main site
    +
    +
    +
    + + 'When: The Scientific Secrets of Perfect Timing' book notes + + gemini://foo.zone/gemfeed/2025-04-19-when-book-notes.gmi + 2025-04-19T10:26:05+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + These are my personal book notes from Daniel Pink's 'When: The Scientific Secrets of Perfect Timing.' They are for me, but I hope they might be useful to you too. + +
    +

    "When: The Scientific Secrets of Perfect Timing" book notes



    -4. Renaming to .posted: After a message is successfully posted to a platform, the corresponding .queued file is renamed to have a .posted extension, and the filename timestamp is also updated. This signals that the post has been processed and published.
    +Published at 2025-04-19T10:26:05+03:00

    -*Example - After a successful post to LinkedIn, the message file might look like this:*
    +These are my personal book notes from Daniel Pink's "When: The Scientific Secrets of Perfect Timing." They are for me, but I hope they might be useful to you too.

    -./db/platforms/linkedin/foo.share:-mastodon.txt.20241112-121323.posted
    +	  __
    + (`/\
    + `=\/\ __...--~~~~~-._   _.-~~~~~--...__
    +  `=\/\               \ /               \\
    +   `=\/                V                 \\
    +   //_\___--~~~~~~-._  |  _.-~~~~~~--...__\\
    +  //  ) (..----~~~~._\ | /_.~~~~----.....__\\
    + ===( INK )==========\\|//====================
    +__ejm\___/________dwb`---`______________________
     

    -

    How message selection works in gos


    -
    -Gos decides which messages to post using a combination of priority, platform-specific tags, and timing rules. The message selection process ensures that messages are posted according to your configured cadence and targets while respecting pauses between posts and previously met goals.
    -
    -The key factors in message selection are:
    +

    Table of Contents




    -

    Database replication


    +You are a different kind of organism based on the time of day. For example, school tests show worse results later in the day, especially if there are fewer computers than students available. Every person has a chronotype, such as a late or early peaker, or somewhere in the middle (like most people). You can assess your chronotype here:
    +
    +Chronotype Assessment

    -I simply use Syncthing to backup/sync my gosDir. Note, that I run Gos on my personal laptop. No need to run it from a server.
    +Following your chronotype can lead to more happiness and higher job satisfaction.

    -https://syncthing.net
    +

    Daily Rhythms



    -

    Post summary as gemini gemtext


    +Peak, Trough, Rebound (Recovery): Most people experience these periods throughout the day. It's best to "eat the frog" or tackle daunting tasks during the peak. A twin peak exists every day, with mornings and early evenings being optimal for most people. Negative moods follow the opposite pattern, peaking in the afternoon. Light helps adjust but isn't the main driver of our internal clock. Like plants, humans have intrinsic rhythms.

    -For my blog, I want to post a summary of all the social messages posted over the last couple of months. For an example, have a look here:
    +

    Optimal Task Timing



    -./2025-01-01-posts-from-october-to-december-2024.html
    +
      +
    • Analytical work requiring sharpness and focus is best at the peak.
    • +
    • Creative work is more effective during non-peak times.
    • +
    • Biorhythms can sway performance by up to twenty percent.
    • +

    +

    Exercise Timing



    -To accomplish this, run:
    +Exercise in the morning to lose weight; you burn up to twenty percent more fat if you exercise before eating. Exercising after eating aids muscle gain, using the energy from the food. Morning exercises elevate mood, with the effect lasting all day. They also make forming a habit easier. The late afternoon is best for athletic performance due to optimal body temperature, reducing injury risk.

    - -
    gos --geminiSummaryFor 202410,202411,202412
    -
    +

    Drinking Habits



    -This outputs the summary for the three specified months, as shown in the example. The summary includes posts from all social media networks but removes duplicates.
    +
      +
    • Drink water in the morning to counter mild dehydration upon waking.
    • +
    • Delay coffee consumption until cortisol production peaks an hour or 90 minutes after waking. This helps avoid caffeine resistance.
    • +
    • For an afternoon boost, have coffee once cortisol levels drop.
    • +

    +

    Afternoon Challenges ("Bermuda Triangle")



    -Also, add the --gemtexterEnable flag, if you are using Gemtexter:
    +
      +
    • Mistakes are more common in hospitals during this period, like incorrect antibiotic subscriptions or missed handwashing.
    • +
    • Traffic accidents and unfavorable judge decisions occur more frequently in the afternoon.
    • +
    • 2:55 pm is the least productive time of the day.
    • +

    +

    Breaks and Productivity



    +Short, restorative breaks enhance performance. Student exam results improved with a half-hour break beforehand. Even micro-breaks can be beneficial—hourly five-minute walking breaks can increase productivity as much as 30-minute walks. Nature-based breaks are more effective than indoor ones, and full detachment in breaks is essential for restoration. Physical activity during breaks boosts concentration and productivity more than long walks do. Complete detachment from work during breaks is critical.

    - -
    gos --gemtexterEnable --geminiSummaryFor 202410,202411,202412
    -
    +

    Napping



    -Gemtexter
    +Short naps (10-20 minutes) significantly enhance mood, alertness, and cognitive performance, improving learning and problem-solving abilities. Napping increases with age, benefiting mood, flow, and overall health. A "nappuccino," or napping after coffee, offers a double boost, as caffeine takes around 25 minutes to kick in.

    -In case there are HTTP links that translate directly to the Geminispace for certain capsules, specify the Gemini capsules as a comma-separated list as follows:
    +

    Scheduling Breaks



    - -
    gos --gemtexterEnable --geminiSummaryFor 202410,202411,202412 --geminiCapsules "foo.zone,paul.buetow.org"
    -
    +
      +
    • Track breaks just as you do with tasks—aim for three breaks a day.
    • +
    • Every 25 minutes, look away and daydream for 20 seconds, or engage in short exercises.
    • +
    • Meditating for even three minutes is a highly effective restorative activity.
    • +
    • The "Fresh Start Effect" (e.g., beginning a diet on January 1st or a new week) impacts motivation, as does recognizing progress. At the end of each day, spends two minutes to write down accomplishments.
    • +

    +

    Final Impressions



    -It will then also generate Gemini Gemtext links in the summary page and flag them with (Gemini).
    +
      +
    • The concluding experience of a vacation significantly influences overall memories.
    • +
    • Restaurant reviews often hinge on the end of the visit, highlighting extras like wrong bills or additional desserts.
    • +
    • Considering one's older future self can motivate improvements in the present.
    • +

    +

    The Midlife U Curve



    -

    Conclusion


    +Life satisfaction tends to dip in midlife, around the forties, but increases around age 54.

    -Overall, this was a fun little Go project with practical use for me personally. I hope you also had fun reading this, and maybe you will use it as well.
    +

    Project Management Tips


    +
    +
      +
    • Halfway through a project, there's a concentrated work effort ("Oh Oh Effect"), similar to an alarm when slightly behind schedule.
    • +
    • Recognizing daily accomplishments can elevate motivation and satisfaction.
    • +

    +These insights from "When" can guide actions to optimize performance, well-being, and satisfaction across various aspects of life.

    E-Mail your comments to paul@nospam.buetow.org :-)

    +Other book notes of mine are:
    +
    +2025-11-02 'The Courage To Be Disliked' book notes
    +2025-06-07 'A Monk's Guide to Happiness' book notes
    +2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes (You are currently reading this)
    +2024-10-24 'Staff Engineer' book notes
    +2024-07-07 'The Stoic Challenge' book notes
    +2024-05-01 'Slow Productivity' book notes
    +2023-11-11 'Mind Management' book notes
    +2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    +2023-05-06 'The Obstacle is the Way' book notes
    +2023-04-01 'Never split the difference' book notes
    +2023-03-16 'The Pragmatic Programmer' book notes
    +
    Back to the main site
    - Random Weird Things - Part Ⅱ - - gemini://foo.zone/gemfeed/2025-02-08-random-weird-things-ii.gmi - 2025-02-08T11:06:16+02:00 + f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs + + gemini://foo.zone/gemfeed/2025-04-05-f3s-kubernetes-with-freebsd-part-4.gmi + 2025-04-04T23:21:01+03:00, last updated Fri 26 Dec 08:51:06 EET 2025 Paul Buetow aka snonux paul@dev.buetow.org - Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. This is the second run. + This is the fourth blog post about the f3s series for self-hosting demands in a home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.
    -

    Random Weird Things - Part Ⅱ


    +

    f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs



    -Published at 2025-02-08T11:06:16+02:00
    +Published at 2025-04-04T23:21:01+03:00, last updated Fri 26 Dec 08:51:06 EET 2025

    -Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. This is the second run.
    +This is the fourth blog post about the f3s series for self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.

    -2024-07-05 Random Weird Things - Part Ⅰ
    -2025-02-08 Random Weird Things - Part Ⅱ (You are currently reading this)
    -2025-08-15 Random Weird Things - Part Ⅲ
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs (You are currently reading this)
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -
    -/\_/\           /\_/\
    -( o.o ) WHOA!! ( o.o )
    -> ^ <           > ^ <
    -/   \    MOEEW! /   \
    -/______\       /______\
    -
    +f3s logo

    Table of Contents




    -

    11. The SQLite codebase is a gem


    +

    Introduction



    -Check this out:
    +In this blog post, we are going to install the Bhyve hypervisor.

    -SQLite Gem
    +The FreeBSD Bhyve hypervisor is a lightweight, modern hypervisor that enables virtualization on FreeBSD systems. Bhyve's strengths include its minimal overhead, which allows it to achieve near-native performance for virtual machines. It's efficient and lightweight, leveraging the capabilities of the FreeBSD operating system for performance and network management.

    -Source:
    +https://wiki.freebsd.org/bhyve

    -https://wetdry.world/@memes/112717700557038278
    +Bhyve supports running various guest operating systems, including FreeBSD, Linux, and Windows, on hardware platforms that support hardware virtualization extensions (such as Intel VT-x or AMD-V). In our case, we are going to virtualize Rocky Linux, which will later in this series be used to run k3s.

    -

    Go Programming


    +

    Check for POPCNT CPU support



    -

    12. Official Go font


    +POPCNT is a CPU instruction that counts the number of set bits (ones) in a binary number. CPU virtualization and Bhyve support for the POPCNT instruction are important because guest operating systems utilize this instruction to perform various tasks more efficiently. If the host CPU supports POPCNT, Bhyve can pass this capability to virtual machines for better performance. Without POPCNT support, some applications might not run or perform sub-optimally in virtualized environments.

    -The Go programming language has its own official font, called "Go Font." There's a monospace version for code and a proportional one for regular text.
    +To check for POPCNT support, run:

    -Check out some Go code displayed using the Go font:
    + +
    paul@f0:~ % dmesg | grep 'Features2=.*POPCNT'
    +  Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,
    +	FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,
    +	OSXSAVE,AVX,F16C,RDRAND>
    +

    -Go font code
    +So it's there! All good.

    -https://go.dev/blog/go-fonts
    +

    Basic Bhyve setup



    -I found it interesting and/or weird, as Go is a programming language. Why should it bother having its own font? I have never seen another open-source project like Go do this. But I also like it. Maybe I will use it in the future for this blog :-)
    +For managing the Bhyve VMs, we are using vm-bhyve, a tool not part of the FreeBSD operating system but available as a ready-to-use package. It eases VM management and reduces a lot of overhead. We also install the required package to make Bhyve work with the UEFI firmware.

    -

    13. Go functions can have methods


    +https://github.com/churchers/vm-bhyve

    -Functions on struct types? Well known. Functions on types like int and string? It's also known of, but a bit lesser. Functions on function types? That sounds a bit funky, but it's possible, too! For demonstration, have a look at this snippet:
    +The following commands are executed on all three hosts f0, f1, and f2, where re0 is the name of the Ethernet interface (which may need to be adjusted if your hardware is different):

    -
    package main
    -
    -import "log"
    -
    -type fun func() string
    -
    -func (f fun) Bar() string {
    -        return "Bar"
    -}
    -
    -func main() {
    -        var f fun = func() string {
    -                return "Foo"
    -        }
    -        log.Println("Example 1: ", f())
    -        log.Println("Example 2: ", f.Bar())
    -        log.Println("Example 3: ", fun(f.Bar).Bar())
    -        log.Println("Example 4: ", fun(fun(f.Bar).Bar).Bar())
    -}
    +
    paul@f0:~ % doas pkg install vm-bhyve bhyve-firmware
    +paul@f0:~ % doas sysrc vm_enable=YES
    +vm_enable:  -> YES
    +paul@f0:~ % doas sysrc vm_dir=zfs:zroot/bhyve
    +vm_dir:  -> zfs:zroot/bhyve
    +paul@f0:~ % doas zfs create zroot/bhyve
    +paul@f0:~ % doas vm init
    +paul@f0:~ % doas vm switch create public
    +paul@f0:~ % doas vm switch add public re0
     

    -It runs just fine:
    +Bhyve stores all its data in the /bhyve of the zroot ZFS pool:

    -
    ❯ go run main.go
    -2025/02/07 22:56:14 Example 1:  Foo
    -2025/02/07 22:56:14 Example 2:  Bar
    -2025/02/07 22:56:14 Example 3:  Bar
    -2025/02/07 22:56:14 Example 4:  Bar
    -
    -
    -

    macOS


    -
    -For personal computing, I don't use Apple, but I have to use it for work.
    -
    -

    14. ß and ss are treated the same


    -
    -Know German? In German, the letter "sharp s" is written as ß. ß is treated the same as ss on macOS.
    -
    -On a case-insensitive file system like macOS, not only are uppercase and lowercase letters treated the same, but non-Latin characters like the German "ß" are also considered equivalent to their Latin counterparts (in this case, "ss").
    +
    paul@f0:~ % zfs list | grep bhyve
    +zroot/bhyve                                   1.74M   453G  1.74M  /zroot/bhyve
    +

    -So, even though "Maß" and "Mass" are not strictly equivalent, the macOS file system still treats them as the same filename due to its handling of Unicode characters. This can sometimes lead to unexpected behaviour. Check this out:
    +For convenience, we also create this symlink:

    -
    ❯ touch Maß
    -❯ ls -l
    --rw-r--r--@ 1 paul  wheel  0 Feb  7 23:02 Maß
    -❯ touch Mass
    -❯ ls -l
    --rw-r--r--@ 1 paul  wheel  0 Feb  7 23:02 Maß
    -❯ rm Mass
    -❯ ls -l
    -
    -❯ touch Mass
    -❯ ls -ltr
    --rw-r--r--@ 1 paul  wheel  0 Feb  7 23:02 Mass
    -❯ rm Maß
    -❯ ls -l
    +
    paul@f0:~ % doas ln -s /zroot/bhyve/ /bhyve
     
     

    -

    15. Colon as file path separator


    -
    -MacOS can use the colon as a file path separator on its ADFS (file system). A typical ADFS file pathname on a hard disc might be:
    +Now, Bhyve is ready to rumble, but no VMs are there yet:

    -
    -ADFS::4.$.Documents.Techwriter.Myfile
    +
    +
    paul@f0:~ % doas vm list
    +NAME  DATASTORE  LOADER  CPU  MEMORY  VNC  AUTO  STATE
     

    -I can't reproduce this on my (work) Mac, though, as it now uses the APFS file system. In essence, ADFS is an older file system, while APFS is a contemporary file system optimized for Apple's modern devices.
    -
    -https://social.jvns.ca/@b0rk/113041293527832730
    -
    -

    16. Polyglots - programs written in multiple languages


    -
    -A coding polyglot is a program that runs in multiple programming languages without any changes. People usually write them as a fun challenge — you exploit syntax overlaps between languages to make the same file valid (and meaningful) in each one.
    +

    Rocky Linux VMs



    -Check out my very own polyglot:
    +As guest VMs I decided to use Rocky Linux.

    -The fibonatti.pl.c Polyglot
    +Using Rocky Linux 9 as a VM-based OS is beneficial primarily because of its long-term support and stable release cycle. This ensures a reliable environment that receives security updates and bug fixes for an extended period, reducing the need for frequent upgrades.

    -

    17. Languages, where indices start at 1


    +Rocky Linux is community-driven and aims to be fully compatible with enterprise Linux, making it a solid choice for consistency and performance in various deployment scenarios.

    -Array indices start at 1 instead of 0 in some programming languages, known as one-based indexing. This can be controversial because zero-based indexing is more common in popular languages like C, C++, Java, and Python. One-based indexing can lead to off-by-one errors when developers switch between languages with different indexing schemes.
    +https://rockylinux.org/

    -Languages with One-Based Indexing:
    +

    ISO download



    -
      -
    • Fortran
    • -
    • MATLAB
    • -
    • Lua
    • -
    • R (for vectors and lists)
    • -
    • Smalltalk
    • -
    • Julia (by default, although zero-based indexing is also possible)
    • -

    -foo.lua example:
    +We're going to install the Rocky Linux from the latest minimal iso:

    -
    arr = {10, 20, 30, 40, 50}
    -print(arr[1]) -- Accessing the first element
    +
    paul@f0:~ % doas vm iso \
    + https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.5-x86_64-minimal.iso
    +/zroot/bhyve/.iso/Rocky-9.5-x86_64-minimal.iso        1808 MB 4780 kBps 06m28s
    +paul@f0:/bhyve % doas vm create rocky
     

    +

    VM configuration


    +
    +The default Bhyve VM configuration looks like this now:
    +
    -
    ❯ lua foo.lua
    -10
    +
    paul@f0:/bhyve/rocky % cat rocky.conf
    +loader="bhyveload"
    +cpu=1
    +memory=256M
    +network0_type="virtio-net"
    +network0_switch="public"
    +disk0_type="virtio-blk"
    +disk0_name="disk0.img"
    +uuid="1c4655ac-c828-11ef-a920-e8ff1ed71ca0"
    +network0_mac="58:9c:fc:0d:13:3f"
     

    -One-based indexing is more natural for human-readable, mathematical, and theoretical contexts, where counting traditionally starts from one.
    +The uuid and the network0_mac differ for each of the three VMs (the ones being installed on f0, f1 and f2).

    -

    18. Perl Poetry


    +But to make Rocky Linux boot it (plus some other adjustments, e.g. as we intend to run the majority of the workload in the k3s cluster running on those Linux VMs, we give them beefy specs like 4 CPU cores and 14GB RAM). So we run doas vm configure rocky and modified it to:

    -Perl Poetry is a playful and creative practice within the programming community where Perl code is written as a poem. These poems are crafted to be syntactically valid Perl code and make sense as poetic text, often with whimsical or humorous intent. This showcases Perl's flexibility and expressiveness, as well as the creativity of its programmers.
    +
    +guest="linux"
    +loader="uefi"
    +uefi_vars="yes"
    +cpu=4
    +memory=14G
    +network0_type="virtio-net"
    +network0_switch="public"
    +disk0_type="virtio-blk"
    +disk0_name="disk0.img"
    +graphics="yes"
    +graphics_vga=io
    +uuid="1c45400b-c828-11ef-8871-e8ff1ed71cac"
    +network0_mac="58:9c:fc:0d:13:3f"
    +

    -See this Poetry of my own; the Perl interpreter does not yield any syntax error parsing that. But also, the Peom doesn't do anything useful then executed:
    +

    VM installation


    +
    +To start the installer from the downloaded ISO, we run:

    -
    # (C) 2006 by Paul C. Buetow
    -
    -Christmas:{time;#!!!
    -
    -Children: do tell $wishes;
    -
    -Santa: for $each (@children) { 
    -BEGIN { read $each, $their, wishes and study them; use Memoize#ing
    -
    -} use constant gift, 'wrapping'; 
    -package Gifts; pack $each, gift and bless $each and goto deliver
    -or do import if not local $available,!!! HO, HO, HO;
    -
    -redo Santa, pipe $gifts, to_childs;
    -redo Santa and do return if last one, is, delivered; 
    -
    -deliver: gift and require diagnostics if our $gifts ,not break;
    -do{ use NEXT; time; tied $gifts} if broken and dump the, broken, ones;
    -The_children: sleep and wait for (each %gift) and try { to => untie $gifts };
    -
    -redo Santa, pipe $gifts, to_childs;
    -redo Santa and do return if last one, is, delivered; 
    -
    -The_christmas_tree: formline s/ /childrens/, $gifts;
    -alarm and warn if not exists $Christmas{ tree}, @t, $ENV{HOME};  
    -write <<EMail
    - to the parents to buy a new christmas tree!!!!111
    - and send the
    -EMail
    -;wait and redo deliver until defined local $tree;
    -
    -redo Santa, pipe $gifts, to_childs;
    -redo Santa and do return if last one, is, delivered ;}
    -
    -END {} our $mission and do sleep until next Christmas ;}
    +
    paul@f0:~ % doas vm install rocky Rocky-9.5-x86_64-minimal.iso
    +Starting rocky
    +  * found guest in /zroot/bhyve/rocky
    +  * booting...
     
    -__END__
    +paul@f0:/bhyve/rocky % doas vm list
    +NAME   DATASTORE  LOADER  CPU  MEMORY  VNC           AUTO  STATE
    +rocky  default    uefi    4    14G     0.0.0.0:5900  No    Locked (f0.lan.buetow.org)
     
    -This is perl, v5.8.8 built for i386-freebsd-64int
    +paul@f0:/bhyve/rocky % doas sockstat -4 | grep 5900
    +root     bhyve       6079 8   tcp4   *:5900                *:*
     

    -More Perl Poetry of mine
    +Port 5900 now also opens for VNC connections, so I connected it with a VNC client and ran through the installation dialogues. This could be done unattended or more automated, but there are only three VMs to install, and the automation doesn't seem worth it as we do it only once a year or less often.

    -

    19. CSS3 is turing complete


    +

    Increase of the disk image



    -Turns out CSS3 is Turing complete — you can simulate a Turing machine using nothing but CSS animations and styles, no JavaScript needed. Keyframe animations can encode state transitions and perform calculations, which is wild considering CSS is supposed to just make things look pretty.
    +By default, the VM disk image is only 20G, which is a bit small for our purposes, so we have to stop the VMs again, run truncate on the image file to enlarge them to 100G, and restart the installation:

    -Is CSS turing complete?
    + +
    paul@f0:/bhyve/rocky % doas vm stop rocky
    +paul@f0:/bhyve/rocky % doas truncate -s 100G disk0.img
    +paul@f0:/bhyve/rocky % doas vm install rocky Rocky-9.5-x86_64-minimal.iso
    +

    -Check out this 100% CSS implementation of the Conways Game of Life:
    +

    Connect to VNC



    -
    +For the installation, I opened the VNC client on my Fedora laptop (GNOME comes with a simple VNC client) and manually ran through the base installation for each of the VMs. Again, I am sure this could have been automated a bit more, but there were just three VMs, and it wasn't worth the effort. The three VNC addresses of the VMs were vnc://f0:5900, vnc://f1:5900, and vnc://f2:5900.

    -CSS Conways Game of Life
    +

    -Conway's Game of Life is Turing complete because it can simulate a universal Turing machine, meaning it can perform any computation that a computer can, given the right initial conditions and sufficient time and space. Suppose a language can implement Conway's Game of Life. In that case, it demonstrates the language's ability to handle complex state transitions and computations. It has the necessary constructs (like iteration, conditionals, and data manipulation) to simulate any algorithm, thus confirming its Turing completeness.
    +

    -

    20. The biggest shell programs


    +I primarily selected the default settings (auto partitioning on the 100GB drive and a root user password). After the installation, the VMs were rebooted.

    -One would think that shell scripts are only suitable for small tasks. Well, I must be wrong, as there are huge shell programs out there (up to 87k LOC) which aren't auto-generated but hand-written!
    +

    -The Biggest Sell Programs in the World
    +

    -My Gemtexter (bash) is only 1329 LOC as of now. So it's tiny.
    +

    After install



    -Gemtexter - One Bash script to rule it all
    +We perform the following steps for all three VMs. In the following, the examples are all executed on f0 (the VM r0 running on f0):

    -I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)
    +

    VM auto-start after host reboot



    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts - - gemini://foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi - 2025-01-30T09:22:06+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the third blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution we will use on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts


    +To automatically start the VM on the servers, we add the following to the rc.conf on the FreeBSD hosts:

    -Published at 2025-01-30T09:22:06+02:00
    + +
    paul@f0:/bhyve/rocky % cat <<END | doas tee -a /etc/rc.conf
    +vm_list="rocky"
    +vm_delay="5"
    +

    -This is the third blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution we will use on FreeBSD-based physical machines.
    +The vm_delay isn't really required. It is used to wait 5 seconds before starting each VM, but there is currently only one VM per host. Maybe later, when there are more, this will be useful. After adding, there's now a Yes indicator in the AUTO column.

    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts (You are currently reading this)
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    + +
    paul@f0:~ % doas vm list
    +NAME   DATASTORE  LOADER  CPU  MEMORY  VNC           AUTO     STATE
    +rocky  default    uefi    4    14G     0.0.0.0:5900  Yes [1]  Running (2063)
    +
    +
    +

    Static IP configuration


    +
    +After that, we change the network configuration of the VMs to be static (from DHCP) here. As per the previous post of this series, the three FreeBSD hosts were already in my /etc/hosts file:
    +
    +
    +192.168.1.130 f0 f0.lan f0.lan.buetow.org
    +192.168.1.131 f1 f1.lan f1.lan.buetow.org
    +192.168.1.132 f2 f2.lan f2.lan.buetow.org
    +
    +
    +For the Rocky VMs, we add those to the FreeBSD host systems as well:
    +
    + +
    paul@f0:/bhyve/rocky % cat <<END | doas tee -a /etc/hosts
    +192.168.1.120 r0 r0.lan r0.lan.buetow.org
    +192.168.1.121 r1 r1.lan r1.lan.buetow.org
    +192.168.1.122 r2 r2.lan r2.lan.buetow.org
    +END
    +
    +
    +And we configure the IPs accordingly on the VMs themselves by opening a root shell via SSH to the VMs and entering the following commands on each of the VMs:

    -f3s logo
    + +
    [root@r0 ~] % nmcli connection modify enp0s5 ipv4.address 192.168.1.120/24
    +[root@r0 ~] % nmcli connection modify enp0s5 ipv4.gateway 192.168.1.1
    +[root@r0 ~] % nmcli connection modify enp0s5 ipv4.DNS 192.168.1.1
    +[root@r0 ~] % nmcli connection modify enp0s5 ipv4.method manual
    +[root@r0 ~] % nmcli connection down enp0s5
    +[root@r0 ~] % nmcli connection up enp0s5
    +[root@r0 ~] % hostnamectl set-hostname r0.lan.buetow.org
    +[root@r0 ~] % cat <<END >>/etc/hosts
    +192.168.1.120 r0 r0.lan r0.lan.buetow.org
    +192.168.1.121 r1 r1.lan r1.lan.buetow.org
    +192.168.1.122 r2 r2.lan r2.lan.buetow.org
    +END
    +

    -

    Table of Contents


    +Whereas:


    -

    Introduction


    -
    -In this blog post, we are setting up the UPS for the cluster. A UPS, or Uninterruptible Power Supply, safeguards my cluster from unexpected power outages and surges. It acts as a backup battery that kicks in when the electricity cuts out—especially useful in my area, where power cuts are frequent—allowing for a graceful system shutdown and preventing data loss and corruption. This is especially important since I will also store some of my data on the f3s nodes.
    +

    Permitting root login



    -

    Changes since last time


    +As these VMs aren't directly reachable via SSH from the internet, we enable root login by adding a line with PermitRootLogin yes to /etc/sshd/sshd_config.

    -

    FreeBSD upgrade from 14.1 to 14.2


    +Once done, we reboot the VM by running reboot inside the VM to test whether everything was configured and persisted correctly.

    -There has been a new release since the last blog post in this series. The upgrade from 14.1 was as easy as:
    +After reboot, we copy a public key over. E.g. I did this from my Laptop as follows:

    -
    paul@f0: ~ % doas freebsd-update fetch
    -paul@f0: ~ % doas freebsd-update install
    -paul@f0: ~ % doas freebsd-update -r 14.2-RELEASE upgrade
    -paul@f0: ~ % doas freebsd-update install
    -paul@f0: ~ % doas shutdown -r now
    +
    % for i in 0 1 2; do ssh-copy-id root@r$i.lan.buetow.org; done
     

    -And after rebooting, I ran:
    +Then, we edit the /etc/ssh/sshd_config file again on all three VMs and configure PasswordAuthentication no to only allow SSH key authentication from now on.
    +
    +

    Install latest updates



    -
    paul@f0: ~ % doas freebsd-update install
    -paul@f0: ~ % doas pkg update
    -paul@f0: ~ % doas pkg upgrade
    -paul@f0: ~ % doas shutdown -r now
    +
    [root@r0 ~] % dnf update
    +[root@r0 ~] % reboot
     

    -And after another reboot, I was on 14.2:
    +

    Stress testing CPU


    +
    +The aim is to prove that bhyve VMs are CPU efficient. As I could not find an off-the-shelf benchmarking tool available in the same version for FreeBSD as well as for Rocky Linux 9, I wrote my own silly CPU benchmarking tool in Go:

    -
    paul@f0:~ % uname -a
    -FreeBSD f0.lan.buetow.org 14.2-RELEASE FreeBSD 14.2-RELEASE 
    - releng/14.2-n269506-c8918d6c7412 GENERIC amd64
    +
    package main
    +
    +import "testing"
    +
    +func BenchmarkCPUSilly1(b *testing.B) {
    +	for i := 0; i < b.N; i++ {
    +		_ = i * i
    +	}
    +}
    +
    +func BenchmarkCPUSilly2(b *testing.B) {
    +	var sillyResult float64
    +	for i := 0; i < b.N; i++ {
    +		sillyResult += float64(i)
    +		sillyResult *= float64(i)
    +		divisor := float64(i) + 1
    +		if divisor > 0 {
    +			sillyResult /= divisor
    +		}
    +	}
    +	_ = sillyResult // to avoid compiler optimization
    +}
     

    -And, of course, I ran this on all 3 nodes!
    -
    -

    A new home (behind the TV)


    -
    -I've put all the infrastructure behind my TV, as plenty of space is available. The TV hides most of the setup, which drastically improved the SAF (spouse acceptance factor).
    -
    -New hardware placement arrangement
    -
    -I got rid of the mini-switch I mentioned in the previous blog post. I have the TP-Link EAP615-Wall mounted on the wall nearby, which is my OpenWrt-powered Wi-Fi hotspot. It also has 3 Ethernet ports, to which I connected the Beelink nodes. That's the device you see at the very top.
    +You can find the repository here:

    -The Ethernet cables go downward through the cable boxes to the Beelink nodes. In addition to the Beelink f3s nodes, I connected the TP-Link to the UPS as well (not discussed further in this blog post, but the positive side effect is that my Wi-Fi will still work during a power loss for some time—and during a power cut, the Beelink nodes will still be able to communicate with each other).
    +https://codeberg.org/snonux/sillybench

    -On the very left (the black box) is the UPS, with four power outlets. Three go to the Beelink nodes, and one goes to the TP-Link. A USB output is also connected to the first Beelink node, f0.
    +

    Silly FreeBSD host benchmark



    -On the very right (halfway hidden behind the TV) are the 3 Beelink nodes stacked on top of each other. The only downside (or upside?) is that my 14-month-old daughter is now chaos-testing the Beelink nodes, as the red power buttons (now reachable for her) are very attractive for her to press when passing by randomly. :-) Luckily, that will only cause graceful system shutdowns!
    +To install it on FreeBSD, we run:

    -

    The UPS hardware


    + +
    paul@f0:~ % doas pkg install git go
    +paul@f0:~ % mkdir ~/git && cd ~/git && \
    +  git clone https://codeberg.org/snonux/sillybench && \
    +  cd sillybench
    +

    -I wanted a UPS that I could connect to via FreeBSD, and that would provide enough backup power to operate the cluster for a couple of minutes (it turned out to be around an hour, but this time will likely be shortened after future hardware upgrades, like additional drives and a backup enclosure) and to automatically initiate the shutdown of all the f3s nodes.
    +And to run it:

    -I decided on the APC Back-UPS BX750MI model because:
    + +
    paul@f0:~/git/sillybench % go version
    +go version go1.24.1 freebsd/amd64
    +
    +paul@f0:~/git/sillybench % go test -bench=.
    +goos: freebsd
    +goarch: amd64
    +pkg: codeberg.org/snonux/sillybench
    +cpu: Intel(R) N100
    +BenchmarkCPUSilly1-4    1000000000               0.4022 ns/op
    +BenchmarkCPUSilly2-4    1000000000               0.4027 ns/op
    +PASS
    +ok      codeberg.org/snonux/sillybench 0.891s
    +

    -
      -
    • Zero noise level when there is no power cut (some light noise when the battery is in operation during a power cut).
    • -
    • Cost: It is relatively affordable (not costing thousands).
    • -
    • USB connectivity: Can be connected via USB to one of the FreeBSD hosts to read the UPS status.
    • -
    • A power output of 750VA (or 410 watts), suitable for an hour of runtime for my f3s nodes (plus the Wi-Fi router).
    • -
    • Multiple power outlets: Can connect all 3 f3s nodes directly.
    • -
    • User-replaceable batteries: I can replace the batteries myself after two years or more (depending on usage).
    • -
    • Its compact design. Overall, I like how it looks.
    • -

    -The APC Back-UPS BX750MI in operation.
    +

    Silly Rocky Linux VM @ Bhyve benchmark



    -

    Configuring FreeBSD to Work with the UPS


    +OK, let's compare this with the Rocky Linux VM running on Bhyve:

    -

    USB Device Detection


    + +
    [root@r0 ~]# dnf install golang git
    +[root@r0 ~]# mkdir ~/git && cd ~/git && \
    +  git clone https://codeberg.org/snonux/sillybench && \
    +  cd sillybench
    +

    -Once plugged in via USB on FreeBSD, I could see the following in the kernel messages:
    +And to run it:

    -
    paul@f0: ~ % doas dmesg | grep UPS
    -ugen0.2: <American Power Conversion Back-UPS BX750MI> at usbus0
    +
    [root@r0 sillybench]# go version
    +go version go1.22.9 (Red Hat 1.22.9-2.el9_5) linux/amd64
    +[root@r0 sillybench]# go test -bench=.
    +goos: linux
    +goarch: amd64
    +pkg: codeberg.org/snonux/sillybench
    +cpu: Intel(R) N100
    +BenchmarkCPUSilly1-4    1000000000               0.4347 ns/op
    +BenchmarkCPUSilly2-4    1000000000               0.4345 ns/op
     

    -

    apcupsd Installation


    +The Linux benchmark is slightly slower than the FreeBSD one. The Go version is also a bit older. I tried the same with the up-to-date version of Go (1.24.x) with similar results. There could be a slight Bhyve overhead, or FreeBSD is just slightly more efficient in this benchmark. Overall, this shows that Bhyve performs excellently.

    -To make use of the USB connection, the apcupsd package had to be installed:
    +

    Silly FreeBSD VM @ Bhyve benchmark


    +
    +But as I am curious and don't want to compare apples with bananas, I decided to install a FreeBSD Bhyve VM to run the same silly benchmark in it. I am not going through the details of how to install a FreeBSD Bhyve VM here; you can easily look it up in the documentation.
    +
    +But here are the results running the same silly benchmark in a FreeBSD Bhyve VM with the same FreeBSD and Go versions as the host system (I have the VM 4 vCPUs and 14GB of RAM; the benchmark won't use as many CPUs (and memory) anyway):

    -
    paul@f0: ~ % doas install apcupsd
    +
    root@freebsd:~/git/sillybench # go test -bench=.
    +goos: freebsd
    +goarch: amd64
    +pkg: codeberg.org/snonux/sillybench
    +cpu: Intel(R) N100
    +BenchmarkCPUSilly1      1000000000               0.4273 ns/op
    +BenchmarkCPUSilly2      1000000000               0.4286 ns/op
    +PASS
    +ok      codeberg.org/snonux/sillybench  0.949s
     

    -I have made the following modifications to the configuration file so that the UPS can be used via the USB interface:
    +It's a bit better than Linux! I am sure that this is not really a scientific benchmark, so take the results with a grain of salt!
    +
    +

    Benchmarking with ubench


    +
    +Let's run another, more sophisticated benchmark using ubench, the Unix Benchmark Utility available for FreeBSD. It was installed by simply running doas pkg install ubench. It can benchmark CPU and memory performance. Here, we limit it to one CPU for the first run with -s, and then let it run at full speed (using all available CPUs in parallel) in the second run.
    +
    +

    FreeBSD host ubench benchmark


    +
    +Single CPU:

    -
    paul@f0:/usr/local/etc/apcupsd % diff -u apcupsd.conf.sample  apcupsd.conf
    ---- apcupsd.conf.sample 2024-11-01 16:40:42.000000000 +0200
    -+++ apcupsd.conf        2024-12-03 10:58:24.009501000 +0200
    -@@ -31,7 +31,7 @@
    - #     940-1524C, 940-0024G, 940-0095A, 940-0095B,
    - #     940-0095C, 940-0625A, M-04-02-2000
    - #
    --UPSCABLE smart
    -+UPSCABLE usb
    -
    - # To get apcupsd to work, in addition to defining the cable
    - # above, you must also define a UPSTYPE, which corresponds to
    -@@ -88,8 +88,10 @@
    - #                            that apcupsd binds to that particular unit
    - #                            (helpful if you have more than one USB UPS).
    - #
    --UPSTYPE apcsmart
    --DEVICE /dev/usv
    -+UPSTYPE usb
    -+DEVICE
    +
    paul@f0:~ % doas ubench -s 1
    +Unix Benchmark Utility v.0.3
    +Copyright (C) July, 1999 PhysTech, Inc.
    +Author: Sergei Viznyuk <sv@phystech.com>
    +http://www.phystech.com/download/ubench.html
    +FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    +Ubench Single CPU:   671010 (0.40s)
    +Ubench Single MEM:  1705237 (0.48s)
    +-----------------------------------
    +Ubench Single AVG:  1188123
     
    - # POLLTIME <int>
    - #   Interval (in seconds) at which apcupsd polls the UPS for status. This
     

    -I left the remaining settings as the default ones; for example, the following are of main interest:
    +All CPUs (with all Bhyve VMs stopped):

    -
    -# If during a power failure, the remaining battery percentage
    -# (as reported by the UPS) is below or equal to BATTERYLEVEL,
    -# apcupsd will initiate a system shutdown.
    -BATTERYLEVEL 5
    -
    -# If during a power failure, the remaining runtime in minutes
    -# (as calculated internally by the UPS) is below or equal to MINUTES,
    -# apcupsd, will initiate a system shutdown.
    -MINUTES 3
    +
    +
    paul@f0:~ % doas ubench
    +Unix Benchmark Utility v.0.3
    +Copyright (C) July, 1999 PhysTech, Inc.
    +Author: Sergei Viznyuk <sv@phystech.com>
    +http://www.phystech.com/download/ubench.html
    +FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    +Ubench CPU:  2660220
    +Ubench MEM:  3095182
    +--------------------
    +Ubench AVG:  2877701
     

    -I then enabled and started the daemon:
    +

    FreeBSD VM @ Bhyve ubench benchmark


    +
    +Single CPU:

    -
    paul@f0:/usr/local/etc/apcupsd % doas sysrc apcupsd_enable=YES
    -apcupsd_enable:  -> YES
    -paul@f0:/usr/local/etc/apcupsd % doas service apcupsd start
    -Starting apcupsd.
    +
    root@freebsd:~ # ubench -s 1
    +Unix Benchmark Utility v.0.3
    +Copyright (C) July, 1999 PhysTech, Inc.
    +Author: Sergei Viznyuk <sv@phystech.com>
    +http://www.phystech.com/download/ubench.html
    +FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    +Ubench Single CPU:   672792 (0.40s)
    +Ubench Single MEM:   852757 (0.48s)
    +-----------------------------------
    +Ubench Single AVG:   762774
     

    -

    UPS Connectivity Test


    +Wow, the CPU in the VM was a tiny bit faster than on the host! So this was probably just a glitch in the matrix. Memory seems slower, though.

    -And voila, I could now access the UPS information via the apcaccess command; how convenient :-) (I also read through the manual page, which provides a good understanding of what else can be done with it!).
    +All CPUs:

    -
    paul@f0:~ % apcaccess
    -APC      : 001,035,0857
    -DATE     : 2025-01-26 14:43:27 +0200
    -HOSTNAME : f0.lan.buetow.org
    -VERSION  : 3.14.14 (31 May 2016) freebsd
    -UPSNAME  : f0.lan.buetow.org
    -CABLE    : USB Cable
    -DRIVER   : USB UPS Driver
    -UPSMODE  : Stand Alone
    -STARTTIME: 2025-01-26 14:43:25 +0200
    -MODEL    : Back-UPS BX750MI
    -STATUS   : ONLINE
    -LINEV    : 230.0 Volts
    -LOADPCT  : 4.0 Percent
    -BCHARGE  : 100.0 Percent
    -TIMELEFT : 65.3 Minutes
    -MBATTCHG : 5 Percent
    -MINTIMEL : 3 Minutes
    -MAXTIME  : 0 Seconds
    -SENSE    : Medium
    -LOTRANS  : 145.0 Volts
    -HITRANS  : 295.0 Volts
    -ALARMDEL : No alarm
    -BATTV    : 13.6 Volts
    -LASTXFER : Automatic or explicit self test
    -NUMXFERS : 0
    -TONBATT  : 0 Seconds
    -CUMONBATT: 0 Seconds
    -XOFFBATT : N/A
    -SELFTEST : NG
    -STATFLAG : 0x05000008
    -SERIALNO : 9B2414A03599
    -BATTDATE : 2001-01-01
    -NOMINV   : 230 Volts
    -NOMBATTV : 12.0 Volts
    -NOMPOWER : 410 Watts
    -END APC  : 2025-01-26 14:44:06 +0200
    +
    root@freebsd:~ # ubench
    +Unix Benchmark Utility v.0.3
    +Copyright (C) July, 1999 PhysTech, Inc.
    +Author: Sergei Viznyuk <sv@phystech.com>
    +http://www.phystech.com/download/ubench.html
    +FreeBSD 14.2-RELEASE-p1 FreeBSD 14.2-RELEASE-p1 GENERIC amd64
    +Ubench CPU:  2652857
    +swap_pager: out of swap space
    +swp_pager_getswapspace(27): failed
    +swap_pager: out of swap space
    +swp_pager_getswapspace(18): failed
    +Apr  4 23:02:43 freebsd kernel: pid 862 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +swp_pager_getswapspace(6): failed
    +Apr  4 23:02:46 freebsd kernel: pid 863 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +Apr  4 23:02:47 freebsd kernel: pid 864 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +Apr  4 23:02:48 freebsd kernel: pid 865 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +Apr  4 23:02:49 freebsd kernel: pid 861 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
    +Apr  4 23:02:51 freebsd kernel: pid 839 (ubench), jid 0, uid 0, was killed: failed to reclaim memory
     

    -

    APC Info on Partner Nodes:


    +The multi-CPU benchmark in the Bhyve VM ran with almost identical results to the FreeBSD host system. However, the memory benchmark failed with out-of-swap space errors. I am unsure why, as the VM has 14GB RAM, but I am not investigating further.

    -So far, so good. Host f0 would shut down itself when short on power. But what about the f1 and f2 nodes? They aren't connected directly to the UPS and, therefore, wouldn't know that their power is about to be cut off. For this, apcupsd running on the f1 and f2 nodes can be configured to retrieve UPS information via the network from the apcupsd server running on the f0 node, which is connected directly to the APC via USB.
    +Also, during the benchmark, I noticed the bhyve process on the host was constantly using 399% of the CPU (all 4 CPUs).

    -Of course, this won't work when f0 is down. In this case, no operational node would be connected to the UPS via USB; therefore, the current power status would not be known. However, I consider this a rare circumstance. Furthermore, in case of an f0 system crash, sudden power outages on the two other nodes would occur at different times making real data loss (the main concern here) less likely.
    +
    +  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    + 7449 root         14  20    0    14G    78M kqread   2   2:12 399.81% bhyve
    +

    -And if f0 is down and f1 and f2 receive new data and crash midway, it's likely that a client (e.g., an Android app or another laptop) still has the data stored on it, making data recoverable and data loss overall nearly impossible. I'd receive an alert if any of the nodes go down (more on monitoring later in this blog series).
    +Overall, Bhyve has a small overhead, but the CPU performance difference is negligible. The FreeBSD host is slightly faster than the FreeBSD VM running on Bhyve, but the difference is small enough for our use cases. The memory benchmark seems slightly off, but I'm not sure whether to trust it, especially due to the swap errors. Does ubench's memory benchmark use swap space for the memory test? That wouldn't make sense and might explain the difference to some degree, though. Do you have any ideas?

    -

    Installation on partners


    +

    Rocky Linux VM @ Bhyve ubench benchmark



    -To do this, I installed apcupsd via doas pkg install apcupsd on f1 and f2, and then I could connect to it this way:
    +Unfortunately, I wasn't able to find ubench in any of the Rocky Linux repositories. So, I skipped this test.
    +
    +

    Update: Improving Disk I/O Performance for etcd


    +
    +Updated: Fri 26 Dec 08:51:23 EET 2025
    +
    +After running k3s for some time, I noticed frequent etcd leader elections and "apply request took too long" warnings in the logs. Investigation revealed that etcd's sync writes were extremely slow - around 250 kB/s with the default virtio-blk disk emulation. etcd requires fast sync writes (ideally under 10ms fsync latency) for stable operation.
    +
    +

    The Problem


    +
    +The k3s logs showed etcd struggling with disk I/O:
    +
    +
    +{"level":"warn","msg":"apply request took too long","took":"4.996516657s","expected-duration":"100ms"}
    +{"level":"warn","msg":"slow fdatasync","took":"1.328469363s","expected-duration":"1s"}
    +
    +
    +A simple sync write benchmark confirmed the issue:

    -
    paul@f1:~ % apcaccess -h f0.lan.buetow.org | grep Percent
    -LOADPCT  : 12.0 Percent
    -BCHARGE  : 94.0 Percent
    -MBATTCHG : 5 Percent
    +
    [root@r0 ~]# dd if=/dev/zero of=/tmp/test bs=4k count=2000 oflag=dsync
    +8192000 bytes copied, 31.7058 s, 258 kB/s
     

    -But I want the daemon to be configured and enabled in such a way that it connects to the master UPS node (the one with the UPS connected via USB) so that it can also initiate a system shutdown when the UPS battery reaches low levels. For that, apcupsd itself needs to be aware of the UPS status.
    +

    The Solution: Switch to NVMe Emulation



    -On f1 and f2, I changed the configuration to use f0 (where apcupsd is listening) as a remote device. I also changed the MINUTES setting from 3 to 6 and the BATTERYLEVEL setting from 5 to 10 to ensure that the f1 and f2 nodes could still connect to the f0 node for UPS information before f0 decides to shut down itself. So f1 and f2 must shut down earlier than f0:
    +Bhyve's NVMe emulation provides significantly better I/O performance than virtio-blk.
    +
    +

    Step 1: Prepare the Guest OS


    +
    +Before changing the disk type, the guest needs NVMe drivers in the initramfs and LVM must be configured to scan all devices (not just those recorded during installation):

    -
    paul@f2:/usr/local/etc/apcupsd % diff -u apcupsd.conf.sample apcupsd.conf
    ---- apcupsd.conf.sample 2024-11-01 16:40:42.000000000 +0200
    -+++ apcupsd.conf        2025-01-26 15:52:45.108469000 +0200
    -@@ -31,7 +31,7 @@
    - #     940-1524C, 940-0024G, 940-0095A, 940-0095B,
    - #     940-0095C, 940-0625A, M-04-02-2000
    - #
    --UPSCABLE smart
    -+UPSCABLE ether
    -
    - # To get apcupsd to work, in addition to defining the cable
    - # above, you must also define a UPSTYPE, which corresponds to
    -@@ -52,7 +52,6 @@
    - #                            Network Information Server. This is used if the
    - #                            UPS powering your computer is connected to a
    - #                            different computer for monitoring.
    --#
    - # snmp      hostname:port:vendor:community
    - #                            SNMP network link to an SNMP-enabled UPS device.
    - #                            Hostname is the ip address or hostname of the UPS
    -@@ -88,8 +87,8 @@
    - #                            that apcupsd binds to that particular unit
    - #                            (helpful if you have more than one USB UPS).
    - #
    --UPSTYPE apcsmart
    --DEVICE /dev/usv
    -+UPSTYPE net
    -+DEVICE f0.lan.buetow.org:3551
    -
    - # POLLTIME <int>
    - #   Interval (in seconds) at which apcupsd polls the UPS for status. This
    -@@ -147,12 +146,12 @@
    - # If during a power failure, the remaining battery percentage
    - # (as reported by the UPS) is below or equal to BATTERYLEVEL,
    - # apcupsd will initiate a system shutdown.
    --BATTERYLEVEL 5
    -+BATTERYLEVEL 10
    -
    - # If during a power failure, the remaining runtime in minutes
    - # (as calculated internally by the UPS) is below or equal to MINUTES,
    - # apcupsd, will initiate a system shutdown.
    --MINUTES 3
    -+MINUTES 6
    -
    - # If during a power failure, the UPS has run on batteries for TIMEOUT
    - # many seconds or longer, apcupsd will initiate a system shutdown.
    +
    [root@r0 ~]# cat > /etc/dracut.conf.d/nvme.conf << EOF
    +add_drivers+=" nvme nvme_core "
    +hostonly=no
    +EOF
     
    +[root@r0 ~]# sed -i 's/# use_devicesfile = 1/use_devicesfile = 0/' /etc/lvm/lvm.conf
    +[root@r0 ~]# dracut -f
    +[root@r0 ~]# shutdown -h now
     
    -So I also ran the following commands on f1 and f2:

    - -
    paul@f1:/usr/local/etc/apcupsd % doas sysrc apcupsd_enable=YES
    -apcupsd_enable:  -> YES
    -paul@f1:/usr/local/etc/apcupsd % doas service apcupsd start
    -Starting apcupsd.
    -
    +The hostonly=no setting ensures the initramfs includes drivers for hardware not currently present. The use_devicesfile = 0 tells LVM to scan all block devices rather than only those recorded in /etc/lvm/devices/system.devices - this is important because the device path changes from /dev/vda to /dev/nvme0n1.

    -And then I was able to connect to localhost via the apcaccess command:
    +

    Step 2: Update the Bhyve Configuration


    +
    +On the FreeBSD host, update the VM configuration to use NVMe:

    -
    paul@f1:~ % doas apcaccess | grep Percent
    -LOADPCT  : 5.0 Percent
    -BCHARGE  : 95.0 Percent
    -MBATTCHG : 5 Percent
    +
    paul@f0:~ % doas vm stop rocky
    +paul@f0:~ % doas vm configure rocky
     

    -

    Power outage simulation


    -
    -

    Pulling the plug


    -
    -I simulated a power outage by removing the power input from the APC. Immediately, the following message appeared on all the nodes:
    +Change disk0_type from virtio-blk to nvme:

    -Broadcast Message from root@f0.lan.buetow.org
    -        (no tty) at 15:03 EET...
    -
    -Power failure. Running on UPS batteries.                                              
    +disk0_type="nvme"
     

    -I ran the following command to confirm the available battery time:
    +Then start the VM:

    -
    paul@f0:/usr/local/etc/apcupsd % apcaccess -p TIMELEFT
    -63.9 Minutes
    +
    paul@f0:~ % doas vm start rocky
     

    -And after around one hour (f1 and f2 a bit earlier, f0 a bit later due to the different BATTERYLEVEL and MINUTES settings outlined earlier), the following broadcast was sent out:
    +

    Benchmark Results



    -
    -Broadcast Message from root@f0.lan.buetow.org
    -        (no tty) at 15:08 EET...
    -
    -        *** FINAL System shutdown message from root@f0.lan.buetow.org ***
    -
    -System going down IMMEDIATELY
    -
    -apcupsd initiated shutdown
    -
    +After switching to NVMe emulation, the sync write performance improved dramatically:

    -And all the nodes shut down safely before the UPS ran out of battery!
    + +
    [root@r0 ~]# dd if=/dev/zero of=/tmp/test bs=4k count=2000 oflag=dsync
    +8192000 bytes copied, 0.330718 s, 24.8 MB/s
    +

    -

    Restoring power


    +That's approximately **100x faster** than before (24.8 MB/s vs 258 kB/s).

    -After restoring power, I checked the logs in /var/log/daemon.log and found the following on all 3 nodes:
    +The etcd metrics also showed healthy fsync latencies:

    -Jan 26 17:36:24 f2 apcupsd[2159]: Power failure.
    -Jan 26 17:36:30 f2 apcupsd[2159]: Running on UPS batteries.
    -Jan 26 17:36:30 f2 apcupsd[2159]: Battery charge below low limit.
    -Jan 26 17:36:30 f2 apcupsd[2159]: Initiating system shutdown!
    -Jan 26 17:36:30 f2 apcupsd[2159]: User logins prohibited
    -Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd exiting, signal 15
    -Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd shutdown succeeded
    +etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 347
    +etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 396
    +etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 408
     

    -All good :-)
    +Most fsyncs now complete in under 1ms, and there are no more "slow fdatasync" warnings in the logs. The k3s cluster is now stable without spurious leader elections.
    +
    +

    Important Notes



    +
      +
    • Do NOT use disk0_opts="nocache,direct" with NVMe emulation - in my testing this actually made performance worse.
    • +
    • The guest OS must have NVMe drivers in the initramfs before switching, otherwise it won't boot.
    • +
    • LVM's devices file feature (enabled by default in RHEL 9 / Rocky Linux 9) must be disabled to allow booting from a different device path.
    • +

    Conclusion



    -I have the same UPS (but with a bit more capacity) for my main work setup, which powers my 28" screen, music equipment, etc. It has already been helpful a couple of times during power outages here, so I am sure that the smaller UPS for the F3s setup will be of great use.
    +Having Linux VMs running inside FreeBSD's Bhyve is a solid move for future f3s hosting in my home lab. Bhyve provides a reliable way to manage VMs without much hassle. With Linux VMs, I can tap into all the cool stuff (e.g., Kubernetes, eBPF, systemd) in the Linux world while keeping the steady reliability of FreeBSD.
    +
    +Future uses (out of scope for this blog series) would be additional VMs for different workloads. For example, how about a Windows or NetBSD VM to tinker with?
    +
    +This flexibility is great for keeping options open and managing different workloads without overcomplicating things. Overall, it's a nice setup for getting the most out of my hardware and keeping things running smoothly.

    Read the next post of this series:

    -f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network

    -Other BSD related posts are:
    +Other *BSD-related posts:

    2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts (You are currently reading this)
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs (You are currently reading this)
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    2024-04-01 KISS high-availability with OpenBSD
    @@ -16201,553 +16077,1125 @@ Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd shutdown succeeded 2022-07-30 Let's Encrypt with OpenBSD and Rex
    2016-04-09 Jails and ZFS with Puppet on FreeBSD

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +E-Mail your comments to paul@nospam.buetow.org

    Back to the main site
    - Working with an SRE Interview - - gemini://foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.gmi - 2025-01-15T00:16:04+02:00 + Sharing on Social Media with Gos v1.0.0 + + gemini://foo.zone/gemfeed/2025-03-05-sharing-on-social-media-with-gos.gmi + 2025-03-04T21:22:07+02:00 Paul Buetow aka snonux paul@dev.buetow.org - I have been interviewed by Florian Buetow on `cracking-ai-engineering.com` about what it's like working with a Site Reliability Engineer from the point of view of a Software Engineer, Data Scientist, and AI Engineer. + As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that.
    -

    Working with an SRE Interview


    -
    -Published at 2025-01-15T00:16:04+02:00
    +

    Sharing on Social Media with Gos v1.0.0



    -I have been interviewed by Florian Buetow on cracking-ai-engineering.com about what it's like working with a Site Reliability Engineer from the point of view of a Software Engineer, Data Scientist, and AI Engineer.
    +Published at 2025-03-04T21:22:07+02:00

    -See original interview here
    -Cracking AI Engineering
    +As you may have noticed, I like to share on Mastodon and LinkedIn all the technical things I find interesting, and this blog post is technically all about that.

    -Below, I am posting the interview here on my blog as well.
    +Gos logo

    Table of Contents




    -

    Preamble


    +

    Introduction



    -Florian from Cracking AI Engineering interviewed me about my work as a Principal SRE at Mimecast. We talked about what an Embedded SRE actually does, automation, observability, incident management, and how to work well with an SRE — whether you're a developer, data scientist, or manager.
    +Gos is a Go-based replacement (which I wrote) for Buffer.com, providing the ability to schedule and manage social media posts from the command line. It can be run, for example, every time you open a new shell or only once every N hours when you open a new shell.

    -

    Introducing Paul


    +I used Buffer.com to schedule and post my social media messages for a long time. However, over time, there were more problems with that service, including a slow and unintuitive UI, and the free version only allows scheduling up to 10 messages. At one point, they started to integrate an AI assistant (which would seemingly randomly pop up in separate JavaScript-powered input boxes), and then I had enough and decided I had to build my own social sharing tool—and Gos was born.

    -Hi Paul, please introduce yourself briefly to the audience. Who are you, what do you do for a living, and where do you work?
    +https://buffer.com
    +https://codeberg.org/snonux/gos

    -My name is Paul Bütow, I work at Mimecast, and I’m a Principal Site Reliability Engineer there. I’ve been with Mimecast for almost ten years now. The company specializes in email security, including things like archiving, phishing detection, malware protection, and spam filtering.
    +

    Gos features



    -You mentioned that you’re an ‘Embedded SRE.’ What does that mean exactly?
    +
      +
    • Mastodon and LinkedIn support.
    • +
    • Dry run mode for testing posts without actually publishing.
    • +
    • Configurable via flags and environment variables.
    • +
    • Easy to integrate into automated workflows.
    • +
    • OAuth2 authentication for LinkedIn.
    • +
    • Image previews for LinkedIn posts.
    • +

    +

    Installation



    -It means that I’m directly part of the software engineering team, not in a separate Ops department. I ensure that nothing is deployed manually, and everything runs through automation. I also set up monitoring and observability. These are two distinct aspects: monitoring alerts us when something breaks, while observability helps us identify trends. I also create runbooks so we know what to do when specific incidents occur frequently.
    +

    Prequisites



    -Infrastructure SREs on the other hand handle the foundational setup, like providing the Kubernetes cluster itself or ensuring the operating systems are installed. They don't work on the application directly but ensure the base infrastructure is there for others to use. This works well when a company has multiple teams that need shared infrastructure.
    +The prerequisites are:

    -

    How did you get started?


    +
      +
    • Go (version 1.24 or later)
    • +
    • Supported browsers like Firefox, Chrome, etc for oauth2.
    • +

    +

    Build and install



    -How did your interest in Linux or FreeBSD start?
    +Clone the repository:

    -It began during my school days. We had a PC with DOS at home, and I eventually bought Suse Linux 5.3. Shortly after, I discovered FreeBSD because I liked its handbook so much. I wanted to understand exactly how everything worked, so I also tried Linux from Scratch. That involves installing every package manually to gain a better understanding of operating systems.
    + +
    git clone https://codeberg.org/snonux/gos.git
    +cd gos
    +

    -https://www.FreeBSD.org
    -https://linuxfromscratch.org/
    +Build the binaries:

    -And after school, you pursued computer science, correct?
    + +
    go build -o gos ./cmd/gos
    +go build -o gosc ./cmd/gosc
    +sudo mv gos ~/go/bin
    +sudo mv gosc ~/go/bin
    +

    -Exactly. I wasn’t sure at first whether I wanted to be a software developer or a system administrator. I applied for both and eventually accepted an offer as a Linux system administrator. This was before 'SRE' became a buzzword, but much of what I did back then-automation, infrastructure as code, monitoring-is now considered part of the typical SRE role.
    +Or, if you want to use the Taskfile:

    -

    Roles and Career Progression


    + +
    go-task install
    +

    -Tell us about how you joined Mimecast. When did you fully embrace the SRE role?
    +

    Configuration



    -I started as a Linux sysadmin at 1&1. I managed an ad server farm with hundreds of systems and later handled load balancers. Together with an architect, we managed F5 load balancers distributing around 2,000 services, including for portals like web.de and GMX. I also led the operations team technically for a while before moving to London to join Mimecast.
    +Gos requires a configuration file to store API secrets and OAuth2 credentials for each supported social media platform. The configuration is managed using a Secrets structure, which is stored as a JSON file in ~/.config/gos/gos.json.

    -At Mimecast, the job title was explicitly 'Site Reliability Engineer.' The biggest difference was that I was no longer in a separate Ops department but embedded directly within the storage and search backend team. I loved that because we could plan features together-from automation to measurability and observability. Mimecast also operates thousands of physical servers for email archiving, which was fascinating since I already had experience with large distributed systems at 1&1. It was the right step for me because it allowed me to work close to the code while remaining hands-on with infrastructure.
    +Example Configuration File (~/.config/gos/gos.json):

    -What are the differences between SRE, DevOps, SysAdmin, and Architects?
    + +
    {
    +  "MastodonURL": "https://mastodon.example.com",
    +  "MastodonAccessToken": "your-mastodon-access-token",
    +  "LinkedInClientID": "your-linkedin-client-id",
    +  "LinkedInSecret": "your-linkedin-client-secret",
    +  "LinkedInRedirectURL": "http://localhost:8080/callback",
    +}
    +
    +
    +

    Configuration fields


    +
    +
      +
    • MastodonURL: The base URL of the Mastodon instance you are using (e.g., https://mastodon.social).
    • +
    • MastodonAccessToken: Your access token for the Mastodon API, which is used to authenticate your posts.
    • +
    • LinkedInClientID: The client ID for your LinkedIn app, which is needed for OAuth2 authentication.
    • +
    • LinkedInSecret: The client secret for your LinkedIn app.
    • +
    • LinkedInRedirectURL: The redirect URL configured for handling OAuth2 responses.
    • +
    • LinkedInAccessToken: Gos will automatically update this after successful OAuth2 authentication with LinkedIn.
    • +
    • LinkedInPersonID: Gos will automatically update this after successful OAuth2 authentication with LinkedIn.
    • +

    +

    Automatically managed fields


    +
    +Once you finish the OAuth2 setup (after the initial run of gos), some fields—like LinkedInAccessToken and LinkedInPersonID will get filled in automatically. To check if everything's working without actually posting anything, you can run the app in dry run mode with the --dry option. After OAuth2 is successful, the file will be updated with LinkedInClientID and LinkedInAccessToken. If the access token expires, it will go through the OAuth2 process again.
    +
    +

    Invoking Gos


    +
    +Gos is a command-line tool for posting updates to multiple social media platforms. You can run it with various flags to customize its behaviour, such as posting in dry run mode, limiting posts by size, or targeting specific platforms.
    +
    +Flags control the tool's behavior. Below are several common ways to invoke Gos and descriptions of the available flags.
    +
    +

    Common flags


    +
    +
      +
    • -dry: Run the application in dry run mode, simulating operations without making any changes.
    • +
    • -version: Display the current version of the application.
    • +
    • -compose: Compose a new entry. Default is set by composeEntryDefault.
    • +
    • -gosDir: Specify the directory for Gos' queue and database files. The default is ~/.gosdir.
    • +
    • —cacheDir: Specify the directory for Gos' cache. The default is based on the gosDir path.
    • +
    • -browser: Choose the browser for OAuth2 processes. The default is "firefox".
    • +
    • -configPath: Path to the configuration file. Default is ~/.config/gos/gos.json.
    • +
    • —platforms: The enabled platforms and their post size limits. The default is "Mastodon:500,LinkedIn:1000."
    • +
    • -target: Target number of posts per week. The default is 2.
    • +
    • -minQueued: Minimum number of queued items before a warning message is printed. The default is 4.
    • +
    • -maxDaysQueued: Maximum number of days' worth of queued posts before the target increases and pauseDays decreases. The default is 365.
    • +
    • -pauseDays: Number of days until the next post can be submitted. The default is 3.
    • +
    • -runInterval: Number of hours until the next post run. The default is 12.
    • +
    • —lookback: The number of days to look back in time to review posting history. The default is 30.
    • +
    • -geminiSummaryFor: Generate a Gemini Gemtext format summary specifying months as a comma-separated string.
    • +
    • -geminiCapsules: Comma-separated list of Gemini capsules. Used to detect Gemtext links.
    • +
    • -gemtexterEnable: Add special tags for Gemtexter, the static site generator, to the Gemini Gemtext summary.
    • +
    • -dev: For internal development purposes only.
    • +

    +

    Examples


    +
    +*Dry run mode*
    +
    +Dry run mode lets you simulate the entire posting process without actually sending the posts. This is useful for testing configurations or seeing what would happen before making real posts.
    +
    + +
    ./gos --dry
    +
    +
    +*Normal run*
    +
    +Sharing to all platforms is as simple as the following (assuming it is configured correctly):
    +
    + +
    ./gos 
    +
    +
    +:-)
    +
    +Gos Screenshot
    +
    +However, you will notice that no messages are queued to be posted yet (not like on the screenshot yet!). Relax and read on...
    +
    +

    Composing messages to be posted


    +
    +To post messages using Gos, you need to create text files containing the posts' content. These files are placed inside the directory specified by the --gosDir flag (the default directory is ~/.gosdir). Each text file represents a single post and must have the .txt extension. You can also simply run gos --compose to compose a new entry. It will open simply a new text file in gosDir.
    +
    +

    Basic structure of a message file


    +
    +Each text file should contain the message you want to post on the specified platforms. That's it. Example of a Basic Post File ~/.gosdir/samplepost.txt:
    +
    +
    +This is a sample message to be posted on social media platforms.
    +
    +Maybe add a link here: https://foo.zone
    +
    +#foo #cool #gos #golang
    +
    +
    +The message is just arbitrary text, and, besides inline share tags (see later in this document) at the beginning, Gos does not parse any of the content other than ensuring the overall allowed size for the social media platform isn't exceeded. If it exceeds the limit, Gos will prompt you to edit the post using your standard text editor (as specified by the EDITOR environment variable). When posting, all the hyperlinks, hashtags, etc., are interpreted by the social platforms themselves (e.g., Mastodon, LinkedIn).
    +
    +

    Adding share tags in the filename


    +
    +You can control which platforms a post is shared to, and manage other behaviors using tags embedded in the filename. Add tags in the format share:platform1.-platform2 to target specific platforms within the filename. This instructs Gos to share the message only to platform1 (e.g., Mastodon) and explicitly exclude platform2 (e.g., LinkedIn). You can include multiple platforms by listing them after share:, separated by a .. Use the - symbol to exclude a platform.

    -SREs are like the next step after SysAdmins. A SysAdmin might manually install servers, replace disks, or use simple scripts for automation, while SREs use infrastructure as code and focus on reliability through SLIs, SLOs, and automation. DevOps isn’t really a job-it’s more of a way of working, where developers are involved in operations tasks like setting up CI/CD pipelines or on-call shifts. Architects focus on designing systems and infrastructures, such as load balancers or distributed systems, working alongside SREs to ensure the systems meet the reliability and scalability requirements. The specific responsibilities of each role depend on the company, and there is often overlap.
    +Currently, only linkedin and mastodon are supported, and the shortcuts li and ma also work.

    -What are the most important reliability lessons you’ve learned so far?
    +**Examples:**

      -
    • Don’t leave SRE aspects as an afterthought. It’s much better to discuss automation, monitoring, SLIs, and SLOs early on. Traditional sysadmins often installed systems manually, but today, we do everything via infrastructure as code-using tools like Terraform or Puppet.
    • -
    • I also distinguish between monitoring and observability. Monitoring tells us, 'The server is down, alarm!' Observability dives deeper, showing trends like increasing latency so we can act proactively.
    • -
    • SLI, SLO, and SLA are core elements. We focus on what users actually experience-for example, how quickly an email is sent-and set our goals accordingly.
    • -
    • Runbooks are also crucial. When something goes wrong at night, you don’t want to start from scratch. A runbook outlines how to debug and resolve specific problems, saving time and reducing downtime.
    • +
    • To share only on Mastodon: ~/.gosdir/foopost.share:mastodon.txt
    • +
    • To exclude sharing on LinkedIn: ~/.gosdir/foopost.share:-linkedin.txt
    • +
    • To explicitly share on both LinkedIn and Mastodon: ~/.gosdir/foopost.share:linkedin:mastodon.txt
    • +
    • To explicitly share only on LinkedIn and exclude Mastodon: ~/.gosdir/foopost.share:linkedin:-mastodon.txt

    -

    Anecdotes and Best Practices


    +Besides encoding share tags in the filename, they can also be embedded within the .txt file content to be queued. For example, a file named ~/.gosdir/foopost.txt with the following content:

    -Runbooks sound very practical. Can you explain how they’re used day-to-day?
    +
    +share:mastodon The content of the post here
    +

    -Runbooks are essentially guides for handling specific incidents. For instance, if a service won’t start, the runbook will specify where the logs are and which commands to use. Observability takes it a step further, helping us spot changes early-like rising error rates or latency-so we can address issues before they escalate.
    +or

    -When should you decide to put something into a runbook, and when is it unnecessary?
    +
    +share:mastodon
    +
    +The content of the post is here https://some.foo/link
    +
    +#some #hashtags
    +

    -If an issue happens frequently, it should be documented in a runbook so that anyone, even someone new, can follow the steps to fix it. The idea is that 90% of the common incidents should be covered. For example, if a service is down, the runbook would specify where to find logs, which commands to check, and what actions to take. On the other hand, rare or complex issues, where the resolution depends heavily on context or varies each time, don’t make sense to include in detail. For those, it’s better to focus on general troubleshooting steps.
    +Gos will parse this content, extract the tags, and queue it as ~/.gosdir/db/platforms/mastodon/foopost.share:mastodon.extracted.txt.... (see how post queueing works later in this document).

    -How do you search for and find the correct runbooks?
    +

    Using the prio tag



    -Runbooks should be linked directly in the alert you receive. For example, if you get an alert about a service not running, the alert will have a link to the runbook that tells you what to check, like logs or commands to run. Runbooks are best stored in an internal wiki, so if you don’t find the link in the alert, you know where to search. The important thing is that runbooks are easy to find and up to date because that’s what makes them useful during incidents.
    +Gos randomly picks any queued message without any specific order or priority. However, you can assign a higher priority to a message. The priority determines the order in which posts are processed, with messages without a priority tag being posted last and those with priority tags being posted first. If multiple messages have the priority tag, then a random message will be selected from them.

    -Do you have an interesting war story you can share with us?
    +*Examples using the Priority tag:*

    -Sure. At 1&1, we had a proprietary ad server software that ran a SQL query during startup. The query got slower over time, eventually timing out and preventing the server from starting. Since we couldn’t access the source code, we searched the binary for the SQL and patched it. By pinpointing the issue, a developer was able to adjust the SQL. This collaboration between sysadmin and developer perspectives highlights the value of SRE work.
    +
      +
    • To share only on Mastodon: ~/.gosdir/foopost.prio.share:mastodon.txt
    • +
    • To not share on LinkedIn: ~/.gosdir/foopost.prio.share:-linkedin.txt
    • +
    • To explicitly share on both: ~/.gosdir/foopost.prio.share:linkedin:mastodon.txt
    • +
    • To explicitly share on only linkedin: ~/.gosdir/foopost.prio.share:linkedin:-mastodon.txt
    • +

    +There is more: you can also use the soon tag. It is almost the same as the prio tag, just with one lower priority.

    -

    Working with Different Teams


    +

    More tags



    -You’re embedded in a team-how does collaboration with developers work practically?
    +
      +
    • A .ask. in the filename will prompt you to choose whether to queue, edit, or delete a file before queuing it.
    • +
    • A .now. in the filename will schedule a post immediately, regardless of the target status.
    • +

    +So you could also have filenames like those:

    -We plan everything together from the start. If there’s a new feature, we discuss infrastructure, automated deployments, and monitoring right away. Developers are experts in the code, and I bring the infrastructure expertise. This avoids unpleasant surprises before going live.
    +
      +
    • ~/.gosdir/foopost.ask.txt
    • +
    • ~/.gosdir/foopost.now.txt
    • +
    • ~/.gosdir/foopost.ask.share:mastodon.txt
    • +
    • ~/.gosdir/foopost.ask.prio.share:mastodon.txt
    • +
    • ~/.gosdir/foopost.ask.now.share:-mastodon.txt
    • +
    • ~/.gosdir/foopost.now.share:-linkedin.txt
    • +

    +etc...

    -How about working with data scientists or ML engineers? Are there differences?
    +All of the above also works with embedded tags. E.g.:

    -The principles are the same. ML models also need to be deployed and monitored. You deal with monitoring, resource allocation, and identifying performance drops. Whether it’s a microservice or an ML job, at the end of the day, it’s all running on servers or clusters that must remain stable.
    +
    +share:mastodon,ask,prio Hello wold :-)
    +

    -What about working with managers or the FinOps team?
    +or

    -We often discuss costs, especially in the cloud, where scaling up resources is easy. It’s crucial to know our metrics: do we have enough capacity? Do we need all instances? Or is the CPU only at 5% utilization? This data helps managers decide whether the budget is sufficient or if optimizations are needed.
    +
    +share:mastodon,ask,prio
    +
    +Hello World :-)
    +

    -Do you have practical tips for working with SREs?
    +

    The gosc binary



    -Yes, I have a few:
    +gosc stands for Gos Composer and will simply launch your $EDITOR on a new text file in the gosDir. It's the same as running gos --compose, really. It is a quick way of composing new posts. Once composed, it will ask for your confirmation on whether the message should be queued or not.

    -
      -
    • Early involvement: Include SREs from the beginning in your project.
    • -
    • Runbooks & documentation: Document recurring errors.
    • -
    • Try first: Try to understand the issue yourself before immediately asking the SRE.
    • -
    • Basic infra knowledge: Kubernetes and Terraform aren’t magic. Some basic understanding helps every developer.
    • -

    -

    Using AI Tools


    +

    How queueing works in gos



    -Let’s talk about AI. How do you use it in your daily work?
    +When you place a message file in the gosDir, Gos processes it by moving the message through a queueing system before posting it to the target social media platforms. A message's lifecycle includes several key stages, from creation to posting, all managed through the ./db/platforms/PLATFORM directories.

    -For boilerplate code, like Terraform snippets, I often use ChatGPT. It saves time, although I always review and adjust the output. Log analysis is another exciting application. Instead of manually going through millions of lines, AI can summarize key outliers or errors.
    +

    Step-by-step queueing process



    -Do you think AI could largely replace SREs or significantly change the role?
    +1. Inserting a Message into gosDir: You start by creating a text file that represents your post (e.g., foo.txt) and placing it in the gosDir. When Gos runs, this file is processed. The easiest way is to use gosc here.

    -I see AI as an additional tool. SRE requires a deep understanding of how distributed systems work internally. While AI can assist with routine tasks or quickly detect anomalies, human expertise is indispensable for complex issues.
    +2. Moving to the Queue: Upon running Gos, the tool identifies the message in the gosDir and places it into the queue for the specified platform. The message is moved into the appropriate directory for each platform in ./db/platforms/PLATFORM. During this stage, the message file is renamed to include a timestamp indicating when it was queued and given a .queued extension.

    -

    SRE Learning Resources


    +*Example: If a message is queued for LinkedIn, the filename might look like this:*

    -What resources would you recommend for learning about SRE?
    +
    +~/.gosdir/db/platforms/linkedin/foo.share:-mastodon.txt.20241022-102343.queued
    +

    -The Google SRE book is a classic, though a bit dry. I really like 'Seeking SRE,' as it offers various perspectives on SRE, with many practical stories from different companies.
    +3. Posting the Message: Once a message is placed in the queue, Gos posts it to the specified social media platforms.

    -https://sre.google/books/
    -Seeking SRE
    +4. Renaming to .posted: After a message is successfully posted to a platform, the corresponding .queued file is renamed to have a .posted extension, and the filename timestamp is also updated. This signals that the post has been processed and published.

    -Do you have a podcast recommendation?
    +*Example - After a successful post to LinkedIn, the message file might look like this:*

    -The Google SRE prodcast is quite interesting. It offers insights into how Google approaches SRE, along with perspectives from external guests.
    +
    +./db/platforms/linkedin/foo.share:-mastodon.txt.20241112-121323.posted
    +

    -https://sre.google/prodcast/
    +

    How message selection works in gos



    -

    Blogging


    +Gos decides which messages to post using a combination of priority, platform-specific tags, and timing rules. The message selection process ensures that messages are posted according to your configured cadence and targets while respecting pauses between posts and previously met goals.

    -You also have a blog. What motivates you to write regularly?
    +The key factors in message selection are:

    -Writing helps me learn the most. It also serves as a personal reference. Sometimes I look up how I solved a problem a year ago. And of course, others tackling similar projects might find inspiration in my posts.
    +
      +
    • Target Number of Posts Per Week: The -target flag defines how many posts per week should be made to a specific platform. This target helps Gos manage the posting rate, ensuring that the right number of posts are made without exceeding the desired frequency.
    • +
    • Post History Lookback: The -lookback flag tells Gos how many days back to look in the post history to calculate whether the weekly post target has already been met. It ensures that previously posted content is considered before deciding to queue up another message.
    • +
    • Message Priority: Messages with no priority value are processed after those with priority. If two messages have the same priority, one is selected randomly.
    • +
    • Pause Between Posts: The -pauseDays flag allows you to specify a minimum number of days to wait between posts for the same platform. This prevents oversaturation of content and ensures that posts are spread out over time.
    • +

    +

    Database replication



    -What do you blog about?
    +I simply use Syncthing to backup/sync my gosDir. Note, that I run Gos on my personal laptop. No need to run it from a server.

    -Mostly technical topics I find exciting, like homelab projects, Kubernetes, or book summaries on IT and productivity. It’s a personal blog, so I write about what I enjoy.
    +https://syncthing.net

    -

    Wrap-up


    +

    Post summary as gemini gemtext



    -To wrap up, what are three things every team should keep in mind for stability?
    +For my blog, I want to post a summary of all the social messages posted over the last couple of months. For an example, have a look here:

    -First, maintain runbooks and documentation to avoid chaos at night. Second, automate everything-manual installs in production are risky. Third, define SLIs, SLOs, and SLAs early so everyone knows what we’re monitoring and guaranteeing.
    +./2025-01-01-posts-from-october-to-december-2024.html

    -Is there a motto or mindset that particularly inspires you as an SRE?
    +To accomplish this, run:

    -"Keep it simple and stupid"-KISS. Not everything has to be overly complex. And always stay curious. I’m still fascinated by how systems work under the hood.
    + +
    gos --geminiSummaryFor 202410,202411,202412
    +

    -Where can people find you online?
    +This outputs the summary for the three specified months, as shown in the example. The summary includes posts from all social media networks but removes duplicates.

    -You can find links to my socials on my website paul.buetow.org
    -I regularly post articles and link to everything else I’m working on outside of work.
    +Also, add the --gemtexterEnable flag, if you are using Gemtexter:

    -https://paul.buetow.org

    -Thank you very much for your time and this insightful interview into the world of site reliability engineering
    + +
    gos --gemtexterEnable --geminiSummaryFor 202410,202411,202412
    +

    -My pleasure, this was fun.
    +Gemtexter

    -

    Closing comments


    +In case there are HTTP links that translate directly to the Geminispace for certain capsules, specify the Gemini capsules as a comma-separated list as follows:

    -Thanks for reading! Hopefully there’s something useful in here for your own work. Reliable systems are a team effort, after all.
    + +
    gos --gemtexterEnable --geminiSummaryFor 202410,202411,202412 --geminiCapsules "foo.zone,paul.buetow.org"
    +

    -E-Mail your comments to paul@nospam.buetow.org or contact Florian via the Cracking AI Engineering :-)
    +It will then also generate Gemini Gemtext links in the summary page and flag them with (Gemini).
    +
    +

    Conclusion


    +
    +Overall, this was a fun little Go project with practical use for me personally. I hope you also had fun reading this, and maybe you will use it as well.
    +
    +E-Mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - Posts from October to December 2024 - - gemini://foo.zone/gemfeed/2025-01-01-posts-from-october-to-december-2024.gmi - 2024-12-31T18:09:58+02:00 + Random Weird Things - Part Ⅱ + + gemini://foo.zone/gemfeed/2025-02-08-random-weird-things-ii.gmi + 2025-02-08T11:06:16+02:00 Paul Buetow aka snonux paul@dev.buetow.org - Happy new year! + Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. This is the second run.
    -

    Posts from October to December 2024


    -
    -Published at 2024-12-31T18:09:58+02:00
    -
    -Happy new year!
    -
    -These are my social media posts from the last three months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.
    +

    Random Weird Things - Part Ⅱ



    -These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.
    +Published at 2025-02-08T11:06:16+02:00

    -My about page
    -https://codeberg.org/snonux/gos
    +Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. This is the second run.

    -

    Table of Contents


    +2024-07-05 Random Weird Things - Part Ⅰ
    +2025-02-08 Random Weird Things - Part Ⅱ (You are currently reading this)
    +2025-08-15 Random Weird Things - Part Ⅲ

    -
    +
    +
    + + f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts + + gemini://foo.zone/gemfeed/2025-02-01-f3s-kubernetes-with-freebsd-part-3.gmi + 2025-01-30T09:22:06+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the third blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution we will use on FreeBSD-based physical machines. + +
    +

    f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts



    -

    Wondering to which #web #browser I should ...


    +Published at 2025-01-30T09:22:06+02:00

    -Wondering to which #web #browser I should switch now personally ...
    +This is the third blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution we will use on FreeBSD-based physical machines.

    -www.osnews.com/story/141100/mozilla-fo..-..dvocacy-for-open-web-privacy-and-more/
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts (You are currently reading this)
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -

    eks-node-viewer is a nifty tool, showing the ...


    +f3s logo

    -eks-node-viewer is a nifty tool, showing the compute nodes currently in use in the #EKS cluster. especially useful when dynamically allocating nodes with #karpenter or auto scaling groups.
    +

    Table of Contents



    -github.com/awslabs/eks-node-viewer
    +
    +

    Introduction



    -

    Have put more Photos on - On my static photo ...


    +In this blog post, we are setting up the UPS for the cluster. A UPS, or Uninterruptible Power Supply, safeguards my cluster from unexpected power outages and surges. It acts as a backup battery that kicks in when the electricity cuts out—especially useful in my area, where power cuts are frequent—allowing for a graceful system shutdown and preventing data loss and corruption. This is especially important since I will also store some of my data on the f3s nodes.

    -Have put more Photos on - On my static photo sites - Generated with a #bash script
    +

    Changes since last time



    -irregular.ninja
    +

    FreeBSD upgrade from 14.1 to 14.2



    -

    In Go, passing pointers are not automatically ...


    +There has been a new release since the last blog post in this series. The upgrade from 14.1 was as easy as:

    -In Go, passing pointers are not automatically faster than values. Pointers often force the memory to be allocated on the heap, adding GC overhad. With values, Go can determine whether to put the memory on the stack instead. But with large structs/objects (how you want to call them) or if you want to modify state, then pointers are the semantic to use. #golang
    + +
    paul@f0: ~ % doas freebsd-update fetch
    +paul@f0: ~ % doas freebsd-update install
    +paul@f0: ~ % doas freebsd-update -r 14.2-RELEASE upgrade
    +paul@f0: ~ % doas freebsd-update install
    +paul@f0: ~ % doas shutdown -r now
    +

    -blog.boot.dev/golang/pointers-faster-than-values/
    +And after rebooting, I ran:

    -

    Myself being part of an on-call rotations over ...


    + +
    paul@f0: ~ % doas freebsd-update install
    +paul@f0: ~ % doas pkg update
    +paul@f0: ~ % doas pkg upgrade
    +paul@f0: ~ % doas shutdown -r now
    +

    -Myself being part of an on-call rotations over my whole professional life, just have learned this lesson "Tell people who are new to on-call: Just have fun" :-) This is a neat blog post to read:
    +And after another reboot, I was on 14.2:

    -ntietz.com/blog/what-i-tell-people-new-to-oncall/
    + +
    paul@f0:~ % uname -a
    +FreeBSD f0.lan.buetow.org 14.2-RELEASE FreeBSD 14.2-RELEASE 
    + releng/14.2-n269506-c8918d6c7412 GENERIC amd64
    +

    -

    Feels good to code in my old love #Perl again ...


    +And, of course, I ran this on all 3 nodes!

    -Feels good to code in my old love #Perl again after a while. I am implementing a log parser for generating site stats of my personal homepage! :-) @Perl
    +

    A new home (behind the TV)



    -

    This is an interactive summary of the Go ...


    +I've put all the infrastructure behind my TV, as plenty of space is available. The TV hides most of the setup, which drastically improved the SAF (spouse acceptance factor).

    -This is an interactive summary of the Go release, with a lot of examples utilising iterators in the slices and map packages. Love it! #golang
    +New hardware placement arrangement

    -antonz.org/go-1-23/
    +I got rid of the mini-switch I mentioned in the previous blog post. I have the TP-Link EAP615-Wall mounted on the wall nearby, which is my OpenWrt-powered Wi-Fi hotspot. It also has 3 Ethernet ports, to which I connected the Beelink nodes. That's the device you see at the very top.

    -

    December 2024


    +The Ethernet cables go downward through the cable boxes to the Beelink nodes. In addition to the Beelink f3s nodes, I connected the TP-Link to the UPS as well (not discussed further in this blog post, but the positive side effect is that my Wi-Fi will still work during a power loss for some time—and during a power cut, the Beelink nodes will still be able to communicate with each other).

    -

    Thats unexpected, you cant remove a NaN key ...


    +On the very left (the black box) is the UPS, with four power outlets. Three go to the Beelink nodes, and one goes to the TP-Link. A USB output is also connected to the first Beelink node, f0.

    -Thats unexpected, you cant remove a NaN key from a map without clearing it! #golang
    +On the very right (halfway hidden behind the TV) are the 3 Beelink nodes stacked on top of each other. The only downside (or upside?) is that my 14-month-old daughter is now chaos-testing the Beelink nodes, as the red power buttons (now reachable for her) are very attractive for her to press when passing by randomly. :-) Luckily, that will only cause graceful system shutdowns!

    -unexpected-go.com/you-cant-remove-a-nan-key-from-a-map-without-clearing-it.html
    +

    The UPS hardware



    -

    My second blog post about revamping my home lab ...


    +I wanted a UPS that I could connect to via FreeBSD, and that would provide enough backup power to operate the cluster for a couple of minutes (it turned out to be around an hour, but this time will likely be shortened after future hardware upgrades, like additional drives and a backup enclosure) and to automatically initiate the shutdown of all the f3s nodes.

    -My second blog post about revamping my home lab a little bit just hit the net. #FreeBSD #ZFS #n100 #k8s #k3s #kubernetes
    +I decided on the APC Back-UPS BX750MI model because:

    -foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.gmi (Gemini)
    -foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.html
    +
      +
    • Zero noise level when there is no power cut (some light noise when the battery is in operation during a power cut).
    • +
    • Cost: It is relatively affordable (not costing thousands).
    • +
    • USB connectivity: Can be connected via USB to one of the FreeBSD hosts to read the UPS status.
    • +
    • A power output of 750VA (or 410 watts), suitable for an hour of runtime for my f3s nodes (plus the Wi-Fi router).
    • +
    • Multiple power outlets: Can connect all 3 f3s nodes directly.
    • +
    • User-replaceable batteries: I can replace the batteries myself after two years or more (depending on usage).
    • +
    • Its compact design. Overall, I like how it looks.
    • +

    +The APC Back-UPS BX750MI in operation.

    -

    Very insightful article about tech hiring in ...


    +

    Configuring FreeBSD to Work with the UPS



    -Very insightful article about tech hiring in the age of LLMs. As an interviewer, I have experienced some of the scrnarios already first hand...
    +

    USB Device Detection



    -newsletter.pragmaticengineer.com/p/how-genai-changes-tech-hiring
    +Once plugged in via USB on FreeBSD, I could see the following in the kernel messages:

    -

    for #bpf #ebpf performance debugging, have ...


    + +
    paul@f0: ~ % doas dmesg | grep UPS
    +ugen0.2: <American Power Conversion Back-UPS BX750MI> at usbus0
    +

    -for #bpf #ebpf performance debugging, have a look at bpftop from Netflix. A neat tool showing you the estimated CPU time and other performance statistics for all the BPF programs currently loaded into the #linux kernel. Highly recommend!
    +

    apcupsd Installation



    -github.com/Netflix/bpftop
    +To make use of the USB connection, the apcupsd package had to be installed:

    -

    89 things he/she knows about Git commits is a ...


    + +
    paul@f0: ~ % doas install apcupsd
    +

    -89 things he/she knows about Git commits is a neat list of #Git wisdoms
    +I have made the following modifications to the configuration file so that the UPS can be used via the USB interface:

    -www.jvt.me/posts/2024/07/12/things-know-commits/
    + +
    paul@f0:/usr/local/etc/apcupsd % diff -u apcupsd.conf.sample  apcupsd.conf
    +--- apcupsd.conf.sample 2024-11-01 16:40:42.000000000 +0200
    ++++ apcupsd.conf        2024-12-03 10:58:24.009501000 +0200
    +@@ -31,7 +31,7 @@
    + #     940-1524C, 940-0024G, 940-0095A, 940-0095B,
    + #     940-0095C, 940-0625A, M-04-02-2000
    + #
    +-UPSCABLE smart
    ++UPSCABLE usb
    +
    + # To get apcupsd to work, in addition to defining the cable
    + # above, you must also define a UPSTYPE, which corresponds to
    +@@ -88,8 +88,10 @@
    + #                            that apcupsd binds to that particular unit
    + #                            (helpful if you have more than one USB UPS).
    + #
    +-UPSTYPE apcsmart
    +-DEVICE /dev/usv
    ++UPSTYPE usb
    ++DEVICE
    +
    + # POLLTIME <int>
    + #   Interval (in seconds) at which apcupsd polls the UPS for status. This
    +

    -

    I found that working on multiple side projects ...


    +I left the remaining settings as the default ones; for example, the following are of main interest:

    -I found that working on multiple side projects concurrently is better than concentrating on just one. This seems inefficient at first, but whenever you tend to lose motivation, you can temporarily switch to another one with full élan. However, remember to stop starting and start finishing. This doesn't mean you should be working on 10+ (and a growing list of) side projects concurrently! Select your projects and commit to finishing them before starting the next thing. For example, my current limit of concurrent side projects is around five.
    +
    +# If during a power failure, the remaining battery percentage
    +# (as reported by the UPS) is below or equal to BATTERYLEVEL,
    +# apcupsd will initiate a system shutdown.
    +BATTERYLEVEL 5
    +
    +# If during a power failure, the remaining runtime in minutes
    +# (as calculated internally by the UPS) is below or equal to MINUTES,
    +# apcupsd, will initiate a system shutdown.
    +MINUTES 3
    +

    -

    Agreed? Agreed. Besides #Ruby, I would also ...


    +I then enabled and started the daemon:

    -Agreed? Agreed. Besides #Ruby, I would also add #RakuLang and #Perl @Perl to the list of languages that are great for shell scripts - "Making Easy Things Easy and Hard Things Possible"
    + +
    paul@f0:/usr/local/etc/apcupsd % doas sysrc apcupsd_enable=YES
    +apcupsd_enable:  -> YES
    +paul@f0:/usr/local/etc/apcupsd % doas service apcupsd start
    +Starting apcupsd.
    +

    -lucasoshiro.github.io/posts-en/2024-06-17-ruby-shellscript/
    +

    UPS Connectivity Test



    -

    Plan9 assembly format in Go, but wait, it's not ...


    +And voila, I could now access the UPS information via the apcaccess command; how convenient :-) (I also read through the manual page, which provides a good understanding of what else can be done with it!).

    -Plan9 assembly format in Go, but wait, it's not the Operating System Plan9! #golang #rabbithole
    + +
    paul@f0:~ % apcaccess
    +APC      : 001,035,0857
    +DATE     : 2025-01-26 14:43:27 +0200
    +HOSTNAME : f0.lan.buetow.org
    +VERSION  : 3.14.14 (31 May 2016) freebsd
    +UPSNAME  : f0.lan.buetow.org
    +CABLE    : USB Cable
    +DRIVER   : USB UPS Driver
    +UPSMODE  : Stand Alone
    +STARTTIME: 2025-01-26 14:43:25 +0200
    +MODEL    : Back-UPS BX750MI
    +STATUS   : ONLINE
    +LINEV    : 230.0 Volts
    +LOADPCT  : 4.0 Percent
    +BCHARGE  : 100.0 Percent
    +TIMELEFT : 65.3 Minutes
    +MBATTCHG : 5 Percent
    +MINTIMEL : 3 Minutes
    +MAXTIME  : 0 Seconds
    +SENSE    : Medium
    +LOTRANS  : 145.0 Volts
    +HITRANS  : 295.0 Volts
    +ALARMDEL : No alarm
    +BATTV    : 13.6 Volts
    +LASTXFER : Automatic or explicit self test
    +NUMXFERS : 0
    +TONBATT  : 0 Seconds
    +CUMONBATT: 0 Seconds
    +XOFFBATT : N/A
    +SELFTEST : NG
    +STATFLAG : 0x05000008
    +SERIALNO : 9B2414A03599
    +BATTDATE : 2001-01-01
    +NOMINV   : 230 Volts
    +NOMBATTV : 12.0 Volts
    +NOMPOWER : 410 Watts
    +END APC  : 2025-01-26 14:44:06 +0200
    +

    -www.osnews.com/story/140941/go-plan9-memo-speeding-up-calculations-450/
    +

    APC Info on Partner Nodes:



    -

    This is a neat blog post about the Helix text ...


    +So far, so good. Host f0 would shut down itself when short on power. But what about the f1 and f2 nodes? They aren't connected directly to the UPS and, therefore, wouldn't know that their power is about to be cut off. For this, apcupsd running on the f1 and f2 nodes can be configured to retrieve UPS information via the network from the apcupsd server running on the f0 node, which is connected directly to the APC via USB.

    -This is a neat blog post about the Helix text editor, to which I personally switched around a year ago (from NeoVim). I should blog about my experience as well. To summarize: I am using it together with the terminal multiplexer #tmux. It doesn't bother me that Helix is purely terminal-based and therefore everything has to be in the same font. #HelixEditor
    +Of course, this won't work when f0 is down. In this case, no operational node would be connected to the UPS via USB; therefore, the current power status would not be known. However, I consider this a rare circumstance. Furthermore, in case of an f0 system crash, sudden power outages on the two other nodes would occur at different times making real data loss (the main concern here) less likely.

    -jonathan-frere.com/posts/helix/
    +And if f0 is down and f1 and f2 receive new data and crash midway, it's likely that a client (e.g., an Android app or another laptop) still has the data stored on it, making data recoverable and data loss overall nearly impossible. I'd receive an alert if any of the nodes go down (more on monitoring later in this blog series).

    -

    This blog post is basically a rant against ...


    +

    Installation on partners



    -This blog post is basically a rant against DataDog... Personally, I don't have much experience with DataDog (actually, I have never used it), but one reason to work with logs at my day job (with over 2,000 physical server machines) and to be cost-effective is by using dtail! #dtail #logs #logmanagement
    +To do this, I installed apcupsd via doas pkg install apcupsd on f1 and f2, and then I could connect to it this way:

    -crys.site/blog/2024/reinventint-the-weel/
    -dtail.dev
    + +
    paul@f1:~ % apcaccess -h f0.lan.buetow.org | grep Percent
    +LOADPCT  : 12.0 Percent
    +BCHARGE  : 94.0 Percent
    +MBATTCHG : 5 Percent
    +

    -

    Quick trick to get Helix themes selected ...


    +But I want the daemon to be configured and enabled in such a way that it connects to the master UPS node (the one with the UPS connected via USB) so that it can also initiate a system shutdown when the UPS battery reaches low levels. For that, apcupsd itself needs to be aware of the UPS status.

    -Quick trick to get Helix themes selected randomly #HelixEditor
    +On f1 and f2, I changed the configuration to use f0 (where apcupsd is listening) as a remote device. I also changed the MINUTES setting from 3 to 6 and the BATTERYLEVEL setting from 5 to 10 to ensure that the f1 and f2 nodes could still connect to the f0 node for UPS information before f0 decides to shut down itself. So f1 and f2 must shut down earlier than f0:

    -foo.zone/gemfeed/2024-12-15-random-helix-themes.gmi (Gemini)
    -foo.zone/gemfeed/2024-12-15-random-helix-themes.html
    + +
    paul@f2:/usr/local/etc/apcupsd % diff -u apcupsd.conf.sample apcupsd.conf
    +--- apcupsd.conf.sample 2024-11-01 16:40:42.000000000 +0200
    ++++ apcupsd.conf        2025-01-26 15:52:45.108469000 +0200
    +@@ -31,7 +31,7 @@
    + #     940-1524C, 940-0024G, 940-0095A, 940-0095B,
    + #     940-0095C, 940-0625A, M-04-02-2000
    + #
    +-UPSCABLE smart
    ++UPSCABLE ether
    +
    + # To get apcupsd to work, in addition to defining the cable
    + # above, you must also define a UPSTYPE, which corresponds to
    +@@ -52,7 +52,6 @@
    + #                            Network Information Server. This is used if the
    + #                            UPS powering your computer is connected to a
    + #                            different computer for monitoring.
    +-#
    + # snmp      hostname:port:vendor:community
    + #                            SNMP network link to an SNMP-enabled UPS device.
    + #                            Hostname is the ip address or hostname of the UPS
    +@@ -88,8 +87,8 @@
    + #                            that apcupsd binds to that particular unit
    + #                            (helpful if you have more than one USB UPS).
    + #
    +-UPSTYPE apcsmart
    +-DEVICE /dev/usv
    ++UPSTYPE net
    ++DEVICE f0.lan.buetow.org:3551
    +
    + # POLLTIME <int>
    + #   Interval (in seconds) at which apcupsd polls the UPS for status. This
    +@@ -147,12 +146,12 @@
    + # If during a power failure, the remaining battery percentage
    + # (as reported by the UPS) is below or equal to BATTERYLEVEL,
    + # apcupsd will initiate a system shutdown.
    +-BATTERYLEVEL 5
    ++BATTERYLEVEL 10
    +
    + # If during a power failure, the remaining runtime in minutes
    + # (as calculated internally by the UPS) is below or equal to MINUTES,
    + # apcupsd, will initiate a system shutdown.
    +-MINUTES 3
    ++MINUTES 6
    +
    + # If during a power failure, the UPS has run on batteries for TIMEOUT
    + # many seconds or longer, apcupsd will initiate a system shutdown.
    +
    +
    +So I also ran the following commands on f1 and f2:

    -

    Example where complexity attacks you from ...


    + +
    paul@f1:/usr/local/etc/apcupsd % doas sysrc apcupsd_enable=YES
    +apcupsd_enable:  -> YES
    +paul@f1:/usr/local/etc/apcupsd % doas service apcupsd start
    +Starting apcupsd.
    +

    -Example where complexity attacks you from behind #k8s #kubernetes #OpenAI
    +And then I was able to connect to localhost via the apcaccess command:

    -surfingcomplexity.blog/2024/12/14/quic..-..ecent-openai-public-incident-write-up/
    + +
    paul@f1:~ % doas apcaccess | grep Percent
    +LOADPCT  : 5.0 Percent
    +BCHARGE  : 95.0 Percent
    +MBATTCHG : 5 Percent
    +

    -

    LLMs for Ops? Summaries of logs, probabilities ...


    +

    Power outage simulation



    -LLMs for Ops? Summaries of logs, probabilities about correctness, auto-generating Ansible, some uses cases are there. Wouldn't trust it fully, though.
    +

    Pulling the plug



    -youtu.be/WodaffxVq-E?si=noY0egrfl5izCSQI
    +I simulated a power outage by removing the power input from the APC. Immediately, the following message appeared on all the nodes:

    -

    Excellent article about your dream Product ...


    +
    +Broadcast Message from root@f0.lan.buetow.org
    +        (no tty) at 15:03 EET...
    +
    +Power failure. Running on UPS batteries.                                              
    +

    -Excellent article about your dream Product Manager: Why every software team needs a product manager to thrive via @wallabagapp
    +I ran the following command to confirm the available battery time:

    -testdouble.com/insights/why-product-ma..-..s-accelerate-improve-software-delivery
    + +
    paul@f0:/usr/local/etc/apcupsd % apcaccess -p TIMELEFT
    +63.9 Minutes
    +

    -

    I just finished reading all chapters of CPU ...


    +And after around one hour (f1 and f2 a bit earlier, f0 a bit later due to the different BATTERYLEVEL and MINUTES settings outlined earlier), the following broadcast was sent out:

    -I just finished reading all chapters of CPU land: ... not claiming to remember every detail, but it is a great refresher how CPUs and operating systems actually work under the hood when you execute a program, which we tend to forget in our higher abstraction world. I liked the "story" and some of the jokes along the way! Size wise, it is pretty digestable (not talking about books, but only 7 web articles/chapters)! #cpu #linux #unix #kernel #macOS
    +
    +Broadcast Message from root@f0.lan.buetow.org
    +        (no tty) at 15:08 EET...
    +
    +        *** FINAL System shutdown message from root@f0.lan.buetow.org ***
    +
    +System going down IMMEDIATELY
    +
    +apcupsd initiated shutdown
    +

    -cpu.land/
    +And all the nodes shut down safely before the UPS ran out of battery!

    -

    Indeed, useful to know this stuff! #sre ...


    +

    Restoring power



    -Indeed, useful to know this stuff! #sre
    +After restoring power, I checked the logs in /var/log/daemon.log and found the following on all 3 nodes:

    -biriukov.dev/docs/resolver-dual-stack-..-..resolvers-and-dual-stack-applications/
    +
    +Jan 26 17:36:24 f2 apcupsd[2159]: Power failure.
    +Jan 26 17:36:30 f2 apcupsd[2159]: Running on UPS batteries.
    +Jan 26 17:36:30 f2 apcupsd[2159]: Battery charge below low limit.
    +Jan 26 17:36:30 f2 apcupsd[2159]: Initiating system shutdown!
    +Jan 26 17:36:30 f2 apcupsd[2159]: User logins prohibited
    +Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd exiting, signal 15
    +Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd shutdown succeeded
    +

    -

    It's the small things, which make Unix like ...


    +All good :-)

    -It's the small things, which make Unix like systems, like GNU/Linux, interesting. Didn't know about this #GNU #Tar behaviour yet:
    +

    Conclusion



    -xeiaso.net/notes/2024/pop-quiz-tar/
    +I have the same UPS (but with a bit more capacity) for my main work setup, which powers my 28" screen, music equipment, etc. It has already been helpful a couple of times during power outages here, so I am sure that the smaller UPS for the F3s setup will be of great use.

    -

    My New Year's resolution is not to start any ...


    +Read the next post of this series:

    -My New Year's resolution is not to start any new non-fiction books (or only very few) but to re-read and listen to my favorites, which I read to reflect on and see things from different perspectives. Every time you re-read a book, you gain new insights.<nil>17491
    +f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs

    -Other related posts:
    +Other BSD related posts are:

    -2026-01-01 Posts from July to December 2025
    -2025-07-01 Posts from January to June 2025
    -2025-01-01 Posts from October to December 2024 (You are currently reading this)
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts (You are currently reading this)
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD

    E-Mail your comments to paul@nospam.buetow.org :-)

    @@ -16756,773 +17204,546 @@ Jan 26 17:36:32 f2 apcupsd[2159]: apcupsd shutdown succeeded - Random Helix Themes - - gemini://foo.zone/gemfeed/2024-12-15-random-helix-themes.gmi - 2024-12-15T13:55:05+02:00 + Working with an SRE Interview + + gemini://foo.zone/gemfeed/2025-01-15-working-with-an-sre-interview.gmi + 2025-01-15T00:16:04+02:00 Paul Buetow aka snonux paul@dev.buetow.org - I thought it would be fun to have a random Helix theme every time I open a new shell. Helix is the text editor I use. + I have been interviewed by Florian Buetow on `cracking-ai-engineering.com` about what it's like working with a Site Reliability Engineer from the point of view of a Software Engineer, Data Scientist, and AI Engineer.
    -

    Random Helix Themes


    -
    -Published at 2024-12-15T13:55:05+02:00; Last updated 2024-12-18
    +

    Working with an SRE Interview



    -I thought it would be fun to have a random Helix theme every time I open a new shell. Helix is the text editor I use.
    +Published at 2025-01-15T00:16:04+02:00

    -https://helix-editor.com/
    +I have been interviewed by Florian Buetow on cracking-ai-engineering.com about what it's like working with a Site Reliability Engineer from the point of view of a Software Engineer, Data Scientist, and AI Engineer.

    -So I put this into my zsh dotfiles (in some editor.zsh.source in my ~ directory):
    +See original interview here
    +Cracking AI Engineering

    +Below, I am posting the interview here on my blog as well.

    -So every time I open a new terminal or shell, editor::helix::random_theme gets called, which randomly selects a theme from all installed ones and updates the helix config accordingly.
    +

    Table of Contents



    +
    +

    Preamble



    -

    A better version


    +Florian from Cracking AI Engineering interviewed me about my work as a Principal SRE at Mimecast. We talked about what an Embedded SRE actually does, automation, observability, incident management, and how to work well with an SRE — whether you're a developer, data scientist, or manager.

    -Update 2024-12-18: This is an improved version, which works cross platform (e.g., also on MacOS) and multiple theme directories:
    +

    Introducing Paul



    +Hi Paul, please introduce yourself briefly to the audience. Who are you, what do you do for a living, and where do you work?

    -I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)
    +My name is Paul Bütow, I work at Mimecast, and I’m a Principal Site Reliability Engineer there. I’ve been with Mimecast for almost ten years now. The company specializes in email security, including things like archiving, phishing detection, malware protection, and spam filtering.

    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation - - gemini://foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.gmi - 2024-12-02T23:48:21+02:00, last updated Sun 11 Jan 10:30:00 EET 2026 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the second blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation


    +You mentioned that you’re an ‘Embedded SRE.’ What does that mean exactly?

    -Published at 2024-12-02T23:48:21+02:00, last updated Sun 11 Jan 10:30:00 EET 2026
    +It means that I’m directly part of the software engineering team, not in a separate Ops department. I ensure that nothing is deployed manually, and everything runs through automation. I also set up monitoring and observability. These are two distinct aspects: monitoring alerts us when something breaks, while observability helps us identify trends. I also create runbooks so we know what to do when specific incidents occur frequently.

    -This is the second blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.
    +Infrastructure SREs on the other hand handle the foundational setup, like providing the Kubernetes cluster itself or ensuring the operating systems are installed. They don't work on the application directly but ensure the base infrastructure is there for others to use. This works well when a company has multiple teams that need shared infrastructure.

    -We set the stage last time; this time, we will set up the hardware for this project.
    +

    How did you get started?



    -These are all the posts so far:
    +How did your interest in Linux or FreeBSD start?

    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation (You are currently reading this)
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +It began during my school days. We had a PC with DOS at home, and I eventually bought Suse Linux 5.3. Shortly after, I discovered FreeBSD because I liked its handbook so much. I wanted to understand exactly how everything worked, so I also tried Linux from Scratch. That involves installing every package manually to gain a better understanding of operating systems.

    -f3s logo
    +https://www.FreeBSD.org
    +https://linuxfromscratch.org/

    -ChatGPT generated logo..
    +And after school, you pursued computer science, correct?

    -Let's continue...
    +Exactly. I wasn’t sure at first whether I wanted to be a software developer or a system administrator. I applied for both and eventually accepted an offer as a Linux system administrator. This was before 'SRE' became a buzzword, but much of what I did back then-automation, infrastructure as code, monitoring-is now considered part of the typical SRE role.

    -

    Table of Contents


    +

    Roles and Career Progression



    -
    -

    Deciding on the hardware


    +Tell us about how you joined Mimecast. When did you fully embrace the SRE role?

    -Note that the OpenBSD VMs included in the f3s setup (which will be used later in this blog series for internet ingress - as you know from the first part of this blog series) are already there. These are virtual machines that I rent at OpenBSD Amsterdam and Hetzner.
    +I started as a Linux sysadmin at 1&1. I managed an ad server farm with hundreds of systems and later handled load balancers. Together with an architect, we managed F5 load balancers distributing around 2,000 services, including for portals like web.de and GMX. I also led the operations team technically for a while before moving to London to join Mimecast.

    -https://openbsd.amsterdam
    -https://hetzner.cloud
    +At Mimecast, the job title was explicitly 'Site Reliability Engineer.' The biggest difference was that I was no longer in a separate Ops department but embedded directly within the storage and search backend team. I loved that because we could plan features together-from automation to measurability and observability. Mimecast also operates thousands of physical servers for email archiving, which was fascinating since I already had experience with large distributed systems at 1&1. It was the right step for me because it allowed me to work close to the code while remaining hands-on with infrastructure.

    -This means that the FreeBSD boxes need to be covered, which will later be running k3s in Linux VMs via bhyve hypervisor.
    +What are the differences between SRE, DevOps, SysAdmin, and Architects?

    -I've been considering whether to use Raspberry Pis or look for alternatives. It turns out that complete N100-based mini-computers aren't much more expensive than Raspberry Pi 5s, and they don't require assembly. Furthermore, I like that they are AMD64 and not ARM-based, which increases compatibility with some applications (e.g., I might want to virtualize Windows (via bhyve) on one of those, though that's out of scope for this blog series).
    +SREs are like the next step after SysAdmins. A SysAdmin might manually install servers, replace disks, or use simple scripts for automation, while SREs use infrastructure as code and focus on reliability through SLIs, SLOs, and automation. DevOps isn’t really a job-it’s more of a way of working, where developers are involved in operations tasks like setting up CI/CD pipelines or on-call shifts. Architects focus on designing systems and infrastructures, such as load balancers or distributed systems, working alongside SREs to ensure the systems meet the reliability and scalability requirements. The specific responsibilities of each role depend on the company, and there is often overlap.

    -

    Not ARM but Intel N100


    +What are the most important reliability lessons you’ve learned so far?

    -I needed something compact, efficient, and capable enough to handle the demands of a small-scale Kubernetes cluster and preferably something I don't have to assemble a lot. After researching, I decided on the Beelink S12 Pro with Intel N100 CPUs.
    +
      +
    • Don’t leave SRE aspects as an afterthought. It’s much better to discuss automation, monitoring, SLIs, and SLOs early on. Traditional sysadmins often installed systems manually, but today, we do everything via infrastructure as code-using tools like Terraform or Puppet.
    • +
    • I also distinguish between monitoring and observability. Monitoring tells us, 'The server is down, alarm!' Observability dives deeper, showing trends like increasing latency so we can act proactively.
    • +
    • SLI, SLO, and SLA are core elements. We focus on what users actually experience-for example, how quickly an email is sent-and set our goals accordingly.
    • +
    • Runbooks are also crucial. When something goes wrong at night, you don’t want to start from scratch. A runbook outlines how to debug and resolve specific problems, saving time and reducing downtime.
    • +

    +

    Anecdotes and Best Practices



    -Beelink Mini S12 Pro N100 official page
    +Runbooks sound very practical. Can you explain how they’re used day-to-day?

    -The Intel N100 CPUs are built on the "Alder Lake-N" architecture. These chips are designed to balance performance and energy efficiency well. With four cores, they're more than capable of running multiple containers, even with moderate workloads. Plus, they consume only around 8W of power (ok, that's more than the Pis...), keeping the electricity bill low enough and the setup quiet - perfect for 24/7 operation.
    +Runbooks are essentially guides for handling specific incidents. For instance, if a service won’t start, the runbook will specify where the logs are and which commands to use. Observability takes it a step further, helping us spot changes early-like rising error rates or latency-so we can address issues before they escalate.

    -Beelink preparation
    +When should you decide to put something into a runbook, and when is it unnecessary?

    -The Beelink comes with the following specs:
    +If an issue happens frequently, it should be documented in a runbook so that anyone, even someone new, can follow the steps to fix it. The idea is that 90% of the common incidents should be covered. For example, if a service is down, the runbook would specify where to find logs, which commands to check, and what actions to take. On the other hand, rare or complex issues, where the resolution depends heavily on context or varies each time, don’t make sense to include in detail. For those, it’s better to focus on general troubleshooting steps.

    -
      -
    • 12th Gen Intel N100 processor, with four cores and four threads, and a maximum frequency of up to 3.4 GHz.
    • -
    • 16 GB of DDR4 RAM, with a maximum (official) size of 16 GB (but people could install 32 GB on it).
    • -
    • 500 GB M.2 SSD, with the option to install a 2nd 2.5 SSD drive (which I want to make use of later in this blog series).
    • -
    • GBit ethernet
    • -
    • Four USB 3.2 Gen2 ports (maybe I want to mount something externally at some point)
    • -
    • Dimensions and weight: 115*102*39mm, 280g
    • -
    • Silent cooling system.
    • -
    • HDMI output (needed only for the initial installation and maybe for troubleshooting later)
    • -
    • Auto power on via WoL (may make use of it)
    • -
    • Wi-Fi (not going to use it)
    • -

    -I bought three (3) of them for the cluster I intend to build.
    +How do you search for and find the correct runbooks?

    -
    +Runbooks should be linked directly in the alert you receive. For example, if you get an alert about a service not running, the alert will have a link to the runbook that tells you what to check, like logs or commands to run. Runbooks are best stored in an internal wiki, so if you don’t find the link in the alert, you know where to search. The important thing is that runbooks are easy to find and up to date because that’s what makes them useful during incidents.

    -Unboxing was uneventful. Every Beelink PC came with:
    +Do you have an interesting war story you can share with us?

    -
      -
    • An AC power adapter
    • -
    • An HDMI cable
    • -
    • A VESA mount with screws (not using it as of now)
    • -
    • Some manuals
    • -
    • The pre-assembled Beelink PC itself.
    • -
    • A "Hello" post card (??)
    • -

    -Overall, I love the small form factor.
    +Sure. At 1&1, we had a proprietary ad server software that ran a SQL query during startup. The query got slower over time, eventually timing out and preventing the server from starting. Since we couldn’t access the source code, we searched the binary for the SQL and patched it. By pinpointing the issue, a developer was able to adjust the SQL. This collaboration between sysadmin and developer perspectives highlights the value of SRE work.

    -

    Network switch


    +

    Working with Different Teams



    -I went with the tp-link mini 5-port switch, as I had a spare one available. That switch will be plugged into my wall ethernet port, which connects directly to my fiber internet router with 100 Mbit/s down and 50 Mbit/s upload speed.
    +You’re embedded in a team-how does collaboration with developers work practically?

    -Switch
    +We plan everything together from the start. If there’s a new feature, we discuss infrastructure, automated deployments, and monitoring right away. Developers are experts in the code, and I bring the infrastructure expertise. This avoids unpleasant surprises before going live.

    -

    Installing FreeBSD


    +How about working with data scientists or ML engineers? Are there differences?

    -

    Base install


    +The principles are the same. ML models also need to be deployed and monitored. You deal with monitoring, resource allocation, and identifying performance drops. Whether it’s a microservice or an ML job, at the end of the day, it’s all running on servers or clusters that must remain stable.

    -First, I downloaded the boot-only ISO of the latest FreeBSD release and dumped it on a USB stick via my Fedora laptop:
    +What about working with managers or the FinOps team?

    - -
    [paul@earth]~/Downloads% sudo dd \
    -  if=FreeBSD-14.1-RELEASE-amd64-bootonly.iso \
    -  of=/dev/sda conv=sync
    -
    +We often discuss costs, especially in the cloud, where scaling up resources is easy. It’s crucial to know our metrics: do we have enough capacity? Do we need all instances? Or is the CPU only at 5% utilization? This data helps managers decide whether the budget is sufficient or if optimizations are needed.

    -Next, I plugged the Beelinks (one after another) into my monitor via HDMI (the resolution of the FreeBSD text console seems strangely stretched, as I am using the LG Dual Up monitor), connected Ethernet, an external USB keyboard, and the FreeBSD USB stick, and booted the devices up. With F7, I entered the boot menu and selected the USB stick for the FreeBSD installation.
    +Do you have practical tips for working with SREs?

    -The installation was uneventful. I selected:
    +Yes, I have a few:

      -
    • Guided ZFS on root (pool zroot)
    • -
    • Unencrypted ZFS (I will encrypt separate datasets later; I want it to be able to boot without manual interaction)
    • -
    • Static IP configuration (to ensure that the boxes always have the same IPs, even after switching the router/DHCP server)
    • -
    • I decided to enable the SSH daemon, NTP server, and NTP time synchronization at boot, and I also enabled powerd for automatic CPU frequency scaling.
    • -
    • In addition to root, I added a personal user, paul, whom I placed in the wheel group.
    • +
    • Early involvement: Include SREs from the beginning in your project.
    • +
    • Runbooks & documentation: Document recurring errors.
    • +
    • Try first: Try to understand the issue yourself before immediately asking the SRE.
    • +
    • Basic infra knowledge: Kubernetes and Terraform aren’t magic. Some basic understanding helps every developer.

    -After doing all that three times (once for each Beelink PC), I had three ready-to-use FreeBSD boxes! Their hostnames are f0, f1 and f2!
    +

    Using AI Tools



    -Beelink installation
    +Let’s talk about AI. How do you use it in your daily work?

    -

    Latest patch level and customizing /etc/hosts


    +For boilerplate code, like Terraform snippets, I often use ChatGPT. It saves time, although I always review and adjust the output. Log analysis is another exciting application. Instead of manually going through millions of lines, AI can summarize key outliers or errors.

    -After the first boot, I upgraded to the latest FreeBSD patch level as follows:
    +Do you think AI could largely replace SREs or significantly change the role?

    - -
    root@f0:~ # freebsd-update fetch
    -root@f0:~ # freebsd-update install
    -root@f0:~ # freebsd-update reboot
    -
    +I see AI as an additional tool. SRE requires a deep understanding of how distributed systems work internally. While AI can assist with routine tasks or quickly detect anomalies, human expertise is indispensable for complex issues.

    -I also added the following entries for the three FreeBSD boxes to the /etc/hosts file:
    +

    SRE Learning Resources



    - -
    root@f0:~ # cat <<END >>/etc/hosts
    -192.168.1.130 f0 f0.lan f0.lan.buetow.org
    -192.168.1.131 f1 f1.lan f1.lan.buetow.org
    -192.168.1.132 f2 f2.lan f2.lan.buetow.org
    -END
    -
    +What resources would you recommend for learning about SRE?

    -You might wonder why bother using the hosts file? Why not use DNS properly? The reason is simplicity. I don't manage 100 hosts, only a few here and there. Having an OpenWRT router in my home, I could also configure everything there, but maybe I'll do that later. For now, keep it simple and straightforward.
    +The Google SRE book is a classic, though a bit dry. I really like 'Seeking SRE,' as it offers various perspectives on SRE, with many practical stories from different companies.

    -

    After install


    +https://sre.google/books/
    +Seeking SRE

    -After that, I installed the following additional packages:
    +Do you have a podcast recommendation?

    - -
    root@f0:~ # pkg install helix doas zfs-periodic uptimed
    -
    +The Google SRE prodcast is quite interesting. It offers insights into how Google approaches SRE, along with perspectives from external guests.

    -

    Helix editor


    +https://sre.google/prodcast/

    -Helix? It's my favourite text editor. I have nothing against vi but like hx (Helix) more!
    +

    Blogging



    -https://helix-editor.com/
    +You also have a blog. What motivates you to write regularly?

    -

    doas


    +Writing helps me learn the most. It also serves as a personal reference. Sometimes I look up how I solved a problem a year ago. And of course, others tackling similar projects might find inspiration in my posts.

    -doas? It's a pretty neat (and KISS) replacement for sudo. It has far fewer features than sudo, which is supposed to make it more secure. Its origin is the OpenBSD project. For doas, I accepted the default configuration (where users in the wheel group are allowed to run commands as root):
    +What do you blog about?

    - -
    root@f0:~ # cp /usr/local/etc/doas.conf.sample /usr/local/etc/doas.conf
    -
    +Mostly technical topics I find exciting, like homelab projects, Kubernetes, or book summaries on IT and productivity. It’s a personal blog, so I write about what I enjoy.

    -https://man.openbsd.org/doas
    +

    Wrap-up



    -

    Periodic ZFS snapshotting


    +To wrap up, what are three things every team should keep in mind for stability?

    -zfs-periodic is a nifty tool for automatically creating ZFS snapshots. I decided to go with the following configuration here:
    +First, maintain runbooks and documentation to avoid chaos at night. Second, automate everything-manual installs in production are risky. Third, define SLIs, SLOs, and SLAs early so everyone knows what we’re monitoring and guaranteeing.

    - -
    root@f0:~ # cat <<END >>/etc/periodic.conf
    -daily_zfs_snapshot_enable="YES"
    -daily_zfs_snapshot_pools="zroot"
    -daily_zfs_snapshot_keep="7"
    -weekly_zfs_snapshot_enable="YES"
    -weekly_zfs_snapshot_pools="zroot"
    -weekly_zfs_snapshot_keep="5"
    -monthly_zfs_snapshot_enable="YES"
    -monthly_zfs_snapshot_pools="zroot"
    -monthly_zfs_snapshot_keep="6"
    -END
    -
    +Is there a motto or mindset that particularly inspires you as an SRE?

    -https://github.com/ross/zfs-periodic
    +"Keep it simple and stupid"-KISS. Not everything has to be overly complex. And always stay curious. I’m still fascinated by how systems work under the hood.

    -Note: We have not added zdata to the list of snapshot pools. Currently, this pool does not exist yet, but it will be created later in this blog series. zrepl, which we will use for replication, later in this blog series will manage the zdata snapshots.
    +Where can people find you online?

    -

    Uptime tracking


    +You can find links to my socials on my website paul.buetow.org
    +I regularly post articles and link to everything else I’m working on outside of work.

    -uptimed? I like to track my uptimes. This is how I configured the daemon:
    +https://paul.buetow.org

    - -
    root@f0:~ # cp /usr/local/mimecast/etc/uptimed.conf-dist \
    -  /usr/local/mimecast/etc/uptimed.conf 
    -root@f0:~ # hx /usr/local/mimecast/etc/uptimed.conf
    -
    +Thank you very much for your time and this insightful interview into the world of site reliability engineering
    +
    +My pleasure, this was fun.
    +
    +

    Closing comments


    +
    +Thanks for reading! Hopefully there’s something useful in here for your own work. Reliable systems are a team effort, after all.
    +
    +E-Mail your comments to paul@nospam.buetow.org or contact Florian via the Cracking AI Engineering :-)
    +
    +Back to the main site
    +
    +
    +
    + + Posts from October to December 2024 + + gemini://foo.zone/gemfeed/2025-01-01-posts-from-october-to-december-2024.gmi + 2024-12-31T18:09:58+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Happy new year! + +
    +

    Posts from October to December 2024



    -In the Helix editor session, I changed LOG_MAXIMUM_ENTRIES to 0 to keep all uptime entries forever and not cut off at 50 (the default config). After that, I enabled and started uptimed:
    +Published at 2024-12-31T18:09:58+02:00

    - -
    root@f0:~ # service uptimed enable
    -root@f0:~ # service uptimed start
    -
    +Happy new year!

    -To check the current uptime stats, I can now run uprecords:
    +These are my social media posts from the last three months. I keep them here to reflect on them and also to not lose them. Social media networks come and go and are not under my control, but my domain is here to stay.

    - -
     root@f0:~ # uprecords
    -     #               Uptime | System                                     Boot up
    -----------------------------+---------------------------------------------------
    -->   1     0 days, 00:07:34 | FreeBSD 14.1-RELEASE      Mon Dec  2 12:21:44 2024
    -----------------------------+---------------------------------------------------
    -NewRec     0 days, 00:07:33 | since                     Mon Dec  2 12:21:44 2024
    -    up     0 days, 00:07:34 | since                     Mon Dec  2 12:21:44 2024
    -  down     0 days, 00:00:00 | since                     Mon Dec  2 12:21:44 2024
    -   %up              100.000 | since                     Mon Dec  2 12:21:44 2024
    -
    +These are from Mastodon and LinkedIn. Have a look at my about page for my social media profiles. This list is generated with Gos, my social media platform sharing tool.

    -This is how I track the uptimes for all of my host:
    +My about page
    +https://codeberg.org/snonux/gos

    -Unveiling guprecords.raku: Global Uptime Records with Raku-
    -https://github.com/rpodgorny/uptimed
    +

    Table of Contents



    -

    Hardware check


    +
    +

    October 2024



    -

    Ethernet


    +

    First on-call experience in a startup. Doesn't ...



    -Works. Nothing eventful, really. It's a cheap Realtek chip, but it will do what it is supposed to do.
    +First on-call experience in a startup. Doesn't sound a lot of fun! But the lessons were learned! #sre

    - -
    paul@f0:~ % ifconfig re0
    -re0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    -        options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
    -        ether e8:ff:1e:d7:1c:ac
    -        inet 192.168.1.130 netmask 0xffffff00 broadcast 192.168.1.255
    -        inet6 fe80::eaff:1eff:fed7:1cac%re0 prefixlen 64 scopeid 0x1
    -        inet6 fd22:c702:acb7:0:eaff:1eff:fed7:1cac prefixlen 64 detached autoconf
    -        inet6 2a01:5a8:304:1d5c:eaff:1eff:fed7:1cac prefixlen 64 autoconf pltime 10800 vltime 14400
    -        media: Ethernet autoselect (1000baseT <full-duplex>)
    -        status: active
    -        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
    -
    +ntietz.com/blog/lessons-from-my-first-on-call/

    -

    RAM


    +

    Reviewing your own PR or MR before asking ...



    -All there:
    +Reviewing your own PR or MR before asking others to review it makes a lot of sense. Have seen so many silly mistakes which would have been avoided. Saving time for the real reviewer.

    - -
    paul@f0:~ % sysctl hw.physmem
    -hw.physmem: 16902905856
    -
    -
    +www.jvt.me/posts/2019/01/12/self-code-review/

    -

    CPUs


    +

    Fun with defer in #golang, I did't know, that ...



    -They work:
    +Fun with defer in #golang, I did't know, that a defer object can either be heap or stack allocated. And there are some rules for inlining, too.

    - -
    paul@f0:~ % sysctl dev.cpu | grep freq:
    -dev.cpu.3.freq: 705
    -dev.cpu.2.freq: 705
    -dev.cpu.1.freq: 604
    -dev.cpu.0.freq: 604
    -
    +victoriametrics.com/blog/defer-in-go/

    -

    CPU throttling


    +

    I have been in incidents. Understandably, ...



    -With powerd running, CPU freq is dowthrottled when the box isn't jam-packed. To stress it a bit, I run ubench to see the frequencies being unthrottled again:
    +I have been in incidents. Understandably, everyone wants the issue to be resolved as quickly and others want to know how long TTR will be. IMHO, providing no estimates at all is no solution either. So maybe give a rough estimate but clearly communicate that the estimate is rough and that X, Y, and Z can interfere, meaning there is a chance it will take longer to resolve the incident. Just my thought. What's yours?

    - -
    paul@f0:~ % doas pkg install ubench
    -paul@f0:~ % rehash # For tcsh to find the newly installed command
    -paul@f0:~ % ubench &
    -paul@f0:~ % sysctl dev.cpu | grep freq:
    -dev.cpu.3.freq: 2922
    -dev.cpu.2.freq: 2922
    -dev.cpu.1.freq: 2923
    -dev.cpu.0.freq: 2922
    -
    +firehydrant.com/blog/hot-take-dont-provide-incident-resolution-estimates/

    -Idle, all three Beelinks plus the switch consumed 26.2W. But with ubench stressing all the CPUs, it went up to 38.8W.
    +

    Little tips using strings in #golang and I ...



    -Idle consumption.
    +Little tips using strings in #golang and I personally think one must look more into the std lib (not just for strings, also for slices, maps,...), there are tons of useful helper functions.

    -

    Wake-on-LAN Setup


    +www.calhoun.io/6-tips-for-using-strings-in-go/

    -Updated Sun 11 Jan 10:30:00 EET 2026
    +

    Reading this post about #rust (especially the ...



    -As mentioned in the hardware specs above, the Beelink S12 Pro supports Wake-on-LAN (WoL), which allows me to remotely power on the machines over the network. This is particularly useful since I don't need all three machines running 24/7, and I can save power by shutting them down when not needed and waking them up on demand.
    +Reading this post about #rust (especially the first part), I think I made a good choice in deciding to dive into #golang instead. There was a point where I wanted to learn a new programming language, and Rust was on my list of choices. I think the Go project does a much better job of deciding what goes into the language and how. What are your thoughts?

    -The good news is that FreeBSD already has WoL support enabled by default on the Realtek network interface, as evidenced by the WOL_MAGIC option shown in the ifconfig re0 output above (line 215).
    +josephg.com/blog/rewriting-rust/

    -

    Setting up WoL on the laptop


    +

    The opposite of #ChaosMonkey ... ...



    -To wake the Beelinks from my Fedora laptop (earth), I installed the wol package:
    +The opposite of #ChaosMonkey ... automatically repairing and healing services helping to reduce manual toil work. Runbooks and scripts are only the first step, followed by a fully blown service written in Go. Could be useful, but IMHO why not rather address the root causes of the manual toil work? #sre

    - -
    [paul@earth]~% sudo dnf install -y wol
    -
    +blog.cloudflare.com/nl-nl/improving-platform-resilience-at-cloudflare/

    -Next, I created a simple script (~/bin/wol-f3s) to wake and shutdown the machines:
    +

    November 2024



    - -
    #!/bin/bash
    -# Wake-on-LAN and shutdown script for f3s cluster (f0, f1, f2)
    -
    -# MAC addresses
    -F0_MAC="e8:ff:1e:d7:1c:ac"  # f0 (192.168.1.130)
    -F1_MAC="e8:ff:1e:d7:1e:44"  # f1 (192.168.1.131)
    -F2_MAC="e8:ff:1e:d7:1c:a0"  # f2 (192.168.1.132)
    -
    -# IP addresses
    -F0_IP="192.168.1.130"
    -F1_IP="192.168.1.131"
    -F2_IP="192.168.1.132"
    -
    -# SSH user
    -SSH_USER="paul"
    -
    -# Broadcast address for your LAN
    -BROADCAST="192.168.1.255"
    -
    -wake() {
    -    local name=$1
    -    local mac=$2
    -    echo "Sending WoL packet to $name ($mac)..."
    -    wol -i "$BROADCAST" "$mac"
    -}
    -
    -shutdown_host() {
    -    local name=$1
    -    local ip=$2
    -    echo "Shutting down $name ($ip)..."
    -    ssh -o ConnectTimeout=5 "$SSH_USER@$ip" "doas poweroff" 2>/dev/null && \
    -        echo "  ✓ Shutdown command sent to $name" || \
    -        echo "  ✗ Failed to reach $name (already down?)"
    -}
    -
    -ACTION="${1:-all}"
    -
    -case "$ACTION" in
    -    f0) wake "f0" "$F0_MAC" ;;
    -    f1) wake "f1" "$F1_MAC" ;;
    -    f2) wake "f2" "$F2_MAC" ;;
    -    all|"")
    -        wake "f0" "$F0_MAC"
    -        wake "f1" "$F1_MAC"
    -        wake "f2" "$F2_MAC"
    -        ;;
    -    shutdown|poweroff|down)
    -        shutdown_host "f0" "$F0_IP"
    -        shutdown_host "f1" "$F1_IP"
    -        shutdown_host "f2" "$F2_IP"
    -        echo ""
    -        echo "✓ Shutdown commands sent to all machines."
    -        exit 0
    -        ;;
    -    *)
    -        echo "Usage: $0 [f0|f1|f2|all|shutdown]"
    -        exit 1
    -        ;;
    -esac
    -
    -echo ""
    -echo "✓ WoL packets sent. Machines should boot in a few seconds."
    -
    +

    I just became a Silver Patreon for OSnews. What ...



    -After making the script executable with chmod +x ~/bin/wol-f3s, I can now control the machines with simple commands:
    +I just became a Silver Patreon for OSnews. What is OSnews? It is an independent news site about IT. It is slightly independent and, at times, alternative. I have enjoyed it since my early student days. This one and other projects I financially support are listed here:

    - -
    [paul@earth]~% wol-f3s          # Wake all three
    -[paul@earth]~% wol-f3s f0       # Wake only f0
    -[paul@earth]~% wol-f3s shutdown # Shutdown all three via SSH
    -
    +foo.zone/gemfeed/2024-09-07-projects-i-support.gmi (Gemini)
    +foo.zone/gemfeed/2024-09-07-projects-i-support.html

    -

    Testing WoL and Shutdown


    +

    Until now, I wasn't aware, that Go is under a ...



    -To test the setup, I shutdown all three machines using the script's shutdown function:
    +Until now, I wasn't aware, that Go is under a BSD-style license (3-clause as it seems). Neat. I don't know why, but I always was under the impression it would be MIT. #bsd #golang

    - -
    [paul@earth]~% wol-f3s shutdown
    -Shutting down f0 (192.168.1.130)...
    -  ✓ Shutdown command sent to f0
    -Shutting down f1 (192.168.1.131)...
    -  ✓ Shutdown command sent to f1
    -Shutting down f2 (192.168.1.132)...
    -  ✓ Shutdown command sent to f2
    -
    -✓ Shutdown commands sent to all machines.
    -
    +go.dev/LICENSE

    -After waiting for them to fully power down (about 1 minute), I sent the WoL magic packets:
    +

    These are some book notes from "Staff Engineer" ...



    - -
    [paul@earth]~% wol-f3s
    -Sending WoL packet to f0 (e8:ff:1e:d7:1c:ac)...
    -Waking up e8:ff:1e:d7:1c:ac...
    -Sending WoL packet to f1 (e8:ff:1e:d7:1e:44)...
    -Waking up e8:ff:1e:d7:1e:44...
    -Sending WoL packet to f2 (e8:ff:1e:d7:1c:a0)...
    -Waking up e8:ff:1e:d7:1c:a0...
    -
    -✓ WoL packets sent. Machines should boot in a few seconds.
    -
    +These are some book notes from "Staff Engineer" – there is some really good insight into what is expected from a Staff Engineer and beyond in the industry. I wish I had read the book earlier.

    -Within 30-50 seconds, all three machines successfully booted up and became accessible via SSH!
    +foo.zone/gemfeed/2024-10-24-staff-engineer-book-notes.gmi (Gemini)
    +foo.zone/gemfeed/2024-10-24-staff-engineer-book-notes.html

    -This also works fine over WiFi, by the way — as long as the laptop and the Beelinks are on the same local network, the router bridges everything. And wol-f3s shutdown does the reverse (SSH + doas poweroff), so I can spin the whole cluster up and down pretty quickly.
    +

    Looking at #Kubernetes, it's pretty much ...



    -

    BIOS Configuration


    +Looking at #Kubernetes, it's pretty much following the Unix way of doing things. It has many tools, but each tool has its own single purpose: DNS, scheduling, container runtime, various controllers, networking, observability, alerting, and more services in the control plane. Everything is managed by different services or plugins, mostly running in their dedicated pods. They don't communicate through pipes, but network sockets, though. #k8s

    -For WoL to work reliably, make sure to check the BIOS settings on each Beelink:
    +

    There has been an outage at the upstream ...



    -
      -
    • Enable "Wake on LAN" (usually under Power Management)
    • -
    • Disable "ERP Support" or "ErP Ready" (this can prevent WoL from working)
    • -
    • Enable "Power on by PCI-E" or "Wake on PCI-E"
    • -

    -The exact menu names vary, but these settings are typically found in the Power Management or Advanced sections of the BIOS.
    +There has been an outage at the upstream network provider for OpenBSD.Amsterdam (hoster, I am using). This was the first real-world test for my KISS HA setup, and it worked flawlessly! All my sites and services failed over automatically to my other #OpenBSD VM!

    -

    Conclusion


    +foo.zone/gemfeed/2024-04-01-KISS-high-availability-with-OpenBSD.gmi (Gemini)
    +foo.zone/gemfeed/2024-04-01-KISS-high-availability-with-OpenBSD.html
    +openbsd.amsterdam/

    -Honestly, the Beelink S12 Pro with the N100 is kind of perfect for this — tiny, cheap, sips power, and runs both Linux and FreeBSD without drama. I'm pretty happy with it.
    +

    One of the more confusing parts in Go, nil ...



    -Beelinks stacked
    +One of the more confusing parts in Go, nil values vs nil errors: #golang

    -To ease cable management, I need to get shorter ethernet cables. I will place the tower on my shelf, where most of the cables will be hidden (together with a UPS, which will also be added to the setup).
    +unexpected-go.com/nil-errors-that-are-non-nil-errors.html

    -Read the next post of this series:
    +

    Agreeably, writing down with Diagrams helps you ...



    -f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +Agreeably, writing down with Diagrams helps you to think things more through. And keeps others on the same page. Only worth for projects from a certain size, IMHO.

    -Other *BSD-related posts:
    +ntietz.com/blog/reasons-to-write-design-docs/

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation (You are currently reading this)
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +

    I like the idea of types in Ruby. Raku is ...



    -E-Mail your comments to paul@nospam.buetow.org :-)
    +I like the idea of types in Ruby. Raku is supports that already, but in Ruby, you must specify the types in a separate .rbs file, which is, in my opinion, cumbersome and is a reason not to use it extensively for now. I believe there are efforts to embed the type information in the standard .rb files, and that the .rbs is just an experiment to see how types could work out without introducing changes into the core Ruby language itself right now? #Ruby #RakuLang

    -Back to the main site
    -
    -
    -
    - - f3s: Kubernetes with FreeBSD - Part 1: Setting the stage - - gemini://foo.zone/gemfeed/2024-11-17-f3s-kubernetes-with-freebsd-part-1.gmi - 2024-11-16T23:20:14+02:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the first blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines. - -
    -

    f3s: Kubernetes with FreeBSD - Part 1: Setting the stage


    +github.com/ruby/rbs

    -Published at 2024-11-16T23:20:14+02:00
    +

    So, #Haskell is better suited for general ...



    -This is the first blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.
    +So, #Haskell is better suited for general purpose than #Rust? I thought deploying something in Haskell means publishing an academic paper :-) Interesting rant about Rust, though:

    -I will post a new entry every month or so (there are too many other side projects for more frequent updates—I bet you can understand).
    +chrisdone.com/posts/rust/

    -These are all the posts so far:
    +

    At first, functional options add a bit of ...



    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage (You are currently reading this)
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +At first, functional options add a bit of boilerplate, but they turn out to be quite neat, especially when you have very long parameter lists that need to be made neat and tidy. #golang

    -f3s logo
    +www.calhoun.io/using-functional-options-instead-of-method-chaining-in-go/

    -ChatGPT generated logo..
    +

    Revamping my home lab a little bit. #freebsd ...



    -Let's begin...
    +Revamping my home lab a little bit. #freebsd #bhyve #rocky #linux #vm #k3s #kubernetes #wireguard #zfs #nfs #ha #relayd #k8s #selfhosting #homelab

    -

    Table of Contents


    +foo.zone/gemfeed/2024-11-17-f3s-kubernetes-with-freebsd-part-1.gmi (Gemini)
    +foo.zone/gemfeed/2024-11-17-f3s-kubernetes-with-freebsd-part-1.html

    -
    -

    Why this setup?


    +

    Wondering to which #web #browser I should ...



    -My previous setup was great for learning Terraform and AWS, but it is too expensive. Costs are under control there, but only because I am shutting down all containers after use (so they are offline ninety percent of the time and still cost around $20 monthly). With the new setup, I could run all containers 24/7 at home, which would still be cheaper in terms of electricity consumption. I have a 400 MBit/s uplink (I could have more if I wanted, but it is more than plenty for my use case already).
    +Wondering to which #web #browser I should switch now personally ...

    -From babylon5.buetow.org to .cloud
    +www.osnews.com/story/141100/mozilla-fo..-..dvocacy-for-open-web-privacy-and-more/

    -Migrating off all my containers from AWS ECS means I need a reliable and scalable environment to host my workloads. I wanted something:
    +

    eks-node-viewer is a nifty tool, showing the ...



    -
      -
    • To self-host all my open-source apps (Docker containers).
    • -
    • Fully under my control (goodbye cloud vendor lock-in).
    • -
    • Secure and redundant.
    • -
    • Cost-efficient (after the initial hardware investment).
    • -
    • Something I can poke around with and also pick up new skills.
    • -

    -

    The infrastructure


    +eks-node-viewer is a nifty tool, showing the compute nodes currently in use in the #EKS cluster. especially useful when dynamically allocating nodes with #karpenter or auto scaling groups.

    -This is still in progress, and I need to own the hardware. But in this first part of the blog series, I will outline what I intend to do.
    +github.com/awslabs/eks-node-viewer

    -Diagram
    +

    Have put more Photos on - On my static photo ...



    -

    Physical FreeBSD nodes and Linux VMs


    +Have put more Photos on - On my static photo sites - Generated with a #bash script

    -The setup starts with three physical FreeBSD nodes deployed into my home LAN. On these, I'm going to run Rocky Linux virtual machines with bhyve. Why Linux VMs in FreeBSD and not Linux directly? I want to leverage the great ZFS integration in FreeBSD (among other features), and I have been using FreeBSD for a while in my home lab. And with bhyve, there is a very performant hypervisor available which makes the Linux VMs de-facto run at native speed (another use case of mine would be maybe running a Windows bhyve VM on one of the nodes - but out of scope for this blog series).
    +irregular.ninja

    -https://www.freebsd.org/
    -https://wiki.freebsd.org/bhyve
    +

    In Go, passing pointers are not automatically ...



    -I selected Rocky Linux because it comes with long-term support (I don't want to upgrade the VMs every 6 months). Rocky Linux 9 will reach its end of life in 2032, which is plenty of time! Of course, there will be minor upgrades, but nothing will significantly break my setup.
    +In Go, passing pointers are not automatically faster than values. Pointers often force the memory to be allocated on the heap, adding GC overhad. With values, Go can determine whether to put the memory on the stack instead. But with large structs/objects (how you want to call them) or if you want to modify state, then pointers are the semantic to use. #golang

    -https://rockylinux.org/
    -https://wiki.rockylinux.org/rocky/version/
    +blog.boot.dev/golang/pointers-faster-than-values/

    -Furthermore, I am already using "RHEL-family" related distros at work and Fedora on my main personal laptop. Rocky Linux belongs to the same type of Linux distribution family, so I already feel at home here. I also used Rocky 9 before I switched to AWS ECS. Now, I am switching back in one sense or another ;-)
    +

    Myself being part of an on-call rotations over ...



    -

    Kubernetes with k3s


    +Myself being part of an on-call rotations over my whole professional life, just have learned this lesson "Tell people who are new to on-call: Just have fun" :-) This is a neat blog post to read:

    -These Linux VMs form a three-node k3s Kubernetes cluster, where my containers will reside moving forward. The 3-node k3s cluster will be highly available (in etcd mode), and all apps will probably be deployed with Helm. Prometheus will also be running in k3s, collecting time-series metrics and handling monitoring. Additionally, a private Docker registry will be deployed into the k3s cluster, where I will store some of my self-created Docker images. k3s is the perfect distribution of Kubernetes for homelabbers due to its simplicity and the inclusion of the most useful features out of the box!
    +ntietz.com/blog/what-i-tell-people-new-to-oncall/

    -https://k3s.io/
    +

    Feels good to code in my old love #Perl again ...



    -

    HA volumes for k3s with HAST/ZFS and NFS


    +Feels good to code in my old love #Perl again after a while. I am implementing a log parser for generating site stats of my personal homepage! :-) @Perl

    -Persistent storage for the k3s cluster will be handled by highly available (HA) NFS shares backed by ZFS on the FreeBSD hosts.
    +

    This is an interactive summary of the Go ...



    -On two of the three physical FreeBSD nodes, I will add a second SSD drive to each and dedicate it to a zhast ZFS pool. With HAST (FreeBSD's solution for highly available storage), this pool will be replicated at the byte level to a standby node.
    +This is an interactive summary of the Go release, with a lot of examples utilising iterators in the slices and map packages. Love it! #golang

    -A virtual IP (VIP) will point to the master node. When the master node goes down, the VIP will failover to the standby node, where the ZFS pool will be mounted. An NFS server will listen to both nodes. k3s will use the VIP to access the NFS shares.
    +antonz.org/go-1-23/

    -FreeBSD Wiki: Highly Available Storage
    +

    December 2024



    -You can think of DRBD being the Linux equivalent to FreeBSD's HAST.
    +

    Thats unexpected, you cant remove a NaN key ...



    -

    OpenBSD/relayd to the rescue for external connectivity


    +Thats unexpected, you cant remove a NaN key from a map without clearing it! #golang

    -All apps should be reachable through the internet (e.g., from my phone or computer when travelling). For external connectivity and TLS management, I've got two OpenBSD VMs (one hosted by OpenBSD Amsterdam and another hosted by Hetzner) handling public-facing services like DNS, relaying traffic, and automating Let's Encrypt certificates.
    +unexpected-go.com/you-cant-remove-a-nan-key-from-a-map-without-clearing-it.html

    -All of this (every Linux VM to every OpenBSD box) will be connected via WireGuard tunnels, keeping everything private and secure. There will be 6 WireGuard tunnels (3 k3s nodes times two OpenBSD VMs).
    +

    My second blog post about revamping my home lab ...



    -https://en.wikipedia.org/wiki/WireGuard
    +My second blog post about revamping my home lab a little bit just hit the net. #FreeBSD #ZFS #n100 #k8s #k3s #kubernetes

    -So, when I want to access a service running in k3s, I will hit an external DNS endpoint (with the authoritative DNS servers being the OpenBSD boxes). The DNS will resolve to the master OpenBSD VM (see my KISS highly-available with OpenBSD blog post), and from there, the relayd process (with a Let's Encrypt certificate—see my Let's Encrypt with OpenBSD and Rex blog post) will accept the TCP connection and forward it through the WireGuard tunnel to a reachable node port of one of the k3s nodes, thus serving the traffic.
    +foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.gmi (Gemini)
    +foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.html
    +
    +

    Very insightful article about tech hiring in ...


    +
    +Very insightful article about tech hiring in the age of LLMs. As an interviewer, I have experienced some of the scrnarios already first hand...
    +
    +newsletter.pragmaticengineer.com/p/how-genai-changes-tech-hiring
    +
    +

    for #bpf #ebpf performance debugging, have ...


    +
    +for #bpf #ebpf performance debugging, have a look at bpftop from Netflix. A neat tool showing you the estimated CPU time and other performance statistics for all the BPF programs currently loaded into the #linux kernel. Highly recommend!
    +
    +github.com/Netflix/bpftop
    +
    +

    89 things he/she knows about Git commits is a ...


    +
    +89 things he/she knows about Git commits is a neat list of #Git wisdoms
    +
    +www.jvt.me/posts/2024/07/12/things-know-commits/
    +
    +

    I found that working on multiple side projects ...


    +
    +I found that working on multiple side projects concurrently is better than concentrating on just one. This seems inefficient at first, but whenever you tend to lose motivation, you can temporarily switch to another one with full élan. However, remember to stop starting and start finishing. This doesn't mean you should be working on 10+ (and a growing list of) side projects concurrently! Select your projects and commit to finishing them before starting the next thing. For example, my current limit of concurrent side projects is around five.
    +
    +

    Agreed? Agreed. Besides #Ruby, I would also ...


    +
    +Agreed? Agreed. Besides #Ruby, I would also add #RakuLang and #Perl @Perl to the list of languages that are great for shell scripts - "Making Easy Things Easy and Hard Things Possible"
    +
    +lucasoshiro.github.io/posts-en/2024-06-17-ruby-shellscript/
    +
    +

    Plan9 assembly format in Go, but wait, it's not ...


    +
    +Plan9 assembly format in Go, but wait, it's not the Operating System Plan9! #golang #rabbithole
    +
    +www.osnews.com/story/140941/go-plan9-memo-speeding-up-calculations-450/
    +
    +

    This is a neat blog post about the Helix text ...



    -KISS high-availability with OpenBSD
    -Let's Encrypt with OpenBSD and Rex
    +This is a neat blog post about the Helix text editor, to which I personally switched around a year ago (from NeoVim). I should blog about my experience as well. To summarize: I am using it together with the terminal multiplexer #tmux. It doesn't bother me that Helix is purely terminal-based and therefore everything has to be in the same font. #HelixEditor

    -The OpenBSD setup described here already exists and is ready to use. The only thing that does not yet exist is the configuration of relayd to forward requests to k3s through the WireGuard tunnel(s).
    +jonathan-frere.com/posts/helix/

    -

    Data integrity


    +

    This blog post is basically a rant against ...



    -

    Periodic backups


    +This blog post is basically a rant against DataDog... Personally, I don't have much experience with DataDog (actually, I have never used it), but one reason to work with logs at my day job (with over 2,000 physical server machines) and to be cost-effective is by using dtail! #dtail #logs #logmanagement

    -Let's face it, backups are non-negotiable.
    +crys.site/blog/2024/reinventint-the-weel/
    +dtail.dev

    -On the HAST master node, incremental and encrypted ZFS snapshots are created daily and automatically backed up to AWS S3 Glacier Deep Archive via CRON. I have a bunch of scripts already available, which I currently use for a similar purpose on my FreeBSD Home NAS server (an old ThinkPad T440 with an external USB drive enclosure, which I will eventually retire when the HAST setup is ready). I will copy them and slightly modify them to fit the purpose.
    +

    Quick trick to get Helix themes selected ...



    -There's also zfstools in the ports, which helps set up an automatic snapshot regime:
    +Quick trick to get Helix themes selected randomly #HelixEditor

    -https://www.freshports.org/sysutils/zfstools
    +foo.zone/gemfeed/2024-12-15-random-helix-themes.gmi (Gemini)
    +foo.zone/gemfeed/2024-12-15-random-helix-themes.html

    -The backup scripts also perform some zpool scrubbing now and then. A scrub once in a while keeps the trouble away.
    +

    Example where complexity attacks you from ...



    -

    Power protection


    +Example where complexity attacks you from behind #k8s #kubernetes #OpenAI

    -Power outages are regularly in my area, so a UPS keeps the infrastructure running during short outages and protects the hardware. I'm still trying to decide which hardware to get, and I still need one, as my previous NAS is simply an older laptop that already has a battery for power outages. However, there are plenty of options to choose from. My main criterion is that the UPS should be silent, as the whole setup will be installed in an upper shelf unit in my daughter's room. ;-)
    +surfingcomplexity.blog/2024/12/14/quic..-..ecent-openai-public-incident-write-up/

    -

    Monitoring: Keeping an eye on everything


    +

    LLMs for Ops? Summaries of logs, probabilities ...



    -I want to know when stuff breaks (ideally before it breaks), so monitoring is a big part of the plan.
    +LLMs for Ops? Summaries of logs, probabilities about correctness, auto-generating Ansible, some uses cases are there. Wouldn't trust it fully, though.

    -

    Prometheus and Grafana


    +youtu.be/WodaffxVq-E?si=noY0egrfl5izCSQI

    -Inside the k3s cluster, Prometheus will be deployed to handle metrics collection. It will be configured to scrape data from my Kubernetes workloads, nodes, and any services I monitor. Prometheus also integrates with Alertmanager to generate alerts based on predefined thresholds or conditions.
    +

    Excellent article about your dream Product ...



    -https://prometheus.io
    +Excellent article about your dream Product Manager: Why every software team needs a product manager to thrive via @wallabagapp

    -For visualization, Grafana will be deployed alongside Prometheus. I mostly just want dashboards for CPU, memory, and pod health — the usual stuff. Makes it way easier to figure out what's going wrong when something inevitably does.
    +testdouble.com/insights/why-product-ma..-..s-accelerate-improve-software-delivery

    -https://grafana.com
    +

    I just finished reading all chapters of CPU ...



    -

    Gogios: My custom alerting system


    +I just finished reading all chapters of CPU land: ... not claiming to remember every detail, but it is a great refresher how CPUs and operating systems actually work under the hood when you execute a program, which we tend to forget in our higher abstraction world. I liked the "story" and some of the jokes along the way! Size wise, it is pretty digestable (not talking about books, but only 7 web articles/chapters)! #cpu #linux #unix #kernel #macOS

    -Alerts generated by Prometheus are forwarded to Alertmanager, which I will configure to work with Gogios, a lightweight monitoring and alerting system I wrote myself. Gogios runs on one of my OpenBSD VMs. At regular intervals, Gogios scrapes the alerts generated in the k3s cluster and notifies me via Email.
    +cpu.land/

    -KISS server monitoring with Gogios
    +

    Indeed, useful to know this stuff! #sre ...



    -Ironically, I implemented Gogios to avoid using more complex alerting systems like Prometheus, but here we go—it integrates well now.
    +Indeed, useful to know this stuff! #sre

    -

    Conclusion


    +biriukov.dev/docs/resolver-dual-stack-..-..resolvers-and-dual-stack-applications/

    -This setup may be just the beginning. Some ideas I'm thinking about for the future:
    +

    It's the small things, which make Unix like ...



    -
      -
    • Adding more FreeBSD nodes (in different physical locations, maybe at my wider family's places? WireGuard would make it possible!) for better redundancy. (HA storage then might be trickier)
    • -
    • Deploying more Docker apps (data-intensive ones, like a picture gallery, my entire audiobook catalogue, or even a music server) to k3s.
    • -

    -For now, though, I'm focused on completing the migration from AWS ECS and getting all my Docker containers running smoothly in k3s.
    +It's the small things, which make Unix like systems, like GNU/Linux, interesting. Didn't know about this #GNU #Tar behaviour yet:

    -Anyway, stay tuned — in part 2 I'll probably get into the hardware and OS setup.
    +xeiaso.net/notes/2024/pop-quiz-tar/

    -Read the next post of this series:
    +

    My New Year's resolution is not to start any ...



    -f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +My New Year's resolution is not to start any new non-fiction books (or only very few) but to re-read and listen to my favorites, which I read to reflect on and see things from different perspectives. Every time you re-read a book, you gain new insights.<nil>17491

    -Other *BSD-related posts:
    +Other related posts:

    -2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    -2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    -2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    -2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    -2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    -2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    -2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    -2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage (You are currently reading this)
    -2024-04-01 KISS high-availability with OpenBSD
    -2024-01-13 One reason why I love OpenBSD
    -2022-10-30 Installing DTail on OpenBSD
    -2022-07-30 Let's Encrypt with OpenBSD and Rex
    -2016-04-09 Jails and ZFS with Puppet on FreeBSD
    +2026-01-01 Posts from July to December 2025
    +2025-07-01 Posts from January to June 2025
    +2025-01-01 Posts from October to December 2024 (You are currently reading this)

    E-Mail your comments to paul@nospam.buetow.org :-)

    @@ -17531,726 +17752,773 @@ Waking up e8:ff:1e:d7:1c:a0... - 'Staff Engineer' book notes - - gemini://foo.zone/gemfeed/2024-10-24-staff-engineer-book-notes.gmi - 2024-10-24T20:57:44+03:00 + Random Helix Themes + + gemini://foo.zone/gemfeed/2024-12-15-random-helix-themes.gmi + 2024-12-15T13:55:05+02:00 Paul Buetow aka snonux paul@dev.buetow.org - These are my personal takeaways after reading 'Staff Engineer' by Will Larson. Note that the book contains much more knowledge wisdom and that these notes only contain points I personally found worth writing down. This is mainly for my own use, but you might find it helpful too. + I thought it would be fun to have a random Helix theme every time I open a new shell. Helix is the text editor I use.
    -

    "Staff Engineer" book notes


    -
    -Published at 2024-10-24T20:57:44+03:00
    -
    -These are my personal takeaways after reading "Staff Engineer" by Will Larson. Note that the book contains much more knowledge wisdom and that these notes only contain points I personally found worth writing down. This is mainly for my own use, but you might find it helpful too.
    -
    -
    -         ,..........   ..........,
    -     ,..,'          '.'          ',..,
    -    ,' ,'            :            ', ',
    -   ,' ,'             :             ', ',
    -  ,' ,'              :              ', ',
    - ,' ,'............., : ,.............', ',
    -,'  '............   '.'   ............'  ',
    - '''''''''''''''''';''';''''''''''''''''''
    -                    '''
    -
    -
    -

    Table of Contents


    -
    -
    -

    The Four Archetypes of a Staff Engineer


    -
    -Larson defines four archetypes. You'll probably recognize yourself in one (or a mix):
    -
    -
      -
    • Tech Lead: You own the technical direction of a team. Architecture, quality, keeping everyone aligned.
    • -
    • Solver: You get thrown at the hard cross-team problems. Basically a firefighter for gnarly stuff.
    • -
    • Architect: Long-term technical vision. Standards, system design, things that need to last.
    • -
    • Right Hand: Trusted technical advisor to leadership. Strategy, org politics, the stuff nobody else wants to touch.
    • -

    -

    Influence and Impact over Authority


    +

    Random Helix Themes



    -You won't have direct authority over most people or teams you work with. Influence is the actual tool here. You have to persuade, align, sometimes just nudge people in the right direction. No one reports to you, but you still need to drive outcomes.
    +Published at 2024-12-15T13:55:05+02:00; Last updated 2024-12-18

    -

    Breadth and Depth of Knowledge


    +I thought it would be fun to have a random Helix theme every time I open a new shell. Helix is the text editor I use.

    -You need to know a bit about a lot of things (infra, security, product, etc.) but still be able to go deep in a few areas. The tricky part is keeping that breadth current without spreading yourself too thin.
    +https://helix-editor.com/

    -

    Mentorship and Sponsorship


    +So I put this into my zsh dotfiles (in some editor.zsh.source in my ~ directory):

    -Mentoring is obvious -- help people grow technically and career-wise. But sponsorship is the one that surprised me: actively advocating for people, creating opportunities for them, pushing them forward. It's not just answering questions, it's putting your reputation behind someone.

    -

    Managing Up and Across


    +So every time I open a new terminal or shell, editor::helix::random_theme gets called, which randomly selects a theme from all installed ones and updates the helix config accordingly.

    -You have to manage up (set expectations with leadership, advocate for technical needs) and across (work with peer teams, build alignment). Basically a lot of communication and relationship building. Easy to underestimate this one.

    -

    Strategic Thinking


    +

    A better version



    -Senior engineers focus on execution. Staff engineers need to think about what happens months or years from now. That means sometimes pushing back on short-term pressures in favor of longer-term architectural decisions. Not always a popular move.
    +Update 2024-12-18: This is an improved version, which works cross platform (e.g., also on MacOS) and multiple theme directories:

    -

    Emotional Intelligence



    -The higher you go, the more soft skills matter. Building relationships, resolving conflicts, reading the room. I think this catches a lot of engineers off guard -- you can't just be the smartest person technically anymore.
    +I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)

    -
    +Back to the main site
    +
    +
    +
    + + f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation + + gemini://foo.zone/gemfeed/2024-12-03-f3s-kubernetes-with-freebsd-part-2.gmi + 2024-12-02T23:48:21+02:00, last updated Sun 11 Jan 10:30:00 EET 2026 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the second blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines. + +
    +

    f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation



    -A lot of the problems you deal with are poorly defined. Nobody knows exactly what the problem is, let alone the solution. You have to be comfortable operating in that fog and still making progress.
    +Published at 2024-12-02T23:48:21+02:00, last updated Sun 11 Jan 10:30:00 EET 2026

    -

    Visible and Invisible Work


    +This is the second blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.

    -A huge chunk of Staff Engineer work is invisible. Aligning teams, influencing decisions, resolving conflicts -- none of that shows up as commits. Larson says you need to get comfortable with that, which I think is genuinely hard for engineers who are used to shipping things.
    +We set the stage last time; this time, we will set up the hardware for this project.

    -

    Scaling Yourself


    +These are all the posts so far:

    -You can't do everything yourself anymore. Write things down, build repeatable processes, mentor others, automate what you can. The goal is to make teams more effective even when you're not in the room.
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation (You are currently reading this)
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -

    Career Progression and Title Inflation


    +f3s logo

    -"Staff Engineer" means wildly different things at different companies. Titles don't always match actual responsibility or skill. Focus on the work and impact, not the title.
    +ChatGPT generated logo..

    -Some of the above is less about technical chops and more about the strategic and interpersonal side of things. Anyway, here are some more concrete takeaways:
    +Let's continue...

    -

    Not a faster Senior Engineer


    +

    Table of Contents




    -It's important to know what work or which role most energizes you. A Staff engineer is not a more senior engineer. A Staff engineer also fits into another archetype.
    -
    -As a staff engineer, you are always expected to go beyond your comfort zone and learn new things.
    -
    -Your job sometimes will feel like an SEM and sometimes strangely similar to your senior roles.
    -
    -A Staff engineer is, like a Manager, a leader. However, being a Manager is a specific job. Leaders can apply to any job, especially to Staff engineers.
    -
    -

    The Balance


    -
    -The more senior you become, the more responsibility you will have to cope with them in less time. Balance your speed of progress with your personal life, don't work late hours and don't skip these personal care events.
    -
    -Do fewer things but do them better. Everything done will accelerate the organization. Everything else will drag it down—quality over quantity.
    -
    -Don't work at ten things and progress slowly; focus on one thing and finish it.
    -
    -Only spend some of the time firefighting. Have time for deep thinking. Only deep think some of the time. Otherwise, you lose touch with reality.
    -
    -Sebactical: Take at least six months. Otherwise, it won't be as restored.
    +

    Deciding on the hardware



    -

    More things


    +Note that the OpenBSD VMs included in the f3s setup (which will be used later in this blog series for internet ingress - as you know from the first part of this blog series) are already there. These are virtual machines that I rent at OpenBSD Amsterdam and Hetzner.

    -
      -
    • Provide simple but widely used tools. Complex and powerful tools will have power users but only a very few. All others will not use the tool.
    • -
    • In meetings, when someone is inactive, try to pull him in. Pull in max one person at a time. Don't open the discussion to multiple people.
    • -
    • Get used to writing things down and repeating yourself. You will scale yourself much more.
    • -
    • Title inflation: skills correspond to work, but the titles don't.
    • -

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +https://openbsd.amsterdam
    +https://hetzner.cloud

    -Other book notes of mine are:
    +This means that the FreeBSD boxes need to be covered, which will later be running k3s in Linux VMs via bhyve hypervisor.

    -2025-11-02 'The Courage To Be Disliked' book notes
    -2025-06-07 'A Monk's Guide to Happiness' book notes
    -2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    -2024-10-24 'Staff Engineer' book notes (You are currently reading this)
    -2024-07-07 'The Stoic Challenge' book notes
    -2024-05-01 'Slow Productivity' book notes
    -2023-11-11 'Mind Management' book notes
    -2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    -2023-05-06 'The Obstacle is the Way' book notes
    -2023-04-01 'Never split the difference' book notes
    -2023-03-16 'The Pragmatic Programmer' book notes
    +I've been considering whether to use Raspberry Pis or look for alternatives. It turns out that complete N100-based mini-computers aren't much more expensive than Raspberry Pi 5s, and they don't require assembly. Furthermore, I like that they are AMD64 and not ARM-based, which increases compatibility with some applications (e.g., I might want to virtualize Windows (via bhyve) on one of those, though that's out of scope for this blog series).

    -Back to the main site
    -
    -
    -
    - - Gemtexter 3.0.0 - Let's Gemtext again⁴ - - gemini://foo.zone/gemfeed/2024-10-02-gemtexter-3.0.0-lets-gemtext-again-4.gmi - 2024-10-01T21:46:26+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - I proudly announce that I've released Gemtexter version `3.0.0`. What is Gemtexter? It's my minimalist static site generator for Gemini Gemtext, HTML and Markdown, written in GNU Bash. - -
    -

    Gemtexter 3.0.0 - Let's Gemtext again⁴


    +

    Not ARM but Intel N100



    -Published at 2024-10-01T21:46:26+03:00
    +I needed something compact, efficient, and capable enough to handle the demands of a small-scale Kubernetes cluster and preferably something I don't have to assemble a lot. After researching, I decided on the Beelink S12 Pro with Intel N100 CPUs.

    -I proudly announce that I've released Gemtexter version 3.0.0. What is Gemtexter? It's my minimalist static site generator for Gemini Gemtext, HTML and Markdown, written in GNU Bash.
    +Beelink Mini S12 Pro N100 official page

    -https://codeberg.org/snonux/gemtexter
    +The Intel N100 CPUs are built on the "Alder Lake-N" architecture. These chips are designed to balance performance and energy efficiency well. With four cores, they're more than capable of running multiple containers, even with moderate workloads. Plus, they consume only around 8W of power (ok, that's more than the Pis...), keeping the electricity bill low enough and the setup quiet - perfect for 24/7 operation.

    -
    --=[ typewriters ]=-  1/98
    -                                      .-------.
    -       .-------.                     _|~~ ~~  |_
    -      _|~~ ~~  |_       .-------.  =(_|_______|_)
    -    =(_|_______|_)=    _|~~ ~~  |_   |:::::::::|    .-------.
    -      |:::::::::|    =(_|_______|_)  |:::::::[]|   _|~~ ~~  |_
    -      |:::::::[]|      |:::::::::|   |o=======.| =(_|_______|_)
    -      |o=======.|      |:::::::[]|   `"""""""""`   |:::::::::|
    - jgs  `"""""""""`      |o=======.|                 |:::::::[]|
    -  mod. by Paul Buetow  `"""""""""`                 |o=======.|
    -                                                   `"""""""""`
    -
    +Beelink preparation

    -

    Table of Contents


    +The Beelink comes with the following specs:

      -
    • Gemtexter 3.0.0 - Let's Gemtext again⁴
    • -
    • Why Bash?
    • -
    • HTML exact variant is the only variant
    • -
    • Table of Contents auto-generation
    • -
    • Configurable themes
    • -
    • No use of webfonts by default
    • -
    • More
    • +
    • 12th Gen Intel N100 processor, with four cores and four threads, and a maximum frequency of up to 3.4 GHz.
    • +
    • 16 GB of DDR4 RAM, with a maximum (official) size of 16 GB (but people could install 32 GB on it).
    • +
    • 500 GB M.2 SSD, with the option to install a 2nd 2.5 SSD drive (which I want to make use of later in this blog series).
    • +
    • GBit ethernet
    • +
    • Four USB 3.2 Gen2 ports (maybe I want to mount something externally at some point)
    • +
    • Dimensions and weight: 115*102*39mm, 280g
    • +
    • Silent cooling system.
    • +
    • HDMI output (needed only for the initial installation and maybe for troubleshooting later)
    • +
    • Auto power on via WoL (may make use of it)
    • +
    • Wi-Fi (not going to use it)

    -

    Why Bash?


    +I bought three (3) of them for the cluster I intend to build.

    -This project is too complex for a Bash script. Writing it in Bash was to try out how maintainable a "larger" Bash script could be. It's still pretty maintainable and helps me try new Bash tricks here and then!
    +

    -Let's list what's new!
    +Unboxing was uneventful. Every Beelink PC came with:

    -

    HTML exact variant is the only variant


    +
      +
    • An AC power adapter
    • +
    • An HDMI cable
    • +
    • A VESA mount with screws (not using it as of now)
    • +
    • Some manuals
    • +
    • The pre-assembled Beelink PC itself.
    • +
    • A "Hello" post card (??)
    • +

    +Overall, I love the small form factor.

    -The last version of Gemtexter introduced the HTML exact variant, which wasn't enabled by default. This version of Gemtexter removes the previous (inexact) variant and makes the exact variant the default. This is a breaking change, which is why there is a major version bump of Gemtexter. Here is a reminder of what the exact variant was:
    +

    Network switch



    -Gemtexter is there to convert your Gemini Capsule into other formats, such as HTML and Markdown. An HTML exact variant can now be enabled in the gemtexter.conf by adding the line declare -rx HTML_VARIANT=exact. The HTML/CSS output changed to reflect a more exact Gemtext appearance and to respect the same spacing as you would see in the Geminispace.
    +I went with the tp-link mini 5-port switch, as I had a spare one available. That switch will be plugged into my wall ethernet port, which connects directly to my fiber internet router with 100 Mbit/s down and 50 Mbit/s upload speed.

    -

    Table of Contents auto-generation


    +Switch

    -Just add...
    +

    Installing FreeBSD



    -
    - << template::inline::toc
    +

    Base install


    +
    +First, I downloaded the boot-only ISO of the latest FreeBSD release and dumped it on a USB stick via my Fedora laptop:
    +
    + +
    [paul@earth]~/Downloads% sudo dd \
    +  if=FreeBSD-14.1-RELEASE-amd64-bootonly.iso \
    +  of=/dev/sda conv=sync
     

    -...into a Gemtexter template file and Gemtexter will automatically generate a table of contents for the page based on the headings (see this page's ToC for example). The ToC will also have links to the relevant sections in HTML and Markdown output. The Gemtext format does not support links, so the ToC will simply be displayed as a bullet list.
    +Next, I plugged the Beelinks (one after another) into my monitor via HDMI (the resolution of the FreeBSD text console seems strangely stretched, as I am using the LG Dual Up monitor), connected Ethernet, an external USB keyboard, and the FreeBSD USB stick, and booted the devices up. With F7, I entered the boot menu and selected the USB stick for the FreeBSD installation.

    -

    Configurable themes


    +The installation was uneventful. I selected:

    -It was always possible to customize the style of a Gemtexter's resulting HTML page, but all the config options were scattered across multiple files. Now, the CSS style, web fonts, etc., are all configurable via themes.
    +
      +
    • Guided ZFS on root (pool zroot)
    • +
    • Unencrypted ZFS (I will encrypt separate datasets later; I want it to be able to boot without manual interaction)
    • +
    • Static IP configuration (to ensure that the boxes always have the same IPs, even after switching the router/DHCP server)
    • +
    • I decided to enable the SSH daemon, NTP server, and NTP time synchronization at boot, and I also enabled powerd for automatic CPU frequency scaling.
    • +
    • In addition to root, I added a personal user, paul, whom I placed in the wheel group.
    • +

    +After doing all that three times (once for each Beelink PC), I had three ready-to-use FreeBSD boxes! Their hostnames are f0, f1 and f2!

    -Simply configure HTML_THEME_DIR in the gemtexter.conf file to the corresponding directory. For example:
    +Beelink installation
    +
    +

    Latest patch level and customizing /etc/hosts


    +
    +After the first boot, I upgraded to the latest FreeBSD patch level as follows:

    -
    declare -xr HTML_THEME_DIR=./extras/html/themes/simple
    +
    root@f0:~ # freebsd-update fetch
    +root@f0:~ # freebsd-update install
    +root@f0:~ # freebsd-update reboot
     

    -To customize the theme or create your own, simply copy the theme directory and modify it as needed. This makes it also much easier to switch between layouts.
    -
    -

    No use of webfonts by default


    +I also added the following entries for the three FreeBSD boxes to the /etc/hosts file:

    -The default theme is now "back to the basics" and does not utilize any web fonts. The previous themes are still part of the release and can be easily configured. These are currently the future and business themes. You can check them out from the themes directory.
    + +
    root@f0:~ # cat <<END >>/etc/hosts
    +192.168.1.130 f0 f0.lan f0.lan.buetow.org
    +192.168.1.131 f1 f1.lan f1.lan.buetow.org
    +192.168.1.132 f2 f2.lan f2.lan.buetow.org
    +END
    +

    -

    More


    +You might wonder why bother using the hosts file? Why not use DNS properly? The reason is simplicity. I don't manage 100 hosts, only a few here and there. Having an OpenWRT router in my home, I could also configure everything there, but maybe I'll do that later. For now, keep it simple and straightforward.

    -Additionally, there were a couple of bug fixes, refactorings and overall improvements in the documentation made.
    +

    After install



    -E-Mail your comments to paul@nospam.buetow.org :-)
    +After that, I installed the following additional packages:

    -Other related posts are:
    + +
    root@f0:~ # pkg install helix doas zfs-periodic uptimed
    +

    -2024-10-02 Gemtexter 3.0.0 - Let's Gemtext again⁴ (You are currently reading this)
    -2023-07-21 Gemtexter 2.1.0 - Let's Gemtext again³
    -2023-03-25 Gemtexter 2.0.0 - Let's Gemtext again²
    -2022-08-27 Gemtexter 1.1.0 - Let's Gemtext again
    -2021-06-05 Gemtexter - One Bash script to rule it all
    -2021-04-24 Welcome to the Geminispace
    +

    Helix editor



    -Back to the main site
    -
    -
    -
    - - Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers - - gemini://foo.zone/gemfeed/2024-09-07-site-reliability-engineering-part-4.gmi - 2024-09-07T16:27:58+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Welcome to Part 4 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I’m here to share what SRE is all about in this blog series. - -
    -

    Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers


    +Helix? It's my favourite text editor. I have nothing against vi but like hx (Helix) more!

    -Published at 2024-09-07T16:27:58+03:00
    +https://helix-editor.com/

    -Welcome to Part 4 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I’m here to share what SRE is all about in this blog series.
    +

    doas



    -2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
    -2023-11-19 Site Reliability Engineering - Part 2: Operational Balance
    -2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture
    -2024-09-07 Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers (You are currently reading this)
    -2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning
    +doas? It's a pretty neat (and KISS) replacement for sudo. It has far fewer features than sudo, which is supposed to make it more secure. Its origin is the OpenBSD project. For doas, I accepted the default configuration (where users in the wheel group are allowed to run commands as root):

    -
    -       __..._   _...__
    -  _..-"      `Y`      "-._
    -  \ Once upon |           /
    -  \\  a time..|          //
    -  \\\         |         ///
    -   \\\ _..---.|.---.._ ///
    -jgs \\`_..---.Y.---.._`//	
    +
    +
    root@f0:~ # cp /usr/local/etc/doas.conf.sample /usr/local/etc/doas.conf
     

    -This time, I want to share some tips on how to onboard software engineers, QA engineers, and Site Reliability Engineers (SREs) to the primary on-call rotation. Traditionally, onboarding might take half a year (depending on the complexity of the infrastructure), but with a bit of strategy and structured sessions, we've managed to reduce it to just six weeks per person. Let's dive in!
    +https://man.openbsd.org/doas

    -

    Setting the Scene: Tier-1 On-Call Rotation


    +

    Periodic ZFS snapshotting



    -First things first, let's talk about Tier-1. This is where the magic begins. Tier-1 covers over 80% of the common on-call cases and is the perfect breeding ground for new on-call engineers to get their feet wet. It's designed to be manageable training ground.
    +zfs-periodic is a nifty tool for automatically creating ZFS snapshots. I decided to go with the following configuration here:

    -

    Why Tier-1?


    + +
    root@f0:~ # cat <<END >>/etc/periodic.conf
    +daily_zfs_snapshot_enable="YES"
    +daily_zfs_snapshot_pools="zroot"
    +daily_zfs_snapshot_keep="7"
    +weekly_zfs_snapshot_enable="YES"
    +weekly_zfs_snapshot_pools="zroot"
    +weekly_zfs_snapshot_keep="5"
    +monthly_zfs_snapshot_enable="YES"
    +monthly_zfs_snapshot_pools="zroot"
    +monthly_zfs_snapshot_keep="6"
    +END
    +

    -
      -
    • Easy to Understand: Every on-call engineer should be familiar with Tier-1 tasks.
    • -
    • Training Ground: This is where engineers start their on-call career. It's purposefully kept simple so that it's not overwhelming right off the bat.
    • -
    • Runbook/recipe driven: Every alert is attached to a comprehensive runbook, making it easy for every engineer to follow.
    • -

    -

    Onboarding Process: From 6 Months to 6 Weeks


    +https://github.com/ross/zfs-periodic

    -So how did we cut down the onboarding time so drastically? Here’s the breakdown of our process:
    +Note: We have not added zdata to the list of snapshot pools. Currently, this pool does not exist yet, but it will be created later in this blog series. zrepl, which we will use for replication, later in this blog series will manage the zdata snapshots.

    -Knowledge Transfer (KT) Sessions: We kicked things off with more than 10 KT sessions, complete with video recordings. These sessions are comprehensive and cover everything from the basics to some more advanced topics. The recorded sessions mean that new engineers can revisit them anytime they need a refresher.
    +

    Uptime tracking



    -Shadowing Sessions: Each new engineer undergoes two on-call week shadowing sessions. This hands-on experience is invaluable. They get to see real-time incident handling and resolution, gaining practical knowledge that's hard to get from just reading docs.
    +uptimed? I like to track my uptimes. This is how I configured the daemon:

    -Comprehensive Runbooks: We created 64 runbooks (by the time writing this probably more than 100) that are composable like Lego bricks. Each runbook covers a specific scenario and guides the engineer step-by-step to resolution. Pairing these with monitoring alerts linked directly to Confluence docs, and from there to the respective runbooks, ensures every alert can be navigated with ease (well, there are always exceptions to the rule...).
    + +
    root@f0:~ # cp /usr/local/mimecast/etc/uptimed.conf-dist \
    +  /usr/local/mimecast/etc/uptimed.conf 
    +root@f0:~ # hx /usr/local/mimecast/etc/uptimed.conf
    +
    +
    +In the Helix editor session, I changed LOG_MAXIMUM_ENTRIES to 0 to keep all uptime entries forever and not cut off at 50 (the default config). After that, I enabled and started uptimed:

    -Self-Sufficiency & Confidence Building: With all these resources at their fingertips, our on-call engineers become self-sufficient for most of the common issues they'll face (new starters can now handle around 80% of the most common issue after 6 weeks they had joined the company). This boosts their confidence and ensures they can handle Tier-1 incidents independently.
    + +
    root@f0:~ # service uptimed enable
    +root@f0:~ # service uptimed start
    +

    -Documentation and Feedback Loop: Continuous improvement is key. We regularly update our documentation based on feedback from the engineers. This makes our process even more robust and user-friendly.
    +To check the current uptime stats, I can now run uprecords:

    -

    It's All About the Tiers


    + +
     root@f0:~ # uprecords
    +     #               Uptime | System                                     Boot up
    +----------------------------+---------------------------------------------------
    +->   1     0 days, 00:07:34 | FreeBSD 14.1-RELEASE      Mon Dec  2 12:21:44 2024
    +----------------------------+---------------------------------------------------
    +NewRec     0 days, 00:07:33 | since                     Mon Dec  2 12:21:44 2024
    +    up     0 days, 00:07:34 | since                     Mon Dec  2 12:21:44 2024
    +  down     0 days, 00:00:00 | since                     Mon Dec  2 12:21:44 2024
    +   %up              100.000 | since                     Mon Dec  2 12:21:44 2024
    +

    -Let’s briefly touch on the Tier levels:
    +This is how I track the uptimes for all of my host:

    -
      -
    • Tier 1: Easy and foundational tasks. Perfect for getting new engineers started. This covers around 80% of all on-call cases we face. This is what we trained on.
    • -
    • Tier 2: Slightly more complex, requiring more background knowledge. We trained on some of the topics but not all.
    • -
    • Tier 3: Requires a good understanding of the platform/architecture. Likely needs KT sessions with domain experts.
    • -
    • Tier DE (Domain Expert): The heavy hitters. Domain experts are required for these tasks.
    • -

    -

    Growing into Higher Tiers


    +Unveiling guprecords.raku: Global Uptime Records with Raku-
    +https://github.com/rpodgorny/uptimed

    -From Tier-1, engineers naturally grow into Tier-2 and beyond. The structured training and gradual increase in complexity help ensure a smooth transition as they gain experience and confidence. The key here is that engineers stay curous and engaged in the on-call, so that they always keep learning.
    +

    Hardware check



    -

    Keeping Runbooks Up to Date


    +

    Ethernet



    -It is important that runbooks are not a "project to be finished"; runbooks have to be maintained and updated over time. Sections may change, new runbooks need to be added, and old ones can be deleted. So the acceptance criteria of an on-call shift would not just be reacting to alerts and incidents, but also reviewing and updating the current runbooks.
    +Works. Nothing eventful, really. It's a cheap Realtek chip, but it will do what it is supposed to do.

    -

    Conclusion


    + +
    paul@f0:~ % ifconfig re0
    +re0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    +        options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
    +        ether e8:ff:1e:d7:1c:ac
    +        inet 192.168.1.130 netmask 0xffffff00 broadcast 192.168.1.255
    +        inet6 fe80::eaff:1eff:fed7:1cac%re0 prefixlen 64 scopeid 0x1
    +        inet6 fd22:c702:acb7:0:eaff:1eff:fed7:1cac prefixlen 64 detached autoconf
    +        inet6 2a01:5a8:304:1d5c:eaff:1eff:fed7:1cac prefixlen 64 autoconf pltime 10800 vltime 14400
    +        media: Ethernet autoselect (1000baseT <full-duplex>)
    +        status: active
    +        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
    +

    -By structuring the onboarding process with KT sessions, shadowing, comprehensive runbooks, and a feedback loop, we've been able to fast-track the process from six months to just six weeks. This not only prepares our engineers for the on-call rotation quicker but also ensures they're confident and capable when handling incidents.
    +

    RAM



    -If you're looking to optimize your on-call onboarding process, these strategies could be your ticket to a more efficient and effective transition. Happy on-calling!
    +All there:

    -Continue with the fifth part of this series:
    + +
    paul@f0:~ % sysctl hw.physmem
    +hw.physmem: 16902905856
    +
    +

    -2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning
    +

    CPUs



    -E-Mail your comments to paul@nospam.buetow.org :-)
    +They work:

    -Back to the main site
    -
    -
    -
    - - Projects I financially support - - gemini://foo.zone/gemfeed/2024-09-07-projects-i-support.gmi - 2024-09-07T16:04:19+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - This is the list of projects and initiatives I support/sponsor. - -
    -

    Projects I financially support


    + +
    paul@f0:~ % sysctl dev.cpu | grep freq:
    +dev.cpu.3.freq: 705
    +dev.cpu.2.freq: 705
    +dev.cpu.1.freq: 604
    +dev.cpu.0.freq: 604
    +

    -Published at 2024-09-07T16:04:19+03:00
    +

    CPU throttling



    -This is the list of projects and initiatives I support/sponsor.
    +With powerd running, CPU freq is dowthrottled when the box isn't jam-packed. To stress it a bit, I run ubench to see the frequencies being unthrottled again:

    -
    -||====================================================================||
    -||//$\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\//$\\||
    -||(100)==================| FEDERAL SPONSOR NOTE |================(100)||
    -||\\$//        ~         '------========--------'                \\$//||
    -||<< /        /$\              // ____ \\                         \ >>||
    -||>>|  12    //L\\            // ///..) \\         L38036133B   12 |<<||
    -||<<|        \\ //           || <||  >\  ||                        |>>||
    -||>>|         \$/            ||  $$ --/  ||        One Hundred     |<<||
    -||<<|      L38036133B        *\\  |\_/  //* series                 |>>||
    -||>>|  12                     *\\/___\_//*   1989                  |<<||
    -||<<\      Open Source   ______/Franklin\________     Supporting   />>||
    -||//$\                 ~| SPONSORING AND FUNDING |~               /$\\||
    -||(100)===================  AWESOME OPEN SOURCE =================(100)||
    -||\\$//\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\$//||
    -||====================================================================||
    - 
    +
    +
    paul@f0:~ % doas pkg install ubench
    +paul@f0:~ % rehash # For tcsh to find the newly installed command
    +paul@f0:~ % ubench &
    +paul@f0:~ % sysctl dev.cpu | grep freq:
    +dev.cpu.3.freq: 2922
    +dev.cpu.2.freq: 2922
    +dev.cpu.1.freq: 2923
    +dev.cpu.0.freq: 2922
     

    -

    Table of Contents


    -
    -
    -

    Motivation


    +Idle, all three Beelinks plus the switch consumed 26.2W. But with ubench stressing all the CPUs, it went up to 38.8W.

    -Sponsoring free and open-source projects, even for personal use, is important to ensure the sustainability, security, and continuous improvement of the software. It supports developers who often maintain these projects without compensation, helping them provide updates, new features, and security patches. By contributing, you recognize their efforts, foster a culture of innovation, and benefit from perks like early access or support, all while ensuring the long-term viability of the tools you rely on.
    +Idle consumption.

    -Albeit I am not putting a lot of money into my sponsoring efforts, it still helps the open-source maintainers because the more little sponsors there are, the higher the total sum.
    +

    Wake-on-LAN Setup



    -

    OSnews


    +Updated Sun 11 Jan 10:30:00 EET 2026

    -I am a silver Patreon member of OSnews. I have been following this site since my student years. It's always been a great source of independent and slightly alternative IT news.
    +As mentioned in the hardware specs above, the Beelink S12 Pro supports Wake-on-LAN (WoL), which allows me to remotely power on the machines over the network. This is particularly useful since I don't need all three machines running 24/7, and I can save power by shutting them down when not needed and waking them up on demand.

    -https://osnews.com
    +The good news is that FreeBSD already has WoL support enabled by default on the Realtek network interface, as evidenced by the WOL_MAGIC option shown in the ifconfig re0 output above (line 215).

    -

    Cup o' Go Podcast


    +

    Setting up WoL on the laptop



    -I am a Patreon of the Cup o' Go Podcast. The podcast helps me stay updated with the Go community for around 15 minutes per week. I am not a full-time software developer, but my long-term ambition is to become better in Go every week by working on personal projects and tools for work.
    +To wake the Beelinks from my Fedora laptop (earth), I installed the wol package:

    -https://cupogo.dev
    + +
    [paul@earth]~% sudo dnf install -y wol
    +

    -

    Codeberg


    +Next, I created a simple script (~/bin/wol-f3s) to wake and shutdown the machines:

    -Codeberg e.V. is a nonprofit organization that provides online resources for software development and collaboration. I am a user and a supporting member, paying an annual membership of €24. I didn't have to pay that membership fee, as Codeberg offers all the services I use for free.
    + +
    #!/bin/bash
    +# Wake-on-LAN and shutdown script for f3s cluster (f0, f1, f2)
    +
    +# MAC addresses
    +F0_MAC="e8:ff:1e:d7:1c:ac"  # f0 (192.168.1.130)
    +F1_MAC="e8:ff:1e:d7:1e:44"  # f1 (192.168.1.131)
    +F2_MAC="e8:ff:1e:d7:1c:a0"  # f2 (192.168.1.132)
    +
    +# IP addresses
    +F0_IP="192.168.1.130"
    +F1_IP="192.168.1.131"
    +F2_IP="192.168.1.132"
    +
    +# SSH user
    +SSH_USER="paul"
    +
    +# Broadcast address for your LAN
    +BROADCAST="192.168.1.255"
    +
    +wake() {
    +    local name=$1
    +    local mac=$2
    +    echo "Sending WoL packet to $name ($mac)..."
    +    wol -i "$BROADCAST" "$mac"
    +}
    +
    +shutdown_host() {
    +    local name=$1
    +    local ip=$2
    +    echo "Shutting down $name ($ip)..."
    +    ssh -o ConnectTimeout=5 "$SSH_USER@$ip" "doas poweroff" 2>/dev/null && \
    +        echo "  ✓ Shutdown command sent to $name" || \
    +        echo "  ✗ Failed to reach $name (already down?)"
    +}
    +
    +ACTION="${1:-all}"
    +
    +case "$ACTION" in
    +    f0) wake "f0" "$F0_MAC" ;;
    +    f1) wake "f1" "$F1_MAC" ;;
    +    f2) wake "f2" "$F2_MAC" ;;
    +    all|"")
    +        wake "f0" "$F0_MAC"
    +        wake "f1" "$F1_MAC"
    +        wake "f2" "$F2_MAC"
    +        ;;
    +    shutdown|poweroff|down)
    +        shutdown_host "f0" "$F0_IP"
    +        shutdown_host "f1" "$F1_IP"
    +        shutdown_host "f2" "$F2_IP"
    +        echo ""
    +        echo "✓ Shutdown commands sent to all machines."
    +        exit 0
    +        ;;
    +    *)
    +        echo "Usage: $0 [f0|f1|f2|all|shutdown]"
    +        exit 1
    +        ;;
    +esac
    +
    +echo ""
    +echo "✓ WoL packets sent. Machines should boot in a few seconds."
    +

    -https://codeberg.org
    -https://codeberg.org/snonux - My Codeberg page
    +After making the script executable with chmod +x ~/bin/wol-f3s, I can now control the machines with simple commands:

    -

    GrapheneOS


    + +
    [paul@earth]~% wol-f3s          # Wake all three
    +[paul@earth]~% wol-f3s f0       # Wake only f0
    +[paul@earth]~% wol-f3s shutdown # Shutdown all three via SSH
    +

    -GrapheneOS is an open-source project that improves Android's privacy and security with sandboxing, exploit mitigations, and a permission model. It does not include Google apps or services but offers a sandboxed Google Play compatibility layer and its own apps and services.
    +

    Testing WoL and Shutdown



    -I've made a one-off €100 donation because I really like this, and I run GrapheneOS on my personal Phone as my main daily driver.
    +To test the setup, I shutdown all three machines using the script's shutdown function:

    -https://grapheneos.org/
    -Why GrapheneOS Rox
    + +
    [paul@earth]~% wol-f3s shutdown
    +Shutting down f0 (192.168.1.130)...
    +  ✓ Shutdown command sent to f0
    +Shutting down f1 (192.168.1.131)...
    +  ✓ Shutdown command sent to f1
    +Shutting down f2 (192.168.1.132)...
    +  ✓ Shutdown command sent to f2
    +
    +✓ Shutdown commands sent to all machines.
    +

    -

    AnkiDroid


    +After waiting for them to fully power down (about 1 minute), I sent the WoL magic packets:

    -AnkiDroid is an app that lets you learn flashcards efficiently with spaced repetition. It is compatible with Anki software and supports various flashcard content, syncing, statistics, and more.
    + +
    [paul@earth]~% wol-f3s
    +Sending WoL packet to f0 (e8:ff:1e:d7:1c:ac)...
    +Waking up e8:ff:1e:d7:1c:ac...
    +Sending WoL packet to f1 (e8:ff:1e:d7:1e:44)...
    +Waking up e8:ff:1e:d7:1e:44...
    +Sending WoL packet to f2 (e8:ff:1e:d7:1c:a0)...
    +Waking up e8:ff:1e:d7:1c:a0...
    +
    +✓ WoL packets sent. Machines should boot in a few seconds.
    +

    -I've been learning vocabulary with this free app, and it is, in my opinion, the best flashcard app I know. I've made a 20$ one-off donation to this project.
    +Within 30-50 seconds, all three machines successfully booted up and became accessible via SSH!

    -https://opencollective.com/ankidroid
    +This also works fine over WiFi, by the way — as long as the laptop and the Beelinks are on the same local network, the router bridges everything. And wol-f3s shutdown does the reverse (SSH + doas poweroff), so I can spin the whole cluster up and down pretty quickly.

    -

    OpenBSD through OpenBSD.Amsterdam


    +

    BIOS Configuration



    - The OpenBSD project produces a FREE, multi-platform 4.4BSD-based UNIX-like operating system. Our efforts emphasize portability, standardization, correctness, proactive security and integrated cryptography. As an example of the effect OpenBSD has, the popular OpenSSH software comes from OpenBSD. OpenBSD is freely available from their download sites.
    +For WoL to work reliably, make sure to check the BIOS settings on each Beelink:

    -I implicitly support the OpenBSD project through a VM I have rented at OpenBSD Amsterdam. They donate €10 per VM and €15 per VM for every renewal to the OpenBSD Foundation, with dedicated servers running vmm(4)/vmd(8) to host opinionated VMs.
    +
      +
    • Enable "Wake on LAN" (usually under Power Management)
    • +
    • Disable "ERP Support" or "ErP Ready" (this can prevent WoL from working)
    • +
    • Enable "Power on by PCI-E" or "Wake on PCI-E"
    • +

    +The exact menu names vary, but these settings are typically found in the Power Management or Advanced sections of the BIOS.

    -https://www.OpenBSD.org
    -https://OpenBSD.Amsterdam
    +

    Conclusion



    -

    ProtonMail


    +Honestly, the Beelink S12 Pro with the N100 is kind of perfect for this — tiny, cheap, sips power, and runs both Linux and FreeBSD without drama. I'm pretty happy with it.

    -I am not directly funding this project, but I am a very happy paying customer, and I am listing it here as an alternative to big tech if you don't want to run your own mail infrastructure. I am listing ProtonMail here as it is a non-profit organization, and I want to emphasize the importance of considering alternatives to big tech.
    +Beelinks stacked

    -https://proton.me/
    +To ease cable management, I need to get shorter ethernet cables. I will place the tower on my shelf, where most of the cables will be hidden (together with a UPS, which will also be added to the setup).

    -

    Libro.fm


    +Read the next post of this series:

    -This is the alternative to Audible if you are into audiobooks (like I am). For every book or every month of membership, I am also supporting a local bookstore I selected. Their catalog is not as large as Audible's, but it's still pretty decent.
    +f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts

    -Libro.fm began as a conversation among friends at Third Place Books, a local bookstore in Seattle, Washington, about the growing popularity of audiobooks and the lack of a way for readers to purchase them from independent bookstores. Flash forward, and Libro.fm was founded in 2014.
    +Other *BSD-related posts:

    -https://libro.fm
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation (You are currently reading this)
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD

    -E-mail your comments to paul@nospam.buetow.org :-)
    +E-Mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - Typing `127.1` words per minute (`>100wpm average`) - - gemini://foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.gmi - 2024-08-05T17:39:30+03:00 + f3s: Kubernetes with FreeBSD - Part 1: Setting the stage + + gemini://foo.zone/gemfeed/2024-11-17-f3s-kubernetes-with-freebsd-part-1.gmi + 2024-11-16T23:20:14+02:00 Paul Buetow aka snonux paul@dev.buetow.org - After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?... + This is the first blog post about my f3s series for my self-hosting demands in my home lab. f3s? The 'f' stands for FreeBSD, and the '3s' stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.
    -

    Typing 127.1 words per minute (>100wpm average)


    -
    -Published at 2024-08-05T17:39:30+03:00; Updated at 2025-02-22
    -
    -
    -,---,---,---,---,---,---,---,---,---,---,---,---,---,-------,
    -|1/2| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | + | ' | <-    |
    -|---'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-----|
    -| ->| | Q | W | E | R | T | Y | U | I | O | P | ] | ^ |     |
    -|-----',--',--',--',--',--',--',--',--',--',--',--',--'|    |
    -| Caps | A | S | D | F | G | H | J | K | L | \ | [ | * |    |
    -|----,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'---'----|
    -|    | < | Z | X | C | V | B | N | M | , | . | - |          |
    -|----'-,-',--'--,'---'---'---'---'---'---'-,-'---',--,------|
    -| ctrl |  | alt |                          |altgr |  | ctrl |
    -'------'  '-----'--------------------------'------'  '------'
    -      Nieminen Mika	
    -
    -
    -

    Table of Contents


    -
    -
    -

    Introduction


    -
    -After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?...
    -
    -
      -
    • Concave: Some fingers are longer than others. A concave keyboard makes it so that the keycaps meant to be pressed by the longer fingers are further down (e.g., left middle finger for e on a Qwerty layout), and keycaps meant to be pressed by shorter fingers are further up (e.g., right pinky finger for the letter p).
    • -
    • Ortholinear: The keys are arranged in a straight vertical line, unlike most conventional keyboards. The conventional keyboards still resemble the old typewriters, where the placement of the keys was optimized so that the typewriter would not jam. There is no such requirement anymore.
    • -
    • Split: The keyboard is split into two halves (left and right), allowing one to place either hand where it is most ergonomic.
    • -

    -After discovering ThePrimagen (I found him long ago, but I never bothered buying the same keyboard he is on) on YouTube and reading/watching a couple of reviews, I thought that as a computer professional, the equipment could be expensive anyway (laptop, adjustable desk, comfortable chair), so why not invest a bit more into the keyboard? I purchased myself the Kinesis Advantage360 Professional keyboard.
    -
    -

    Kinesis review


    -
    -For an in-depth review, have a look at this great article:
    -
    -Review of the Kinesis Advantage360 Professional keyboard
    -
    -

    Top build quality


    -
    -Overall, the keyboard feels excellent quality and robust. It has got some weight to it. Because of that, it is not ideally suited for travel, though. But I have a different keyboard to solve this (see later in this post). Overall, I love how it is built and how it feels.
    -
    -Kinesis Adv.360 Pro at home
    -
    -

    Bluetooth connectivity


    -
    -Despite encountering concerns about Bluetooth connectivity issues with the Kinesis keyboard during my research, I purchased one anyway as I intended to use it only via USB. However, I discovered that the firmware updates available afterwards had addressed these reported Bluetooth issues, and as a result, I did not experience any difficulties with the Bluetooth functionality. This positive outcome allowed me to enjoy using the keyboard also wirelessly.
    -
    -

    Gateron Brown key switches


    -
    -Many voices on the internet seem to dislike the Gateron Brown switches, the only official choice for non-clicky tactile switches in the Kinesis, so I was also a bit concerned. I almost went with Cherry MX Browns for my Kinesis (a custom build from a 3rd party provider that is partnershipping with Kinesis). Still, I decided on Gateron Browns to try different switches than the Cherry MX Browns I already have on my ZSA Moonlander keyboard (another ortho-linear split keyboard, but without a concave keycap layout).
    -
    -At first, I was disappointed by the Gaterons, as they initially felt a bit meshy compared to the Cherries. Still, over the weeks I grew to prefer them because of their smoothness. Over time, the tactile bumps also became more noticeable (as my perception of them improved). Because of their less pronounced tactile feedback, the Gaterons are less tiring for long typing sessions and better suited for a relaxed typing experience.
    -
    -So, the Cherry MX feel sharper but are more tiring in the long run, and the Gaterons are easier to write on and the tactile Feedback is slightly less pronounced.
    -
    -

    Keycaps


    -
    -If you ever purchase a Kinesis keyboard, go with the PCB keycaps. They upgrade the typing experience a lot. The only thing you will lose is that the backlighting won't shine through them. But that is a reasonable tradeoff. When do I need backlighting? I am supposed to look at the screen and not the keyboard while typing.
    -
    -I went with the blank keycaps, by the way.
    +

    f3s: Kubernetes with FreeBSD - Part 1: Setting the stage



    -Kinesis Adv.360 Pro at home
    +Published at 2024-11-16T23:20:14+02:00

    -

    Keymap editor


    +This is the first blog post about my f3s series for my self-hosting demands in my home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution I will use on FreeBSD-based physical machines.

    -There is no official keymap editor. You have to edit a configuration file manually, build the firmware from scratch, and upload the firmware with the new keymap to both keyboard halves. The Professional version of his keyboard, by the way, runs on the ZMK open-source firmware.
    +I will post a new entry every month or so (there are too many other side projects for more frequent updates—I bet you can understand).

    -Many users find the need for an easy-to-use keymap editor an issue. But this is the Pro model. You can also go with the non-Pro, which runs on non-open-source firmware and has no Bluetooth (it must be operated entirely on USB).
    +These are all the posts so far:

    -There is a 3rd party solution which is supposed to configure the keymap for the Professional model as bliss, but I have never used it. As a part-time programmer and full-time Site Reliability Engineer, I am okay configuring the keymap in my text editor and building it in a local docker container. This is one of the standard ways of doing it here. You could also use a GitHub pipeline for the firmware build, but I prefer building it locally on my machine. This all seems natural to me, but this may be an issue for "the average Joe" user.
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage (You are currently reading this)
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

    -

    First steps


    +f3s logo

    -I didn't measure the usual words per minute (wpm) on my previous keyboard, the ZSA Moonlander, but I guess that it was around 40-50wpm. Once the Kinesis arrived, I started practising. The experience was quite different due to the concave keycaps, so I barely managed 10wpm on the first day.
    +ChatGPT generated logo..

    -I quickly noticed that I could not continue using the freestyle 6-finger typing system I was used to on my Moonlander or any previous keyboards I worked with. I learned ten-finger touch typing from scratch to be more efficient with the Kinesis keyboard. The keyboard forces you to embrace touch typing.
    +Let's begin...

    -Sometimes, there were brain farts, and I couldn't type at all. The trick was not to freak out about it, but to move on. If your average goes down a bit for a day, it doesn't matter; the long-term trend over several days and weeks matters, not the one-off wpm high score.
    +

    Table of Contents



    -Although my wrist pain seemed to go away aftre the first week of using the Kinesis, my fingers became tired of adjusting to the new way of typing. My hands were stiff, as if I had been training for the Olympics. Only after three weeks did I start to feel comfortable with it. If it weren't for the comments I read online, I would have sent it back after week 2.
    +
    +

    Why this setup?



    -I also had a problem with the left pinky finger, where I could not comfortably reach the p key. This involved moving the whole hand. An easy fix was to swap p with ; on the keyboard layout.
    +My previous setup was great for learning Terraform and AWS, but it is too expensive. Costs are under control there, but only because I am shutting down all containers after use (so they are offline ninety percent of the time and still cost around $20 monthly). With the new setup, I could run all containers 24/7 at home, which would still be cheaper in terms of electricity consumption. I have a 400 MBit/s uplink (I could have more if I wanted, but it is more than plenty for my use case already).

    -

    Considering alternate layouts


    +From babylon5.buetow.org to .cloud

    -As I was going to learn 10-finger touch typing from scratch, I also played with the thought of switching from the Qwerty to the Dvorak or Colemak keymap, but after reading some comments on the internet, I decided against it:
    +Migrating off all my containers from AWS ECS means I need a reliable and scalable environment to host my workloads. I wanted something:

      -
    • These layouts (Dvorak and Colemak) will minimize the finger travel for the most commonly used English words, but they necessarily don't give you a better wpm score.
    • -
    • One comment on Redit also mentioned that getting stiffer fingers with these layouts is more likely than with Qwerty, as in Qwerty, he had to stretch out his fingers more often, which helps here.
    • -
    • There are also many applications and websites with keyboard shortcuts and are Qwerty-optimized.
    • -
    • You won't be able to use someone else's computer as there will be likely Qwerty. Some report that after using an alternative layout for a while, they forget how to use Qwerty.
    • +
    • To self-host all my open-source apps (Docker containers).
    • +
    • Fully under my control (goodbye cloud vendor lock-in).
    • +
    • Secure and redundant.
    • +
    • Cost-efficient (after the initial hardware investment).
    • +
    • Something I can poke around with and also pick up new skills.

    -

    Training how to type


    -
    -

    Tools


    -
    -One of the most influential tools in my touch typing journey has been keybr.com. This site/app helped me learn 10-finger touch typing, and I practice daily for 30 minutes (in the first two weeks, up to an hour every day). The key is persistence and focus on technique rather than speed; the latter naturally improves with regular practice. Precision matters, too, so I always correct my errors using the backspace key.
    +

    The infrastructure



    -https://keybr.com
    +This is still in progress, and I need to own the hardware. But in this first part of the blog series, I will outline what I intend to do.

    -I also used a command-line tool called tt, which is written in Go. It has a feature that I found very helpful: the ability to practice typing by piping custom text into it. Additionally, I appreciated its customization options, such as choosing a colour theme and specifying how statistics are displayed.
    +Diagram

    -https://github.com/lemnos/tt
    +

    Physical FreeBSD nodes and Linux VMs



    -I wrote myself a small Ruby script that would randomly select a paragraph from one of my eBooks or book notes and pipe it to tt. This helped me remember some of the books I read and also practice touch typing.
    +The setup starts with three physical FreeBSD nodes deployed into my home LAN. On these, I'm going to run Rocky Linux virtual machines with bhyve. Why Linux VMs in FreeBSD and not Linux directly? I want to leverage the great ZFS integration in FreeBSD (among other features), and I have been using FreeBSD for a while in my home lab. And with bhyve, there is a very performant hypervisor available which makes the Linux VMs de-facto run at native speed (another use case of mine would be maybe running a Windows bhyve VM on one of the nodes - but out of scope for this blog series).

    -

    My keybr.com statistics


    +https://www.freebsd.org/
    +https://wiki.freebsd.org/bhyve

    -Overall, I trained for around 4 months in more than 5,000 sessions. My top speed in a session was 127.1wpm (up from barely 10wpm at the beginning).
    +I selected Rocky Linux because it comes with long-term support (I don't want to upgrade the VMs every 6 months). Rocky Linux 9 will reach its end of life in 2032, which is plenty of time! Of course, there will be minor upgrades, but nothing will significantly break my setup.

    -All time stats
    +https://rockylinux.org/
    +https://wiki.rockylinux.org/rocky/version/

    -My overall average speed over those 5,000 sessions was 80wpm. The average speed over the last week was over 100wpm. The green line represents the wpm average (increasing trend), the purple line represents the number of keys in the practices (not much movement there, as all keys are unlocked), and the red line represents the average typing accuracy.
    +Furthermore, I am already using "RHEL-family" related distros at work and Fedora on my main personal laptop. Rocky Linux belongs to the same type of Linux distribution family, so I already feel at home here. I also used Rocky 9 before I switched to AWS ECS. Now, I am switching back in one sense or another ;-)

    -Typing speed over leson
    +

    Kubernetes with k3s



    -Around the middle, you see a break-in of the wpm average value. This was where I swapped the p and ; keys, but after some retraining, I came back to the previous level and beyond.
    +These Linux VMs form a three-node k3s Kubernetes cluster, where my containers will reside moving forward. The 3-node k3s cluster will be highly available (in etcd mode), and all apps will probably be deployed with Helm. Prometheus will also be running in k3s, collecting time-series metrics and handling monitoring. Additionally, a private Docker registry will be deployed into the k3s cluster, where I will store some of my self-created Docker images. k3s is the perfect distribution of Kubernetes for homelabbers due to its simplicity and the inclusion of the most useful features out of the box!

    -

    Tips and tricks


    +https://k3s.io/

    -These are some tips and tricks I learned along the way to improve my typing speed:
    +

    HA volumes for k3s with HAST/ZFS and NFS



    -

    Relax


    +Persistent storage for the k3s cluster will be handled by highly available (HA) NFS shares backed by ZFS on the FreeBSD hosts.

    -It's easy to get cramped when trying to hit this new wpm mark, but this is just holding you back. Relax and type at a natural pace. Now I also understand why my Katate Sensei back in London kept screaming "RELAAAX" at me during practice.... It didn't help much back then, though, as it is difficult to relax while someone screams at you!
    +On two of the three physical FreeBSD nodes, I will add a second SSD drive to each and dedicate it to a zhast ZFS pool. With HAST (FreeBSD's solution for highly available storage), this pool will be replicated at the byte level to a standby node.

    -

    Focus on accuracy first


    +A virtual IP (VIP) will point to the master node. When the master node goes down, the VIP will failover to the standby node, where the ZFS pool will be mounted. An NFS server will listen to both nodes. k3s will use the VIP to access the NFS shares.

    -This goes with the previous point. Instead of trying to speed through sessions as quickly as possible, slow down and try to type the words correctly—so don't rush it. If you aren't fast yet, the reason is that your brain hasn't trained enough. It will come over time, and you will be faster.
    +FreeBSD Wiki: Highly Available Storage

    -

    Chording


    +You can think of DRBD being the Linux equivalent to FreeBSD's HAST.

    -A trick to getting faster is to type by word and pause between each word so you learn the words by chords. From 80wpm and beyond, this makes a real difference.
    +

    OpenBSD/relayd to the rescue for external connectivity



    -

    Punctuation and Capitalization


    +All apps should be reachable through the internet (e.g., from my phone or computer when travelling). For external connectivity and TLS management, I've got two OpenBSD VMs (one hosted by OpenBSD Amsterdam and another hosted by Hetzner) handling public-facing services like DNS, relaying traffic, and automating Let's Encrypt certificates.

    -I included 10% punctuation and 20% capital letters in my keybr.com practice sessions to simulate real typing conditions, which improved my overall working efficiency. I guess I would have gone to 120wpm in average if I didn't include this options...
    +All of this (every Linux VM to every OpenBSD box) will be connected via WireGuard tunnels, keeping everything private and secure. There will be 6 WireGuard tunnels (3 k3s nodes times two OpenBSD VMs).

    -

    Reverse shifting


    +https://en.wikipedia.org/wiki/WireGuard

    -Reverse shifting aka left-right shifting is to...
    +So, when I want to access a service running in k3s, I will hit an external DNS endpoint (with the authoritative DNS servers being the OpenBSD boxes). The DNS will resolve to the master OpenBSD VM (see my KISS highly-available with OpenBSD blog post), and from there, the relayd process (with a Let's Encrypt certificate—see my Let's Encrypt with OpenBSD and Rex blog post) will accept the TCP connection and forward it through the WireGuard tunnel to a reachable node port of one of the k3s nodes, thus serving the traffic.

    -
      -
    • ...use the left shift key for letters on the right keyboard side.
    • -
    • ...use the right shift key for letters on the left keyboard side.
    • -

    -This makes using the shift key a blaze.
    +KISS high-availability with OpenBSD
    +Let's Encrypt with OpenBSD and Rex

    -

    Enter the flow state


    +The OpenBSD setup described here already exists and is ready to use. The only thing that does not yet exist is the configuration of relayd to forward requests to k3s through the WireGuard tunnel(s).

    -Listening to music helps me enter a flow state during practice sessions, which makes typing training a bit addictive (which is good, or isn't it?).
    +

    Data integrity



    -

    Repeat every word


    +

    Periodic backups



    -There's a setting on keybr.com that makes it so that every word is always repeated, having you type every word twice in a row. I liked this feature very much, and I think it also helped to improve my practice.
    +Let's face it, backups are non-negotiable.

    -

    Don't use the same finger for two consecutive keystrokes


    +On the HAST master node, incremental and encrypted ZFS snapshots are created daily and automatically backed up to AWS S3 Glacier Deep Archive via CRON. I have a bunch of scripts already available, which I currently use for a similar purpose on my FreeBSD Home NAS server (an old ThinkPad T440 with an external USB drive enclosure, which I will eventually retire when the HAST setup is ready). I will copy them and slightly modify them to fit the purpose.

    -Apparently, if you want to type fast, avoid using the same finger for two consecutive keystrokes. This means you don't always need to use the same finger for the same keys.
    -However, there are no hard and fast rules. Thus, everyone develops their system for typing word combinations. An exception would be if you are typing the very same letter in a row (e.g., t in letter)—here, you are using the same finger for both ts.
    +There's also zfstools in the ports, which helps set up an automatic snapshot regime:

    -

    Warm-up


    +https://www.freshports.org/sysutils/zfstools

    -You can't reach your average typing speed first ting the morning. It would help if you warmed up before the exercise or practice later during the day. Also, some days are good, others not so, e.g., after a bad night's sleep. What matters is the mid- and long-term trend, not the fluctuations here, though.
    +The backup scripts also perform some zpool scrubbing now and then. A scrub once in a while keeps the trouble away.

    -

    Travel keyboard


    +

    Power protection



    -As mentioned, the Kinesis is a great keyboard, but it is not meant for travel.
    +Power outages are regularly in my area, so a UPS keeps the infrastructure running during short outages and protects the hardware. I'm still trying to decide which hardware to get, and I still need one, as my previous NAS is simply an older laptop that already has a battery for power outages. However, there are plenty of options to choose from. My main criterion is that the UPS should be silent, as the whole setup will be installed in an upper shelf unit in my daughter's room. ;-)

    -I guess keyboards will always be my expensive hobby, so I also purchased another ergonomic, ortho-linear, concave split keyboard, the Glove80 (with the Red Pro low-profile switches). This keyboard is much lighter and, in my opinion, much better suited for travel than the Kinesis. It also comes with a great travel case.
    +

    Monitoring: Keeping an eye on everything



    -Here is a photo of me using it with my Surface Go 2 (it runs Linux, by the way) while waiting for the baggage drop at the airport:
    +I want to know when stuff breaks (ideally before it breaks), so monitoring is a big part of the plan.

    -Traveling with the Glove80 using my Surface Go 2
    +

    Prometheus and Grafana



    -For everyday work, I prefer the tactile Browns on the Kinesis over the Red Pro I have on the Glove80 (normal profile vs. low profile). The Kinesis feels much more premium, whereas the Glove80 is much lighter and easier to store away in a rucksack (the official travel case is a bit bulky, so I wrapped it simply in bubble plastic).
    +Inside the k3s cluster, Prometheus will be deployed to handle metrics collection. It will be configured to scrape data from my Kubernetes workloads, nodes, and any services I monitor. Prometheus also integrates with Alertmanager to generate alerts based on predefined thresholds or conditions.

    -The F-key row is odd at the Glove80. I would have preferred more keys on the sides like the Kinesis, and I use them for [] {} (), which is pretty handy there. However, I like the thumb cluster of the Glove80 more than the one on the Kinesis.
    +https://prometheus.io

    -The good thing is that I can switch between both keyboards instantly without retraining my typing memories. I've configured (as much as possible) the same keymaps on both my Kinesis and Glove80, making it easy to switch between them at any occasion.
    +For visualization, Grafana will be deployed alongside Prometheus. I mostly just want dashboards for CPU, memory, and pod health — the usual stuff. Makes it way easier to figure out what's going wrong when something inevitably does.

    -Interested in the Glove80? I suggest also reading this review:
    +https://grafana.com

    -Review of the Glove80 keyboard
    +

    Gogios: My custom alerting system



    -

    Upcoming custom Kinesis build


    +Alerts generated by Prometheus are forwarded to Alertmanager, which I will configure to work with Gogios, a lightweight monitoring and alerting system I wrote myself. Gogios runs on one of my OpenBSD VMs. At regular intervals, Gogios scrapes the alerts generated in the k3s cluster and notifies me via Email.

    -As I mentioned, keyboards will remain an expensive hobby of mine. I don't regret anything here, though. After all, I use keyboards at my day job. I've ordered a Kinesis custom build with the Gateron Kangaroo switches, and I'm excited to see how that compares to my current setup. I'm still deciding whether to keep my Gateron Brown-equipped Kinesis as a secondary keyboard or possibly leave it at my in-laws for use when visiting or to sell it.
    +KISS server monitoring with Gogios

    -Update 2025-02-22: I've received my custom Kinesis Adv. 360 build with the Gateron Baby Kangaroo key switches. I am absolutely in love! I will keep my Gateron Brown version around, though.
    +Ironically, I implemented Gogios to avoid using more complex alerting systems like Prometheus, but here we go—it integrates well now.

    Conclusion



    -When I traveled with the Glove80 for work to the London office, a colleague stared at my keyboard and made jokes that it might be broken (split into two halves). But other than that...
    +This setup may be just the beginning. Some ideas I'm thinking about for the future:

    -Ten-finger touch typing has improved my efficiency and has become a rewarding discipline. Whether it's the keyboards I use, the tools I practice with, or the techniques I've adopted, each step has been a learning experience. I hope sharing my journey provides valuable insights and inspiration for anyone looking to improve their touch typing skills.
    +
      +
    • Adding more FreeBSD nodes (in different physical locations, maybe at my wider family's places? WireGuard would make it possible!) for better redundancy. (HA storage then might be trickier)
    • +
    • Deploying more Docker apps (data-intensive ones, like a picture gallery, my entire audiobook catalogue, or even a music server) to k3s.
    • +

    +For now, though, I'm focused on completing the migration from AWS ECS and getting all my Docker containers running smoothly in k3s.

    -I also accidentally started using a 10-finger-like system (maybe still 6 fingers, but better than before) on my regular laptop keyboard. I could be more efficient on the laptop keyboard. The form is different there (not ortholinear, not concave keycaps, etc.), but my typing has improved there too (even if it is only by a little bit).
    +Anyway, stay tuned — in part 2 I'll probably get into the hardware and OS setup.

    -I don't want to return to a non-concave keyboard as my default. I will use other keyboards still once in a while but only for short periods or when I have to (e.g. travelling with my Laptop and when there is no space to put an external keyboard)
    +Read the next post of this series:

    -Learning to touch type has been an eye-opening experience for me, not just for work but also for personal projects. Now, writing documentation is so much fun; who could believe that? Furthermore, working with Slack (communicating with colleagues) is more fun now as well.
    +f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +
    +Other *BSD-related posts:
    +
    +2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability
    +2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments
    +2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage
    +2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network
    +2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs
    +2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts
    +2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation
    +2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage (You are currently reading this)
    +2024-04-01 KISS high-availability with OpenBSD
    +2024-01-13 One reason why I love OpenBSD
    +2022-10-30 Installing DTail on OpenBSD
    +2022-07-30 Let's Encrypt with OpenBSD and Rex
    +2016-04-09 Jails and ZFS with Puppet on FreeBSD

    E-Mail your comments to paul@nospam.buetow.org :-)

    @@ -18259,22 +18527,22 @@ jgs \\`_..---.Y.---.._`// - 'The Stoic Challenge' book notes - - gemini://foo.zone/gemfeed/2024-07-07-the-stoic-challenge-book-notes.gmi - 2024-07-07T12:46:55+03:00 + 'Staff Engineer' book notes + + gemini://foo.zone/gemfeed/2024-10-24-staff-engineer-book-notes.gmi + 2024-10-24T20:57:44+03:00 Paul Buetow aka snonux paul@dev.buetow.org - These are my personal takeaways after reading 'The Stoic Challenge: A Philosopher's Guide to Becoming Tougher, Calmer, and More Resilient' by William B. Irvine. + These are my personal takeaways after reading 'Staff Engineer' by Will Larson. Note that the book contains much more knowledge wisdom and that these notes only contain points I personally found worth writing down. This is mainly for my own use, but you might find it helpful too.
    -

    "The Stoic Challenge" book notes


    +

    "Staff Engineer" book notes



    -Published at 2024-07-07T12:46:55+03:00
    +Published at 2024-10-24T20:57:44+03:00

    -These are my personal takeaways after reading "The Stoic Challenge: A Philosopher's Guide to Becoming Tougher, Calmer, and More Resilient" by William B. Irvine.
    +These are my personal takeaways after reading "Staff Engineer" by Will Larson. Note that the book contains much more knowledge wisdom and that these notes only contain points I personally found worth writing down. This is mainly for my own use, but you might find it helpful too.

              ,..........   ..........,
    @@ -18291,1234 +18559,786 @@ jgs \\`_..---.Y.---.._`//
     

    Table of Contents




    -

    God sets you up for a challenge


    -
    -Gods set you up for a challenge to see how resilient you are. Is getting angry worth the price? If you stay calm then you can find the optimal workaround for the obstacle. Stay calm even with big setbacks. Practice minimalism of negative emotions.
    -
    -Put a positive spin on everything. What should you do if someone wrong you? Don't get angry, there is no point in that, it just makes you suffer. Do the best what you got now and keep calm and carry on. A resilient person will refuse to play the role of a victim. You can develop the setback response skills. Turn a setback. e.g. a handycap, into a personal triumph.
    -
    -It is not the things done to you or happen to you what matters but how you take the things and react to these things.
    -
    -Don't row against the other boats but against your own lazy bill. It doesn't matter if you are first or last, as long as you defeat your lazy self.
    -
    -Stoics are thankful that they are mortal. As then you can get reminded of how great it is to be alive at all. In dying we are more alive we have ever been as every thing you do could be the last time you do it. Rather than fighting your death you should embrace it if there are no workarounds. Embrace a good death.
    -
    -

    Negative visualization


    -
    -It is easy what we have to take for granted.
    -
    -
      -
    • Imagine the negative and then think that things are actually much better than they seem to be.
    • -
    • Close your eyes and imagine you are color blind for a minute, then open the eyes again and see all the colours. You will be grateful for being able to see the colours.
    • -
    • Now close your eyes for a minute and imagine you would be blind, so that you will never be able to experience the world again and let it sink in. When you open your eyes again you will feel a lot of gratefulness.
    • -
    • Last time meditation. Lets you appreciate the life as it is now. Life gets vitalised again.
    • -

    -

    Oh, nice trick, you stoic "god"! ;-)


    -
    -Take setbacks as a challenge. Also take it with some humor.
    -
    -
    -What would the stoic god's do next? This is just a test strategy by them. Don't be frustrated at all but be astonished of what comes next. Thank the stoic gods of testing you. This is comfort zone extension of the stoics aka toughness Training.
    -
    -E-Mail your comments to paul@nospam.buetow.org :-)
    -
    -Other book notes of mine are:
    -
    -2025-11-02 'The Courage To Be Disliked' book notes
    -2025-06-07 'A Monk's Guide to Happiness' book notes
    -2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    -2024-10-24 'Staff Engineer' book notes
    -2024-07-07 'The Stoic Challenge' book notes (You are currently reading this)
    -2024-05-01 'Slow Productivity' book notes
    -2023-11-11 'Mind Management' book notes
    -2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    -2023-05-06 'The Obstacle is the Way' book notes
    -2023-04-01 'Never split the difference' book notes
    -2023-03-16 'The Pragmatic Programmer' book notes
    -
    -Back to the main site
    -
    -
    -
    - - Random Weird Things - Part Ⅰ - - gemini://foo.zone/gemfeed/2024-07-05-random-weird-things.gmi - 2024-07-05T10:59:59+03:00 - - Paul Buetow aka snonux - paul@dev.buetow.org - - Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. As a start, here are ten of them. - -
    -

    Random Weird Things - Part Ⅰ


    -
    -Published at 2024-07-05T10:59:59+03:00; Updated at 2025-02-08
    -
    -Every so often, I come across random, weird, and unexpected things on the internet. I thought it would be neat to share them here from time to time. As a start, here are ten of them.
    -
    -2024-07-05 Random Weird Things - Part Ⅰ (You are currently reading this)
    -2025-02-08 Random Weird Things - Part Ⅱ
    -2025-08-15 Random Weird Things - Part Ⅲ
    -
    -
    -		       /\_/\
    -WHOA!! 	     ( o.o )
    -		       > ^ <
    -		      /  -  \
    -		    /        \
    -		   /______\  \
    -
    +

    The Four Archetypes of a Staff Engineer



    -

    Table of Contents


    +Larson defines four archetypes. You'll probably recognize yourself in one (or a mix):


    -

    1. bad.horse traceroute


    -
    -Run traceroute to get the poem (or song).
    -
    -Update: A reader hinted that by specifying -n 60, there will be even more output!
    -
    - -
    ❯ traceroute -m 60 bad.horse
    -traceroute to bad.horse (162.252.205.157), 60 hops max, 60 byte packets
    - 1  _gateway (192.168.1.1)  5.237 ms  5.264 ms  6.009 ms
    - 2  77-85-0-2.ip.btc-net.bg (77.85.0.2)  8.753 ms  7.112 ms  8.336 ms
    - 3  212-39-69-103.ip.btc-net.bg (212.39.69.103)  9.434 ms  9.268 ms  9.986 ms
    - 4  * * *
    - 5  xe-1-2-0.mpr1.fra4.de.above.net (80.81.194.26)  39.812 ms  39.030 ms  39.772 ms
    - 6  * ae12.cs1.fra6.de.eth.zayo.com (64.125.26.172)  123.576 ms *
    - 7  * * *
    - 8  * * *
    - 9  ae10.cr1.lhr15.uk.eth.zayo.com (64.125.29.17)  119.097 ms  119.478 ms  120.767 ms
    -10  ae2.cr1.lhr11.uk.zip.zayo.com (64.125.24.140)  120.398 ms  121.147 ms  120.948 ms
    -11  * * *
    -12  ae25.mpr1.yyz1.ca.zip.zayo.com (64.125.23.117)  145.072 ms *  181.773 ms
    -13  ae5.mpr1.tor3.ca.zip.zayo.com (64.125.23.118)  168.239 ms  168.158 ms  168.137 ms
    -14  64.124.217.237.IDIA-265104-ZYO.zip.zayo.com (64.124.217.237)  168.026 ms  167.999 ms  165.451 ms
    -15  * * *
    -16  t00.toroc1.on.ca.sn11.net (162.252.204.2)  131.598 ms  131.308 ms  131.482 ms
    -17  bad.horse (162.252.205.130)  131.430 ms  145.914 ms  130.514 ms
    -18  bad.horse (162.252.205.131)  136.634 ms  145.295 ms  135.631 ms
    -19  bad.horse (162.252.205.132)  139.158 ms  148.363 ms  138.934 ms
    -20  bad.horse (162.252.205.133)  145.395 ms  148.054 ms  147.140 ms
    -21  he.rides.across.the.nation (162.252.205.134)  149.687 ms  147.731 ms  150.135 ms
    -22  the.thoroughbred.of.sin (162.252.205.135)  156.644 ms  155.155 ms  156.447 ms
    -23  he.got.the.application (162.252.205.136)  161.187 ms  162.318 ms  162.674 ms
    -24  that.you.just.sent.in (162.252.205.137)  166.763 ms  166.675 ms  164.243 ms
    -25  it.needs.evaluation (162.252.205.138)  172.073 ms  171.919 ms  171.390 ms
    -26  so.let.the.games.begin (162.252.205.139)  175.386 ms  174.180 ms  175.965 ms
    -27  a.heinous.crime (162.252.205.140)  180.857 ms  180.766 ms  180.192 ms
    -28  a.show.of.force (162.252.205.141)  187.942 ms  186.669 ms  186.986 ms
    -29  a.murder.would.be.nice.of.course (162.252.205.142)  191.349 ms  191.939 ms  190.740 ms
    -30  bad.horse (162.252.205.143)  195.425 ms  195.716 ms  196.186 ms
    -31  bad.horse (162.252.205.144)  199.238 ms  200.620 ms  200.318 ms
    -32  bad.horse (162.252.205.145)  207.554 ms  206.729 ms  205.201 ms
    -33  he-s.bad (162.252.205.146)  211.087 ms  211.649 ms  211.712 ms
    -34  the.evil.league.of.evil (162.252.205.147)  212.657 ms  216.777 ms  216.589 ms
    -35  is.watching.so.beware (162.252.205.148)  220.911 ms  220.326 ms  221.961 ms
    -36  the.grade.that.you.receive (162.252.205.149)  225.384 ms  225.696 ms  225.640 ms
    -37  will.be.your.last.we.swear (162.252.205.150)  232.312 ms  230.989 ms  230.919 ms
    -38  so.make.the.bad.horse.gleeful (162.252.205.151)  235.761 ms  235.291 ms  235.585 ms
    -39  or.he-ll.make.you.his.mare (162.252.205.152)  241.350 ms  239.407 ms  238.394 ms
    -40  o_o (162.252.205.153)  246.154 ms  247.650 ms  247.110 ms
    -41  you-re.saddled.up (162.252.205.154)  250.925 ms  250.401 ms  250.619 ms
    -42  there-s.no.recourse (162.252.205.155)  256.071 ms  251.154 ms  255.340 ms
    -43  it-s.hi-ho.silver (162.252.205.156)  260.152 ms  261.775 ms  261.544 ms
    -44  signed.bad.horse (162.252.205.157)  262.430 ms  261.410 ms  261.365 ms
    -
    -
    -

    2. ASCII cinema


    -
    -Fancy watching Star Wars Episode IV in ASCII? Head to the ASCII cinema:
    +

    Influence and Impact over Authority



    -https://asciinema.org/a/569727
    +You won't have direct authority over most people or teams you work with. Influence is the actual tool here. You have to persuade, align, sometimes just nudge people in the right direction. No one reports to you, but you still need to drive outcomes.

    -

    3. Netflix's Hello World application


    +

    Breadth and Depth of Knowledge



    -Netflix has got the Hello World application run in production 😱
    +You need to know a bit about a lot of things (infra, security, product, etc.) but still be able to go deep in a few areas. The tricky part is keeping that breadth current without spreading yourself too thin.

    -
      -
    • https://www.Netflix.com/helloworld
    • -

    -By the time this is posted, it seems that Netflix has taken it offline... I should have created a screenshot!
    +

    Mentorship and Sponsorship



    -

    C programming


    +Mentoring is obvious -- help people grow technically and career-wise. But sponsorship is the one that surprised me: actively advocating for people, creating opportunities for them, pushing them forward. It's not just answering questions, it's putting your reputation behind someone.

    -

    4. Indexing an array


    +

    Managing Up and Across



    -In C, you can index an array like this: array[i] (not surprising). But this works as well and is valid C code: i[array], 🤯 It's because after the spec A[B] is equivalent to *(A + B) and the ordering doesn't matter for the + operator. All 3 loops are producing the same output. Would be funny to use i[array] in a merge request of some code base on April Fool's day!
    +You have to manage up (set expectations with leadership, advocate for technical needs) and across (work with peer teams, build alignment). Basically a lot of communication and relationship building. Easy to underestimate this one.

    - -
    #include <stdio.h>
    -
    -int main(void) {
    -  int array[5] = { 1, 2, 3, 4, 5 };
    -
    -  for (int i = 0; i < 5; i++)
    -    printf("%d\n", array[i]);
    -
    -  for (int i = 0; i < 5; i++)
    -    printf("%d\n", i[array]);
    -
    -  for (int i = 0; i < 5; i++)
    -    printf("%d\n", *(i + array));
    -}
    -
    +

    Strategic Thinking



    -

    5. Variables with prefix $


    +Senior engineers focus on execution. Staff engineers need to think about what happens months or years from now. That means sometimes pushing back on short-term pressures in favor of longer-term architectural decisions. Not always a popular move.

    -In C you can prefix variables with $! E.g. the following is valid C code 🫠:
    +

    Emotional Intelligence



    - -
    #include <stdio.h>
    -
    -int main(void) {
    -  int $array[5] = { 1, 2, 3, 4, 5 };
    -
    -  for (int $i = 0; $i < 5; $i++)
    -    printf("%d\n", $array[$i]);
    -
    -  for (int $i = 0; $i < 5; $i++)
    -    printf("%d\n", $i[$array]);
    -
    -  for (int $i = 0; $i < 5; $i++)
    -    printf("%d\n", *($i + $array));
    -}
    -
    +The higher you go, the more soft skills matter. Building relationships, resolving conflicts, reading the room. I think this catches a lot of engineers off guard -- you can't just be the smartest person technically anymore.

    -

    6. Object oriented shell scripts using ksh


    +

    -Experienced software developers are aware that scripting languages like Python, Perl, Ruby, and JavaScript support object-oriented programming (OOP) concepts such as classes and inheritance. However, many might be surprised to learn that the latest version of the Korn shell (Version 93t+) also supports OOP. In ksh93, OOP is implemented using user-defined types:
    +A lot of the problems you deal with are poorly defined. Nobody knows exactly what the problem is, let alone the solution. You have to be comfortable operating in that fog and still making progress.

    - -
    #!/usr/bin/ksh93
    - 
    -typeset -T Point_t=(
    -    integer -h 'x coordinate' x=0
    -    integer -h 'y coordinate' y=0
    -    typeset -h 'point color'  color="red"
    -
    -    function getcolor {
    -        print -r ${_.color}
    -    }
    -
    -    function setcolor {
    -        _.color=$1
    -    }
    -
    -    setxy() {
    -        _.x=$1; _.y=$2
    -    }
    -
    -    getxy() {
    -        print -r "(${_.x},${_.y})"
    -    }
    -)
    - 
    -Point_t point
    - 
    -echo "Initial coordinates are (${point.x},${point.y}). Color is ${point.color}"
    - 
    -point.setxy 5 6
    -point.setcolor blue
    - 
    -echo "New coordinates are ${point.getxy}. Color is ${point.getcolor}"
    - 
    -exit 0
    -
    +

    Visible and Invisible Work



    -Using types to create object oriented Korn shell 93 scripts
    +A huge chunk of Staff Engineer work is invisible. Aligning teams, influencing decisions, resolving conflicts -- none of that shows up as commits. Larson says you need to get comfortable with that, which I think is genuinely hard for engineers who are used to shipping things.

    -

    7. This works in Go


    +

    Scaling Yourself



    -There is no pointer arithmetic in Go like in C, but it is still possible to do some brain teasers with pointers 😧:
    +You can't do everything yourself anymore. Write things down, build repeatable processes, mentor others, automate what you can. The goal is to make teams more effective even when you're not in the room.

    - -
    package main
    -
    -import "fmt"
    -
    -func main() {
    -	var i int
    -	f := func() *int {
    -		return &i
    -	}
    -	*f()++
    -	fmt.Println(i)
    -}
    -
    +

    Career Progression and Title Inflation


    +
    +"Staff Engineer" means wildly different things at different companies. Titles don't always match actual responsibility or skill. Focus on the work and impact, not the title.

    -Go playground
    +Some of the above is less about technical chops and more about the strategic and interpersonal side of things. Anyway, here are some more concrete takeaways:

    -

    8. "I am a Teapot" HTTP response code


    +

    Not a faster Senior Engineer



    -Defined in 1998 as one of the IETF's traditional April Fools' jokes (RFC 2324), the Hyper Text Coffee Pot Control Protocol specifies an HTTP status code that is not intended for actual HTTP server implementation. According to the RFC, this code should be returned by teapots when asked to brew coffee. This status code also serves as an Easter egg on some websites, such as Google.com's "I'm a teapot" feature. Occasionally, it is used to respond to a blocked request, even though the more appropriate response would be the 403 Forbidden status code.
    +
      +
    • A Staff engineer is more than just a faster Senior.
    • +
    • A staff engineer is not a senior engineer but a bit better.
    • +

    +It's important to know what work or which role most energizes you. A Staff engineer is not a more senior engineer. A Staff engineer also fits into another archetype.

    -https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#418
    +As a staff engineer, you are always expected to go beyond your comfort zone and learn new things.

    -

    9. jq is a functional programming language


    +Your job sometimes will feel like an SEM and sometimes strangely similar to your senior roles.

    -Many know of jq, the handy small tool and swiss army knife for JSON parsing.
    +A Staff engineer is, like a Manager, a leader. However, being a Manager is a specific job. Leaders can apply to any job, especially to Staff engineers.

    -https://github.com/jqlang/jq
    +

    The Balance



    -What many don't know that jq is actually a full blown functional programming language jqlang, have a look at the language description:
    +The more senior you become, the more responsibility you will have to cope with them in less time. Balance your speed of progress with your personal life, don't work late hours and don't skip these personal care events.

    -https://github.com/jqlang/jq/wiki/jq-Language-Description
    +Do fewer things but do them better. Everything done will accelerate the organization. Everything else will drag it down—quality over quantity.

    -As a matter of fact, the language is so powerful, that there exists an implementation of jq in jq itself:
    +Don't work at ten things and progress slowly; focus on one thing and finish it.

    -https://github.com/wader/jqjq
    +Only spend some of the time firefighting. Have time for deep thinking. Only deep think some of the time. Otherwise, you lose touch with reality.

    -Here some snipped from jqjq, to get a feel of jqlang:
    +Sebactical: Take at least six months. Otherwise, it won't be as restored.

    -
    -def _token:
    -	def _re($re; f):
    -	  ( . as {$remain, $string_stack}
    -	  | $remain
    -	  | match($re; "m").string
    -	  | f as $token
    -	  | { result: ($token | del(.string_stack))
    -	    , remain: $remain[length:]
    -	    , string_stack:
    -	        ( if $token.string_stack == null then $string_stack
    -	          else $token.string_stack
    -	          end
    -	        )
    -	    }
    -	  );
    -	if .remain == "" then empty
    -	else
    -	  ( . as {$string_stack}
    -	  | _re("^\\s+"; {whitespace: .})
    -	  // _re("^#[^\n]*"; {comment: .})
    -	  // _re("^\\.[_a-zA-Z][_a-zA-Z0-9]*"; {index: .[1:]})
    -	  // _re("^[_a-zA-Z][_a-zA-Z0-9]*"; {ident: .})
    -	  // _re("^@[_a-zA-Z][_a-zA-Z0-9]*"; {at_ident: .})
    -	  // _re("^\\$[_a-zA-Z][_a-zA-Z0-9]*"; {binding: .})
    -	  # 1.23, .123, 123e2, 1.23e2, 123E2, 1.23e+2, 1.23E-2 or 123
    -	  // _re("^(?:[0-9]*\\.[0-9]+|[0-9]+)(?:[eE][-\\+]?[0-9]+)?"; {number: .})
    -	  // _re("^\"(?:[^\"\\\\]|\\\\.)*?\\\\\\(";
    -	      ( .[1:-2]
    -	      | _unescape
    -	      | {string_start: ., string_stack: ($string_stack+["\\("])}
    -	      )
    -	    )
    -	 .
    -	 .
    -	 .
    -
    -
    -

    10. Regular expression to verify email addresses


    -
    -This is a pretty old meme, but still worth posting here (as some may be unaware). The RFC822 Perl regex to validate email addresses is 😱:
    +

    More things



    -
    -(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    -)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
    -\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
    -?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
    -\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
    -31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
    -](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
    -(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
    -(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    -|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
    -?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
    -r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
    - \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
    -?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
    -)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
    - \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
    -)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
    -)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
    -*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
    -|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
    -\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
    -\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
    -]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
    -]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
    -?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
    -:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
    -:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
    -:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
    -[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
    -\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
    -\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
    -@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
    -(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
    -)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    -".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
    -:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
    -\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
    -\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
    -?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
    -:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
    -^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
    -.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
    -]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
    -[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
    -r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
    -\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
    -|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
    -00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
    -.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
    -;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
    -:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
    -(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    -\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
    -^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
    -]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
    -?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    -".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
    -?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
    -\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
    -])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
    -])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
    -:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
    -\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
    -[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
    -]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
    -?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
    -()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
    -?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
    -@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
    - \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
    -;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
    -)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
    -".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
    -(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
    -\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
    -\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
    -"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
    -*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
    -+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
    -.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
    -|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
    -?:\r\n)?[ \t])*))*)?;\s*)
    -
    -
    -https://pdw.ex-parrot.com/Mail-RFC822-Address.html
    +
      +
    • Provide simple but widely used tools. Complex and powerful tools will have power users but only a very few. All others will not use the tool.
    • +
    • In meetings, when someone is inactive, try to pull him in. Pull in max one person at a time. Don't open the discussion to multiple people.
    • +
    • Get used to writing things down and repeating yourself. You will scale yourself much more.
    • +
    • Title inflation: skills correspond to work, but the titles don't.
    • +

    +E-Mail your comments to paul@nospam.buetow.org :-)

    -I hope you had some fun. E-Mail your comments to paul@nospam.buetow.org :-)
    +Other book notes of mine are:

    -other related posts are:
    +2025-11-02 'The Courage To Be Disliked' book notes
    +2025-06-07 'A Monk's Guide to Happiness' book notes
    +2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    +2024-10-24 'Staff Engineer' book notes (You are currently reading this)
    +2024-07-07 'The Stoic Challenge' book notes
    +2024-05-01 'Slow Productivity' book notes
    +2023-11-11 'Mind Management' book notes
    +2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    +2023-05-06 'The Obstacle is the Way' book notes
    +2023-04-01 'Never split the difference' book notes
    +2023-03-16 'The Pragmatic Programmer' book notes

    Back to the main site
    - Terminal multiplexing with `tmux` - Z-Shell edition - - gemini://foo.zone/gemfeed/2024-06-23-terminal-multiplexing-with-tmux.gmi - 2024-06-23T22:41:59+03:00, last updated Fri 02 May 00:10:49 EEST 2025 + Gemtexter 3.0.0 - Let's Gemtext again⁴ + + gemini://foo.zone/gemfeed/2024-10-02-gemtexter-3.0.0-lets-gemtext-again-4.gmi + 2024-10-01T21:46:26+03:00 Paul Buetow aka snonux paul@dev.buetow.org - This is the Z-Shell version. There is also a Fish version: + I proudly announce that I've released Gemtexter version `3.0.0`. What is Gemtexter? It's my minimalist static site generator for Gemini Gemtext, HTML and Markdown, written in GNU Bash.
    -

    Terminal multiplexing with tmux - Z-Shell edition


    -
    -Published at 2024-06-23T22:41:59+03:00, last updated Fri 02 May 00:10:49 EEST 2025
    -
    -This is the Z-Shell version. There is also a Fish version:
    +

    Gemtexter 3.0.0 - Let's Gemtext again⁴



    -./2025-05-02-terminal-multiplexing-with-tmux-fish-edition.html
    +Published at 2024-10-01T21:46:26+03:00

    -Tmux (Terminal Multiplexer) is a powerful, terminal-based tool that manages multiple terminal sessions within a single window. Here are some of its primary features and functionalities:
    +I proudly announce that I've released Gemtexter version 3.0.0. What is Gemtexter? It's my minimalist static site generator for Gemini Gemtext, HTML and Markdown, written in GNU Bash.

    -
      -
    • Session management
    • -
    • Window and Pane management
    • -
    • Persistent Workspace
    • -
    • Customization
    • -

    -https://github.com/tmux/tmux/wiki
    +https://codeberg.org/snonux/gemtexter

    -         _______
    -        |.-----.|
    -        || Tmux||
    -        ||_.-._||
    -        `--)-(--`
    -       __[=== o]___
    -      |:::::::::::|\
    -jgs   `-=========-`()
    -    mod. by Paul B.
    +-=[ typewriters ]=-  1/98
    +                                      .-------.
    +       .-------.                     _|~~ ~~  |_
    +      _|~~ ~~  |_       .-------.  =(_|_______|_)
    +    =(_|_______|_)=    _|~~ ~~  |_   |:::::::::|    .-------.
    +      |:::::::::|    =(_|_______|_)  |:::::::[]|   _|~~ ~~  |_
    +      |:::::::[]|      |:::::::::|   |o=======.| =(_|_______|_)
    +      |o=======.|      |:::::::[]|   `"""""""""`   |:::::::::|
    + jgs  `"""""""""`      |o=======.|                 |:::::::[]|
    +  mod. by Paul Buetow  `"""""""""`                 |o=======.|
    +                                                   `"""""""""`
     

    Table of Contents




    -

    Before continuing...


    -
    -Before continuing to read this post, I encourage you to get familiar with Tmux first (unless you already know the basics). You can go through the official getting started guide:
    -
    -https://github.com/tmux/tmux/wiki/Getting-Started
    -
    -I can also recommend this book (this is the book I got started with with Tmux):
    -
    -https://pragprog.com/titles/bhtmux2/tmux-2/
    -
    -Over the years, I have built a couple of shell helper functions to optimize my workflows. Tmux is extensively integrated into my daily workflows (personal and work). I had colleagues asking me about my Tmux config and helper scripts for Tmux several times. It would be neat to blog about it so that everyone interested in it can make a copy of my configuration and scripts.
    -
    -The configuration and scripts in this blog post are only the non-work-specific parts. There are more helper scripts, which I only use for work (and aren't really useful outside of work due to the way servers and clusters are structured there).
    -
    -Tmux is highly configurable, and I think I am only scratching the surface of what is possible with it. Nevertheless, it may still be useful for you. I also love that Tmux is part of the OpenBSD base system!
    -
    -

    Shell aliases


    -
    -I am a user of the Z-Shell (zsh), but I believe all the snippets mentioned in this blog post also work with Bash.
    +

    Why Bash?



    -https://www.zsh.org
    +This project is too complex for a Bash script. Writing it in Bash was to try out how maintainable a "larger" Bash script could be. It's still pretty maintainable and helps me try new Bash tricks here and then!

    -For the most common Tmux commands I use, I have created the following shell aliases:
    +Let's list what's new!

    - -
    alias tm=tmux
    -alias tl='tmux list-sessions'
    -alias tn=tmux::new
    -alias ta=tmux::attach
    -alias tx=tmux::remote
    -alias ts=tmux::search
    -alias tssh=tmux::cluster_ssh
    -
    +

    HTML exact variant is the only variant



    -Note all tmux::...; those are custom shell functions doing certain things, and they aren't part of the Tmux distribution. But let's run through every alias one by one.
    +The last version of Gemtexter introduced the HTML exact variant, which wasn't enabled by default. This version of Gemtexter removes the previous (inexact) variant and makes the exact variant the default. This is a breaking change, which is why there is a major version bump of Gemtexter. Here is a reminder of what the exact variant was:

    -The first two are pretty straightforward. tm is simply a shorthand for tmux, so I have to type less, and tl lists all Tmux sessions that are currently open. No magic here.
    +Gemtexter is there to convert your Gemini Capsule into other formats, such as HTML and Markdown. An HTML exact variant can now be enabled in the gemtexter.conf by adding the line declare -rx HTML_VARIANT=exact. The HTML/CSS output changed to reflect a more exact Gemtext appearance and to respect the same spacing as you would see in the Geminispace.

    -

    The tn alias - Creating a new session


    +

    Table of Contents auto-generation



    -The tn alias is referencing this function:
    +Just add...

    - -
    # Create new session and if already exists attach to it
    -tmux::new () {
    -    readonly session=$1
    -    local date=date
    -    if where gdate &>/dev/null; then
    -        date=gdate
    -    fi
    -
    -    tmux::cleanup_default
    -    if [ -z "$session" ]; then
    -        tmux::new T$($date +%s)
    -    else
    -        tmux new-session -d -s $session
    -        tmux -2 attach-session -t $session || tmux -2 switch-client -t $session
    -    fi
    -}
    -alias tn=tmux::new
    +
    + << template::inline::toc
     

    -There is a lot going on here. Let's have a detailed look at what it is doing. As a note, the function relies on GNU Date, so MacOS is looking for the gdate commands to be available. Otherwise, it will fall back to date. You need to install GNU Date for Mac, as it isn't installed by default there. As I use Fedora Linux on my personal Laptop and a MacBook for work, I have to make it work for both.
    +...into a Gemtexter template file and Gemtexter will automatically generate a table of contents for the page based on the headings (see this page's ToC for example). The ToC will also have links to the relevant sections in HTML and Markdown output. The Gemtext format does not support links, so the ToC will simply be displayed as a bullet list.

    -First, a Tmux session name can be passed to the function as a first argument. That session name is only optional. Without it, Tmux will select a session named T$($date +%s) as a default. Which is T followed by the UNIX epoch, e.g. T1717133796.
    +

    Configurable themes



    -

    Cleaning up default sessions automatically


    +It was always possible to customize the style of a Gemtexter's resulting HTML page, but all the config options were scattered across multiple files. Now, the CSS style, web fonts, etc., are all configurable via themes.

    -Note also the call to tmux::cleanup_default; it would clean up all already opened default sessions if they aren't attached. Those sessions were only temporary, and I had too many flying around after a while. So, I decided to auto-delete the sessions if they weren't attached. If I want to keep sessions around, I will rename them with the Tmux command prefix-key $. This is the cleanup function:
    +Simply configure HTML_THEME_DIR in the gemtexter.conf file to the corresponding directory. For example:

    -
    tmux::cleanup_default () {
    -    local s
    -    tmux list-sessions | grep '^T.*: ' | grep -F -v attached |
    -    cut -d: -f1 | while read -r s; do
    -        echo "Killing $s"
    -        tmux kill-session -t "$s"
    -    done
    -}
    +
    declare -xr HTML_THEME_DIR=./extras/html/themes/simple
     

    -The cleanup function kills all open Tmux sessions that haven't been renamed properly yet—but only if they aren't attached (e.g., don't run in the foreground in any terminal). Cleaning them up automatically keeps my Tmux sessions as neat and tidy as possible.
    +To customize the theme or create your own, simply copy the theme directory and modify it as needed. This makes it also much easier to switch between layouts.

    -

    Renaming sessions


    +

    No use of webfonts by default



    -Whenever I am in a temporary session (named T....), I may decide that I want to keep this session around. I have to rename the session to prevent the cleanup function from doing its thing. That's, as mentioned already, easily accomplished with the standard prefix-key $ Tmux command.
    +The default theme is now "back to the basics" and does not utilize any web fonts. The previous themes are still part of the release and can be easily configured. These are currently the future and business themes. You can check them out from the themes directory.

    -

    The ta alias - Attaching to a session


    +

    More



    -This alias refers to the following function, which tries to attach to an already-running Tmux session.
    +Additionally, there were a couple of bug fixes, refactorings and overall improvements in the documentation made.

    - -
    tmux::attach () {
    -    readonly session=$1
    -
    -    if [ -z "$session" ]; then
    -        tmux attach-session || tmux::new
    -    else
    -        tmux attach-session -t $session || tmux::new $session
    -    fi
    -}
    -alias ta=tmux::attach
    -
    +E-Mail your comments to paul@nospam.buetow.org :-)

    -If no session is specified (as the argument of the function), it will try to attach to the first open session. If no Tmux server is running, it will create a new one with tmux::new. Otherwise, with a session name given as the argument, it will attach to it. If unsuccessful (e.g., the session doesn't exist), it will be created and attached to.
    +Other related posts are:

    -

    The tr alias - For a nested remote session


    +2024-10-02 Gemtexter 3.0.0 - Let's Gemtext again⁴ (You are currently reading this)
    +2023-07-21 Gemtexter 2.1.0 - Let's Gemtext again³
    +2023-03-25 Gemtexter 2.0.0 - Let's Gemtext again²
    +2022-08-27 Gemtexter 1.1.0 - Let's Gemtext again
    +2021-06-05 Gemtexter - One Bash script to rule it all
    +2021-04-24 Welcome to the Geminispace

    -This SSHs into the remote server specified and then, remotely on the server itself, starts a nested Tmux session. So we have one Tmux session on the local computer and, inside of it, an SSH connection to a remote server with a Tmux session running again. The benefit of this is that, in case my network connection breaks down, the next time I connect, I can continue my work on the remote server exactly where I left off. The session name is the name of the server being SSHed into. If a session like this already exists, it simply attaches to it.
    +Back to the main site
    +
    +
    +
    + + Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers + + gemini://foo.zone/gemfeed/2024-09-07-site-reliability-engineering-part-4.gmi + 2024-09-07T16:27:58+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + Welcome to Part 4 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I’m here to share what SRE is all about in this blog series. + +
    +

    Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers



    - -
    tmux::remote () {
    -    readonly server=$1
    -    tmux new -s $server "ssh -t $server 'tmux attach-session || tmux'" || \
    -        tmux attach-session -d -t $server
    -}
    -alias tr=tmux::remote
    -
    +Published at 2024-09-07T16:27:58+03:00

    -

    Change of the Tmux prefix for better nesting


    +Welcome to Part 4 of my Site Reliability Engineering (SRE) series. I'm currently working as a Site Reliability Engineer, and I’m here to share what SRE is all about in this blog series.

    -To make nested Tmux sessions work smoothly, one must change the Tmux prefix key locally or remotely. By default, the Tmux prefix key is Ctrl-b, so Ctrl-b $, for example, renames the current session. To change the prefix key from the standard Ctrl-b to, for example, Ctrl-g, you must add this to the tmux.conf:
    +2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
    +2023-11-19 Site Reliability Engineering - Part 2: Operational Balance
    +2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture
    +2024-09-07 Site Reliability Engineering - Part 4: Onboarding for On-Call Engineers (You are currently reading this)
    +2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning

    -set-option -g prefix C-g
    +       __..._   _...__
    +  _..-"      `Y`      "-._
    +  \ Once upon |           /
    +  \\  a time..|          //
    +  \\\         |         ///
    +   \\\ _..---.|.---.._ ///
    +jgs \\`_..---.Y.---.._`//	
     

    -This way, when I want to rename the remote Tmux session, I have to use Ctrl-g $, and when I want to rename the local Tmux session, I still have to use Ctrl-b $. In my case, I have this deployed to all remote servers through a configuration management system (out of scope for this blog post).
    +This time, I want to share some tips on how to onboard software engineers, QA engineers, and Site Reliability Engineers (SREs) to the primary on-call rotation. Traditionally, onboarding might take half a year (depending on the complexity of the infrastructure), but with a bit of strategy and structured sessions, we've managed to reduce it to just six weeks per person. Let's dive in!

    -There might also be another way around this (without reconfiguring the prefix key), but that is cumbersome to use, as far as I remember.
    +

    Setting the Scene: Tier-1 On-Call Rotation



    -

    The ts alias - Searching sessions with fuzzy finder


    +First things first, let's talk about Tier-1. This is where the magic begins. Tier-1 covers over 80% of the common on-call cases and is the perfect breeding ground for new on-call engineers to get their feet wet. It's designed to be manageable training ground.

    -Despite the fact that with tmux::cleanup_default, I don't leave a huge mess with trillions of Tmux sessions flying around all the time, at times, it can become challenging to find exactly the session I am currently interested in. After a busy workday, I often end up with around twenty sessions on my laptop. This is where fuzzy searching for session names comes in handy, as I often don't remember the exact session names.
    +

    Why Tier-1?



    - -
    tmux::search () {
    -    local -r session=$(tmux list-sessions | fzf | cut -d: -f1)
    -    if [ -z "$TMUX" ]; then
    -        tmux attach-session -t $session
    -    else
    -        tmux switch -t $session
    -    fi
    -}
    -alias ts=tmux::search
    -
    +
      +
    • Easy to Understand: Every on-call engineer should be familiar with Tier-1 tasks.
    • +
    • Training Ground: This is where engineers start their on-call career. It's purposefully kept simple so that it's not overwhelming right off the bat.
    • +
    • Runbook/recipe driven: Every alert is attached to a comprehensive runbook, making it easy for every engineer to follow.
    • +

    +

    Onboarding Process: From 6 Months to 6 Weeks



    -All it does is list all currently open sessions in fzf, where one of them can be searched and selected through fuzzy find, and then either switch (if already inside a session) to the other session or attach to the other session (if not yet in Tmux).
    +So how did we cut down the onboarding time so drastically? Here’s the breakdown of our process:

    -You must install the fzf command on your computer for this to work. This is how it looks like:
    +Knowledge Transfer (KT) Sessions: We kicked things off with more than 10 KT sessions, complete with video recordings. These sessions are comprehensive and cover everything from the basics to some more advanced topics. The recorded sessions mean that new engineers can revisit them anytime they need a refresher.

    -Tmux session fuzzy finder
    +Shadowing Sessions: Each new engineer undergoes two on-call week shadowing sessions. This hands-on experience is invaluable. They get to see real-time incident handling and resolution, gaining practical knowledge that's hard to get from just reading docs.

    -

    The tssh alias - Cluster SSH replacement


    +Comprehensive Runbooks: We created 64 runbooks (by the time writing this probably more than 100) that are composable like Lego bricks. Each runbook covers a specific scenario and guides the engineer step-by-step to resolution. Pairing these with monitoring alerts linked directly to Confluence docs, and from there to the respective runbooks, ensures every alert can be navigated with ease (well, there are always exceptions to the rule...).

    -Before I used Tmux, I was a heavy user of ClusterSSH, which allowed me to log in to multiple servers at once in a single terminal window and type and run commands on all of them in parallel.
    +Self-Sufficiency & Confidence Building: With all these resources at their fingertips, our on-call engineers become self-sufficient for most of the common issues they'll face (new starters can now handle around 80% of the most common issue after 6 weeks they had joined the company). This boosts their confidence and ensures they can handle Tier-1 incidents independently.

    -https://github.com/duncs/clusterssh
    +Documentation and Feedback Loop: Continuous improvement is key. We regularly update our documentation based on feedback from the engineers. This makes our process even more robust and user-friendly.

    -However, since I started using Tmux, I retired ClusterSSH, as it came with the benefit that Tmux only needs to be run in the terminal, whereas ClusterSSH spawned terminal windows, which aren't easily portable (e.g., from a Linux desktop to macOS). The tmux::cluster_ssh function can have N arguments, where:
    +

    It's All About the Tiers


    +
    +Let’s briefly touch on the Tier levels:

      -
    • ...the first argument will be the session name (see tmux::tssh_from_argument helper function), and all remaining arguments will be server hostnames/FQDNs to connect to simultaneously.
    • -
    • ...or, the first argument is a file name, and the file contains a list of hostnames/FQDNs (see tmux::ssh_from_file helper function)
    • +
    • Tier 1: Easy and foundational tasks. Perfect for getting new engineers started. This covers around 80% of all on-call cases we face. This is what we trained on.
    • +
    • Tier 2: Slightly more complex, requiring more background knowledge. We trained on some of the topics but not all.
    • +
    • Tier 3: Requires a good understanding of the platform/architecture. Likely needs KT sessions with domain experts.
    • +
    • Tier DE (Domain Expert): The heavy hitters. Domain experts are required for these tasks.

    -This is the function definition behind the tssh alias:
    -
    - -
    tmux::cluster_ssh () {
    -    if [ -f "$1" ]; then
    -        tmux::tssh_from_file $1
    -        return
    -    fi
    -
    -    tmux::tssh_from_argument $@
    -}
    -alias tssh=tmux::cluster_ssh
    -
    -
    -This function is just a wrapper around the more complex tmux::tssh_from_file and tmux::tssh_from_argument functions, as you have learned already. Most of the magic happens there.
    -
    -

    The tmux::tssh_from_argument helper


    +

    Growing into Higher Tiers



    -This is the most magic helper function we will cover in this post. It looks like this:
    +From Tier-1, engineers naturally grow into Tier-2 and beyond. The structured training and gradual increase in complexity help ensure a smooth transition as they gain experience and confidence. The key here is that engineers stay curous and engaged in the on-call, so that they always keep learning.

    - -
    tmux::tssh_from_argument () {
    -    local -r session=$1; shift
    -    local first_server=$1; shift
    -
    -    tmux new-session -d -s $session "ssh -t $first_server"
    -    if ! tmux list-session | grep "^$session:"; then
    -        echo "Could not create session $session"
    -        return 2
    -    fi
    -
    -    for server in "${@[@]}"; do
    -        tmux split-window -t $session "tmux select-layout tiled; ssh -t $server"
    -    done
    -
    -    tmux setw -t $session synchronize-panes on
    -    tmux -2 attach-session -t $session | tmux -2 switch-client -t $session
    -}
    -
    +

    Keeping Runbooks Up to Date



    -It expects at least two arguments. The first argument is the session name to create for the clustered SSH session. All other arguments are server hostnames or FQDNs to which to connect. The first one is used to make the initial session. All remaining ones are added to that session with tmux split-window -t $session.... At the end, we enable synchronized panes by default, so whenever you type, the commands will be sent to every SSH connection, thus allowing the neat ClusterSSH feature to run commands on multiple servers simultaneously. Once done, we attach (or switch, if already in Tmux) to it.
    +It is important that runbooks are not a "project to be finished"; runbooks have to be maintained and updated over time. Sections may change, new runbooks need to be added, and old ones can be deleted. So the acceptance criteria of an on-call shift would not just be reacting to alerts and incidents, but also reviewing and updating the current runbooks.

    -Sometimes, I don't want the synchronized panes behavior and want to switch it off temporarily. I can do that with prefix-key p and prefix-key P after adding the following to my local tmux.conf:
    +

    Conclusion



    -
    -bind-key p setw synchronize-panes off
    -bind-key P setw synchronize-panes on
    -
    +By structuring the onboarding process with KT sessions, shadowing, comprehensive runbooks, and a feedback loop, we've been able to fast-track the process from six months to just six weeks. This not only prepares our engineers for the on-call rotation quicker but also ensures they're confident and capable when handling incidents.

    -

    The tmux::tssh_from_file helper


    +If you're looking to optimize your on-call onboarding process, these strategies could be your ticket to a more efficient and effective transition. Happy on-calling!

    -This one sets the session name to the file name and then reads a list of servers from that file, passing the list of servers to tmux::tssh_from_argument as the arguments. So, this is a neat little wrapper that also enables me to open clustered SSH sessions from an input file.
    +Continue with the fifth part of this series:

    - -
    tmux::tssh_from_file () {
    -    local -r serverlist=$1; shift
    -    local -r session=$(basename $serverlist | cut -d. -f1)
    -
    -    tmux::tssh_from_argument $session $(awk '{ print $1} ' $serverlist | sed 's/.lan./.lan/g')
    -}
    -
    +2026-03-01 Site Reliability Engineering - Part 5: System Design, Incidents, and Learning

    -

    tssh examples


    +E-Mail your comments to paul@nospam.buetow.org :-)

    -To open a new session named fish and log in to 4 remote hosts, run this command (Note that it is also possible to specify the remote user):
    +Back to the main site
    +
    +
    +
    + + Projects I financially support + + gemini://foo.zone/gemfeed/2024-09-07-projects-i-support.gmi + 2024-09-07T16:04:19+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the list of projects and initiatives I support/sponsor. + +
    +

    Projects I financially support



    -
    -$ tssh fish blowfish.buetow.org fishfinger.buetow.org \
    -    fishbone.buetow.org user@octopus.buetow.org
    -
    +Published at 2024-09-07T16:04:19+03:00

    -To open a new session named manyservers, put many servers (one FQDN per line) into a file called manyservers.txt and simply run:
    +This is the list of projects and initiatives I support/sponsor.

    -$ tssh manyservers.txt
    +||====================================================================||
    +||//$\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\//$\\||
    +||(100)==================| FEDERAL SPONSOR NOTE |================(100)||
    +||\\$//        ~         '------========--------'                \\$//||
    +||<< /        /$\              // ____ \\                         \ >>||
    +||>>|  12    //L\\            // ///..) \\         L38036133B   12 |<<||
    +||<<|        \\ //           || <||  >\  ||                        |>>||
    +||>>|         \$/            ||  $$ --/  ||        One Hundred     |<<||
    +||<<|      L38036133B        *\\  |\_/  //* series                 |>>||
    +||>>|  12                     *\\/___\_//*   1989                  |<<||
    +||<<\      Open Source   ______/Franklin\________     Supporting   />>||
    +||//$\                 ~| SPONSORING AND FUNDING |~               /$\\||
    +||(100)===================  AWESOME OPEN SOURCE =================(100)||
    +||\\$//\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\\$//||
    +||====================================================================||
    + 
     

    -

    Common Tmux commands I use in tssh


    -
    -These are default Tmux commands that I make heavy use of in a tssh session:
    +

    Table of Contents




    -

    Copy and paste workflow


    -
    -As you will see later in this blog post, I have configured a history limit of 1 million items in Tmux so that I can scroll back quite far. One main workflow of mine is to search for text in the Tmux history, select and copy it, and then switch to another window or session and paste it there (e.g., into my text editor to do something with it).
    +

    Motivation



    -This works by pressing prefix-key [ to enter Tmux copy mode. From there, I can browse the Tmux history of the current window using either the arrow keys or vi-like navigation (see vi configuration later in this blog post) and the Pg-Dn and Pg-Up keys.
    +Sponsoring free and open-source projects, even for personal use, is important to ensure the sustainability, security, and continuous improvement of the software. It supports developers who often maintain these projects without compensation, helping them provide updates, new features, and security patches. By contributing, you recognize their efforts, foster a culture of innovation, and benefit from perks like early access or support, all while ensuring the long-term viability of the tools you rely on.

    -I often search the history backwards with prefix-key [ followed by a ?, which opens the Tmux history search prompt.
    +Albeit I am not putting a lot of money into my sponsoring efforts, it still helps the open-source maintainers because the more little sponsors there are, the higher the total sum.

    -Once I have identified the terminal text to be copied, I enter visual select mode with v, highlight all the text to be copied (using arrow keys or Vi motions), and press y to yank it (sorry if this all sounds a bit complicated, but Vim/NeoVim users will know this, as it is pretty much how you do it there as well).
    +

    OSnews



    -For v and y to work, the following has to be added to the Tmux configuration file:
    +I am a silver Patreon member of OSnews. I have been following this site since my student years. It's always been a great source of independent and slightly alternative IT news.

    -
    -bind-key -T copy-mode-vi 'v' send -X begin-selection
    -bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
    -
    +https://osnews.com

    -Once the text is yanked, I switch to another Tmux window or session where, for example, a text editor is running and paste the yanked text from Tmux into the editor with prefix-key ]. Note that when pasting into a modal text editor like Vi or Helix, you would first need to enter insert mode before prefix-key ] would paste anything.
    +

    Cup o' Go Podcast



    -

    Tmux configurations


    +I am a Patreon of the Cup o' Go Podcast. The podcast helps me stay updated with the Go community for around 15 minutes per week. I am not a full-time software developer, but my long-term ambition is to become better in Go every week by working on personal projects and tools for work.

    -Some features I have configured directly in Tmux don't require an external shell alias to function correctly. Let's walk line by line through my local ~/.config/tmux/tmux.conf:
    +https://cupogo.dev

    -
    -source ~/.config/tmux/tmux.local.conf
    -
    -set-option -g allow-rename off
    -set-option -g history-limit 100000
    -set-option -g status-bg '#444444'
    -set-option -g status-fg '#ffa500'
    -set-option -s escape-time 0
    -
    +

    Codeberg



    -There's yet to be much magic happening here. I source a tmux.local.conf, which I sometimes use to override the default configuration that comes from the configuration management system. But it is mostly just an empty file, so it doesn't throw any errors on Tmux startup when I don't use it.
    +Codeberg e.V. is a nonprofit organization that provides online resources for software development and collaboration. I am a user and a supporting member, paying an annual membership of €24. I didn't have to pay that membership fee, as Codeberg offers all the services I use for free.

    -I work with many terminal outputs, which I also like to search within Tmux. So, I added a large enough history-limit, enabling me to search backwards in Tmux for any output up to a million lines of text.
    +https://codeberg.org
    +https://codeberg.org/snonux - My Codeberg page

    -Besides changing some colours (personal taste), I also set escape-time to 0, which is just a workaround. Otherwise, my Helix text editor's ESC key would take ages to trigger within Tmux. I am trying to remember the gory details. You can leave it out; if everything works fine for you, leave it out.
    +

    GrapheneOS



    -The next lines in the configuration file are:
    +GrapheneOS is an open-source project that improves Android's privacy and security with sandboxing, exploit mitigations, and a permission model. It does not include Google apps or services but offers a sandboxed Google Play compatibility layer and its own apps and services.

    -
    -set-window-option -g mode-keys vi
    -bind-key -T copy-mode-vi 'v' send -X begin-selection
    -bind-key -T copy-mode-vi 'y' send -X copy-selection-and-cancel
    -
    +I've made a one-off €100 donation because I really like this, and I run GrapheneOS on my personal Phone as my main daily driver.

    -I navigate within Tmux using Vi keybindings, so the mode-keys is set to vi. I use the Helix modal text editor, which is close enough to Vi bindings for simple navigation to feel "native" to me. (By the way, I have been a long-time Vim and NeoVim user, but I eventually switched to Helix. It's off-topic here, but it may be worth another blog post once.)
    +https://grapheneos.org/
    +Why GrapheneOS Rox

    -The two bind-key commands make it so that I can use v and y in copy mode, which feels more Vi-like (as already discussed earlier in this post).
    +

    AnkiDroid



    -The next set of lines in the configuration file are:
    +AnkiDroid is an app that lets you learn flashcards efficiently with spaced repetition. It is compatible with Anki software and supports various flashcard content, syncing, statistics, and more.

    -
    -bind-key h select-pane -L
    -bind-key j select-pane -D
    -bind-key k select-pane -U
    -bind-key l select-pane -R
    -
    -bind-key H resize-pane -L 5
    -bind-key J resize-pane -D 5
    -bind-key K resize-pane -U 5
    -bind-key L resize-pane -R 5
    -
    +I've been learning vocabulary with this free app, and it is, in my opinion, the best flashcard app I know. I've made a 20$ one-off donation to this project.

    -These allow me to use prefix-key h, prefix-key j, prefix-key k, and prefix-key l for switching panes and prefix-key H, prefix-key J, prefix-key K, and prefix-key L for resizing the panes. If you don't know Vi/Vim/NeoVim, the letters hjkl are commonly used there for left, down, up, and right, which is also the same for Helix, by the way.
    +https://opencollective.com/ankidroid

    -The next set of lines in the configuration file are:
    +

    OpenBSD through OpenBSD.Amsterdam



    -
    -bind-key c new-window -c '#{pane_current_path}'
    -bind-key F new-window -n "session-switcher" "tmux list-sessions | fzf | cut -d: -f1 | xargs tmux switch-client -t"
    -bind-key T choose-tree
    -
    + The OpenBSD project produces a FREE, multi-platform 4.4BSD-based UNIX-like operating system. Our efforts emphasize portability, standardization, correctness, proactive security and integrated cryptography. As an example of the effect OpenBSD has, the popular OpenSSH software comes from OpenBSD. OpenBSD is freely available from their download sites.

    -The first one is that any new window starts in the current directory. The second one is more interesting. I list all open sessions in the fuzzy finder. I rely heavily on this during my daily workflow to switch between various sessions depending on the task. E.g. from a remote cluster SSH session to a local code editor.
    +I implicitly support the OpenBSD project through a VM I have rented at OpenBSD Amsterdam. They donate €10 per VM and €15 per VM for every renewal to the OpenBSD Foundation, with dedicated servers running vmm(4)/vmd(8) to host opinionated VMs.

    -The third one, choose-tree, opens a tree view in Tmux listing all sessions and windows. This one is handy to get a better overview of what is currently running in any local Tmux session. It looks like this (it also allows me to press a hotkey to switch to a particular Tmux window):
    +https://www.OpenBSD.org
    +https://OpenBSD.Amsterdam

    -Tmux session tree view
    +

    ProtonMail



    +I am not directly funding this project, but I am a very happy paying customer, and I am listing it here as an alternative to big tech if you don't want to run your own mail infrastructure. I am listing ProtonMail here as it is a non-profit organization, and I want to emphasize the importance of considering alternatives to big tech.

    -The last remaining lines in my configuration file are:
    -
    -
    -bind-key p setw synchronize-panes off
    -bind-key P setw synchronize-panes on
    -bind-key r source-file ~/.config/tmux/tmux.conf \; display-message "tmux.conf reloaded"
    -
    +https://proton.me/

    -We discussed synchronized panes earlier. I use it all the time in clustered SSH sessions. When enabled, all panes (remote SSH sessions) receive the same keystrokes. This is very useful when you want to run the same commands on many servers at once, such as navigating to a common directory, restarting a couple of services at once, or running tools like htop to quickly monitor system resources.
    +

    Libro.fm



    -The last one reloads my Tmux configuration on the fly.
    +This is the alternative to Audible if you are into audiobooks (like I am). For every book or every month of membership, I am also supporting a local bookstore I selected. Their catalog is not as large as Audible's, but it's still pretty decent.

    -E-Mail your comments to paul@nospam.buetow.org :-)
    +Libro.fm began as a conversation among friends at Third Place Books, a local bookstore in Seattle, Washington, about the growing popularity of audiobooks and the lack of a way for readers to purchase them from independent bookstores. Flash forward, and Libro.fm was founded in 2014.

    -Other related posts are:
    +https://libro.fm

    -2026-02-02 A tmux popup editor for Cursor Agent CLI prompts
    -2025-05-02 Terminal multiplexing with tmux - Fish edition
    -2024-06-23 Terminal multiplexing with tmux - Z-Shell edition (You are currently reading this)
    +E-mail your comments to paul@nospam.buetow.org :-)

    Back to the main site
    - Projects I currently don't have time for - - gemini://foo.zone/gemfeed/2024-05-03-projects-i-currently-dont-have-time-for.gmi - 2024-05-03T16:23:03+03:00 + Typing `127.1` words per minute (`>100wpm average`) + + gemini://foo.zone/gemfeed/2024-08-05-typing-127.1-words-per-minute.gmi + 2024-08-05T17:39:30+03:00 Paul Buetow aka snonux paul@dev.buetow.org - Over the years, I have collected many ideas for my personal projects and noted them down. I am currently in the process of cleaning up all my notes and reviewing those ideas. I don’t have time for the ones listed here and won’t have any soon due to other commitments and personal projects. So, in order to 'get rid of them' from my notes folder, I decided to simply put them in this blog post so that those ideas don't get lost. Maybe I will pick up one or another idea someday in the future, but for now, they are all put on ice in favor of other personal projects or family time. + After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?...
    -

    Projects I currently don't have time for


    -
    -Published at 2024-05-03T16:23:03+03:00
    +

    Typing 127.1 words per minute (>100wpm average)



    -Over the years, I have collected many ideas for my personal projects and noted them down. I am currently in the process of cleaning up all my notes and reviewing those ideas. I don’t have time for the ones listed here and won’t have any soon due to other commitments and personal projects. So, in order to "get rid of them" from my notes folder, I decided to simply put them in this blog post so that those ideas don't get lost. Maybe I will pick up one or another idea someday in the future, but for now, they are all put on ice in favor of other personal projects or family time.
    +Published at 2024-08-05T17:39:30+03:00; Updated at 2025-02-22

    -Art by Laura Brown
    -
    -.'`~~~~~~~~~~~`'.
    -(  .'11 12 1'.  )
    -|  :10 \    2:  |
    -|  :9   @-> 3:  |
    -|  :8       4;  |
    -'. '..7 6 5..' .'
    - ~-------------~  ldb
    -
    +,---,---,---,---,---,---,---,---,---,---,---,---,---,-------,
    +|1/2| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | + | ' | <-    |
    +|---'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-----|
    +| ->| | Q | W | E | R | T | Y | U | I | O | P | ] | ^ |     |
    +|-----',--',--',--',--',--',--',--',--',--',--',--',--'|    |
    +| Caps | A | S | D | F | G | H | J | K | L | \ | [ | * |    |
    +|----,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'-,-'---'----|
    +|    | < | Z | X | C | V | B | N | M | , | . | - |          |
    +|----'-,-',--'--,'---'---'---'---'---'---'-,-'---',--,------|
    +| ctrl |  | alt |                          |altgr |  | ctrl |
    +'------'  '-----'--------------------------'------'  '------'
    +      Nieminen Mika	
     

    Table of Contents




    -

    Hardware projects I don't have time for


    -
    -

    I use Arch, btw!


    -
    -The idea was to build the ultimate Arch Linux setup on an old ThinkPad X200 booting with the open-source LibreBoot firmware, complete with a tiling window manager, dmenu, and all the elite tools. This is mainly for fun, as I am pretty happy (and productive) with my Fedora Linux setup. I ran EndeavourOS (close enough to Arch) on an old ThinkPad for a while, but then I switched back to Fedora because the rolling releases were annoying (there were too many updates).
    -
    -

    OpenBSD home router


    -
    -In my student days, I operated a 486DX PC with OpenBSD as my home DSL internet router. I bought the setup from my brother back then. The router's hostname was fishbone, and it performed very well until it became too slow for larger broadband bandwidth after a few years of use.
    -
    -I had the idea to revive this concept, implement fishbone2, and place it in front of my proprietary ISP router to add an extra layer of security and control in my home LAN. It would serve as the default gateway for all of my devices, including a Wi-Fi access point, would run a DNS server, Pi-hole proxy, VPN client, and DynDNS client. I would also implement high availability using OpenBSD's CARP protocol.
    -
    -https://openbsdrouterguide.net
    -https://pi-hole.net/
    -https://www.OpenBSD.org
    -https://www.OpenBSD.org/faq/pf/carp.html
    +
  • Typing 127.1 words per minute (>100wpm average)
  • +
  • Introduction
  • +
  • Kinesis review
  • +
  • ⇢ ⇢ Top build quality
  • +
  • ⇢ ⇢ Bluetooth connectivity
  • +
  • ⇢ ⇢ Gateron Brown key switches
  • +
  • ⇢ ⇢ Keycaps
  • +
  • ⇢ ⇢ Keymap editor
  • +
  • First steps
  • +
  • Considering alternate layouts
  • +
  • Training how to type
  • +
  • ⇢ ⇢ Tools
  • +
  • My keybr.com statistics
  • +
  • Tips and tricks
  • +
  • ⇢ ⇢ Relax
  • +
  • ⇢ ⇢ Focus on accuracy first
  • +
  • ⇢ ⇢ Chording
  • +
  • ⇢ ⇢ Punctuation and Capitalization
  • +
  • ⇢ ⇢ Reverse shifting
  • +
  • ⇢ ⇢ Enter the flow state
  • +
  • ⇢ ⇢ Repeat every word
  • +
  • ⇢ ⇢ Don't use the same finger for two consecutive keystrokes
  • +
  • ⇢ ⇢ Warm-up
  • +
  • Travel keyboard
  • +
  • Upcoming custom Kinesis build
  • +
  • Conclusion
  • +
    +

    Introduction


    +
    +After work one day, I noticed some discomfort in my right wrist. Upon research, it appeared to be a mild case of Repetitive Strain Injury (RSI). Initially, I thought that this would go away after a while, but after a week it became even worse. This led me to consider potential causes such as poor posture or keyboard use habits. As an enthusiast of keyboards, I experimented with ergonomic concave ortholinear split keyboards. Wait, what?...

    -However, I am putting this on hold as I have opted for an OpenWRT-based solution, which was much quicker to set up and runs well enough.
    +
      +
    • Concave: Some fingers are longer than others. A concave keyboard makes it so that the keycaps meant to be pressed by the longer fingers are further down (e.g., left middle finger for e on a Qwerty layout), and keycaps meant to be pressed by shorter fingers are further up (e.g., right pinky finger for the letter p).
    • +
    • Ortholinear: The keys are arranged in a straight vertical line, unlike most conventional keyboards. The conventional keyboards still resemble the old typewriters, where the placement of the keys was optimized so that the typewriter would not jam. There is no such requirement anymore.
    • +
    • Split: The keyboard is split into two halves (left and right), allowing one to place either hand where it is most ergonomic.
    • +

    +After discovering ThePrimagen (I found him long ago, but I never bothered buying the same keyboard he is on) on YouTube and reading/watching a couple of reviews, I thought that as a computer professional, the equipment could be expensive anyway (laptop, adjustable desk, comfortable chair), so why not invest a bit more into the keyboard? I purchased myself the Kinesis Advantage360 Professional keyboard.

    -https://OpenWRT.org/
    +

    Kinesis review



    -

    Pi-Hole server


    +For an in-depth review, have a look at this great article:

    -Install Pi-hole on one of my Pis or run it in a container on Freekat. For now, I am putting this on hold as the primary use for this would be ad-blocking, and I am avoiding surfing ad-heavy sites anyway. So there's no significant use for me personally at the moment.
    +Review of the Kinesis Advantage360 Professional keyboard

    -https://pi-hole.net/
    +

    Top build quality



    -

    Infodash


    +Overall, the keyboard feels excellent quality and robust. It has got some weight to it. Because of that, it is not ideally suited for travel, though. But I have a different keyboard to solve this (see later in this post). Overall, I love how it is built and how it feels.

    -The idea was to implement my smart info screen using purely open-source software. It would display information such as the health status of my personal infrastructure, my current work tracker balance (I track how much I work to prevent overworking), and my sports balance (I track my workouts to stay within my quotas for general health). The information would be displayed on a small screen in my home office, on my Pine watch, or remotely from any terminal window.
    +Kinesis Adv.360 Pro at home

    -I don't have this, and I haven't missed having it, so I guess it would have been nice to have it but not provide any value other than the "fun of tinkering."
    +

    Bluetooth connectivity



    -

    Reading station


    +Despite encountering concerns about Bluetooth connectivity issues with the Kinesis keyboard during my research, I purchased one anyway as I intended to use it only via USB. However, I discovered that the firmware updates available afterwards had addressed these reported Bluetooth issues, and as a result, I did not experience any difficulties with the Bluetooth functionality. This positive outcome allowed me to enjoy using the keyboard also wirelessly.

    -I wanted to create the most comfortable setup possible for reading digital notes, articles, and books. This would include a comfy armchair, a silent barebone PC or Raspberry Pi computer running either Linux or *BSD, and an e-Ink display mounted on a flexible arm/stand. There would also be a small table for my paper journal for occasional note-taking. There are a bunch of open-source software available for PDF and ePub reading. It would have been neat, but I am currently using the most straightforward solution: a Kobo Elipsa 2E, which I can use on my sofa.
    +

    Gateron Brown key switches



    -

    Retro station


    +Many voices on the internet seem to dislike the Gateron Brown switches, the only official choice for non-clicky tactile switches in the Kinesis, so I was also a bit concerned. I almost went with Cherry MX Browns for my Kinesis (a custom build from a 3rd party provider that is partnershipping with Kinesis). Still, I decided on Gateron Browns to try different switches than the Cherry MX Browns I already have on my ZSA Moonlander keyboard (another ortho-linear split keyboard, but without a concave keycap layout).

    -I had an idea to build a computer infused with retro elements. It wouldn't use actual retro hardware but would look and feel like a retro machine. I would call this machine HAL or Retron.
    +At first, I was disappointed by the Gaterons, as they initially felt a bit meshy compared to the Cherries. Still, over the weeks I grew to prefer them because of their smoothness. Over time, the tactile bumps also became more noticeable (as my perception of them improved). Because of their less pronounced tactile feedback, the Gaterons are less tiring for long typing sessions and better suited for a relaxed typing experience.

    -I would use an old ThinkPad laptop placed on a horizontal stand, running NetBSD, and attaching a keyboard from ModelFkeyboards. I use WindowMaker as a window manager and run terminal applications through Retro Term. For the monitor, I would use an older (black) EIZO model with large bezels.
    +So, the Cherry MX feel sharper but are more tiring in the long run, and the Gaterons are easier to write on and the tactile Feedback is slightly less pronounced.

    -https://www.NetBSD.org
    -https://www.modelfkeyboards.com
    -https://github.com/Swordfish90/cool-retro-term)
    +

    Keycaps



    -The computer would occasionally be used to surf the Gemini space, take notes, blog, or do light coding. However, I have abandoned the project for now because there isn't enough space in my apartment, as my daughter will have a room for herself.
    +If you ever purchase a Kinesis keyboard, go with the PCB keycaps. They upgrade the typing experience a lot. The only thing you will lose is that the backlighting won't shine through them. But that is a reasonable tradeoff. When do I need backlighting? I am supposed to look at the screen and not the keyboard while typing.

    -

    Sound server


    +I went with the blank keycaps, by the way.

    -My idea involved using a barebone mini PC running FreeBSD with the Navidrome sound server software. I could remotely connect to it from my phone, workstation/laptop to listen to my music collection. The storage would be based on ZFS with at least two drives for redundancy. The app would run in a Linux Docker container under FreeBSD via Bhyve.
    +Kinesis Adv.360 Pro at home

    -https://github.com/navidrome/navidrome
    -https://wiki.freebsd.org/bhyve
    +

    Keymap editor



    -

    Project Freekat


    +There is no official keymap editor. You have to edit a configuration file manually, build the firmware from scratch, and upload the firmware with the new keymap to both keyboard halves. The Professional version of his keyboard, by the way, runs on the ZMK open-source firmware.

    -My idea involved purchasing the Meerkat mini PC from System76 and installing FreeBSD. Like the sound-server idea (see previous idea), it would run Linux Docker through Bhyve. I would self-host a bunch of applications on it:
    +Many users find the need for an easy-to-use keymap editor an issue. But this is the Pro model. You can also go with the non-Pro, which runs on non-open-source firmware and has no Bluetooth (it must be operated entirely on USB).

    -
      -
    • Wallabag
    • -
    • Ankidroid
    • -
    • Miniflux & Postgres
    • -
    • Audiobookshelf
    • -
    • ...
    • -

    -All of this would be within my LAN, but the services would also be accessible from the internet through either Wireguard or SSH reverse tunnels to one of my OpenBSD VMs, for example:
    +There is a 3rd party solution which is supposed to configure the keymap for the Professional model as bliss, but I have never used it. As a part-time programmer and full-time Site Reliability Engineer, I am okay configuring the keymap in my text editor and building it in a local docker container. This is one of the standard ways of doing it here. You could also use a GitHub pipeline for the firmware build, but I prefer building it locally on my machine. This all seems natural to me, but this may be an issue for "the average Joe" user.

    -
      -
    • wallabag.awesome.buetow.org
    • -
    • ankidroid.awesome.buetow.org
    • -
    • miniflux.awesome.buetow.org
    • -
    • audiobookshelf.awesome.buetow.org
    • -
    • ...
    • -

    -I am abandoning this project for now, as I am currently hosting my apps on AWS ECS Fargate under *.cool.buetow.org, which is "good enough" for the time being and also offers the benefit of learning to use AWS and Terraform, knowledge that can be applied at work.
    +

    First steps



    -My personal AWS setup
    +I didn't measure the usual words per minute (wpm) on my previous keyboard, the ZSA Moonlander, but I guess that it was around 40-50wpm. Once the Kinesis arrived, I started practising. The experience was quite different due to the concave keycaps, so I barely managed 10wpm on the first day.

    -

    Programming projects I don't have time for


    +I quickly noticed that I could not continue using the freestyle 6-finger typing system I was used to on my Moonlander or any previous keyboards I worked with. I learned ten-finger touch typing from scratch to be more efficient with the Kinesis keyboard. The keyboard forces you to embrace touch typing.

    -

    CLI-HIVE


    +Sometimes, there were brain farts, and I couldn't type at all. The trick was not to freak out about it, but to move on. If your average goes down a bit for a day, it doesn't matter; the long-term trend over several days and weeks matters, not the one-off wpm high score.

    -This was a pet project idea that my brother and I had. The concept was to collect all shell history of all servers at work in a central place, apply ML/AI, and return suggestions for commands to type or allow a fuzzy search on all the commands in the history. The recommendations for the commands on a server could be context-based (e.g., past occurrences on the same server type).
    +Although my wrist pain seemed to go away aftre the first week of using the Kinesis, my fingers became tired of adjusting to the new way of typing. My hands were stiff, as if I had been training for the Olympics. Only after three weeks did I start to feel comfortable with it. If it weren't for the comments I read online, I would have sent it back after week 2.

    -You could decide whether to share your command history with others so they would receive better suggestions depending on which server they are on, or you could keep all the history private and secure. The plan was to add hooks into zsh and bash shells so that all commands typed would be pushed to the central location for data mining.
    +I also had a problem with the left pinky finger, where I could not comfortably reach the p key. This involved moving the whole hand. An easy fix was to swap p with ; on the keyboard layout.

    -

    Enhanced KISS home photo albums


    +

    Considering alternate layouts



    -I don't use third-party cloud providers such as Google Photos to store/archive my photos. Instead, they are all on a ZFS volume on my home NAS, with regular offsite backups taken. Thus, my project would involve implementing the features I miss most or finding a solution simple enough to host on my LAN:
    +As I was going to learn 10-finger touch typing from scratch, I also played with the thought of switching from the Qwerty to the Dvorak or Colemak keymap, but after reading some comments on the internet, I decided against it:

      -
    • A feature I miss presents me with a random day from the past and some photos from that day. This project would randomly select a day and generate a photo album for me to view and reminisce about memories.
    • -
    • Another feature I miss is the ability to automatically deduplicate all the photos, as I am sure there are tons of duplicates on my NAS.
    • -
    • Auto-enhancing the photos (perhaps using ImageMagick?)
    • -
    • I already have a simple photoalbum.sh script that generates an album based on an input directory. However, it would be great also to have a timeline feature to enable browsing through different dates.
    • +
    • These layouts (Dvorak and Colemak) will minimize the finger travel for the most commonly used English words, but they necessarily don't give you a better wpm score.
    • +
    • One comment on Redit also mentioned that getting stiffer fingers with these layouts is more likely than with Qwerty, as in Qwerty, he had to stretch out his fingers more often, which helps here.
    • +
    • There are also many applications and websites with keyboard shortcuts and are Qwerty-optimized.
    • +
    • You won't be able to use someone else's computer as there will be likely Qwerty. Some report that after using an alternative layout for a while, they forget how to use Qwerty.

    -KISS static web photo albums with photoalbum.sh
    +

    Training how to type



    -

    KISS file sync server with end-to-end encryption


    +

    Tools



    -I aimed to have a simple server to which I could sync notes and other documents, ensuring that the data is fully end-to-end encrypted. This way, only the clients could decrypt the data, while an encrypted copy of all the data would be stored on the server side. There are a few solutions (e.g., NextCloud), but they are bloated or complex to set up.
    +One of the most influential tools in my touch typing journey has been keybr.com. This site/app helped me learn 10-finger touch typing, and I practice daily for 30 minutes (in the first two weeks, up to an hour every day). The key is persistence and focus on technique rather than speed; the latter naturally improves with regular practice. Precision matters, too, so I always correct my errors using the backspace key.

    -I currently use Syncthing for encrypted file sync across all my devices; however, the data is not end-to-end encrypted. It's a good-enough setup, though, as my Syncthing server is in my home LAN on an encrypted file system.
    +https://keybr.com

    -https://syncthing.net
    +I also used a command-line tool called tt, which is written in Go. It has a feature that I found very helpful: the ability to practice typing by piping custom text into it. Additionally, I appreciated its customization options, such as choosing a colour theme and specifying how statistics are displayed.
    +
    +https://github.com/lemnos/tt
    +
    +I wrote myself a small Ruby script that would randomly select a paragraph from one of my eBooks or book notes and pipe it to tt. This helped me remember some of the books I read and also practice touch typing.

    -I also had the idea of using this as a pet project for work and naming it Cryptolake, utilizing post-quantum-safe encryption algorithms and a distributed data store.
    +

    My keybr.com statistics



    -

    A language that compiles to bash


    +Overall, I trained for around 4 months in more than 5,000 sessions. My top speed in a session was 127.1wpm (up from barely 10wpm at the beginning).

    -I had an idea to implement a higher-level language with strong typing that could be compiled into native Bash code. This would make all resulting Bash scripts more robust and secure by default. The project would involve developing a parser, lexer, and a Bash code generator. I planned to implement this in Go.
    +All time stats

    -I had previously implemented a tiny scripting language called Fype (For Your Program Execution), which could have served as inspiration.
    +My overall average speed over those 5,000 sessions was 80wpm. The average speed over the last week was over 100wpm. The green line represents the wpm average (increasing trend), the purple line represents the number of keys in the practices (not much movement there, as all keys are unlocked), and the red line represents the average typing accuracy.

    -The Fype Programming Language
    +Typing speed over leson

    -

    A language that compiles to sed


    +Around the middle, you see a break-in of the wpm average value. This was where I swapped the p and ; keys, but after some retraining, I came back to the previous level and beyond.

    -This is similar to the previous idea, but the difference is that the language would compile into a sed script. Sed has many features, but the brief syntax makes scripts challenging to read. The higher-level language would mimic sed but in a form that is easier for humans to read.
    +

    Tips and tricks



    -

    Renovate VS-Sim


    +These are some tips and tricks I learned along the way to improve my typing speed:

    -VS-Sim is an open-source simulator programmed in Java for distributed systems. VS-Sim stands for "Verteilte Systeme Simulator," the German translation for "Distributed Systems Simulator." The VS-Sim project was my diploma thesis at Aachen University of Applied Sciences.
    +

    Relax



    -https://codeberg.org/snonux/vs-sim
    +It's easy to get cramped when trying to hit this new wpm mark, but this is just holding you back. Relax and type at a natural pace. Now I also understand why my Katate Sensei back in London kept screaming "RELAAAX" at me during practice.... It didn't help much back then, though, as it is difficult to relax while someone screams at you!

    -The ideas I had was:
    +

    Focus on accuracy first



    -
      -
    • Translate the project into English.
    • -
    • Modernise the Java codebase to be compatible with the latest JDK.
    • -
    • Make it compile to native binaries using GraalVM.
    • -
    • Distribute the project using AppImages.
    • -

    -I have put this project on hold for now, as I want to do more things in Go and fewer in Java in my personal time.
    +This goes with the previous point. Instead of trying to speed through sessions as quickly as possible, slow down and try to type the words correctly—so don't rush it. If you aren't fast yet, the reason is that your brain hasn't trained enough. It will come over time, and you will be faster.

    -

    KISS ticketing system


    +

    Chording



    -My idea was to program a KISS (Keep It Simple, Stupid) ticketing system for my personal use. However, I am abandoning this project because I now use the excellent Taskwarrior software. You can learn more about it at:
    +A trick to getting faster is to type by word and pause between each word so you learn the words by chords. From 80wpm and beyond, this makes a real difference.

    -https://taskwarrior.org/
    +

    Punctuation and Capitalization



    -

    A domain-specific language (DSL) for work


    +I included 10% punctuation and 20% capital letters in my keybr.com practice sessions to simulate real typing conditions, which improved my overall working efficiency. I guess I would have gone to 120wpm in average if I didn't include this options...

    -At work, an internal service allocates storage space for our customers on our storage clusters. It automates many tasks, but many tweaks are accessible through APIs. I had the idea to implement a Ruby-based DSL that would make using all those APIs for ad-hoc changes effortless, e.g.:
    +

    Reverse shifting



    - -
    Cluster :UK, :uk01 do
    -  Customer.C1A1.segments.volumes.each do |volume|
    -    puts volume.usage_stats
    -    volume.move_off! if volume.over_subscribed?
    -  end
    -end
    -
    +Reverse shifting aka left-right shifting is to...

    -I am abandoning this project because my workplace has stopped the annual pet project competition, and I have other more important projects to work on at the moment.
    +
      +
    • ...use the left shift key for letters on the right keyboard side.
    • +
    • ...use the right shift key for letters on the left keyboard side.
    • +

    +This makes using the shift key a blaze.

    -Creative universe (Work pet project contests)
    +

    Enter the flow state



    -

    Self-hosting projects I don't have time for


    +Listening to music helps me enter a flow state during practice sessions, which makes typing training a bit addictive (which is good, or isn't it?).

    -

    My own Matrix server


    +

    Repeat every word



    -I value privacy. It would be great to run my own Matrix server for communication within my family. I have yet to have time to look into this more closely.
    +There's a setting on keybr.com that makes it so that every word is always repeated, having you type every word twice in a row. I liked this feature very much, and I think it also helped to improve my practice.

    -https://matrix.org
    +

    Don't use the same finger for two consecutive keystrokes



    -

    Ampache music server


    +Apparently, if you want to type fast, avoid using the same finger for two consecutive keystrokes. This means you don't always need to use the same finger for the same keys.
    +However, there are no hard and fast rules. Thus, everyone develops their system for typing word combinations. An exception would be if you are typing the very same letter in a row (e.g., t in letter)—here, you are using the same finger for both ts.

    -Ampache is an open-source music streaming server that allows you to host and manage your music collection online, accessible via a web interface. Setting it up involves configuring a web server, installing Ampache, and organising your music files, which can be time-consuming.
    +

    Warm-up



    -

    Librum eBook reader


    +You can't reach your average typing speed first ting the morning. It would help if you warmed up before the exercise or practice later during the day. Also, some days are good, others not so, e.g., after a bad night's sleep. What matters is the mid- and long-term trend, not the fluctuations here, though.

    -Librum is a self-hostable e-book reader that allows users to manage and read their e-book collection from a web interface. Designed to be a self-contained platform where users can upload, organise, and access their e-books, Librum emphasises privacy and control over one's digital library.
    +

    Travel keyboard



    -https://github.com/Librum-Reader/Librum
    +As mentioned, the Kinesis is a great keyboard, but it is not meant for travel.

    -I am using my Kobo devices or my laptop to read these kinds of things for now.
    +I guess keyboards will always be my expensive hobby, so I also purchased another ergonomic, ortho-linear, concave split keyboard, the Glove80 (with the Red Pro low-profile switches). This keyboard is much lighter and, in my opinion, much better suited for travel than the Kinesis. It also comes with a great travel case.

    -

    Memos - Note-taking service


    +Here is a photo of me using it with my Surface Go 2 (it runs Linux, by the way) while waiting for the baggage drop at the airport:

    -Memos is a note-taking service that simplifies and streamlines information capture and organisation. It focuses on providing users with a minimalistic and intuitive interface, aiming to enhance productivity without the clutter commonly associated with more complex note-taking apps.
    +Traveling with the Glove80 using my Surface Go 2

    -https://www.usememos.com
    +For everyday work, I prefer the tactile Browns on the Kinesis over the Red Pro I have on the Glove80 (normal profile vs. low profile). The Kinesis feels much more premium, whereas the Glove80 is much lighter and easier to store away in a rucksack (the official travel case is a bit bulky, so I wrapped it simply in bubble plastic).

    -I am abandoning this idea for now, as I am currently using plain Markdown files for notes and syncing them with Syncthing across my devices.
    +The F-key row is odd at the Glove80. I would have preferred more keys on the sides like the Kinesis, and I use them for [] {} (), which is pretty handy there. However, I like the thumb cluster of the Glove80 more than the one on the Kinesis.

    -

    Bepasty server


    +The good thing is that I can switch between both keyboards instantly without retraining my typing memories. I've configured (as much as possible) the same keymaps on both my Kinesis and Glove80, making it easy to switch between them at any occasion.

    -Bepasty is like a Pastebin for all kinds of files (text, image, audio, video, documents, binary, etc.). It seems very neat, but I only share a little nowadays. When I do, I upload files via SCP to one of my OpenBSD VMs and serve them via vanilla httpd there, keeping it KISS.
    +Interested in the Glove80? I suggest also reading this review:

    -https://github.com/bepasty/bepasty-server
    +Review of the Glove80 keyboard

    -

    Books I don't have time to read


    +

    Upcoming custom Kinesis build



    -

    Fluent Python


    +As I mentioned, keyboards will remain an expensive hobby of mine. I don't regret anything here, though. After all, I use keyboards at my day job. I've ordered a Kinesis custom build with the Gateron Kangaroo switches, and I'm excited to see how that compares to my current setup. I'm still deciding whether to keep my Gateron Brown-equipped Kinesis as a secondary keyboard or possibly leave it at my in-laws for use when visiting or to sell it.

    -I consider myself an advanced programmer in Ruby, Bash, and Perl. However, Python seems to be ubiquitous nowadays, and most of my colleagues prefer Python over any other languages. Thus, it makes sense for me to also learn and use Python. After conducting some research, "Fluent Python" appears to be the best book for this purpose.
    +Update 2025-02-22: I've received my custom Kinesis Adv. 360 build with the Gateron Baby Kangaroo key switches. I am absolutely in love! I will keep my Gateron Brown version around, though.

    -I don't have time to read this book at the moment, as I am focusing more on Go (Golang) and I know just enough Python to get by (e.g., for code reviews). Additionally, there are still enough colleagues around who can review my Ruby or Bash code.
    +

    Conclusion



    -

    Programming Ruby


    +When I traveled with the Glove80 for work to the London office, a colleague stared at my keyboard and made jokes that it might be broken (split into two halves). But other than that...

    -I've read a couple of Ruby books already, but "Programming Ruby," which covers up to Ruby 3.2, was just recently released. I would like to read this to deepen my Ruby knowledge further and to revisit some concepts that I may have forgotten.
    +Ten-finger touch typing has improved my efficiency and has become a rewarding discipline. Whether it's the keyboards I use, the tools I practice with, or the techniques I've adopted, each step has been a learning experience. I hope sharing my journey provides valuable insights and inspiration for anyone looking to improve their touch typing skills.

    -As stated in this blog post, I am currently more eager to focus on Go, so I've put the Ruby book on hold. Additionally, there wouldn't be enough colleagues who could "understand" my advanced Ruby skills anyway, as most of them are either Java developers or SREs who don't code a lot.
    +I also accidentally started using a 10-finger-like system (maybe still 6 fingers, but better than before) on my regular laptop keyboard. I could be more efficient on the laptop keyboard. The form is different there (not ortholinear, not concave keycaps, etc.), but my typing has improved there too (even if it is only by a little bit).

    -

    Peter F. Hamilton science fiction books


    +I don't want to return to a non-concave keyboard as my default. I will use other keyboards still once in a while but only for short periods or when I have to (e.g. travelling with my Laptop and when there is no space to put an external keyboard)

    -I am a big fan of science fiction, but my reading list is currently too long anyway. So, I've put the Hamilton books on the back burner for now. You can see all the novels I've read here:
    +Learning to touch type has been an eye-opening experience for me, not just for work but also for personal projects. Now, writing documentation is so much fun; who could believe that? Furthermore, working with Slack (communicating with colleagues) is more fun now as well.

    -https://paul.buetow.org/novels.html
    -gemini://paul.buetow.org/novels.gmi
    +E-Mail your comments to paul@nospam.buetow.org :-)

    +Back to the main site
    +
    +
    +
    + + 'The Stoic Challenge' book notes + + gemini://foo.zone/gemfeed/2024-07-07-the-stoic-challenge-book-notes.gmi + 2024-07-07T12:46:55+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + These are my personal takeaways after reading 'The Stoic Challenge: A Philosopher's Guide to Becoming Tougher, Calmer, and More Resilient' by William B. Irvine. + +
    +

    "The Stoic Challenge" book notes



    -

    New websites I don't have time for


    +Published at 2024-07-07T12:46:55+03:00

    -

    Create a "Why Raku Rox" site


    +These are my personal takeaways after reading "The Stoic Challenge: A Philosopher's Guide to Becoming Tougher, Calmer, and More Resilient" by William B. Irvine.

    -The website "Why Raku Rox" would showcase the unique features and benefits of the Raku programming language and highlight why it is an exceptional choice for developers. Raku, originally known as Perl 6, is a dynamic, expressive language designed for flexible and powerful software development.
    +
    +         ,..........   ..........,
    +     ,..,'          '.'          ',..,
    +    ,' ,'            :            ', ',
    +   ,' ,'             :             ', ',
    +  ,' ,'              :              ', ',
    + ,' ,'............., : ,.............', ',
    +,'  '............   '.'   ............'  ',
    + '''''''''''''''''';''';''''''''''''''''''
    +                    '''
    +

    -This would be similar to the "Why OpenBSD rocks" site:
    +

    Table of Contents



    -https://why-openbsd.rocks
    -https://raku.org
    +
    +

    God sets you up for a challenge



    -I am not working on this for now, as I currently don’t even have time to program in Raku.
    +Gods set you up for a challenge to see how resilient you are. Is getting angry worth the price? If you stay calm then you can find the optimal workaround for the obstacle. Stay calm even with big setbacks. Practice minimalism of negative emotions.

    -

    Research projects I don't have time for


    +Put a positive spin on everything. What should you do if someone wrong you? Don't get angry, there is no point in that, it just makes you suffer. Do the best what you got now and keep calm and carry on. A resilient person will refuse to play the role of a victim. You can develop the setback response skills. Turn a setback. e.g. a handycap, into a personal triumph.

    -

    Project secure


    +It is not the things done to you or happen to you what matters but how you take the things and react to these things.

    -For work: Implement a PoC that dumps Java heaps to extract secrets from memory. Based on the findings, write a Java program that encrypts secrets in the kernel using the memfd_secret() syscall to make it even more secure.
    +Don't row against the other boats but against your own lazy bill. It doesn't matter if you are first or last, as long as you defeat your lazy self.

    -https://lwn.net/Articles/865256/
    +Stoics are thankful that they are mortal. As then you can get reminded of how great it is to be alive at all. In dying we are more alive we have ever been as every thing you do could be the last time you do it. Rather than fighting your death you should embrace it if there are no workarounds. Embrace a good death.

    -Due to other priorities, I am putting this on hold for now. The software we have built is pretty damn secure already!
    +

    Negative visualization



    -

    CPU utilisation is all wrong


    +It is easy what we have to take for granted.

    -This research project, based on Brendan Gregg's blog post, could potentially significantly impact my work.
    +
      +
    • Imagine the negative and then think that things are actually much better than they seem to be.
    • +
    • Close your eyes and imagine you are color blind for a minute, then open the eyes again and see all the colours. You will be grateful for being able to see the colours.
    • +
    • Now close your eyes for a minute and imagine you would be blind, so that you will never be able to experience the world again and let it sink in. When you open your eyes again you will feel a lot of gratefulness.
    • +
    • Last time meditation. Lets you appreciate the life as it is now. Life gets vitalised again.
    • +

    +

    Oh, nice trick, you stoic "god"! ;-)



    -https://brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html
    +Take setbacks as a challenge. Also take it with some humor.

    -The research project would involve setting up dashboards that display actual CPU usage and the cycles versus waiting time for memory access.
    +
      +
    • A setback in a setback, how Genius :-)
    • +
    • A setback in a setback in a setback: the stoic god's work overtime, eh? :-)
    • +

    +What would the stoic god's do next? This is just a test strategy by them. Don't be frustrated at all but be astonished of what comes next. Thank the stoic gods of testing you. This is comfort zone extension of the stoics aka toughness Training.

    E-Mail your comments to paul@nospam.buetow.org :-)

    -Related and maybe interesting:
    +Other book notes of mine are:

    -Sweating the small stuff - Tiny projects of mine
    +2025-11-02 'The Courage To Be Disliked' book notes
    +2025-06-07 'A Monk's Guide to Happiness' book notes
    +2025-04-19 'When: The Scientific Secrets of Perfect Timing' book notes
    +2024-10-24 'Staff Engineer' book notes
    +2024-07-07 'The Stoic Challenge' book notes (You are currently reading this)
    +2024-05-01 'Slow Productivity' book notes
    +2023-11-11 'Mind Management' book notes
    +2023-07-17 'Software Developers Career Guide and Soft Skills' book notes
    +2023-05-06 'The Obstacle is the Way' book notes
    +2023-04-01 'Never split the difference' book notes
    +2023-03-16 'The Pragmatic Programmer' book notes

    Back to the main site
    diff --git a/gemfeed/index.gmi b/gemfeed/index.gmi index c1a8c5ac..09aea74c 100644 --- a/gemfeed/index.gmi +++ b/gemfeed/index.gmi @@ -2,6 +2,9 @@ ## To be in the .zone! +=> ./2026-04-02-distributed-systems-simulator-part-3.gmi 2026-04-02 - Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API +=> ./2026-04-01-distributed-systems-simulator-part-2.gmi 2026-04-01 - Distributed Systems Simulator - Part 2: Built-in Protocols +=> ./2026-03-31-distributed-systems-simulator-part-1.gmi 2026-03-31 - Distributed Systems Simulator - Part 1: Introduction and GUI => ./2026-03-02-rcm-ruby-configuration-management-dsl.gmi 2026-03-02 - RCM: The Ruby Configuration Management DSL => ./2026-03-01-site-reliability-engineering-part-5.gmi 2026-03-01 - Site Reliability Engineering - Part 5: System Design, Incidents, and Learning => ./2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 - Loadbars 0.13.0 released diff --git a/index.gmi b/index.gmi index f5a3967a..b1e0d52f 100644 --- a/index.gmi +++ b/index.gmi @@ -30,6 +30,9 @@ Everything you read on this site is my personal opinion and experience. You can ### Posts +=> ./gemfeed/2026-04-02-distributed-systems-simulator-part-3.gmi 2026-04-02 - Distributed Systems Simulator - Part 3: Advanced Examples and Protocol API +=> ./gemfeed/2026-04-01-distributed-systems-simulator-part-2.gmi 2026-04-01 - Distributed Systems Simulator - Part 2: Built-in Protocols +=> ./gemfeed/2026-03-31-distributed-systems-simulator-part-1.gmi 2026-03-31 - Distributed Systems Simulator - Part 1: Introduction and GUI => ./gemfeed/2026-03-02-rcm-ruby-configuration-management-dsl.gmi 2026-03-02 - RCM: The Ruby Configuration Management DSL => ./gemfeed/2026-03-01-site-reliability-engineering-part-5.gmi 2026-03-01 - Site Reliability Engineering - Part 5: System Design, Incidents, and Learning => ./gemfeed/2026-03-01-loadbars-0.13.0-released.gmi 2026-03-01 - Loadbars 0.13.0 released -- cgit v1.2.3