From 03a2cf8147eb2b06404be42314a3134bf835bad9 Mon Sep 17 00:00:00 2001 From: Paul Buetow Date: Sat, 13 Jan 2024 23:08:14 +0200 Subject: Update content for gemtext --- gemfeed/atom.xml.tmp | 655 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 655 insertions(+) create mode 100644 gemfeed/atom.xml.tmp (limited to 'gemfeed/atom.xml.tmp') diff --git a/gemfeed/atom.xml.tmp b/gemfeed/atom.xml.tmp new file mode 100644 index 00000000..f603be57 --- /dev/null +++ b/gemfeed/atom.xml.tmp @@ -0,0 +1,655 @@ + + + 2024-01-13T23:08:07+02:00 + foo.zone feed + To be in the .zone! + + + gemini://foo.zone/ + + One reason why I love OpenBSD + + gemini://foo.zone/gemfeed/2024-01-13-one-reason-why-i-love-openbsd.gmi + 2024-01-13T22:55:33+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + I just upgraded my OpenBSD's from `7.3` to `7.4` by following the unattended upgrade guide: + +
+

One reason why I love OpenBSD


+
+Published at 2024-01-13T22:55:33+02:00
+
+
+                         .
+                          A       ;
+                |   ,--,-/ \---,-/|  ,
+               _|\,'. /|      /|   `/|-.
+           \`.'    /|      ,            `;.
+          ,'\   A     A         A   A _ /| `.;
+        ,/  _              A       _  / _   /|  ;
+       /\  / \   ,  ,           A  /    /     `/|
+      /_| | _ \         ,     ,             ,/  \
+     // | |/ `.\  ,-      ,       ,   ,/ ,/      \/
+     / @| |@  / /'   \  \      ,              >  /|    ,--.
+    |\_/   \_/ /      |  |           ,  ,/        \  ./' __:..
+    |  __ __  |       |  | .--.  ,         >  >   |-'   /     `
+  ,/| /  '  \ |       |  |     \      ,           |    /
+ /  |<--.__,->|       |  | .    `.        >  >    /   (
+/_,' \\  ^  /  \     /  /   `.    >--            /^\   |
+      \\___/    \   /  /      \__'     \   \   \/   \  |
+       `.   |/          ,  ,                  /`\    \  )
+         \  '  |/    ,       V    \          /        `-\
+          `|/  '  V      V           \    \.'            \_
+           '`-.       V       V        \./'\
+               `|/-.      \ /   \ /,---`\         kat
+                n
+                /   `._____V_____V'
+                           '     '
+
+
+I just upgraded my OpenBSD's from 7.3 to 7.4 by following the unattended upgrade guide:
+
+https://www.openbsd.org/faq/upgrade74.html
+
+ +
doas installboot sd0 # Update the bootloader (not for every upgrade required)
+doas sysupgrade # Update all binaries (including Kernel)
+
+
+sysupgrade downloaded and upgraded to the next release and rebooted the system. After the reboot, I run:
+
+ +
doas sysmerge # Update system configuration files
+doas pkg_add -u # Update all packages
+doas reboot # Just in case, reboot one more time
+
+
+That's it! Took me around 5 minutes in total! No issues, only these few comands, only 5 minutes! It just works! No problems, no conflicts, no tons (actually none) config file merge conflicts.
+
+I followed the same procedure the previous times and never encountered any difficulties with any OpenBSD upgrades.
+
+I have seen upgrades of other Operating Systems either take a long time or break the system (which takes manual steps to repair). That's just one of many reasons why I love OpenBSD! There appear never to be any problems. It just gets its job done!
+
+The OpenBSD Project
+
+BTW: are you looking for an opinionated OpenBSD VM hoster? OpenBSD Amsterdam may be for you. They rock (I am having a VM there, too)!
+
+https://openbsd.amsterdam
+
+Other *BSD related posts are:
+
+2016-04-09 Jails and ZFS with Puppet on FreeBSD
+2022-07-30 Let's Encrypt with OpenBSD and Rex
+2022-10-30 Installing DTail on OpenBSD
+2024-01-13 One reason why I love OpenBSD (You are currently reading this)
+
+E-Mail your comments to paul@nospam.buetow.org :-)
+
+Back to the main site
+
+
+
+ + Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect + + gemini://foo.zone/gemfeed/2024-01-09-site-reliability-engineering-part-3.gmi + 2024-01-09T18:35:48+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the third part of my Site Reliability Engineering (SRE) series. I am currently employed as a Site Reliability Engineer and will try to share what SRE is about in this blog series. + +
+

Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect


+
+Published at 2024-01-09T18:35:48+02:00
+
+This is the third part of my Site Reliability Engineering (SRE) series. I am currently employed as a Site Reliability Engineer and will try to share what SRE is about in this blog series.
+
+2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
+2023-11-19 Site Reliability Engineering - Part 2: Operational Balance in SRE
+2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect (You are currently reading this)
+
+
+                    ..--""""----..                 
+                 .-"   ..--""""--.j-.              
+              .-"   .-"        .--.""--..          
+           .-"   .-"       ..--"-. \/    ;         
+        .-"   .-"_.--..--""  ..--'  "-.  :         
+      .'    .'  /  `. \..--"" __ _     \ ;         
+     :.__.-"    \  /        .' ( )"-.   Y          
+     ;           ;:        ( )     ( ).  \         
+   .':          /::       :            \  \        
+ .'.-"\._   _.-" ; ;      ( )    .-.  ( )  \       
+  "    `."""  .j"  :      :      \  ;    ;  \      
+    bug /"""""/     ;      ( )    "" :.( )   \     
+       /\    /      :       \         \`.:  _ \    
+      :  `. /        ;       `( )     (\/ :" \ \   
+       \   `.        :         "-.(_)_.'   t-'  ;  
+        \    `.       ;                    ..--":  
+         `.    `.     :              ..--""     :  
+           `.    "-.   ;       ..--""           ;  
+             `.     "-.:_..--""            ..--"   
+               `.      :             ..--""        
+                 "-.   :       ..--""              
+                    "-.;_..--""                    
+
+
+
+

On-Call Culture and the Human Aspect: Prioritising Well-being in the Realm of Reliability


+
+Site Reliability Engineering is synonymous with ensuring system reliability, but the human factor is an often-underestimated part of this discipline. Ensuring an healthy on-call culture is as critical as any technical solution. The well-being of the engineers is an important factor.
+
+Firstly, a healthy on-call rotation is about more than just managing and responding to incidents. It's about the entire ecosystem that supports this practice. This involves reducing pain points, offering mentorship, rapid iteration, and ensuring that engineers have the right tools and processes. One ceavat is, that engineers should be willing to learn. Especially in on-call rotation embedding SREs with other engineers (for example Software Engineers or QA Engineers), it's difficult to motivate everyone to engage. QA Engineers want to test the software, Software Engineers want to implement new features; they don't want to troubleshoot and debug production incidents. It can be depressing for the mentoring SRE.
+
+Furthermore, the metrics that measure the success of an on-call experience are only sometimes straightforward. While one might assume that fewer pages translate to better on-call expertise (which is true to a degree, as who wants to receive a page out of office hours?), it's not always the volume of pages that matters most. Trust, ownership, accountability, and effective communication play the important roles.
+
+An important part is giving feedback about the on-call experience to ensure continuous learning. If alerts are mostly noise, they should be tuned or even eliminated. If alerts are actionable, can recurring tasks be automated? If there are knowledge gaps, is the documentation not good enough? Continuous retrospection ensures that not only do systems evolve, but the experience for the on-call engineers becomes progressively better.
+
+Onboarding for on-call duties is a crucial aspect of ensuring the reliability and efficiency of systems. This process involves equipping new team members with the knowledge, tools, and support to handle incidents confidently. It begins with an overview of the system architecture and common challenges, followed by training on monitoring tools, alerting mechanisms, and incident response protocols. Shadowing experienced on-call engineers can offer practical exposure. Too often, new engineers are thrown into the cold water without proper onboarding and training because the more experienced engineers are too busy fire-fighting production issues in the first place.
+
+An always-on, always-alert culture can lead to burnout. Engineers should be encouraged to recognise their limits, take breaks, and seek support when needed. This isn't just about individual health; a burnt-out engineer can have cascading effects on the entire team and the systems they manage. A successful on-call culture ensures that while systems are kept running, the engineers are kept happy, healthy, and supported. The more experienced engineers should take time to mentor the junior engineers, but the junior engineers should also be fully engaged, try to investigate and learn new things by themselves.
+
+For the junior engineer, it's too easy to fall back and ask the experts in the team every time an issue arises. This seems reasonable, but serving recipes for solving production issues on a silver tablet won't scale forever, as there are infinite scenarios of how production systems can break. So every engineer should learn to debug, troubleshoot and resolve production incidents independently. The experts will still be there for guidance and step in when the junior gets stuck after trying, but the experts should also learn to step down so that lesser experienced engineers can step up and learn. But mistakes can always happen here; that's why having a blameless on-call culture is essential.
+
+A blameless on-call culture is a must for a safe and collaborative environment where engineers can effectively respond to incidents without fear of retribution. This approach acknowledges that mistakes are a natural part of the learning and innovation process. When individuals are assured they won't be punished for errors, they're more likely to openly discuss mistakes, allowing the entire team to learn and grow from each incident. Furthermore, a blameless culture promotes psychological safety, enhances job satisfaction, reduces burnout, and ensures that talent remains committed and engaged.
+
+E-Mail your comments to paul@nospam.buetow.org :-)
+
+Back to the main site
+
+
+
+ + Bash Golf Part 3 + + gemini://foo.zone/gemfeed/2023-12-10-bash-golf-part-3.gmi + 2023-12-10T11:35:54+02:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the third blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time. + +
+

Bash Golf Part 3


+
+Published at 2023-12-10T11:35:54+02:00
+
+
+    '\       '\        '\                   .  .          |>18>>
+      \        \         \              .         ' .     |
+     O>>      O>>       O>>         .                 'o  |
+      \       .\. ..    .\. ..   .                        |
+      /\    .  /\     .  /\    . .                        |
+     / /   .  / /  .'.  / /  .'    .                      |
+jgs^^^^^^^`^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                        Art by Joan Stark, mod. by Paul Buetow
+
+
+This is the third blog post about my Bash Golf series. This series is random Bash tips, tricks, and weirdnesses I have encountered over time.
+
+2021-11-29 Bash Golf Part 1
+2022-01-01 Bash Golf Part 2
+2023-12-10 Bash Golf Part 3 (You are currently reading this)
+
+

FUNCNAME


+
+FUNCNAME is an array you are looking for a way to dynamically determine the name of the current function (which could be considered the callee in the context of its own execution), you can use the special variable FUNCNAME. This is an array variable that contains the names of all shell functions currently in the execution call stack. The element FUNCNAME[0] holds the name of the currently executing function, FUNCNAME[1] the name of the function that called that, and so on.
+
+This is particularly useful for logging when you want to include the callee function in the log output. E.g. look at this log helper:
+
+ +
#!/usr/bin/env bash
+
+log () {
+    local -r level="$1"; shift
+    local -r message="$1"; shift
+    local -i pid="$$"
+
+    local -r callee=${FUNCNAME[1]}
+    local -r stamp=$(date +%Y%m%d-%H%M%S)
+
+    echo "$level|$stamp|$pid|$callee|$message" >&2
+}
+
+at_home_friday_evening () {
+    log INFO 'One Peperoni Pizza, please'
+}
+
+at_home_friday_evening
+
+
+The output is as follows:
+
+ +
./logexample.sh
+INFO|20231210-082732|123002|at_home_friday_evening|One Peperoni Pizza, please
+
+
+

:(){ :|:& };:


+
+This one may be widely known already, but I am including it here as I found a cute image illustrating it. But to break :(){ :|:& };: down:
+
+
    +
  • :(){ } is really a declaration of the function :
  • +
  • The ; is ending the current statement
  • +
  • The : at the end is calling the function :
  • +
  • :|:& is the function body
  • +

+Let's break down the function body :|:&:
+
+
    +
  • The first : is calling the function recursively
  • +
  • The |: is piping the output to the function : again (parallel recursion)
  • +
  • The & lets it run in the background.
  • +

+So, it's a fork bomb. If you run it, your computer will run out of resources eventually. (Modern Linux distributions could have reasonable limits configured for your login session, so it won't bring down your whole system anymore unless you run it as root!)
+
+And here is the cute illustration:
+
+Bash fork bomb
+
+

Inner functions


+
+Bash defines variables as it is interpreting the code. The same applies to function declarations. Let's consider this code:
+
+ +
#!/usr/bin/env bash
+
+outer() {
+  inner() {
+    echo 'Intel inside!'
+  }
+  inner
+}
+
+inner
+outer
+inner
+
+
+And let's execute it:
+
+
+❯ ./inner.sh
+/tmp/inner.sh: line 10: inner: command not found
+Intel inside!
+Intel inside!
+
+
+What happened? The first time inner was called, it wasn't defined yet. That only happens after the outer run. Note that inner will still be globally defined. But functions can be declared multiple times (the last version wins):
+
+ +
#!/usr/bin/env bash
+
+outer1() {
+  inner() {
+    echo 'Intel inside!'
+  }
+  inner
+}
+
+outer2() {
+  inner() {
+    echo 'Wintel inside!'
+  }
+  inner
+}
+
+outer1
+inner
+outer2
+inner
+
+
+And let's run it:
+
+
+❯ ./inner2.sh
+Intel inside!
+Intel inside!
+Wintel inside!
+Wintel inside!
+
+
+

Exporting functions


+
+Have you ever wondered how to execute a shell function in parallel through xargs? The problem is that this won't work:
+
+ +
#!/usr/bin/env bash
+
+some_expensive_operations() {
+  echo "Doing expensive operations with '$1' from pid $$"
+}
+
+for i in {0..9}; do echo $i; done \
+  | xargs -P10 -I{} bash -c 'some_expensive_operations "{}"'
+
+
+We try here to run ten parallel processes; each of them should run the some_expensive_operations function with a different argument. The arguments are provided to xargs through STDIN one per line. When executed, we get this:
+
+
+❯ ./xargs.sh
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+bash: line 1: some_expensive_operations: command not found
+
+
+There's an easy solution for this. Just export the function! It will then be magically available in any sub-shell!
+
+ +
#!/usr/bin/env bash
+
+some_expensive_operations() {
+  echo "Doing expensive operations with '$1' from pid $$"
+}
+export -f some_expensive_operations
+
+for i in {0..9}; do echo $i; done \
+  | xargs -P10 -I{} bash -c 'some_expensive_operations "{}"'
+
+
+When we run this now, we get:
+
+
+❯ ./xargs.sh
+Doing expensive operations with '0' from pid 132831
+Doing expensive operations with '1' from pid 132832
+Doing expensive operations with '2' from pid 132833
+Doing expensive operations with '3' from pid 132834
+Doing expensive operations with '4' from pid 132835
+Doing expensive operations with '5' from pid 132836
+Doing expensive operations with '6' from pid 132837
+Doing expensive operations with '7' from pid 132838
+Doing expensive operations with '8' from pid 132839
+Doing expensive operations with '9' from pid 132840
+
+
+If some_expensive_function would call another function, the other function must also be exported. Otherwise, there will be a runtime error again. E.g., this won't work:
+
+ +
#!/usr/bin/env bash
+
+some_other_function() {
+  echo "$1"
+}
+
+some_expensive_operations() {
+  some_other_function "Doing expensive operations with '$1' from pid $$"
+}
+export -f some_expensive_operations
+
+for i in {0..9}; do echo $i; done \
+  | xargs -P10 -I{} bash -c 'some_expensive_operations "{}"'
+
+
+... because some_other_function isn't exported! You will also need to add an export -f some_other_function!
+
+

Dynamic variables with local


+
+You may know that local is how to declare local variables in a function. Most don't know that those variables actually have dynamic scope. Let's consider the following example:
+
+ +
#!/usr/bin/env bash
+
+foo() {
+  local foo=bar # Declare local/dynamic variable
+  bar
+  echo "$foo"
+}
+
+bar() {
+  echo "$foo"
+  foo=baz
+}
+
+foo=foo # Declare global variable
+foo # Call function foo
+echo "$foo"
+
+
+Let's pause a minute. What do you think the output would be?
+
+Let's run it:
+
+
+❯ ./dynamic.sh
+bar
+baz
+foo
+
+
+What happened? The variable foo (declared with local) is available in the function it was declared in and in all other functions down the call stack! We can even modify the value of foo, and the change will be visible up the call stack. It's not a global variable; on the last line, echo "$foo" echoes the global variable content.
+
+
+

if conditionals


+
+Consider all variants here more or less equivalent:
+
+ +
#!/usr/bin/env bash
+
+declare -r foo=foo
+declare -r bar=bar
+
+if [ "$foo" = foo ]; then
+  if [ "$bar" = bar ]; then
+    echo ok1
+  fi
+fi
+
+if [ "$foo" = foo ] && [ "$bar" == bar ]; then
+  echo ok2a
+fi
+
+[ "$foo" = foo ] && [ "$bar" == bar ] && echo ok2b
+
+if [[ "$foo" = foo && "$bar" == bar ]]; then
+  echo ok3a
+fi
+
+ [[ "$foo" = foo && "$bar" == bar ]] && echo ok3b
+
+if test "$foo" = foo && test "$bar" = bar; then
+  echo ok4a
+fi
+
+test "$foo" = foo && test "$bar" = bar && echo ok4b
+
+
+The output we get is:
+
+
+❯ ./if.sh
+ok1
+ok2a
+ok2b
+ok3a
+ok3b
+ok4a
+ok4b
+
+
+

Multi-line comments


+
+You all know how to comment. Put a # in front of it. You could use multiple single-line comments or abuse heredocs and redirect it to the : no-op command to emulate multi-line comments.
+
+ +
#!/usr/bin/env bash
+
+# Single line comment
+
+# These are two single line
+# comments one after another
+
+: <<COMMENT
+This is another way a
+multi line comment
+could be written!
+COMMENT
+
+
+I will not demonstrate the execution of this script, as it won't print anything! It's obviously not the most pretty way of commenting on your code, but it could sometimes be handy!
+
+

Don't change it while it's executed


+
+Consider this script:
+
+ +
#!/usr/bin/env bash
+
+echo foo
+echo echo baz >> $0
+echo bar
+
+
+When it is run, it will do:
+
+
+❯ ./if.sh
+foo
+bar
+baz
+❯ cat if.sh
+#!/usr/bin/env bash
+
+echo foo
+echo echo baz >> $0
+echo bar
+echo baz
+
+
+So what happened? The echo baz line was appended to the script while it was still executed! And the interpreter also picked it up! It tells us that Bash evaluates each line as it encounters it. This can lead to nasty side effects when editing the script while it is still being executed! You should always keep this in mind!
+
+
+Other related posts are:
+
+2021-05-16 Personal Bash coding style guide
+2021-06-05 Gemtexter - One Bash script to rule it all
+2021-11-29 Bash Golf Part 1
+2022-01-01 Bash Golf Part 2
+2023-12-10 Bash Golf Part 3 (You are currently reading this)
+
+E-Mail your comments to paul@nospam.buetow.org :-)
+
+Back to the main site
+
+
+
+ + Site Reliability Engineering - Part 2: Operational Balance in SRE + + gemini://foo.zone/gemfeed/2023-11-19-site-reliability-engineering-part-2.gmi + 2023-11-19T00:18:18+03:00 + + Paul Buetow aka snonux + paul@dev.buetow.org + + This is the second part of my Site Reliability Engineering (SRE) series. I am currently employed as a Site Reliability Engineer and will try to share what SRE is about in this blog series. + +
+

Site Reliability Engineering - Part 2: Operational Balance in SRE


+
+Published at 2023-11-19T00:18:18+03:00
+
+This is the second part of my Site Reliability Engineering (SRE) series. I am currently employed as a Site Reliability Engineer and will try to share what SRE is about in this blog series.
+
+2023-08-18 Site Reliability Engineering - Part 1: SRE and Organizational Culture
+2023-11-19 Site Reliability Engineering - Part 2: Operational Balance in SRE (You are currently reading this)
+2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect
+
+
+⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⣾⣷⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
+⠀⠀⠀⠀⣾⠿⠿⠿⠶⠾⠿⠿⣿⣿⣿⣿⣿⣿⠿⠿⠶⠶⠿⠿⠿⣷⠀⠀⠀⠀
+⠀⠀⠀⣸⢿⣆⠀⠀⠀⠀⠀⠀⠀⠙⢿⡿⠉⠀⠀⠀⠀⠀⠀⠀⣸⣿⡆⠀⠀⠀
+⠀⠀⢠⡟⠀⢻⣆⠀⠀⠀⠀⠀⠀⠀⣾⣧⠀⠀⠀⠀⠀⠀⠀⣰⡟⠀⢻⡄⠀⠀
+⠀⢀⣾⠃⠀⠀⢿⡄⠀⠀⠀⠀⠀⢠⣿⣿⡀⠀⠀⠀⠀⠀⢠⡿⠀⠀⠘⣷⡀⠀
+⠀⣼⣏⣀⣀⣀⣈⣿⡀⠀⠀⠀⠀⣸⣿⣿⡇⠀⠀⠀⠀⢀⣿⣃⣀⣀⣀⣸⣧⠀
+⠀⢻⣿⣿⣿⣿⣿⣿⠃⠀⠀⠀⠀⣿⣿⣿⣿⠀⠀⠀⠀⠈⢿⣿⣿⣿⣿⣿⡿⠀
+⠀⠀⠉⠛⠛⠛⠋⠁⠀⠀⠀⠀⢸⣿⣿⣿⣿⡆⠀⠀⠀⠀⠈⠙⠛⠛⠛⠉⠀⠀
+⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠸⣿⣿⣿⣿⠇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
+⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⣾⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
+⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣸⣿⣿⣿⣿⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
+⠀⠀⠀⠀⠀⠀⠴⠶⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠿⠶⠦⠀⠀
+
+
+

Operational Balance in SRE: Finding the Equilibrium in Reliability and Velocity


+
+Site Reliability Engineering has established itself as more than just a set of best practices or methodologies. Instead, it stands as a beacon of operational excellence, which guides engineering teams through the turbulent waters of modern software development and system management.
+
+In the universe of software production, two fundamental forces are often at odds: The drive for rapid feature release (velocity) and the need for system reliability. Traditionally, the faster teams moved, the more risk was introduced into systems. SRE offers a approach to mitigate these conflicting drives through concepts like error budgets and SLIs/SLOs. These mechanisms offer a tangible metric, allowing teams to quantify how much they can push changes while ensuring they don't compromise system health. Thus, the error budget becomes a balancing act, where teams weigh the trade-offs between innovation and reliability.
+
+An important part of this balance is the dichotomy between operations and coding. According to SRE principles, an engineer should ideally spend an equal amount of time on operations work and coding - 50% on each. This isn't just a random metric; it's a reflection of the value SRE places on both maintaining operational excellence and progressing forward with innovations. This balance ensures that while SREs are solving today's problems, they are also preparing for tomorrow's challenges.
+
+However, not all operational tasks are equal. SRE differentiates between "ops work" and "toil". While ops work is integral to system maintenance and can provide value, toil represents repetitive, mundane tasks which offer little value in the long run. Recognising and minimising toil is crucial. A culture that allows engineers to drown in toil stifles innovation and growth. Hence, an organisation's approach to toil indicates its operational health and commitment to balance.
+
+A cornerstone of achieving operational balance lies in the tools and processes SREs use. Effective monitoring, observability tools, and ensuring that tools can handle high cardinality data are foundational. These aren't just technical requisites but reflective of an organisational culture prioritising proactive problem-solving. By having systems that effectively flag potential issues before they escalate, SREs can maintain the balance between system stability and forward momentum.
+
+Moreover, operational balance isn't just a technological or process challenge; it's a human one. The health of on-call engineers is as crucial as the health of the services they manage. On-call postmortems, continuous feedback loops, and recognising gaps (be it tooling, operational expertise, or resources) ensure that the human elements of operations are noticed.
+
+In conclusion, operational balance in SRE isn't static thing but an ongoing journey. It requires organisations to constantly evaluate their practices, tools, and, most importantly, their culture. By achieving this balance, organisations can ensure that they have time for innovation while maintaining the robustness and reliability of their systems, resulting in sustainable long-term success.
+
+That all sounds very romantic. The truth is, it's brutal to archive the perfect balance. No system will ever be perfect. But at least we should aim for it!
+
+Continue with the third part of this series:
+
+2024-01-09 Site Reliability Engineering - Part 3: On-Call Culture and the Human Aspect
+
+E-Mail your comments to paul@nospam.buetow.org :-)
+
+Back to the main site
+
+
+
-- cgit v1.2.3