<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Galo Navarro</title>
    <description>Galo Navarro — Principal Engineer specializing in distributed systems, platform engineering, and infrastructure for AI workloads. Writing about Kubernetes, delivery pipelines, and scaling engineering organizations.
</description>
    <link>https://varoa.net/</link>
    <atom:link href="https://varoa.net/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sun, 01 Mar 2026 19:30:54 +0000</pubDate>
    <lastBuildDate>Sun, 01 Mar 2026 19:30:54 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      <item>
        <title>Inferencing delivery bottlenecks away</title>
        <description>&lt;p&gt;One of the hardest parts of integrating AI in our workflows is realizing that
you can, and probably should, throw away assumptions and re-evaluate trade-offs
from scratch.&lt;/p&gt;

&lt;p&gt;Unit tests are insufficient on their own to guarantee the correctness of a
larger system so we rely on integration / e2e to fill the gap. But the latter
are also slower and resource-hungry so as the number of tests grows, iteration
speed suffers and delivery pipelines become unscalable.&lt;/p&gt;

&lt;p&gt;What everyone always wanted was to avoid running the entire suite of
integration / e2e tests for every single change, and filter only relevant ones.
But the problem of calculating the code paths  affected by a given change is
hardest precisely at the integration / e2e layer, where the dependency graph
isn’t visible at compilation time and crosses network and system boundaries.&lt;/p&gt;

&lt;p&gt;So the dilemma was never really solved. Other than the consultants coming up
with a revised &lt;a href=&quot;https://varoa.net/2024/02/06/how-about-we-forget-the-concept-of-test-types.html&quot;&gt;geometry of
tests&lt;/a&gt;
every five-odd years, each organization settled on a custom mix of testing
layers that provided enough confidence, not too much friction, reasonable
costs, and got on with their business. Even so, scalability problems in
delivery pipelines were routine even in industry leaders&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;With the growing deluge of code hitting every delivery pipeline these days,
whatever equilibrium existed is no more. In the same way as &lt;a href=&quot;https://varoa.net/2026/02/22/code-reviews-cant-keep-up.html&quot;&gt;code reviews can’t
keep up&lt;/a&gt;, running
the entire suite of tests for 10x, 100x more PRs is just not sustainable. And
the problem hits explosively in integration and e2e tests. Even if they can be
parallelized (e.g. full environment in a GitHub runner or ephemeral
environment), cost and duration are problematic. Others depend on staging or
persistent production replicas.&lt;/p&gt;

&lt;p&gt;Finding the relevant subset of integration / e2e tests might have just become a
lot simpler. An LLM can actually understand systems and features at a much
higher level. So a small decision step in our delivery pipeline can evaluate
individual changes to any part of a larger system, using architecture
documents, API specs and similar context, and produce a reasonably relevant
subset of tests.&lt;/p&gt;

&lt;p&gt;The type of prompt I’m using looks like this:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Look at this diff. Use $architecture_knowledge_base and tell me the subset of e2e tests that would be relevant to execute to validate this feature with reasonable confidence.&lt;/p&gt;

&lt;/blockquote&gt;

&lt;p&gt;The key is of course the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$architecture_knowledge_base&lt;/code&gt;. Integration/E2E tests
cover the surface of a System Under Test so the LLM needs to have a reasonable
picture of its internal structure. In the simplest case, the entire system and
test suite lives in a monorepo, so the LLM has everything it needs on the
checkout. In other cases, the monorepo is too large and the exploration costs
way too many tokens. Or the change lives in a different repository than other
components of the system, and even the integration / e2e test suite. But the
idea doesn’t change much. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$architecture_knowledge_base&lt;/code&gt; might just be a
simple README.md, a more comprehensive document with a high-level architecture
of the system, API specs, whatever is necessary to help the LLM build a
reasonable approximation of the dependency graph, and figure out what subset of
integration/e2e tests seem most relevant for the change at hand.&lt;/p&gt;

&lt;p&gt;The failure mode is that being an approximation we might skip a relevant test
that would capture a bug. Running the full e2e suite is a reasonable safety
net, but you can get away with not doing it all the time and limit them to a
fixed schedule (hourly, daily, whatever works).&lt;/p&gt;

&lt;p&gt;Complementing this setup, usual rollout strategies (red/black, canaries, etc.),
and solid observability in production. Which should cover not just anomalies in
latency, throughput, or error rates at the service boundaries, but also
business-level metrics that catch silent functional regressions. Overall, this
hits a reasonable balance between delivery speed and early detection of
whatever issues slip through the cracks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Footnotes&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I know of a FAANG where an entire business division in charge of a very-well-known product was on the spot internally for the ungodly amount of resources consumed by running e2e tests on their delivery pipelines. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 01 Mar 2026 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2026/03/01/inferencing-delivery-bottlenecks-away.html</link>
        <guid isPermaLink="true">https://varoa.net/2026/03/01/inferencing-delivery-bottlenecks-away.html</guid>
        
        
      </item>
    
      <item>
        <title>Code reviews can&apos;t keep up</title>
        <description>&lt;p&gt;I’m no longer convinced that the classic code review process is the
right tool for the job.&lt;/p&gt;

&lt;p&gt;For me, code reviews were always the prime resource for knowledge
transfer, consistency, and collective ownership of code (not for finding
bugs, that is what tests are for). I am still convinced that the goals
remain relevant, perhaps more than ever. But I am not blind. In the
teams I work with, every engineer is producing more PRs than they could
realistically review, even at early stages of AI adoption. Code review
boards are clogged with features and bug fixes, and back-pressure
propagates through the engineering team up to the business.&lt;/p&gt;

&lt;p&gt;The argument to slow down production and spend enough time on reviews
sounds like asking to print less books because the scribbler monks can’t
keep up. The burden is on us, engineers, to figure out what needs to
change in the software production system so that we can leverage the
technology available to us.&lt;/p&gt;

&lt;p&gt;Code reviews are an impediment because they were designed under the
assumption that software follows an artisanal mode of production. But
human labor is no longer the primary constraint. The systems we are
creating to produce software are not better shovels, they are industrial
factories and robots. So whatever replaces code reviews needs to work in
an industrial context, operate at a much higher throughput, and still
respect the same or higher level of quality, reliability, and security
standards.&lt;/p&gt;

&lt;p&gt;As software engineers we know well enough that increasing throughput is
the easy part. The real challenge is making high throughput sustainable.
This is where knowledge transfer, consistency, and collective ownership
of code remain relevant. But we need different ways to deliver them.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Knowledge transfer looks different when producing software doesn’t need
the classic
&lt;a href=&quot;https://aws.amazon.com/executive-insights/content/amazon-two-pizza-team/&quot;&gt;pizza-team&lt;/a&gt;
model with closely knit groups of 5-10 people collaborating together.
The idea of &lt;a href=&quot;https://every.to/chain-of-thought/the-two-slice-team&quot;&gt;two-slice
teams&lt;/a&gt; is taking
hold, and I am seeing it working well around me. One or two high-agency
engineers with a mission and LLMs.&lt;/p&gt;

&lt;p&gt;A way to see coding agents is as a supply of throwaway hires that on can
delegate work to (including code reviews). I am not interested in
knowledge transfer to disposable hires in the same way as I was with
humans. I do care about being able to feed them a codebase, specs,
design decisions, and historical context about systems in the way that
Neo learned martial arts in Matrix.&lt;/p&gt;

&lt;center&gt;&lt;img src=&quot;https://media1.tenor.com/m/wRKrqeO4CLcAAAAd/matrix-upload.gif&quot; /&gt;&lt;/center&gt;

&lt;p&gt;Although the concept of a knowledge cartridge uploaded to a brain in
seconds is very far from the way we’ve been preserving internal
knowledge about software: a soup of half-obsolete confluence pages,
internal wikis, and so-so commit logs doesn’t cut it. But neither does a
CLAUDE.md or AGENTS.md however cleverly written.&lt;/p&gt;

&lt;p&gt;To review work effectively, an agent needs requirements, technical
designs,
&lt;a href=&quot;https://docs.aws.amazon.com/prescriptive-guidance/latest/architectural-decision-records/adr-process.html&quot;&gt;ADRs&lt;/a&gt;,
actual API specs. Everything that a human used to internalize over
months, at an instant. When projects are split in multiple repositories,
they need an authoritative reference, providing a high level overview
and a map of actual hyperlinks. All that information needs to coexist
and evolve with the source code it describes, behind SHAs with a
high-quality commit log, giving agents critical access to the full
history and rationale of changes. The Linux kernel is a reference for
how most projects should start behaving at any scale.&lt;/p&gt;

&lt;p&gt;This is where I see one of the biggest gaps in infrastructure and
toolchain that slow down the transformation of the software production
systems. The blob of code, designs, and specs described above is still
too raw, and looks like a sweet spot to apply Retrieval-Augmented
Generation (RAG) for agents (and humans using them) to navigate a
codebase. Review feedback makes more sense as actual patches (either at
review time, or post-merge depending on the criticality). The automated
delivery pipeline needs built-in &lt;a href=&quot;https://en.wikipedia.org/wiki/Andon_(manufacturing)&quot;&gt;Andon
cords&lt;/a&gt; at different
stages to request and accomodate human intervention when necessary.
Classic UIs and CI/CD systems seem archaic for these purposes, starting
with the GitHub as the industry standard. I’m not sure that building
in-house alternatives will be the way to go.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Consistency doesn’t concern me as much because standards are a
precondition of industrial processes. Linters and formatters already did
much of the work, and agents write code that is ultimately a standard
distilled from reinforcement learning of an industry-wide corpus.
Consistency becomes infrastructure, not a by-product of enforcement.&lt;/p&gt;

&lt;p&gt;Ownership is a different matter.&lt;/p&gt;

&lt;h2 id=&quot;renegotiating-trust-and-accountability-boundaries&quot;&gt;Renegotiating trust and accountability boundaries&lt;/h2&gt;

&lt;p&gt;A code review left an audit trail of responsibility over code and
outcomes. But who is responsible and accountable for the code written,
reviewed, approved, and deployed by agents? Here are AWS’s comments on
the recent &lt;a href=&quot;https://www.theregister.com/2026/02/20/amazon_denies_kiro_agentic_ai_behind_outage/&quot;&gt;13-hour disruption related to its AI agent
Kiro&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This brief event was the result of user (AWS employee) error - specifically misconfigured access controls - not AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s true that humans are generally responsible for their use of a tool,
but it feels somewhat simplistic to lay responsibility on the user of
an extraordinarily complex tool operating semi-autonomously in an
extremely complicated distributed system. People can’t reliably prompt
into correct outcomes when both are separated by a fog of
interpretations made by non-deterministic models. But then why should we
trust this code to reach production?&lt;/p&gt;

&lt;p&gt;Trust on modern software pipelines will come from two sources.&lt;/p&gt;

&lt;p&gt;First, it’s obvious that the sophistication required from testing, QA
and observability are going to explode to a degree that I’m not sure
we’ve grasped yet. The dashboards, the #alerts channel, the pager, every
mechanism that expects continuous human attention are not it. But when
someone or something pulls the Andon cord and brings a human in the
loop, the complexity of &lt;a href=&quot;https://surfingcomplexity.blog/2026/02/14/lots-of-ai-sre-no-ai-incident-management/&quot;&gt;incident
management&lt;/a&gt;
explodes. Everything that’s going on in the industrial complex needs to
be made visible and intelligible for the human(s) that need to make
decisions at this speed.&lt;/p&gt;

&lt;p&gt;Second, taking the real world as reference, we can see that the need for
trust and accountability led us to demand credentials to use certain
tools in certain contexts: you can’t just operate cars, heavy machinery,
aviation control towers etc. at will. At some point, the same will
happen to operate agents that are functioning autonomously in complex
software environments with non-trivial consequences at stake.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;The accountability point shows that even in industrial systems, humans
close the loop. We may be one or two steps removed from the factory
floor, but stay responsible for design, behaviour, and outcomes. That
requires a deep understanding of the production systems. Code reviews
were a key component of the learning experience that helped engineers
develop the domain expertise and scar tissue that are required to bear
that responsibility. We will need alternative methods to solve that
problem. We might be able to deprecate code reviews. We can’t afford to
deprecate expertise.&lt;/p&gt;
</description>
        <pubDate>Sun, 22 Feb 2026 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2026/02/22/code-reviews-cant-keep-up.html</link>
        <guid isPermaLink="true">https://varoa.net/2026/02/22/code-reviews-cant-keep-up.html</guid>
        
        
      </item>
    
      <item>
        <title>AI workloads challenge the cattle model</title>
        <description>&lt;p&gt;AI workloads break the “cattle” &lt;sup&gt;&lt;a href=&quot;&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; approach to 
infrastructure management that made Kubernetes an effective IaaS
platform. Kubernetes stays agnostic of the workloads, treats resources
as fungible, and the entire stack underneath plays along: nodes on top
of undifferentiated VMs on undifferentiated cloud infrastructure. It’s
cattle all the way down.&lt;/p&gt;

&lt;p&gt;But AI infrastructure punishes mental models applied from inertia.
Generic abstractions that worked for backend services are too limited,
and treating six-figure hardware as disposable, undifferentiated cattle
seems unacceptable.&lt;/p&gt;

&lt;h2 id=&quot;restarts&quot;&gt;Restarts&lt;/h2&gt;

&lt;p&gt;Take restarts. Software engineers learn to design and expect systems
where the impact of a restart should be insignificant. In a generic
backend service (an API, a stream processor) this works because a
restart means dropping a tiny fraction of requests. For most AI
workloads, the impact is a much bigger deal.&lt;/p&gt;

&lt;p&gt;This is because AI workloads have fundamentally different constraints
and different Service Level Objectives (SLOs).&lt;/p&gt;

&lt;p&gt;Training workloads extend over long periods of time (anywhere from hours
to weeks or months) and follow a regular pattern alternating in-GPU
computation for individual batches with bursts of huge data exchanges
during which participating GPUs synchronize gradients, perform
checkpoints, etc. So a Pod dying mid-batch means re-computing and
re-synchronizing that work.&lt;/p&gt;

&lt;p&gt;This breaks expectations. Training workloads ultimately must optimize
for speed and cost. Given that training a model locks a lot of expensive
equipment for a long time, the priority is to maximize GPU
utilisation. Even a modest rate of Pod restarts will have a massive
impact on the overall cost, as a single pod restart may easily lead
to &lt;a href=&quot;https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/&quot;&gt;repeating more work across all
GPUs&lt;/a&gt;.
That multiplier factor doesn’t exist in backend services.&lt;/p&gt;

&lt;p&gt;Inference workloads are short lived in comparison to training, but still
much longer than the typical request to a backend service (a chatbot
might need seconds or minutes to reply to a prompt). And SLOs are very
nuanced depending on the specific type of inference and use case. A
chatbot needs to keep Time To First Token (TTFT) well under a second, so
a Pod restart degrades the user experience.  After the first token, the
rest just need to keep a stable enough Time Per Output Token (TPOT) to
keep up with the user’s reading speed. But for an LLM summarizing large
bodies of text, or a vision model categorizing images, TTFT will be
unimportant compared to overall throughput.&lt;/p&gt;

&lt;h2 id=&quot;topology-awareness&quot;&gt;Topology awareness&lt;/h2&gt;

&lt;p&gt;When a traditional workload requests a particular amount of resources,
the Kubernetes scheduler generally cares about finding available memory
and CPU, but not much about where they are. With GPU infrastructure this
changes.&lt;/p&gt;

&lt;p&gt;Topology awareness and node identity are critical factors for AI
workloads. Training requires moving large amounts of data efficiently
across all participating GPUs, and those transfers happen through many
communication paths across the network fabric, and within individual
servers. To support these needs, a low-end GPU server might look like
this (&lt;a href=&quot;https://docs.nvidia.com/certification-programs/latest/nvidia-certified-configuration-guide.html&quot;&gt;NVIDIA-Certified Systems Configuration
Guide&lt;/a&gt;):&lt;/p&gt;

&lt;div class=&quot;image-box&quot;&gt;
  &lt;img src=&quot;https://varoa.net/assets/ai-workloads/2p-server-2-gpus.png&quot; alt=&quot;2-GPU server topology&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;But paying for the more expensive equipment is not enough if the
scheduler doesn’t leverage it correctly.&lt;/p&gt;

&lt;p&gt;Imagine two of training jobs, A and B, and each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requests&lt;/code&gt; 2 GPUs. If
task A gets the two GPUs on the left, and B the two on the right, a
gradient synchronization stays within a single CPU socket making data
transfers efficient (&lt;a href=&quot;https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/&quot;&gt;50-55 GB/s on Gen
5&lt;/a&gt;).
But if A and B get one GPU on each side, traffic will be forced to
traverse the CPU interconnect, &lt;a href=&quot;https://www.exxactcorp.com/blog/HPC/exploring-the-complexities-of-pcie-connectivity-and-peer-to-peer-communication&quot;&gt;reducing bandwidth by
3-4x&lt;/a&gt;
and creating unnecessary contention. All for a scheduling decision that
Kubernetes doesn’t even know exists.&lt;/p&gt;

&lt;p&gt;Higher-end servers scale the number of GPUs making the topology more
complex and scheduling decisions more nuanced. And at the top of the
spectrum, servers add direct GPU-to-GPU interconnects, dedicated
high-bandwidth NICs (giving each GPU a dedicated bandwidth of 400 Gbps),
and optimized topologies. The diagram below shows how &lt;a href=&quot;https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-to-dgxh100.html&quot;&gt;NVIDIA’s DGX
H100&lt;/a&gt;
(a pre-assembled server) creates a full mesh topology directly via
NVLink.&lt;/p&gt;

&lt;div class=&quot;image-box&quot;&gt;
  &lt;img src=&quot;https://varoa.net/assets/ai-workloads/dgx-h100.png&quot; alt=&quot;DGX H100 NVLink topology&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;The scheduling problem becomes much harder than traditional clusters. It
is no longer about “find an available host with this much memory and
CPU”. When managing AI workloads, the exact placement matters at every
level, the GPU itself, but also which NUMA node, which PCIe switch or
NVSwitch configuration, which NIC, etc. Once we add multiple servers, we
will also have to worry about the deep rabbit hole of high-performance
networks.&lt;/p&gt;

&lt;p&gt;Kubernetes has no native understanding of most of these details, so it
relies on the operator supplying that information up front by
configuration or other means.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html&quot;&gt;NVIDIA GPU
operator&lt;/a&gt;
helps by exposing GPU topology information to the Kubernetes scheduler.
It uses lower level tools (e.g. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nvidia-smi&lt;/code&gt;) to extract information
about GPU placement, interconnects and NUMA domains, which enable better
scheduling decisions. The operator also provides a &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/&quot;&gt;Kubernetes device
plugin&lt;/a&gt;
so that workloads can specify new resources, &lt;a href=&quot;https://docs.cloud.google.com/kubernetes-engine/docs/how-to/gpu-operator&quot;&gt;this example Pod
spec&lt;/a&gt;
is requests 1 GPU:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: cuda-vectoradd
spec:
  restartPolicy: OnFailure
  containers:
  - name: vectoradd
    image: nvidia/samples:vectoradd-cuda11.2.1
    resources:
    limits:
      nvidia.com/gpu: 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From the user’s perspective this is deceptively simple, but it still
depends on the operator having configured node pools correctly based on
the specific server and GPU types, etc.&lt;/p&gt;

&lt;h2 id=&quot;resource-selection&quot;&gt;Resource selection&lt;/h2&gt;

&lt;p&gt;The cattle model lets users think of resources in terms of high-level
abstractions like “cpu share” or “memory size”. Those categories are too
coarse if we’re dealing with GPU infrastructure.&lt;/p&gt;

&lt;p&gt;For example, the &lt;a href=&quot;http://nvidia.com/gpu&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nvidia.com/gpu&lt;/code&gt;&lt;/a&gt; resource
shown above refers to a full GPUs. But some AI  workloads may be too
small (e.g. a Jupyter notebook), so reserving that much compute is a
waste. To avoid this, cluster operators can partition GPUs using
features like &lt;a href=&quot;https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html&quot;&gt;multi-instance GPU
(MiG)&lt;/a&gt;,
which allows exposing a GPU as smaller GPU instances.&lt;/p&gt;

&lt;p&gt;For example, the &lt;a href=&quot;https://docs.nvidia.com/datacenter/tesla/mig-user-guide/supported-mig-profiles.html#a100-mig-profiles&quot;&gt;NVIDIA
A100&lt;/a&gt;
supports a profiles like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1g.5gB&lt;/code&gt; (exposing 7 GPU instances to the host
with 1/8th of memory each) or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3g.30gb&lt;/code&gt; (3 GPU instances with 4/8th of
memory). With the NVIDIA Operator, those MiG partitions appear as
resource types. A user running a workload can then adjust resource
allocation to request MiG instances and avoid taking larger GPUs unless
it’s necessary.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[...]
resources:
  limits:
    nvidia.com/mig-3g.30gb: 2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But this also implies that users cannot simply throw workloads to the
cluster assuming that the cluster has a herd of undifferentiated “GPU
units” as they would with CPUs. There is an abstraction leakage: users
need to think about GPU partitioning, memory slices, and operator-level
concerns much deeper than in traditional backends.&lt;/p&gt;

&lt;h2 id=&quot;heterogeneous-clusters&quot;&gt;Heterogeneous clusters&lt;/h2&gt;

&lt;p&gt;The need to adapt infrastructure to the combination of workloads that
make sense for each organization is a key factor why heterogeneous
clusters make sense in this context.&lt;/p&gt;

&lt;p&gt;Consider for example &lt;a href=&quot;https://arxiv.org/abs/2401.09670&quot;&gt;disaggregated
inference&lt;/a&gt;. This technique is used to
improve performance in LLM engines based on the SLOs mentioned above. To
handle a request, the engine will split the work in a “Prefill” phase
that optimizes for TTFT, and a “Decode” phase that optimizes for TPOT.
Both phases have very different constraints: prefill is in charge of the
initial processing of the prompt and is very compute intensive, whereas
“Decode” is less compute and more memory intensive. So it might make
sense to have a heterogeneous cluster that dedicates higher-end GPU
servers with NVLink for prefill, and less powerful GPUs with more memory
for decode.&lt;/p&gt;

&lt;p&gt;In this scenario, you probably also want to reserve high-end nodes for
Prefill even if the GPUs are idle to ensure that incoming requests
achieve better TTFT. This trade-off, intentionally keeping expensive
resources underutilized, is not exactly what the Kubernetes scheduler is
designed to do.&lt;/p&gt;

&lt;h2 id=&quot;are-pets-back-then&quot;&gt;Are pets back then?&lt;/h2&gt;

&lt;p&gt;No. The complexity and scale of AI infrastructure makes a cattle
approach mandatory. But to keep applying it, we need to extend the
capabilities of our tools to fill gaps as they appear.&lt;/p&gt;

&lt;p&gt;Schedulers like &lt;a href=&quot;https://volcano.sh/en/&quot;&gt;Volcano&lt;/a&gt; or &lt;a href=&quot;https://github.com/NVIDIA/KAI-Scheduler&quot;&gt;NVIDIA’s
KAI&lt;/a&gt; extend Kubernetes with
topology awareness and more sophisticated scheduling policies that help
make effective and efficient use of expensive resources (some like &lt;a href=&quot;https://kubernetes.io/docs/concepts/scheduling-eviction/gang-scheduling/&quot;&gt;gang
scheduling&lt;/a&gt; making their way into k8s). Distributed application
frameworks like &lt;a href=&quot;https://www.ray.io/&quot;&gt;Ray&lt;/a&gt; can build on top, provide
users with higher level abstractions tailored for AI workloads (e.g. the
&lt;a href=&quot;https://docs.ray.io/en/latest/cluster/kubernetes/index.html&quot;&gt;kube-ray
operator&lt;/a&gt;
adds custom resources like the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RayCluster&lt;/code&gt; for training jobs, or the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RayService&lt;/code&gt; for inference use cases),  and manage their lifecycle more
effectively.&lt;/p&gt;

&lt;p&gt;This preserves the core principles of the cattle model: we isolate the
complexity of infrastructure behind general purpose abstractions with
just enough knowledge of the workloads that run on top, and allow users
to submit work that runs effectively without the need to dig in lower
level details.&lt;/p&gt;

&lt;p&gt;The cost comes as more layers of complexity that require operators with
expertise in GPU architectures, PCIe topologies, network fabrics, and
virtualization, and users with enough technical knowledge to understand
how to translate their constraints and SLOs to infrastructure
requirements.&lt;/p&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;p&gt;[1][]: The “pets versus cattle” metaphor contrasts two ways of managing
infrastructure. In the old days, we treated each piece of hardware as
pets, special, unique, with server names like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Gandalf&lt;/code&gt; (sticker
included), their very own IP address and configuration. Should it break
it was a Big Deal to replace with its successor &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Legolas&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Gimli&lt;/code&gt;, or
whatever fit our naming schema [2]. Under the cattle mindset, resources
are just anonymous, indistinguishable resources. When Pod &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;4baad43&lt;/code&gt;
dies, the Kubernetes scheduler shrugs and spawns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;23f13ca&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;[2][]: Naming schemes seemed notably unimaginative: Lord of the Rings
characters, constellations, metal bands… Some of the
&lt;a href=&quot;http://last.fm&quot;&gt;last.fm&lt;/a&gt; servers were a notable exception, but the
naming scheme shall remain in the dark.&lt;/p&gt;
</description>
        <pubDate>Sat, 07 Feb 2026 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2026/02/07/ai-workloads-challenge-the-cattle-model.html</link>
        <guid isPermaLink="true">https://varoa.net/2026/02/07/ai-workloads-challenge-the-cattle-model.html</guid>
        
        
      </item>
    
      <item>
        <title>PoC is a framework of perverse incentives</title>
        <description>&lt;p&gt;Was it ever possible to build a Proof of Concept that didn’t end up
being rushed into production as a nest of bugs, instability, and
technical debt, regardless of any advance warning from engineering that
it will be “just a PoC”? Can we be at ease, now that anyone with a
laptop has a PoC machine gun?&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Every PoC starts with good intentions. But the road to hell is paved
with good intentions. They are a bit like that movie “Gremlins” where a
kid sells a cute pet with three warnings (“no sunlight”, “no contact
with water”, “no food after midnight”). The buyer ignores the warnings
and at the first oversight, the pet nibbles on chicken wings leftovers
and transforms into a horde of reptilian psychopaths. The kid’s
grandfather, a grumpy elder, knew that usage warnings are useless and
didn’t want to sell the creature in the first place. But nobody wanted
to listen. The kid was too eager to sell.  The man was too eager to buy.&lt;/p&gt;

&lt;p&gt;The term PoC functions as a trigger-word, offering a special type of
deliverable that twists what are fair, rational incentives from every
stakeholder into a perverse framework that works against them.&lt;/p&gt;

&lt;p&gt;For engineers, it carries the implicit license to fasten headphones and
indulge in coding without the boring guardrails of the daily job, the
jiras, the tests, the documentation, the standard tools and frameworks,
the attention to quality standards. It’s “just a PoC” so it’ll be fun
and it’ll be fine, we will only need a bit of extra time to polish, to
productise. They become eager sellers of a dangerous product.&lt;/p&gt;

&lt;p&gt;For anyone on the buy side, the PoC is equally attractive. It finds
eager sponsors among anyone with influence on priorities. They see work
that requires surprisingly small budgets, that motivates engineers, that
perhaps carries fuzzy warnings about it being “just a PoC”, but how
risky can this be? It’s “just a PoC”.&lt;/p&gt;

&lt;p&gt;When the demo arrives PMs are excited that such a valuable feature got
done faster and cheaper than usual. Sales is positive that it can really
help close $customer’s contract renewal.  Suddenly, the entire
organization is pushing to roll it out by yesterday. Nobody wants to
productise. They want to profit. Bring value to customers. How can that
be wrong?  Warnings are forgotten. Any delay, incomprehensible. Let’s be
agile and iterate later.&lt;/p&gt;

&lt;p&gt;Sometimes engineers manage to push back and rewrite, the delay
frustrates PMs, Sales, customers alike (“What is it with you engineers
and technical debt, why did you introduce it in the first place?”).
Other times someone pulls $customer’s monthly bill and strong-arms the
release of a duct taped, half-baked feature, and engineers brace for the
ensuing maintenance tire fire. Every stakeholder has fair and rational
incentives, but the aggregate is a mess.&lt;/p&gt;

&lt;p&gt;The whole point of PoCs was to confirm the viability of an idea. What’s
viable about all this?&lt;/p&gt;

&lt;h2 id=&quot;you-cant-lose-if-you-dont-play&quot;&gt;You can’t lose if you don’t play&lt;/h2&gt;

&lt;p&gt;The best way to avoid falling in the PoC trap is not doing them in the
first place. Which doesn’t mean to stop doing small investments to
validate ideas. What we want to avoid is a particular mindset, a way of
approaching and executing this type of projects. The antiquarian in
Gremlins wasn’t against all Christmas presents either, just the kind
that will find a way to eat dinner leftovers and stab you.&lt;/p&gt;

&lt;p&gt;Not doing PoCs is about treating those seemingly special projects as any
other. It means resisting the temptation to compromise on the standards
and basic principles that apply to ordinary work. Those short-cuts are a
naive optimization that yes, accelerates the demo. But does so at the
expense of creating a flawed deliverable.&lt;/p&gt;

&lt;p&gt;For the sake of speed it seems fine to have tests fail, hack around the
code base breaking logical boundaries that are there for good reasons,
to leave corner cases unimplemented behind TODOs. If quality controls
make those short-cuts impossible on the main line, then let’s find a
toxic workaround like developing on feature branches or the git-flow
nonsense.&lt;/p&gt;

&lt;p&gt;By tolerating short-cuts, engineers trick themselves with a shiny
looking demo that conceals a cluster of delay-fuse bombs of technical
debt and late integration issues.&lt;/p&gt;

&lt;p&gt;The same standards and principles applied to production are the
necessary guardrails to keep emotional impulses and perverse incentives
in check. Work and demo from trunk, respect the same quality checks,
control exposure to users with feature flags or similar mechanisms, etc.
Even if the demo takes a bit longer, the risk that over-excited PMs push
for a premature release becomes smaller and easier to manage.&lt;/p&gt;

&lt;p&gt;The argument that respecting production-grade delivery and operational
standards slows down experimentation has some merit. But it raises the
question: if those standards matter, why is it acceptable to open
back-doors under the PoC banner? Instead, a more reasonable course of
action would be to adapt delivery processes and operational standards to
enable the rate of speed and experimentation that the business requires.&lt;/p&gt;

&lt;h2 id=&quot;scaling-without-over-engineering&quot;&gt;Scaling without over-engineering&lt;/h2&gt;

&lt;p&gt;Another objection to treating a PoC like real work is that production
standards require taking scalability, maintainability and the whole lot
into account. Won’t that turn a quick experiment into an expensive and
over-engineered mess?&lt;/p&gt;

&lt;p&gt;The answer is that we should not confound considering those factors with
actually designing and implementing for the maximum. In a well-known
talk about scalability, &lt;a href=&quot;http://static.googleusercontent.com/media/research.google.com/en//people/jeff/WSDM09-keynote.pdf&quot;&gt;Jeff Dean recommends
to&lt;/a&gt;
“design for ~10x growth, but plan to rewrite before ~100x [because] the
right design at x may be very wrong at 10x or 100x.”&lt;/p&gt;

&lt;p&gt;There is an implicit point here: scalability is about ranges, more than
fixed points. What matters is awareness of where the initial
implementation stands, and a clear, credible story for how it will scale
if, and when needed.&lt;/p&gt;

&lt;p&gt;The PoC mentality adds pressure to design for very concrete points at
the lower end of the range (“the demo will be for just one user”, “it’s
just a couple of engineers working on this system”, “it’s just a
throwaway”). Which leaves little runway before the system saturates and
forces a rewrite.&lt;/p&gt;

&lt;p&gt;The way to avoid that problem is not jumping straight from “throwaway
demo” to “hyperscale”. It is to apply the same principles one would use
in normal work, but adjusting the scale. A design for a single-digit %
of the full production load with enough headroom to support a 10x growth
is enough to keep the over-engineering risk under control.&lt;/p&gt;

&lt;p&gt;When it comes to maintainability, it is naive to anticipate years of
future maintenance by a hypothetical team of dozens of engineers, and
bloat the experiment with every architectural paradigm on the books.
Most often, it’s about the exact opposite. To avoid marrying to anything
that embeds long-term assumptions (architectural patterns, frameworks,
etc.). Deliver a basic, monolithic design with reasonably clear
boundaries to allow for a clean replacement of functional units, or
slicing them out to larger subsystems, if and when required.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Call this type of project whatever you want. What matters is refusing to
delude ourselves into a mindset that treats short-cuts as free, warnings
as formalities, and “we’ll productize later” as a plan. A reasonable
combination of discipline and pragmatism allows you to be much more
ambitious. To build experiments that can satisfy the short-term urges of
PMs or sales, and are able to grow fast into real products without a
rewrite, a fire drill, or an apology to the customer who trusted the
demo.&lt;/p&gt;
</description>
        <pubDate>Fri, 30 Jan 2026 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2026/01/30/poc.html</link>
        <guid isPermaLink="true">https://varoa.net/2026/01/30/poc.html</guid>
        
        
      </item>
    
      <item>
        <title>AI-generated code will choke delivery pipelines</title>
        <description>&lt;p&gt;Everyone is focused on the impact of AI on the production of code. But
code isn’t just produced, it has to be consumed: built, packaged,
tested, distributed, deployed, operated. Leveraging AI to amplify the
supply of code will grow already complex systems and accelerate the pace
of change. Without a realistic plan to scale delivery pipelines, we’re
asking for trouble.&lt;/p&gt;

&lt;p&gt;There is already smoke on the horizon. The DORA report on &lt;a href=&quot;https://dora.dev/research/ai/gen-ai-report/&quot;&gt;Generative AI
impact in Software
Development&lt;/a&gt; released March
2025 notes how “&lt;em&gt;contrary to our expectations, […] AI adoption is
negatively impacting software delivery performance&lt;/em&gt;.”&lt;/p&gt;

&lt;p&gt;After all, the software industry was already competent at increasing
supply well before AI. The real struggle happened on the systems
downstream. Here is &lt;a href=&quot;https://slack.engineering/circuit-breakers/&quot;&gt;Slack&lt;/a&gt;
in 2022:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;For several years, internal tooling and services struggled to keep up
with 10% month-over-month growth in CI/CD requests from a combination
of growth in 1) internal headcount and 2) complexity of services and
testing. Development across Slack slowed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Saturated delivery pipelines aren’t just a technical headache. They rate
limit businesses when growing the flow of changes is most important. But
scaling delivery pipelines the way we’re used to won’t work against a
flood of AI-generated code.&lt;/p&gt;

&lt;h2&gt;Software delivery is a trap of compounding complexity&lt;/h2&gt;

&lt;p&gt;The default approach to software delivery is defensive, with emphasis on
capturing defects before they reach production.&lt;/p&gt;

&lt;p&gt;That turns every growth cycle into a storm of compound effects. More
code leads to larger, more complex systems that are exponentially harder
to test and operate. And change has the annoying side effect of breaking
things, so optimizing to produce more code also means optimizing for
more breakage. Work accumulates, and doing it keeps getting harder.&lt;/p&gt;

&lt;p&gt;It is no coincidence that &lt;a href=&quot;https://slack.engineering/balancing-safety-and-velocity-in-ci-cd-at-slack/&quot;&gt;Slack’s growth
pains&lt;/a&gt;
concentrated around testing. “&lt;em&gt;At its worst, our time-to-test-results for
a commit was 95 minutes&lt;/em&gt;”, “&lt;em&gt;test flakiness per pull request consistently
around 50%&lt;/em&gt;, “&lt;em&gt;cascading failures on overloaded delivery infrastructure&lt;/em&gt;”,
and so on.&lt;/p&gt;

&lt;p&gt;Those are familiar growth pains along the narrative arc that starts with
a couple of Jenkins jobs, and ends with a proliferation of broad, heavy
weight test suites, a hydra of dev / test / staging environments, the
inevitable Rube-Goldberg mechanism (a.k.a., “internal platform”) to
manage the whole mess (provision and maintain infrastructure, keep test
data, component versions, and configuration coherent and aligned with
each other and production). All with the obligation to keep
production-grade SLOs because any malfunction in the delivery pipeline
has the potential to paralyse the business.&lt;/p&gt;

&lt;p&gt;If compound effects bring trouble, how could people get away with
defensive delivery?&lt;/p&gt;

&lt;p&gt;Because growth has natural constraints. Hiring budgets dry, org charts
get too heavy: growth stops. At that point technical infrastructure gets
a chance to breathe, catch up with the flows of code and reach a stable
equilibrium. Some friction remains, but the speed bumps are tolerable
(Slack &lt;a href=&quot;https://slack.engineering/balancing-safety-and-velocity-in-ci-cd-at-slack/&quot;&gt;was
content&lt;/a&gt;
with reaching “&lt;em&gt;test turnaround time (p95) […] consistently below 18
minutes and significantly more predictable&lt;/em&gt;””.)&lt;/p&gt;

&lt;h2&gt;AI melted the growth constraints that protected delivery
pipelines&lt;/h2&gt;

&lt;p&gt;To an organization that needs to produce more software, hiring budget or
management overhead are a smaller obstacle once $20 copilots are enough
to power-up the existing headcount.&lt;/p&gt;

&lt;p&gt;But AI does a lot more than make engineers churn more code. It expands
the population that can produce it. It shifts tactical trade-offs in
ways that favour creating more code (e.g. when cloning functionality is
cheap, &lt;a href=&quot;https://leaddev.com/software-quality/how-ai-generated-code-accelerates-technical-debt&quot;&gt;the incentive to reuse
dissolves&lt;/a&gt;).
When AI is able to deliver coding tasks based on a prompt, there won’t
be enough copies of the Mythical Man Month to dissuade business folks
from trying to accelerate road maps and product strategies by
provisioning fleets of AI coding agents.&lt;/p&gt;

&lt;p&gt;Will this translate into &lt;a href=&quot;https://www.lesswrong.com/posts/tqmQTezvXGFmfSe7f/how-much-are-llms-actually-boosting-real-world-programmer&quot;&gt;a net increase of actual
value&lt;/a&gt;
or just an &lt;a href=&quot;https://leaddev.com/software-quality/how-ai-generated-code-accelerates-technical-debt&quot;&gt;accumulation of slop and technical
debt&lt;/a&gt;?
Regardless of the answer, there will be more raw code, making every
organisation susceptible to growth pains of a much larger scale than
they were used to.&lt;/p&gt;

&lt;h2&gt;Can’t we automate the problem away?&lt;/h2&gt;

&lt;p&gt;No. Automation will make the problem worse. It does help get more work
done, but when the job is producing software, that translates into added
pressure on delivery pipelines. To get a measure of the impact of
automation on flows of code, consider these charts from &lt;a href=&quot;https://dl.acm.org/doi/pdf/10.1145/2854146&quot;&gt;a classic
Google article&lt;/a&gt; in 2015,
well before AI:&lt;/p&gt;

&lt;div class=&quot;image-grid&quot;&gt;
  &lt;img src=&quot;https://varoa.net/assets/ai-generated-code/google-human-committers-per-week.png&quot; alt=&quot;Chart showing the evolution of human committers per week at Google between 2010 and 2015&quot; /&gt;
  &lt;img src=&quot;https://varoa.net/assets/ai-generated-code/google-commits-per-week.png&quot; alt=&quot;Chart showing the evolution of commit per week at Google between 2010 and 2015, split by human and automated ones&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;In 5 years, Google had ~3x as many human committers, and their commits
grew by roughly the same amount. Meanwhile, the total commit rate
multiplied by 10. It was because of automation: over 60% of all commits
at Google came from tools like Rosie, used for “performing large-scale
code changes” and responsible for ~7,000 commits per month. This isn’t
exclusive to FAANGs: at &lt;a href=&quot;https://varoa.net/paas/infrastructure/platform/kubernetes/cloud/2020/01/02/talk-how-to-build-a-paas-for-1500-engineers.html&quot;&gt;Adevinta’s engineering
platform&lt;/a&gt;,
serving a few hundred engineers, our bots produced 7,500 commits per
month during 2019, and that was just automating basic chores (e.g.
internal dependency updates, configuration changes, etc.).&lt;/p&gt;

&lt;p&gt;AI also boosts automation. Which gets more effective, and broadens its
scope to more complex tasks like writing tests, infrastructure
management, fixing easy bugs based on support tickets, etc. Capabilities
that will keep expanding. This opens immense opportunities by making it
viable to solve challenging, higher order problems. But that means code.
A lot. Of. Code.&lt;/p&gt;

&lt;h2&gt;The old playbook to scale software delivery is obsolete&lt;/h2&gt;

&lt;p&gt;Scaling delivery pipelines traditionally boiled down to splitting the
larger system into subdomains that enabled independent flows of code.
This was one of the main selling points of modern architectures patterns
like microservices, serverless, event driven, etc.&lt;/p&gt;

&lt;p&gt;But this strategy is deceptive.&lt;/p&gt;

&lt;p&gt;The final system is still one so all flows merge in it eventually,
surfacing failures that only manifest when components interact together
under real world conditions. An organization that insists on capturing
issues before production starts hopping across a catalogue of poor
quality filters:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Allow independent flows all the way to production, exacerbating the
complexity and cost of troubleshooting late integration issues in an
environment that never stops changing.&lt;/li&gt;
  &lt;li&gt;Sequence full-system tests and roll out to production with a
monolithic delivery process. Which adds control, but defeats the point
by slowing down the entire pipeline.&lt;/li&gt;
  &lt;li&gt;Batch N changes in 1 “release” event to amortize time and cost, which
lead to long-lived “rc” branches and stabilization periods, hold up
new development, and have the same effect on troubleshooting (which of
the N changes caused each issue?)&lt;/li&gt;
  &lt;li&gt;Allow independent flows all the way to production after a full system
validation stage. Which leads to the self-feeding Golem made of
flake-prone, time-consuming, broad scope test suites, the environment
replicas they run on, and the Rube-Goldberg apparatus to manage them.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The few organizations that went through sustained growth periods saw
their pipelines overwhelmed. AI will democratize that experience.&lt;/p&gt;

&lt;h2&gt;Scaling software delivery by shifting focus from defense to offense&lt;/h2&gt;

&lt;p&gt;“&lt;a href=&quot;https://how.complexsystems.fail/#4&quot;&gt;How complex systems fail&lt;/a&gt;”, #4:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The complexity of these systems makes it impossible for them to run
without multiple flaws being present. […] Eradication of all latent
failures is limited primarily by economic cost but also because it is
difficult before the fact to see how such failures might contribute to
an accident.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is impossible to predict every failure mode, let alone replicate it
during development. So why obsess with it? Testing code ahead of
production will always add value. But past an initial safety net, focus
should be on building systems, tools, and processes that can absorb the
fast rate of change that will become the norm as AI disseminates across
the industry.&lt;/p&gt;

&lt;p&gt;My rough mental playbook for scaling delivery pipelines in an AI-driven
ecosystem:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reference defensive testing to a diminishing returns curve, and trim
aggressively. Keep fast, low footprint tests for single components,
without dependencies on other components or dedicated infrastructure.
Verify both sides of API contracts. Replay actual production data
rather than programming mocks. Ship.&lt;/li&gt;
  &lt;li&gt;Minimize dev / test / staging and similar environments, ideally
removing them completely. Shift relevant test suites to canary / smoke
tests. Add synthetic load directly on production. Get code where the
real action is, and learn to survive there.&lt;/li&gt;
  &lt;li&gt;To measure and track quality, prefer metrics that look at business
outcomes and user experience than properties of code. Product SLA /
SLOs, error budgets, etc. work better than test coverage, static
analysis, bug counts and QA reports (even if they have their utility.)&lt;/li&gt;
  &lt;li&gt;Get comfortable with operating high-entropy environments with high
rate of change. Design systems that expect failure but minimize
impact. Provide high-cardinality, context-rich observability that
speeds up detection diagnosis and resolution.&lt;/li&gt;
  &lt;li&gt;Expect to accommodate radical changes in testing practices as
production systems start incorporating inference workloads, making
non-deterministic test results a feature rather than an anomaly. Even
worse if the product embeds ML pipelines with continuous workflows of
(re)training, selection, deployment of models, requiring additional
infrastructure, data intensive workflows, etc.&lt;/li&gt;
  &lt;li&gt;Treat security as an additional dimension of failure, rather than a
sporadic event to worry about during quarterly audits. Probe for
holes, in production, continuously.&lt;/li&gt;
  &lt;li&gt;De-emphasize practices based on control and coordination.
    &lt;ul&gt;
      &lt;li&gt;A prominent example are code reviews. They did deliver valuable
outcomes, but at the expense of throttling delivery. The landscape
has changed now.  Humans are not the only inhabitants of codebases.
Sustaining a high rate of changes is more critical. Software
maintenance economics are upside down. We can’t do code reviews like
the 10 years ago.&lt;/li&gt;
      &lt;li&gt;Another: API contracts should be able to evolve without coordinating
systems on each side. Normalize having N versions of an API alive in
production at any point in time, with self-directed strategies to
measure usage, deprecation, migrations, etc.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;In some critical systems, a slower, more defensive approach is non
negotiable (e.g.  health care, aviation, etc.). Using AI to augmenting
the supply of code may not be adequate.&lt;/li&gt;
  &lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I will close with a quote from &lt;a href=&quot;https://increment.com/testing/i-test-in-production/&quot;&gt;Charity
Majors&lt;/a&gt;: “&lt;em&gt;we need
to help each other get over our fear and paranoia around production
systems. You should be up to your elbows in prod every single day. Prod
is where your users live. Prod is where users interact with your code on
your infrastructure.&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;This was already true when humans were the only ones. We have to make
room for the AIs.&lt;/p&gt;
</description>
        <pubDate>Mon, 07 Apr 2025 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2025/04/07/ai-generated-code.html</link>
        <guid isPermaLink="true">https://varoa.net/2025/04/07/ai-generated-code.html</guid>
        
        
      </item>
    
      <item>
        <title>Why aren&apos;t we all serverless yet?</title>
        <description>&lt;p&gt;The median product engineer should reason about applications as composites of
high-level, functional Lego blocks where technical low-level details are
invisible. Serverless represents just about the ultimate abstraction for this
mindset. Consider &lt;a href=&quot;https://docs.aws.amazon.com/lambda/latest/dg/welcome.html&quot;&gt;AWS
Lambda&lt;/a&gt;’s elevator
pitch: “&lt;em&gt;you organize your code into Lambda functions [which run] only when
needed and scale automatically. You only pay for the compute time that you
consume&lt;/em&gt;”. Engineers can get away without paying attention to infrastructure,
resource allocation, runtime management, cost optimization, or similar concerns
just like they don’t worry about CPU cache coherency algorithms or the electric
grid &lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;And yet, despite an appealing value proposition, the industry pivot to
serverless compute&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; as the default architectural pattern for cloud
applications, hasn’t really happened. As &lt;a href=&quot;https://aws.amazon.com/blogs/aws/aws-lambda-turns-ten-the-first-decade-of-serverless-innovation/&quot;&gt;AWS Lambda turned
10&lt;/a&gt;
last November, Will Larson
&lt;a href=&quot;https://bsky.app/profile/lethain.com/post/3lbad6v3cxc2x&quot;&gt;posted&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Something I’m still having trouble believing is that complex workflows are
going to move to e.g. AWS Lambda rather than stateless containers orchestrated
by e.g. Amazon EKS. I think 0-1 it makes sense, but operating/scaling
efficiently seems hard. […]&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The scepticism seems well justified. In 2023 &lt;a href=&quot;https://www.datadoghq.com/state-of-serverless/&quot;&gt;Datadog
reported&lt;/a&gt; serverless
adoption growing between 3-7% among the major cloud providers, with over
50% of their customers using some version of it. However, there is a
long way from “we use lambda” to “our complex workflows are all
serverless”, and the growth rate seems too incremental to make the case
that an exodus is on the way. The general trend of &lt;a href=&quot;https://a16z.com/why-software-is-eating-the-world/&quot;&gt;software eating the
world&lt;/a&gt; is a simpler
and more reasonable explanation for that level of growth, especially
with AI putting downwards pressure on the cost and cognitive overhead to
generate small pieces of purpose-specific code (a sweet spot for
serverless). Otherwise, the figures don’t really signal a mass
migration.&lt;/p&gt;

&lt;p&gt;What are the sources of friction against serverless?&lt;/p&gt;

&lt;p&gt;In this piece will discuss two factors. First, fatigue from the last
paradigm shift to microservices (a term I’ll use as shorthand for
architectures based on small, loosely coupled services with bounded
contexts). That transition was much harder than expected because of
immaturity in tooling, infrastructure, but also a critical gap between
technical and organizational readiness. Second that, while some might
consider that the industry is being too conservative, caution is in fact
reasonable because serverless will exacerbate the same type of
challenges that were created by microservices (many of which are still
not fully resolved).&lt;/p&gt;

&lt;h2 id=&quot;the-post-traumatic-syndrome-of-microservices&quot;&gt;The post-traumatic syndrome of microservices&lt;/h2&gt;

&lt;p&gt;A technological trend can be directionally correct, but direction says
little about timing, which is where most people get burned. Early
adopters share part of the industry bill for maturing a technology, so
in a way, every migration implies a bet that the cost / benefit ratio
will work out either because the maturity gap is already small enough,
or because the new technology will provide outsized benefits.&lt;/p&gt;

&lt;p&gt;Microservices made a canonical example of how easy it is to miscalibrate
that bet. Since the trend started &lt;a href=&quot;https://www.martinfowler.com/articles/microservices.html&quot;&gt;~15y
ago&lt;/a&gt;, these
architectures proved effective to solve real problems of scale,
reliability or productivity. But also showed a heavy reliance on
load-bearing infrastructure and organizational competence that didn’t
exist back then. FAANGs and service providers subsidized tooling and
infrastructure. The rest of the industry paid out of their own pocket
for the real-world projects where solutions were tested on the longer
tail of use cases and a generation of engineers were trained on
them. (Make your own estimates on what % of tech industry funding
may have gone to “break the monolith” projects alone.)&lt;/p&gt;

&lt;p&gt;The positive side of the microservice bubble was that it socialized the
cost of maturing the technology. In 2025 we enjoy a collective knowledge
base of benefits, trade-offs, risks and, especially, contraindications
(knowing when &lt;em&gt;not&lt;/em&gt; to use a technology is a good marker of maturity,
and it’s significant that only in the last couple of years it became
acceptable to say that a monolith is usually a better starting point).
The negative side was that, in retrospect, many organizations would have
preferred to opt-out of the battle testing part, and wait at the boring
side of the adoption curve.&lt;/p&gt;

&lt;p&gt;The serverless trend may very well be directionally correct too. But a
technical decision maker considering to sponsor a migration will be
justifiably worried about miscalibrating that bet and underestimate the
pending cost to bridge the remaining maturity gap. With microservice
scars still fresh, and immersed in a
&lt;a href=&quot;https://newsletter.pragmaticengineer.com/p/zirp-engineering-practices&quot;&gt;post-ZIRP&lt;/a&gt;
economic environment, this is already a risky proposition.&lt;/p&gt;

&lt;p&gt;Where is the complexity of moving into serverless?&lt;/p&gt;

&lt;h2 id=&quot;load-bearing-infrastructure&quot;&gt;Load-bearing infrastructure&lt;/h2&gt;

&lt;p&gt;A naive mental model for the transition to serverless involves laying
each microservice in the chopping board and fragment its constituent
logical components into a collection of smaller functions. At a high
elevation this makes no fundamental change: a functional Lego block is
conceptually the same whether it’s implemented as an endpoint in a
microservice or a lambda function. But so are an electric and a
combustion car conceptually the same, at least until you’re trying to
refuel.&lt;/p&gt;

&lt;p&gt;High-level abstractions are always supported by load-bearing
infrastructure which may (and should) be invisible, but is never
irrelevant. The transition to electric cars depends on adapting or
rebuilding the energy distribution network that is taken for granted in
combustion cars. The transition to microservices depended on providing a
new stack of technical infrastructure to solve distributed systems
problems that emerged as soon as communication between Lego blocks took
place over the network instead of a motherboard (to wit: on-wire
formats, latency, reliability, data integrity, service discovery,
deployment, observability, troubleshooting, etc.). In a similar way,
fragmenting a microservice into lambdas implies a leap away from a
significant part of of the previous load-bearing infrastructure.&lt;/p&gt;

&lt;p&gt;A basic example are dependency injection frameworks like
&lt;a href=&quot;https://github.com/google/dagger&quot;&gt;Dagger&lt;/a&gt;, &lt;a href=&quot;https://github.com/google/wire&quot;&gt;Google
Wire&lt;/a&gt;, &lt;a href=&quot;https://uber-go.github.io/fx/&quot;&gt;Uber
FX&lt;/a&gt;, or &lt;a href=&quot;https://docs.spring.io/spring-boot/reference/using/spring-beans-and-dependency-injection.html&quot;&gt;Spring
Boot&lt;/a&gt;,
which simplify wiring up dependencies among the handful of internal
components that make up a typical microservice. After those components
fragment into a collection of lambdas, the scope expands from a narrow
problem of bookkeeping local references within a process, to one of
orchestrating cloud resources using raw cloud APIs.&lt;/p&gt;

&lt;p&gt;Similar conversations appear in many other domains besides application
frameworks: observability, continuous delivery, etc. Piggybacking on
tooling and infrastructure for stateless architectures is insufficient
to support large serverless ones. Building purpose-specific abstractions
to fill those gaps makes for interesting engineering challenges, but any
decision maker considering a migration to serverless will rather see
them as an inconvenient liability, with uncertain cost, and with too
many odds of locking down engineering capacity that might be better used
for revenue-generating purposes.&lt;/p&gt;

&lt;h2 id=&quot;organizational-readiness&quot;&gt;Organizational readiness&lt;/h2&gt;

&lt;p&gt;Closing technical maturity gaps is not just a matter of building tools,
but also rewiring individuals and organizations to deploy them
effectively. Back in the 2010s, Netflix became one of the key referents
in applying microservices at scale in good part thanks to open sourcing
a vast &lt;a href=&quot;https://netflix.github.io/&quot;&gt;portfolio of internal tools and
infrastructure&lt;/a&gt;. Their cloud architect at
the time, Adrian Cockcroft, said in a &lt;a href=&quot;https://www.infoq.com/presentations/microservices-netflix-industry/&quot;&gt;2023 microservice
retrospective&lt;/a&gt;:
“&lt;em&gt;maybe […] we came up with stuff and we shared it with people and a few
people took it home and tried to implement it before their organizations
were really ready for it.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Most organizations have appetite for speed and growth. But introducing a
technical innovation that optimizes for scale, speed, and productivity
puts pressure on the organization to keep up and  renegotiate trade-offs
around decision-making structures, communication dynamics, operational
capabilities, risk tolerance, quality standards, and similar factors to
stay aligned with the purely technical aspects.&lt;/p&gt;

&lt;p&gt;In my experience, this organizational readiness has a significant lag
that manifests most often in delivery pipelines. Nowadays it’s easy to
find technical systems that look on the whiteboard like the canonical
fabric of independent, decentralized services on top of a &lt;a href=&quot;https://landscape.cncf.io/?view-mode=grid&quot;&gt;bingo
card&lt;/a&gt; of modern tooling. But
behind the surface many conceal monolithic delivery processes where
supposedly autonomous teams are forced to parade changes in lockstep
through a byzantine, maintenance-heavy via crucis of ”dev”, “staging” or
“pre-prod” environments where broad test suites validate each change
against the entire system before it reaches production. This model has
obvious scalability problems that cancel most of the potential of
microservice architectures (and sometimes make things worse). But
getting the broader organization to buy into other ways of planning,
developing, testing, delivering and operating complex distributed
systems is tough.&lt;/p&gt;

&lt;h2 id=&quot;serverless-amplifies-the-challenges-of-microservices&quot;&gt;Serverless amplifies the challenges of microservices&lt;/h2&gt;

&lt;p&gt;“Death Star” diagrams were commonly used in the early 2010s represent
the increased complexity inherent to microservice architectures. If
serverless implies that each one fragments further into hyper granular
lambda functions, then the application becomes a Death Star of Death
Stars, which is bound to exacerbate the same problems that the industry
has been grappling with for the past ~15 years. Technical ones might be
solved through sheer investment in tooling and infrastructure, but fewer
organizations are willing to take an unbounded share of that cost.
Rewiring individual and collective mental frameworks that can deploy and
operate them effectively will take much longer.&lt;/p&gt;

&lt;center&gt;
&lt;img width=&quot;50%&quot; src=&quot;https://newsletter.varoa.net/content/images/2025/01/image.png&quot; alt=&quot;Old Death Star diagrams from microservice early adopters (Hailo, Netflix, Twitter)&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;what-are-effective-vectors-for-serverless-adoption&quot;&gt;What are effective vectors for serverless adoption?&lt;/h2&gt;

&lt;p&gt;Head winds may mean that wholesale migration of workloads is unlikely to
happen in the short-medium term. This does not mean that serverless
lacks a solid value proposition (Simon Wardley has articulated &lt;a href=&quot;https://youtu.be/b7Nc_FJiosk?si=q2Z21FO_7dQF2q0Q&amp;amp;t=1884&quot;&gt;the
case&lt;/a&gt; &lt;a href=&quot;https://medium.com/a-cloud-guru/simon-wardley-is-a-big-fan-of-containers-despite-what-you-might-think-18c9f5352147&quot;&gt;a
few&lt;/a&gt;
&lt;a href=&quot;https://www.serverlesschats.com/110/&quot;&gt;times&lt;/a&gt;), so it is still worth
finding adoption vectors that allow for selective, low-risk, incremental
steps. My blueprint is basically this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Focus on domains owned by fully autonomous teams, who are already
fluent in developing, deploying, testing operating all their software
independently from the rest of the rest of the organization, and where
product stakeholders are equally comfortable with that autonomy.&lt;/li&gt;
  &lt;li&gt;Migrate existing workloads that have well-scoped, self-contained
logic, don’t need complex state management, have low to mid-level
traffic and bursty profiles that fit well with cost / performance
trade-offs of serverless (that paragraph serves as a prompt into your
favourite LLM, which should spit out some combination of event-driven,
background tasks, glue for lightweight orchestration, user
authentication flows, etc.)&lt;/li&gt;
  &lt;li&gt;AI and LLM Integrations deserve their own category. As I mentioned
above, these pretty much satisfy all the above properties, tend to
appear in a more experimental context and drag less legacy
dependencies. With AI Agents shaping to be the trending topic of 2025,
the type of architecture described in Anthropic’s &lt;a href=&quot;https://www.anthropic.com/research/building-effective-agents&quot;&gt;“Building effective
agents&lt;/a&gt;”
lends itself well to composites of small bits of business encapsulated
in functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I will close highlighting a central argument in Simon Wardley’s case for
serverless:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;The future is worrying about things like capital flow through your
applications, where money is actually being spent and what functions,
monitoring that capital flow, tying it to the actual value you’re
creating […] All of a sudden, we’ve got billing by function, we can look
at capital flow in applications, we can associate value to the actual
cost.&lt;/em&gt;” (&lt;a href=&quot;https://www.serverlesschats.com/110/&quot;&gt;source&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To the extent that the main incentive for broad serverless adoption is
not technical, but financial, the sponsor is unlikely to come from
the engineering department. This has non-trivial implications, but I’ll
leave that for another time.&lt;/p&gt;

&lt;p&gt;(Update: &lt;a href=&quot;https://news.ycombinator.com/item?id=42645012&quot;&gt;Hacker News thread&lt;/a&gt;)&lt;/p&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A different matter is whether a product &lt;em&gt;engineer&lt;/em&gt; should merrily go about being content with never lifting the cover of the abstractions he relies on. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I will narrow serverless to compute (e.g. AWS Lambda and family) excluding technologies that some people bundle in the term like event buses, queues, API gateways, etc. Those are core primitives for serverless architectures, but it doesn’t seem right to say that someone consuming from an event bus with stateless service deployed in EC2 is actually doing “serverless”. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Thu, 09 Jan 2025 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2025/01/09/serverless.html</link>
        <guid isPermaLink="true">https://varoa.net/2025/01/09/serverless.html</guid>
        
        
      </item>
    
      <item>
        <title>Identifiers are better off without meaning</title>
        <description>&lt;p&gt;Once at &lt;a href=&quot;httpf://last.fm&quot;&gt;Last.fm&lt;/a&gt; we had an integer overflow in an
identify field. I can’t recall where exactly. But I do remember that
the inconvenience of having a bunch of Hadoop jobs disrupted while we
rushed to update the relevant type couldn’t spoil the collective pride
for having more than 2 billion of whatever needed so many ids.&lt;/p&gt;

&lt;p&gt;Being frugal with identifiers is seldom a good idea, but for me the
worst identifier-related headaches came from IDs that had semantic
value.&lt;/p&gt;

&lt;p&gt;At &lt;a href=&quot;https://en.wikipedia.org/wiki/Tuenti&quot;&gt;Tuenti&lt;/a&gt; (once the largest
social network in Spain) there was a concept similar to Facebook pages.
Pages had types and subtypes. A page type might have been “group”, which
had subtypes “business” or “community”. Another type could be “place”
with subtypes like “store” or “landmark”.&lt;/p&gt;

&lt;p&gt;Page identifiers were strings composed by concatenating numeric
identifiers of the type, subtype, and then an increment field in a DB.
If you visited &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://tuenti.com/p/3_2_6691&lt;/code&gt; you’d instantly know
the meaning. It was a “place” (3) of type “store” (2), and the store ID
was 6691. Page IDs decomposed in this way would be useful for multiple
purposes.  Chosing a type-specific implementation of controllers to
compose the relevant page, routing to a database shard, that kind of
thing.&lt;/p&gt;

&lt;p&gt;At some point Product wanted to change the classification of pages. To
their frustration, this became problematic because the entire taxonomy
was encrusted across the code all the way from URLs to databases.&lt;/p&gt;

&lt;p&gt;Another example are &lt;a href=&quot;https://docs.newrelic.com/docs/new-relic-solutions/new-relic-one/core-concepts/what-entity-new-relic/#entity-synthesis&quot;&gt;New Relic
entities&lt;/a&gt;.
Entities are an abstraction that broadly represents anything that can
send telemetry to New Relic. A host, a Kubernetes cluster, an
application, a JVM or a network router can be entities. Of course those
are all things you want to identify, so entities have Global Unique
IDentifier, or GUID. Every single telemetry datapoint is stamped with
the GUID of the entity that produced it, so they act as the keystone of
the New Relic platform. Features like Service maps, distributed tracing,
entity relationships, and many others are built upon them.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/newrelic/entity-definitions/blob/main/docs/entities/guid_spec.md&quot;&gt;Entity GUIDs have
meaning&lt;/a&gt;.
An example of a GUID is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1|APM|APPLICATION|23&lt;/code&gt; where 1 is the account,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;APM&lt;/code&gt; is the domain, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;APPLICATION&lt;/code&gt; is the unique type within that
domain, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;23&lt;/code&gt; is a unique identifier within the domain and type. That
application might be running in a host &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1|INFRA|HOST|12&lt;/code&gt;. If we wanted
to store that relation, we’d have an entry in some database saying:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;1|INFRA|HOST|12&quot; RUNS &quot;1|APM|APPLICATION|23&quot;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Like at Tuenti, these semantics come handy. If you’re processing
millions of telemetry datapoints per second, it’s useful to tell the
type of reporting entity on the fly by decomposing the GUID, rather than
perform an expensive lookup to an external service. Account IDs can be
used to route data to cells (&lt;a href=&quot;https://www.youtube.com/watch?app=desktop&amp;amp;v=eMikCXiBlOA&quot;&gt;this talk from Andrew
Bloomgarden&lt;/a&gt;
explains how NR used this pattern to scale).&lt;/p&gt;

&lt;p&gt;Domains like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;INFRA&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;APM&lt;/code&gt; corresponded to the original product
verticals at New Relic. Years later Product decided (with good criteria)
that they created unnecessary fragmentation on the user experience.
Types change, get renamed or merged with others.  Sometimes (more often
than it seems) entities had to migrate from one account to another. In
all these cases you would be altering the identifier of many entities.&lt;/p&gt;

&lt;p&gt;It was painful but possible to work around many of these problems. But
replacing GUIDs with semantic-free identifiers was straight impossible.
By virtue of being present in thousands of URLs, NRQL queries, etc.
GUIDs had become a public API that thousands of customers relied upon. A
technical solution to replace identifiers would have been a major
project, but doable. What wasn’t possible was to run a find/replace
across the private documentation and workflows of your entire customer
base.&lt;/p&gt;

&lt;p&gt;If you look closely, the world is full of semantic identifiers.
Sometimes they are hard to avoid, but almost always a pain in
the neck. Because they embed a specific model of the world. But models
become obsolete faster than we’d like.&lt;/p&gt;

&lt;p&gt;Addresses make notable examples. The “complex and idiosyncratic”
&lt;a href=&quot;https://en.wikipedia.org/wiki/Japanese_addressing_system&quot;&gt;Japanese address
system&lt;/a&gt;
reflects the organic growth of its urban areas. In &lt;a href=&quot;https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Overview&quot;&gt;British postal
codes&lt;/a&gt;
the final part can designate anything from a street to a flat depending
on the amount of mail received by the premises.&lt;/p&gt;

&lt;p&gt;When I was a kid, license plates would give up the province where the
owner lived causing an &lt;a href=&quot;https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_Spain#1900_to_1970&quot;&gt;array of
nuisances&lt;/a&gt;.
They were alleviated with the adoption of &lt;a href=&quot;https://en.wikipedia.org/wiki/European_vehicle_registration_plate#European_Union&quot;&gt;European
standards&lt;/a&gt;.
But partly. The root problem being that, like the &lt;a href=&quot;https://en.wikipedia.org/wiki/Domain_Name_System&quot;&gt;Domain Name
System&lt;/a&gt;
identifiers (“Galo’s website”) remain tied to administrative authorities
(“.net”), which can change regulations, or even disappear.&lt;/p&gt;

&lt;p&gt;Nowadays I find most semantic identifiers in resource management. For
some reason, when infrastructure teams define access rules to the
resources of a particular service, they prefer to create a group named
after the team that owns that service, something like
“owner-team-access-list”. Identifiers tied to the org chart don’t like
it when the service moves to another team, or owner-team is reorged
away.&lt;/p&gt;

&lt;p&gt;–&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: a commenter in Reddit pointed at another great example of
problems of identifiers with semantics: the &lt;a href=&quot;https://en.m.wikipedia.org/wiki/German_tank_problem&quot;&gt;German tank
problem&lt;/a&gt;. Do send
more!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Update: Discussion in &lt;a href=&quot;https://news.ycombinator.com/item?id=40247373&quot;&gt;HackerNews&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 01 May 2024 22:01:00 +0000</pubDate>
        <link>https://varoa.net/2024/05/01/identifiers-are-better-off-without-meaning.html</link>
        <guid isPermaLink="true">https://varoa.net/2024/05/01/identifiers-are-better-off-without-meaning.html</guid>
        
        
      </item>
    
      <item>
        <title>Alert on symptoms, not causes</title>
        <description>&lt;p&gt;When you are bringing a new system to production you know that you ought
to define SLIs, set up instrumentation, alerting, etc. Nowadays there is
an abundance of tooling and infrastructure to extract data from your
service and the entire stack it runs on. But this leaves you with a
problem. What can we do with that data? Should you put all of it on a
dashboard, in many? What should trigger an alert and wake you up in the
middle of the night? The possibilities are endless, as Lou Reed would
&lt;a href=&quot;https://open.spotify.com/track/6tM8cMX9S4AyRd5sDDrzhN?si=8b60aa7658d74711&quot;&gt;sing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Teams usually start sieving through their data from generalised
monitoring dashboards available in every major observability platform.
They find the usual charts with resources like CPU, memory, IO, and
threads. JVM users get heap and non-heap usage, pauses, and collection
times. APIs expose latency, throughput, and HTTP response codes, all
exhaustively broken down by endpoint. There is an understandable urge to
watch for any source of trouble. All that data seems important, so the
next step is setting alert triggers on all of them with whatever
thresholds seem reasonable at the time.&lt;/p&gt;

&lt;p&gt;What follows is a Niagara of alerts that aren’t quite one. It could be
that a sudden CPU spike to 91% crosses the threshold, only to be
dismissed as being due to the JVM just cleaning some garbage. The team
will try to fine tune the alert (”maybe trigger only if it’s 90% for
more than 2 minutes”), only to be hit by another false positive on
whatever other of the dozens of alerts. After much stitching, false
positives will go down. Then, an actual incident will slip undetected
through the patchwork of alert thresholds.&lt;/p&gt;

&lt;p&gt;Sometimes teams will treat this as part of the inevitable toil that goes
with operating production software, like carrying extra weight in your
backpack during the on-call shift.&lt;/p&gt;

&lt;p&gt;Being perhaps too lazy, when I build systems I aspire to make
operational toil so small that on-call feels like a free bonus. Of
course it’s hard to get there because operating software creates
problems that sometimes require human effort. But a deluge of noisy
instrumentation isn’t one of those problems. Don’t resign yourself to
it.&lt;/p&gt;

&lt;h2 id=&quot;symptoms-before-causes&quot;&gt;Symptoms before causes&lt;/h2&gt;

&lt;p&gt;To turn instrumentation data into high signal my default approach is to
start by focusing on symptoms before causes.&lt;/p&gt;

&lt;p&gt;This often feels counter intuitive to engineers. Our instinct says that
high CPU usage is a harbinger for trouble, so it would seem wise to
alert when it exceeds that threshold. While there is nothing
intrinsically wrong with this, it quickly becomes impractical. There are
innumerable other causes for trouble. Just like high CPU usage, you
could take each of the dozens of “seems quite important” default metrics
produced by your instrumentation framework, and think of thresholds
above which things might start going awry.&lt;/p&gt;

&lt;p&gt;It’s a case of how an aggregation of good individual decisions sometimes
produces a negative outcome. High CPU usage might signal a failure with,
for example, 75% success rate. But that also means a 25% chance of being
false positive. Every time this alert triggers preemptively you place
a burden on the operator to confirm whether it is actually relevant.
As you add more metrics for the many other possible causes of trouble,
the odds of suffering false positives accumulates to unpleasant levels
quite fast. Discomfort is not even the biggest problem with noisy
alerts. Lots of alerts with low signal ratio slowly induce your team
to ignore them, miss actual incidents and worsen their impact.&lt;/p&gt;

&lt;h2 id=&quot;going-beyond-just-in-case-alerting&quot;&gt;Going beyond “just in case” alerting&lt;/h2&gt;

&lt;p&gt;The cause-first approach is “just in case” alerting. A symptom-first
approach is less about probabilities than certainties. It’s not about
answering “what might cause a problem” but “what external behaviour
manifests that my service is not healthy”.&lt;/p&gt;

&lt;p&gt;An example of a symptom could be an API’s latency going through the
roof. We don’t know what is causing it, it could be CPU or a million
other things. But we are positive that the service is not doing its
job as expected, so we know that the PagerDuty horn that’s about to
bring the sleeping engineer into cardiac arrest is worth blowing.
It is at this point that the knowledge and intuitions about probable
causes and the fancy instrumentation you collected from the system
become useful and usable. You start with the symptom: latency. Then
you move on to check probable causes, like CPU usage or GC activity.
If those aren’t the actual causes, you have a treasure trove of
telemetry at your disposal to search for anomalies that might explain
the problem that you do know you have.&lt;/p&gt;

&lt;h2 id=&quot;causes-for-failure-are-permanent-residents&quot;&gt;Causes for failure are permanent residents&lt;/h2&gt;

&lt;p&gt;Another problem with focusing on causes is that after a system reaches
a certain level of complexity (by 2024, even the most trivial software
sits on top of complex systems), causes for failure are not only
diverse. They are also everywhere. This is Richard Cook on “&lt;a href=&quot;https://how.complexsystems.fail/#3&quot;&gt;How
complex systems fail&lt;/a&gt;”:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;3. Complex systems contain changing mixtures of failures latent within
them. The complexity of these systems makes it impossible for them to
run without multiple flaws being present. […]

4. Complex systems run in degraded mode. Complex systems run as broken
systems. The system continues to function because it contains so many
redundancies and because people can make it function, despite the
presence of many flaws.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;His point is that in a complex system, there are always potential
causes of trouble lying somewhere in between the cogs. You can’t blow
the horn continuously just in case the potential cause did create an
actual problem this time. When something isn’t exceptional, alerting
on its presence loses most of its value and, as we noted above, tends
to become counter-productive by desensitising operators.&lt;/p&gt;

&lt;p&gt;Instead, your system will be better off treating the presence of
potential causes of failure as normal operation. It should work
despite being exposed to them, and trigger an alert only when it is
unable to perform its function and attention from operators is really
necessary.&lt;/p&gt;

&lt;h2 id=&quot;good-slos-are-almost-always-symptoms&quot;&gt;Good SLOs are almost always symptoms&lt;/h2&gt;

&lt;p&gt;Answers to “what external behaviour manifests that my service is not
healthy” tend to overlap with answers to “how would a user tell that the
service doesn’t work” or “how would the business tell that we’re making
money“. Symptoms get you talking about page loading times, conversions,
successful page loads, and so on. Terms closer to user experience and
business objectives. They may seem too far away from the guts of the
system, but will give you clarity and purpose. It’s almost impossible to
figure out how much time per month you can tolerate with &amp;gt;90% CPU
utilisation. It’s much easier to figure out what is the acceptable
percentage of failed requests.&lt;/p&gt;

&lt;p&gt;Being on the same wavelength as the business gives engineers more
control over alert fatigue and toil. The handful of symptoms that become
SLOs allow you to reduce drastically the amount of alerts we need to
configure and attend to.&lt;/p&gt;

&lt;p&gt;SLOs also imply an error budget, and knowing how much failure is
acceptable in a period of time gives your team the option to keep the
PagerDuty horn silent until your error budget is at risk. This doesn’t
mean you should become complacent and tolerate errors, but that you can
(and should!) administer the demand for your team’s limited energy.
Similarly, associating each cause of failure with its effect on SLOs
helps prioritise investing in the most impactful (which won’t
necessarily be the most frequent!).&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;When you define health alerts for your systems it’s more useful to start
with symptoms of trouble, rather than potential causes.&lt;/p&gt;

&lt;p&gt;Causes usually look at resources (CPU, memory, IO, etc.) or internal
processes (GC, thread scheduling, etc.), but the presence of a cause
does not guarantee an issue. This adds the burden of confirmation on the
operator, which gets multiplied because there are many potential causes
of failure. In a complex system, they are also always present.  All
these factors lead to excess alert noise, fatigue operators, and make
them less effective at keeping the system healthy.&lt;/p&gt;

&lt;p&gt;Symptoms look at high-level function (requests served in time, data
flowing, payments being processed). Symptoms are few, and provide
reliable indicators that the system is unable to perform its job. An
alert gives near certainty that attention for the operator is necessary.
Focus on symptoms aligns engineering priorities with those of the
business, and helps define SLOs and error budgets that guide engineering
towards more effective and efficient use of their effort, and gives
control over operational toil.&lt;/p&gt;

&lt;h2 id=&quot;some-further-reading&quot;&gt;(Some) further reading&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;“&lt;a href=&quot;https://charity.wtf/2019/09/20/love-and-alerting-in-the-time-of-cholera-and-observability/&quot;&gt;Love (and Alerting) in the Time of Cholera (and
Observability)&lt;/a&gt;”.&lt;/li&gt;
  &lt;li&gt;“&lt;a href=&quot;https://bravenewgeek.com/choosing-good-slis/&quot;&gt;Choosing good SLIs&lt;/a&gt;”.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://sre.google/books/&quot;&gt;Google’s SRE bookshelf&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Wed, 06 Mar 2024 12:00:00 +0000</pubDate>
        <link>https://varoa.net/2024/03/06/alert-on-symptoms-not-causes.html</link>
        <guid isPermaLink="true">https://varoa.net/2024/03/06/alert-on-symptoms-not-causes.html</guid>
        
        
      </item>
    
      <item>
        <title>How about we forget the concept of test types?</title>
        <description>&lt;p&gt;I have found that the concept of test types (unit, integration, and so
on) does more harm than good. People often give me odd looks when I say
this, so having the explanation in a URL will come handy for future
reference. I will use it as an introduction to a series of practical
case studies on the topic of testing software.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Test types shape the way engineers and even adjacent professionals
reason about testing software. This goes far beyond how an engineer may
decide to test a single change. It also influences how teams and entire
organisations analyse, design, implement, and evolve their testing
pipelines. Which is no joke, given that test writing, test
infrastructure, and test tooling accounts for around 30-40% of the cost
of maintaining software (ChatGPT or Google queries are consistent in
that ballpark, which roughly matches my experience). This cost is well
justified by the impact of testing on any company whose product depends
on building software.&lt;/p&gt;

&lt;p&gt;Imagine you’re in one of those organisations. It might be new, and
growing. How do you structure the testing pipelines, the infrastructure,
the principles and “best practices” that govern work inside each
individual engineering team and across them. How do you find the right
balance between quality and velocity? Do we need a QA team? How will all
those elements behave when growing pains appear? Will they adapt and
keep up with the business, or will your engineering machinery grind to a
halt?&lt;/p&gt;

&lt;p&gt;What happens if the organisation was consolidated, and already at that
breaking point? Suffering quality issues, slow development, gruelling
and unpredictable delivery cycles. How do you approach the task of
improving the quality of the overall product, and each of its
components? How do you identify the necessary changes in the existing
testing pipeline? How do you convince leadership to fund them? How do
you execute those changes without disrupting the business? When these
changes impact the way product teams develop, test, and distribute their
software, how do we exercise an effective influence on their (likely
reluctant) engineers, their tech leads, their engineering and product
managers? What type of shared testing infrastructure needs to be built
to support all those teams, and which should never be built, even if
teams ask for it? Was there a QA team already? Does their role change?
How different are the analysis and the solutions if the product is not
just a web site with a bunch of server-side components, but also has a
mobile application, or native ones, or firmware? How do we know if we’re
making progress?&lt;/p&gt;

&lt;p&gt;Having solid foundations to reason about your testing is essential to
answer any of those questions.&lt;/p&gt;

&lt;p&gt;Test types consolidated on that foundational role when Michael Cohn
introduced the Test Pyramid in “&lt;a href=&quot;https://www.amazon.es/Succeeding-Agile-Software-Development-Using/dp/0321579364?&amp;amp;linkCode=ll1&amp;amp;tag=avr0b-21&amp;amp;linkId=76908430c11232694d8007c9d428887b&amp;amp;language=es_ES&amp;amp;ref_=as_li_ss_tl&quot;&gt;Succeeding with
Agile&lt;/a&gt;”
(2010). The key concept it put in the collective mindset was that you
can classify tests in types which are then laid down as layers. Bottom
to top these were ”unit”, “service” (nowadays perhaps more commonly
known as “integration”), and “user interface”. You want to have more of
the lower ones, fewer of the upper ones.&lt;/p&gt;

&lt;p&gt;Here is &lt;a href=&quot;https://www.mountaingoatsoftware.com/blog/the-forgotten-layer-of-the-test-automation-pyramid&quot;&gt;Cohn
himself&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&quot;At the base of the test automation pyramid is unit testing. Unit
testing should be the foundation of a solid test automation strategy and
as such represents the largest part of the pyramid. Automated unit tests
are wonderful because [...]&quot;
&lt;/blockquote&gt;

&lt;p&gt;This says something about their importance relative to other types but
nothing about how to distinguish them. I couldn’t find a clearer
definition in “Succeeding with Agile”. In my experience, when people
talk about unit tests they imply a focus on verifying a narrow surface
of a code base, although it’s unclear how narrow.
&lt;a href=&quot;https://en.wikipedia.org/wiki/Unit_testing&quot;&gt;Wikipedia&lt;/a&gt; says that a
“&lt;em&gt;Unit is the smallest component that can be isolated within the complex
structure of an app. It could be a function, a subroutine, a method or
property&lt;/em&gt;”, but it comes along with a “&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;citation needed&lt;/code&gt;” and if I take
that definition seriously, then Cohn’s later point that Service testing
“fill[s] the gap between unit and user interface testing” sounds like
defining water surfaces as “ponds” and “everything else”.&lt;/p&gt;

&lt;p&gt;So let’s consult more consultants. Martin Fowler explains that when
writing unit tests at C3 they would “set up a test fixture that created
that object with all the necessary dependencies so it could execute its
methods.” Now we seem to be at the granularity of an object. But in the
same explanation he &lt;a href=&quot;https://martinfowler.com/articles/2021-test-shapes.html&quot;&gt;provides a
quote&lt;/a&gt; from
Kent Beck’s &lt;a href=&quot;https://www.amazon.com/Extreme-Programming-Explained-Embrace-Change/dp/0321278658?crid=2Q9PJD594CD0M&amp;amp;keywords=extreme+programming+kent+beck&amp;amp;qid=1704621833&amp;amp;s=books&amp;amp;sprefix=extreme+programming+kent+be%2Cstripbooks-intl-ship%2C259&amp;amp;sr=1-1&amp;amp;linkCode=ll1&amp;amp;tag=avr0b-20&amp;amp;linkId=f6e4c6e224d48f82be27de28db318fd5&amp;amp;language=en_US&amp;amp;ref_=as_li_ss_tl&quot;&gt;Extreme
Programming&lt;/a&gt;,
which Fowler reads as meaning that “&lt;em&gt;’unit test’ means anything written
by the programmers as opposed to a separate testing team&lt;/em&gt;”.&lt;/p&gt;

&lt;p&gt;That scope seems panoramic compared with “&lt;em&gt;the smaller component that
can be isolated within […] an app&lt;/em&gt;” that we read in Wikipedia and many
other sources. Besides, linking the type of test to who writes it is
also problematic: I know of many organisations where programmers write
most tests, even those that verify large areas of the system or the UI.
Does this mean that tests will be “unit” in one company but not in
another?&lt;/p&gt;

&lt;p&gt;According to Martin Fowler, &lt;a href=&quot;https://martinfowler.com/articles/2021-test-shapes.html&quot;&gt;there
was&lt;/a&gt;
“considerable discussion” about Kent Beck’s formulation to the point
that “&lt;em&gt;one test expert vigorously lambasted Kent for his usage&lt;/em&gt;”. They
asked the expert for his definition of unit testing and he replied that
“&lt;em&gt;in the first morning of my training course I cover 24 different
definitions of unit test.&lt;/em&gt;”&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;We can’t expect much of a conceptual framework based on test types when
the key terms, in Fowler’s own words, “&lt;em&gt;have always been rather murky,
even by the slippery standards of most software terminology&lt;/em&gt;”. It
certainly explains why using them in any conversation about testing
among engineers works like the proverbial can of worms. Once you open
it, the slimy creeps cause trouble everywhere.&lt;/p&gt;

&lt;p&gt;But it’s worse than that. If engineers can’t have a meaningful
conversation about testing, then the communication with business
stakeholders about quality and software delivery is doomed to be nothing
but dysfunctional.&lt;/p&gt;

&lt;p&gt;I have seen my share of the consequences. Organisations that take the
pyramid at heart and over invest on the basis with a multitude of hyper
granular unit tests for individual functions and methods. They run into
the law of diminishing returns. Issues proliferate at the articulation
between pieces that were considered individually correct. Business
stakeholders are unimpressed by the high coverage number and would
rather see the product work after assembling the pieces. The “&lt;a href=&quot;https://www.reddit.com/r/ProgrammerHumor/comments/dw8s1i/2_unit_tests_0_integration_tests/&quot;&gt;two unit
tests, zero integration
tests&lt;/a&gt;”
memes circulate. Teams get burnt by those problems, some dismiss unit
tests as “encumbering and superficial” and conclude that &lt;a href=&quot;https://news.ycombinator.com/item?id=30942020&quot;&gt;unit testing
is overrated&lt;/a&gt;. They
decide that they need less of that type, and more of the comprehensive
types that exercise complete use cases across wider surfaces of the
system. A good amount of those teams later end up buried under the
weight of byzantine test infrastructures that silently grew slow and
unscalable in technical, organisational, or both dimensions. In the
meantime business grows more and more frustrated with the slow pace of
delivery.&lt;/p&gt;

&lt;p&gt;All the way through this mess, people try to figure out what went wrong.
They have opinionated debates around the company’s or the internet’s
water cooler that seldom reach a conclusion. Maybe the problem is in the
choice of shape? What if &lt;a href=&quot;https://twitter.com/swyx/status/1261202288476971008&quot;&gt;the Pyramid has fallen out of
style&lt;/a&gt;? Perhaps we
should get more creative. Let’s try the &lt;a href=&quot;https://thetestingarchitect.substack.com/p/test-pyramid-test-honeycomb-test&quot;&gt;Test
Trophy&lt;/a&gt;.
&lt;a href=&quot;https://engineering.atspotify.com/2018/01/testing-of-microservices/&quot;&gt;The Test
Honeycomb&lt;/a&gt;.
&lt;a href=&quot;https://web.dev/articles/ta-strategies&quot;&gt;The Test Diamond&lt;/a&gt;. &lt;a href=&quot;https://web.dev/articles/ta-strategies&quot;&gt;The Test
Crab&lt;/a&gt;. We were not able to
define one type properly but why not add more? Component tests. API
tests. Fowler himself proposes &lt;a href=&quot;https://martinfowler.com/bliki/SubcutaneousTest.html&quot;&gt;Subcutaneous
tests&lt;/a&gt;, &lt;a href=&quot;https://martinfowler.com/bliki/BroadStackTest.html&quot;&gt;Broad
stack tests&lt;/a&gt;,
&lt;a href=&quot;https://martinfowler.com/bliki/UnitTest.html&quot;&gt;Solitary and Sociable
tests&lt;/a&gt;. Sometimes they
depend on how much surface of code is touched. Others it’s on how that
code is structured. On what part of the stack the code belongs to. How
are tests written, or who writes them.  Anything goes.&lt;/p&gt;

&lt;p&gt;The whole thing reminds of Borges’ &lt;a href=&quot;https://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevolent_Knowledge&quot;&gt;Celestial Emporium of Benevolent
Knowledge&lt;/a&gt;.
All are valid classifications, none authoritative, and their utility
limited to some. Which is fine for certain topics. But when it comes
about testing software it seems that the importance of the subject would
deserve, if not demand, that software engineers have a solid vocabulary
to hold a rational conversation, among themselves and with stakeholders.&lt;/p&gt;

&lt;div class=&quot;image-box&quot;&gt;
  &lt;img src=&quot;https://varoa.net/assets/test_types/wooden_shapes.png&quot; alt=&quot;A shape sorting toy made out of wood, with coloured pieces&quot; /&gt;
&lt;/div&gt;

&lt;p&gt;How about we stop being content with “slippery standards” on our
professional terminology and pay less attention to these murky terms?
What if instead, we focus on what we want to achieve and frame the
problem as what it is: a system. After all a test pipeline is just that,
a system whose inputs are code and related artefacts produced by
development teams, and whose function is to give as much confidence as
possible on whether they work as expected. We want this system to remain
efficient, performant, cost-effective, and scalable to keep up with the
needs of the business. The usual drill.&lt;/p&gt;

&lt;p&gt;With this approach the problem stops being about shoehorning the
complexity of modern software in awkwardly shaped geometric ideals that
fit someone’s wooden box. Instead, we are designing a system with a
clear purpose. We are doing engineering.&lt;/p&gt;

&lt;p&gt;What do we find if we look at a testing pipeline from a systems
perspective? Well, one can think about three basic, familiar properties
we always care about: latency, throughput, error rate.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Latency tells us how long it takes to verify a given change, to run a
test, or the full test suite. It measures the length of our feedback
loops.&lt;/li&gt;
  &lt;li&gt;Throughput tells us how many of those verifications I can run per unit
of time.&lt;/li&gt;
  &lt;li&gt;Error rate tells us what percentage of test executions fail to do
their job. Note that this is not the same as failed tests (that is a
successful execution!). Errors would be false positives (regressions
that pass all tests and therefore slip through our safety net), or
false negatives (often flakes that fail tests even though there wasn’t
an actual regression).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not ground breaking ideas! They permeate the literature about
software testing in one form or another (the reason for preferring unit
tests boils down to trade-offs around latency, error rate, throughput).
But for some reason types, categories, and shapes take the spotlight and
dominate the discussion. Bringing back the focus to the domain of
systems design rather than abstract classification games, helps reason
about problems around testing much more productively.&lt;/p&gt;

&lt;p&gt;As I mentioned above this is meant to be an introduction to a short
series of posts about testing software. The next posts will be practical
applications of a systems perspective:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Analyze a testing pipeline with multiple teams involved, which suffers
many of the pathologies described above. It will be based on real
examples that I have found in the wild. I will model these pipelines
as an actual system and show how this gives us a much better
understanding of what we can do to improve the situation.&lt;/li&gt;
  &lt;li&gt;Show concrete interventions that can be implemented, from the
individual team level to the larger organisation, and use our model of
the system to observe and measure the impact. I will also try to
reference real work done in some of the companies I’ve worked with.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m curious about how people design their own testing pipelines,
challenges, and useful patterns, so anything you want to share will be
very welcome.&lt;/p&gt;
</description>
        <pubDate>Tue, 06 Feb 2024 21:00:00 +0000</pubDate>
        <link>https://varoa.net/2024/02/06/how-about-we-forget-the-concept-of-test-types.html</link>
        <guid isPermaLink="true">https://varoa.net/2024/02/06/how-about-we-forget-the-concept-of-test-types.html</guid>
        
        
      </item>
    
      <item>
        <title>How organisations cripple engineering teams with good intentions</title>
        <description>&lt;p&gt;I believe that engineers are at their best when they complement strong
technical expertise with skills from other disciplines such as product,
project and people management, customer support, HR, finance, UX, and
many others. I believe that any software engineer should structure their
growth plan to acquire the basics of some of those disciplines. I
recommend undergoing a &lt;a href=&quot;https://charity.wtf/2019/01/04/engineering-management-the-pendulum-or-the-ladder/&quot;&gt;tour of
duty&lt;/a&gt;
wearing one of those hats.&lt;/p&gt;

&lt;p&gt;I also believe that engineering teams are better when they are not
limited to executing technical work, but also understand why. Engineers
power-up when they have a clear understanding of the business and
product strategy. When they are involved with other domain experts
(product, project, people managers, customer support, HR, finance, UX,
etc.) in designing the organisational structure and processes that
govern their day-to-day technical work.&lt;/p&gt;

&lt;p&gt;Some may disagree with this. That’s fine, but that’d be a different
discussion. Here I assume that we agree on those points, and therefore
we should want to design organisations to pursue those goals. Where
engineers are supported and encouraged to develop a diverse toolbox of
skills from other disciplines, and where engineers are supported and
encouraged to participate in more aspects of the business than merely
typing code.&lt;/p&gt;

&lt;p&gt;Here I want to discuss how attempts to implement these worthy objectives
often backfire.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Let me present a simplified version of a pattern I’ve witnessed a few
times.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Leadership learns from product manager feedback that technical
decisions are not well aligned with user needs. They diagnose
(rightly) that engineering is not close enough to the user. Since we
have an interest in augmenting engineers with product management
skills, it seems like a good idea to introduce a change in the
organisation’s processes so that engineers spend more quality time
with PMs and customers when defining epics / stories. It’s hard to
argue with this! It makes total sense.&lt;/li&gt;
  &lt;li&gt;Some time later UX designers raise that the software is disconnected
from the actual user experience. After a similar analysis, it seems
like a good idea to modify our processes to allow engineers to spend
more quality time with UX designers when designing features. Again,
it’s hard to argue with this! It makes total sense.&lt;/li&gt;
  &lt;li&gt;Some time later leadership notices that project management work is
falling through the organisational cracks which harms delivery,
quality, etc. They realise that this is a good opportunity to help
engineers develop their project management skills, so we incorporate
some project management responsibilities into engineering teams.
Again, this makes sense!&lt;/li&gt;
  &lt;li&gt;Then leadership meets with Customer Support.&lt;/li&gt;
  &lt;li&gt;Then, with Sales.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You see where I’m going, right?&lt;/p&gt;

&lt;p&gt;Here is a simplified version of another common pattern.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Product stakeholders are defining product priorities for the quarter.
We want an inclusive work environment where any engineer can
contribute ideas for the product. This all makes sense, so we ask the
managers to work with their teams in proposing ideas for initiatives
and projects.&lt;/li&gt;
  &lt;li&gt;HR are looking to improve social media presence and attract leads to
the hiring pipeline. Having Engineer-generated content in the
corporate blog would be powerful! It also helps engineers build
writing skills, get a public presence. Let’s ask engineers to write!&lt;/li&gt;
  &lt;li&gt;Hiring processes are about to be redefined. We dig inclusiveness. We
want engineers engaged and involved. The hiring managers and HR ask
teams to get the engineering hive mind to work and crunch some
proposals.&lt;/li&gt;
  &lt;li&gt;Customer support needs new standards, some consolidation of processes
and tools.  Engineer feedback and engagement is valuable! We ask each
team’s manager to collect feedback and ideas from their teams.&lt;/li&gt;
  &lt;li&gt;And so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;A sum of individual good ideas doesn’t guarantee a good outcome&lt;/h2&gt;

&lt;p&gt;It’s a common mistake to assume that by adding up rational individual
decisions you get to a good aggregate outcome. It’s usually the
opposite. In an evacuation, running for the exit is a sensible
individual decision, but aggregate them and you get a lethal stampede.&lt;/p&gt;

&lt;p&gt;Something similar happens in our little stories. Making process changes
to bring PM, UX, Customer Support, etc. closer to engineers, or asking
engineering teams to propose ideas for organisational aspects makes
sense as individual decisions. But put together, they can have bad
consequences which harm engineering teams (and by extension, the larger
organisation). As we implement them, we suddenly realise that engineers
barely spend time in actual engineering.&lt;/p&gt;

&lt;p&gt;Now, I know this is a trigger for many people. They think. “Hah! Here we
go again! An engineer arguing that engineers should be left alone with
the code”. This is not what I’m saying (I said the very opposite in the
first paragraph!). But my strong belief in multi dimensional engineers
that are engaged with the larger organisation is compatible with the
belief that engineers should spend most of their time in engineering.&lt;/p&gt;

&lt;p&gt;That statement is not even controversial if you swap industries. Think
construction. There is value in bricklayers, plumbers, electricians,
designers and architects knowing some of each other’s skills. Still,
bricklayers must spend a substantial amount of their time laying bricks.
Electricians wiring. Plumbers plumbing. Architects architecting. Or else
the building won’t get done.&lt;/p&gt;

&lt;p&gt;Software engineering is not different. Engineers need to spend time
doing engineering. Who else will do the technical designs, write,
review, test, operate software?&lt;/p&gt;

&lt;p&gt;A more measured objection to my point is to accept that while engineers
should spend time engineering, they should still spend some of their
time in activities outside of their core expertise area. Sure. I agree.
But “some” is quite broad. Let’s be more precise. What % of time spent
in core engineering activities are we talking about? 50%? 25%? Give me a
ballpark.&lt;/p&gt;

&lt;h2&gt;Minimum reasonable focus&lt;/h2&gt;

&lt;p&gt;Or rather, let’s change places. What is the minimum % of time in
activities not related to the core expertise that you consider
reasonable for your own role? What is the minimum % of time that a PM
should spend on product management tasks? Is 50% reasonable or does it
seem too low? It does sound low to me! Isn’t it more like 70%? Would it
be reasonable for a UX designer to spend less than 70% of their time in
core UX design activities? What about people managers? Customer Support?&lt;/p&gt;

&lt;p&gt;But let’s go further. I guess we agree that multi dimensionality also
applies to non-engineering disciplines: that they are also enriched by
acquiring the basics of engineering (among others). So let me ask the
PMs, UX designers, People Managers, HR, Sales and customer support folks
in the room. What % of your time do you spend doing core engineering
activities like code, tests, code reviews, technical documents,
operations and so on. I am fairly confident that if I take the average
of this poll, it would round up to low single digits. That sounds
reasonable! But doesn’t that mean that having engineers spend low
single-digit %s of their time on those non-engineering activities is
also reasonable?&lt;/p&gt;

&lt;p&gt;Building a house needs bricklayers, architects, plumbers, designers,
electricians, and so on. All matter. All are valuable. All benefit from
learning the basics about the other’s expertise area. And yet, plumbers
spend most of their time plumbing. The many experts involved in building
software are in exactly the same situation. All matter. All are
valuable. All are richer when they learn the basics of the other’s
expertise area. And yet, all need to dedicate a substantial % of their
focus and dedication to their core activities.&lt;/p&gt;

&lt;p&gt;We can now go back to that sequence of individual, rational, sound
decisions that the leaders who design organisations tend to make. Of
course it’s great to design organisations that help engineers acquire a
diverse set of skills, that they engage with defining strategy,
organisation, process. But because time is limited, we must be conscious
that every time we direct an engineer’s attention away from engineering,
we’re chipping away from that minimum reasonable allocation of time that
they, like any other professional, need for their core activities. I
reiterate that this is not just writing code. It’s also about code
reviews, technical designs, and so on. I struggle to see a minimum
reasonable allocation for those core engineering activities of less than
70%.&lt;/p&gt;

&lt;p&gt;You might think that a budget of 30% non-engineering time doesn’t seem
so bad. It’s 12h in a standard 40h week! It can fit a lot of stuff.&lt;/p&gt;

&lt;p&gt;But notice that we didn’t even talk about the baseline of day-to-day
overhead that goes into every individual engineering team. I’m thinking
of activities involved in backlog grooming or ordinary human
coordination that already consume a good chunk of that non-engineering
budget. Those activities tend to be inflated with a proliferation of
rituals, meetings, paperwork, rich in post-its and generally under the
umbrella of a methodology, that go well beyond the necessary to achieve
their purpose with pragmatism. Not much is left of those 12h.&lt;/p&gt;

&lt;p&gt;Project/product/people management specialists easily overlook that
overhead and inflation because, from their perspective, that time seems
well spent (engineers are project managing! Growing multi disciplinary
skills! Applying the latest methodologies! Good stuff!) But what happens
if we now add some time working on requirement gathering? And on
    customer support? And on designing interfaces? And on pitching
    project ideas? And writing posts for the corporate blog? Great
    learning! Inclusiveness! But Ars longa, vita brevis. Engineers spend
    less time in engineering.&lt;/p&gt;

&lt;p&gt;Would you, product manager, people manager, UX designer,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$insert_your_discipline_here&lt;/code&gt;, be able to do your job properly if you
had to spend 5% of your time in each of 10 other disciplines? If your
core activity was loaded with a crust of unnecessary ritual? Of course
not!&lt;/p&gt;

&lt;h2 id=&quot;work-fragmentation-hurts-engineers&quot;&gt;Work fragmentation hurts engineers&lt;/h2&gt;

&lt;p&gt;There is a factor that makes this problem even worse in engineering (as
well as in other disciplines). I will refer here to the well-known
&lt;a href=&quot;http://paulgraham.com/makersschedule.html&quot;&gt;Maker’s schedule, Manager’s schedule&lt;/a&gt;. An engineer’s
schedule is like a glass jar that you want to fill with stones, pebbles
and sand. You can only succeed in that exact order. If you try to put
the pebbles and sand first, the stones won’t fit. Maker-type work is
primarily like stones, it requires solid blocks of uninterrupted time.
Manager-type work is mostly like sand or pebbles, it can fit in a more
fragmented schedule with small blocks of time. Maker schedules don’t
work like that: it’s not just a matter of how much time is spent in
non-engineering work. It’s also the fragmentation. Put diverse, varied
activities into an engineering team and the quality of the engineering
will go down because the engineering rocks won’t fit. This isn’t to say
that Manager-type work is less important! It’s equally important! But
they are different jobs for reasons like this one.&lt;/p&gt;

&lt;h2&gt;Consequences&lt;/h2&gt;

&lt;p&gt;When the minimum reasonable threshold of time dedicated to core
engineering tasks is broken, things backfire. I’ve seen this in two main
varieties:&lt;/p&gt;

&lt;h3 id=&quot;teams-neglect-their-engineering-standards&quot;&gt;Teams neglect their engineering standards.&lt;/h3&gt;

&lt;p&gt;This is unsurprising if they don’t have time because they are doing too
much work in adjacent expertise areas, like figuring out requirements,
talking to customers, writing blog posts, or having meeting after
meeting to push JIRAs around. Of course all those activities are
important. Of course engineers grow by doing those activities. Of course
they should individually do them at some point in their careers. But
time available for those is relatively small. They can’t do all, plus
engineering at the right standards. It just can’t happen.&lt;/p&gt;

&lt;p&gt;The amount of time matters. But for engineers and other maker roles,
so does fragmentation. Put diverse, varied activities into an
engineering team and the quality of the engineering will go down.
The engineering rocks won’t fit.&lt;/p&gt;

&lt;h3 id=&quot;it-burns-people-out&quot;&gt;It burns people out.&lt;/h3&gt;

&lt;p&gt;On one hand, red tape and bureaucracy are well known demotivators for
engineers (“I could fix this in less time than I write the JIRAs”). On
the other hand, we have seen quite a few instances of the following
pattern. A gap appears in the people management / product management /
project management area. Senior engineer is spotted as capable of
plugging that hole. The engineer’s manager makes the case that taking
those activities will broaden their toolbox. This makes sense! Engineer
accepts. Does less engineering, more people/product/project management.
Gradually, no engineering. Some of these people become happy project,
product or people managers. But a good chunk of those end up stuck in a
position they don’t quite enjoy, not knowing how to go back, constrained
by a web of pressure points (e.g. lack of equally clear growth plan in
the engineering track as for management roles), until they burn out and
interview elsewhere for an engineering position. And yes, they now shine
as a multi-dimensional engineer. But shine elsewhere.&lt;/p&gt;

&lt;p&gt;Both are bad outcomes for the organisation, even if they derive from an
accumulation of individual decisions that are rational and hard to
disagree with.&lt;/p&gt;

&lt;h2 id=&quot;so-i-guess-i-have-two-messages&quot;&gt;So I guess I have two messages&lt;/h2&gt;

&lt;h3 id=&quot;first-to-leaders-and-managers-outside-of-engineering&quot;&gt;First, to leaders and managers outside of engineering.&lt;/h3&gt;

&lt;p&gt;We are all aligned on the value of multidimensional engineers, on
transparency, on inclusiveness. You should design your organisation
accordingly.  Sometimes, engineers will have to be strong-armed against
their will or their preference! Many times, the “we have project
management work that’s falling through the cracks” or “we need UX and
engineering to be closer” serve as great opportunities to learn the
basics. All that is welcome.&lt;/p&gt;

&lt;p&gt;But please, you need to balance this with an awareness that time matters
and context matters. That you cannot have engineers participate in
product and project management, UX, HR, customer support and three more
things at the same time, as part of their day to day, full-time job as
engineer. That sometimes yes, it’d be great to have engineers join
sales, or HR, or customer support, or something else, but this is
incompatible with keeping a “maker schedule” that is vital to healthy
engineering. That involving the team in that well-intentioned
brainstorming session to figure out the next quarter’s priorities can
disable the same team from delivering the last quarter’s goals up to the
right standards. That sometimes, if there is a hole in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;some_discipline&amp;gt;&lt;/code&gt;
work, maybe the solution is not to throw an engineer at the problem, but
rather go and get the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;discipline_domain_experts&amp;gt;&lt;/code&gt; to plug it. That you
may be interested in old or new methodologies, tools of your trade, etc.
but introducing them in engineering teams may inflate the amount of
non-engineering time they have to deal with unnecessarily, to the
detriment of the time available for their core responsibility.&lt;/p&gt;

&lt;p&gt;That all of this does not mean devaluing your discipline. It just means
that engineering is a different one.&lt;/p&gt;

&lt;h3 id=&quot;second-to-individual-contributors-in-engineering-leadership-roles&quot;&gt;Second, to Individual Contributors in engineering leadership roles.&lt;/h3&gt;

&lt;p&gt;You have to convey to engineers around you the importance of
understanding the Why of your work. The value of growing a diverse
toolbox of skills. Be a role model in this. Help them engage with the
business. Push them to get out of the code sometime, and walk the
organisation learning what’s beyond the brick laying. To get into the
project manager or the UX designer’s shoes, to go help customers ensure
that disaster of an API we designed behind our noise-cancelling
headphones. To do the project management that’s falling through the
cracks, and learn from the experience.&lt;/p&gt;

&lt;p&gt;This is all essential. But it is equally essential that you help
engineers keep their focus, attention, raw, solid, uninterrupted quality
time, on core engineering activities. Ensure that their jars fill with
big stones first. Sometimes you do this by pushing back when the
organisation wants to use some of the engineer’s time budget into for
purposes that seem good, well motivated and rational, but can have
unintentional side effects on the engineers’ ability to write, review,
operate, deliver and maintain the engineering dimension of high quality
software up to high standards.&lt;/p&gt;

&lt;p&gt;This doesn’t mean that engineering is the only dimension that matters.
Nor that it’s the most important one.  Nor should you think of yourself
and your team as sacred cows among lesser professionals.&lt;/p&gt;

&lt;p&gt;It just means that you’re engineers, and you have a job to do.&lt;/p&gt;

</description>
        <pubDate>Tue, 09 Jan 2024 11:00:00 +0000</pubDate>
        <link>https://varoa.net/2024/01/09/how-organisations-cripple-engineering-teams-with-good-intentions.html</link>
        <guid isPermaLink="true">https://varoa.net/2024/01/09/how-organisations-cripple-engineering-teams-with-good-intentions.html</guid>
        
        
      </item>
    
  </channel>
</rss>
