I am a software engineer with 20 years building and operating distributed systems, platforms, and cloud infrastructure at scale. I am currently building GPU infrastructure and AI workload orchestration for private clouds, and integrating agentic AI into our own engineering workflows.
A significant part of my career involved working with engineering leaders and executives on technology strategy and build vs. buy decisions, then leading engineering teams through implementation and operations. Below are some writings from both perspectives.
As AI makes software cheaper, the cost of building the wrong thing becomes the dominant risk. Strategic constraints are shifting from "can we build it" to "should we build it". If you're facing a strategic engineering decision and want a candid outside perspective from an operator who has built these systems, reach out. Most of the advice you're getting comes from someone with a stake in the answer. Mine doesn't.
Get updates via RSS feed or mailing list.
Highlights
-
Everyone is measuring what AI does to code production. Nobody is measuring what it does to everything downstream: build, test, review, deploy, operate. The DORA data is already showing the strain.
-
The technical and organisational decisions behind building a Kubernetes-based platform serving 1500 engineers across Adevinta's marketplace portfolio.
-
Every time someone reports higher latency after migrating to Kubernetes, the root cause is an assumption that changed. This is one of those stories.
All writing
- Inferencing delivery bottlenecks away
- Code reviews can't keep up
- AI workloads challenge the cattle model
- PoC is a framework of perverse incentives
- Why aren't we all serverless yet?
- Identifiers are better off without meaning
- Alert on symptoms, not causes
- How about we forget the concept of test types?
- How organisations cripple engineering teams with good intentions
- Migrating an Eureka-based microservice fleet to Kubernetes
- Sizing Kubernetes pods for JVM apps without fearing the OOM Killer
- GC forensics by example: multi-second pauses and allocation pressure
- How does the default hashCode() work?
- Frugal memory management on the JVM (Meetup)
- DirectBuffer creation / disposal has hidden contention on sun.misc.Cleaner