Every System Tells Its Truth in Production

#systems#production#engineering#reliability

A control system runs a hydraulic system for eighteen months without incidence. Then one morning, the hydraulic fluid leaks out from a hose and the system looses pressure which causes the system response to be sluggish. No trips, No alarms fire. The control panel continues executing its cycle. But as the fluid reserviors drain out over the next few days, the system starts to misbehave, and operations begin to fail. The system is eventually down and maintenance is notified but the damage is done and operation is down resulting in costs.

The control system worked exactly as designed. It just wasn't designed for this.

Every System Tells Its Truth in Production

Most systems look reasonable on the happy path. The designed path. These systems behave themselves in demos and pass the developer's tests and even survive design reviews. Afterall, the developers have the best intentions in mind. And then they reach production and their intended customers. That is when the conversation take a turn.

Production isn't just an environment—it's a crucible. Real machinery, real physics, real latency, real operators, real consequences. It's where assumptions stop being hypothetical and start charging interest. Where the difference between 100ms and 200ms isn't academic—it's the difference between a controlled stop and an emergency shutdown.

Every system tells its truth there.

The Comfortable Lie of the Happy Path

The happy path is a useful fiction - an essential posion if you may. We need it to get anything built at all and for the fundamental truth that it lets us describe the intent. A composite of requirements translated into user stories or fairy-tale descriptions of how the user will interact with the system. It lets us romance the reason for the desired behaviour and reassures us that when a checklist is all done, we can finally ship and raise a glass to a job well done.

It starts with a concise set of words on a document and then evolves and over time, teams begin to mistake the happy path for reality. Diagrams become promises. Tests become guarantees. Architecture documents become a form of reassurance.

What gets less attention is everything living just outside that path: sensors that drift over months rather than fail outright or even sensors installed so that they jittern instead of producing a constant output, networks that don't drop packets but add 50ms of jitter under load, operators who develop workarounds that bypass safety interlocks, physical processes that couple in ways the control model doesn't account for.

These are not edge cases. They are the system.

What “Truth” Actually Means

When I say a system tells its truth in production, I don't mean it suddenly becomes malicious or broken. I mean it becomes honest about what it actually is—not what we documented it to be.

Think of it like this: in the lab, you're having a conversation with your system. You command a valve to open and it complies. You trigger an interlock and it responds. You're speaking to it in a controlled environment, with calibrated instruments, asking it controlled questions.

Production is where the system starts talking back.

It reveals what it actually depends on—not just the 24VDC supply you specified, but the quality of that supply under varying loads, the grounding scheme you inherited, the EMI from the VFD three panels over. It shows which failures it tolerates (a single missed Communication frame) and which it amplifies (a watchdog timeout that cascades into a full line stop). It exposes where responsibility is clear (who owns the PLC code) and where it diffuses (who decides when to replace aging sensors).

And here's what makes this uncomfortable: production reveals not just technical design, but operational reality. Maintenance schedules that drift. Calibration records that age. Tribal knowledge about "the weird thing it does on cold starts." Decision-making under the pressure of downtime costs.

None of this is visible in a clean system architecture diagram.

“The System Worked Exactly as Designed”

This sentence appears often in incident reports, usually as a way to end a conversation. But it's almost always true—and deeply misunderstood.

Most failures are not the result of a single fault. They are the natural outcome of tradeoffs made under budget constraints, complexity accumulated across multiple integration phases, and risk accepted implicitly because making it explicit would have delayed commissioning.

The system worked exactly as designed. The problem is that the design included far more than firmware, software and wiring diagrams. It included unspoken assumptions about how operators would interact with HMIs, implicit bets about which sensors wouldn't drift simultaneously or would be installed correctly, and organizational patterns that made certain kinds of information invisible during shift handoffs.

The control system with the leaky hydraulic fluid? It was designed to operate the hydraulics and ignore response time in favour of supporting different models of pumps. That was a conscious choice, documented and reviewed. What wasn't designed was the specific failure mode where a sluggish response of the mechanics, reported plausible-but-wrong values that fell within acceptance bounds, or the maintenance team's inability to correlate subtle performance degradation, or the absence of any pressure monitoring in the original specification.
The system told the truth about all of it.

Why Production Is Uncomfortable

Production removes the illusion of control.

You no longer get to choose when failures occur, how visible they are, what else is happening simultaneously, or how much time you have to respond. It forces decisions to be made with incomplete telemetry, by fatigued operators, with production managers asking for ETAs.

That is not a weakness of production. That is its value.

This is where experience is formed—not by avoiding failures, but by learning what survives them. It's where you discover which of your redundancies were weight-bearing and which were decorative. It's where you learn the difference between a system that degrades gracefully and one that simply hasn't encountered the right combination of conditions yet.

In embedded and IIoT systems, this matters more. You can't just roll back a deployment. You can't hotfix firmware on a system that's safety-critical without a maintenance window. Your "users" are 50-ton machines that cost $10,000 per hour of downtime. The feedback loop is measured in months, not minutes.

This Is Not a Blog About Tools

Tools matter. Microcontrollers matter. Protocols matter. Fieldbus architectures matter. Cloud architectures matter.

But they are rarely the reason systems fail in production.

Systems fail because graceful degradation was an afterthought, ownership boundaries were unclear between firmware and mechanical teams, incentives were misaligned between commissioning speed and long-term reliability, risks were known but unspoken during design reviews, and complexity grew faster than documentation—or understanding.

Those are design problems. Judgment problems. Responsibility problems.

That is what this space is about.

So Why This Exists?

Off the Happy Path is a collection of observations, essays, and stories from production systems—embedded firmware, IIoT platforms, control systems, and the organizations that build and operate them.

It is not a tutorial blog. It will not explain fundamentals that datasheets and standards already cover well. It will not optimize for engagement or outrage.

It exists to name things that are usually felt but not discussed. To talk about systems as they behave, not as we wish they would. To capture lessons that tend to surface only after something breaks—and sometimes only after we've stopped to actually listen.

A Final Thought

If a system has never surprised you in production, one of two things is true: either it has not been in production long enough, or you were not listening closely when it spoke.

Because eventually, every system does. It tells its truth.

The question is whether we're ready to hear it.

← Back to Feed