Automation has an appealing simplicity at the outset. A repetitive task is identified, a script is written, and almost immediately the benefit becomes visible — time is saved, errors are reduced, and what was once manual effort becomes something that runs with minimal intervention. In those early stages, automation feels like a clear and unambiguous improvement.
And yet, given enough time, many automated processes begin to degrade. They fail intermittently, produce inconsistent results, or require increasingly frequent intervention. In some cases, they become so fragile that the original manual process is quietly reinstated, not because it is better, but because it is more predictable.
This pattern is remarkably common, and it is rarely caused by a single flaw. Instead, it is the result of a series of design decisions — often reasonable in isolation — that collectively undermine the reliability of the system over time. Understanding why this happens requires shifting perspective. Automation is not just about removing manual effort; it is about introducing a system that must continue to function correctly in an environment that is constantly changing.
One of the most common assumptions made during the development of an automated process is that the environment in which it operates will remain stable. File structures are expected to remain consistent, input formats are assumed not to change, and dependencies are treated as fixed. In reality, none of these assumptions hold for long.
File names evolve, folder structures are reorganised, upstream systems introduce subtle changes, and users adapt their behaviour in ways that were not anticipated. What was once a clean and predictable input becomes variable, and the automation — having been designed around a narrower set of conditions — begins to fail at the edges.
The problem is not that the automation is incorrect; it is that it was designed for a snapshot of reality rather than for reality itself.
Reliable automation anticipates variation. It validates inputs, tolerates minor deviations, and fails in controlled and informative ways when expectations are not met. Without these safeguards, even a well-written process becomes brittle.
A second, and often more damaging, issue arises when automation fails without making that failure visible.
In manual processes, errors are typically discovered as they occur. A discrepancy is noticed, a value looks incorrect, or a step does not complete as expected. Human involvement, while inefficient, provides a layer of real-time validation. Automation removes that layer. If a process runs unattended and produces incorrect output without raising any signal, the error can persist indefinitely. Reports may be distributed, decisions may be made, and downstream processes may rely on data that is fundamentally flawed, all without any indication that something has gone wrong. This is where many automation efforts fail most critically — not because they break, but because they break quietly.
Reliability, in this context, is not just about correctness, but about observability. A reliable system makes its state visible. It logs what it is doing, records when something unexpected occurs, and provides enough context for issues to be diagnosed without guesswork. Without that visibility, automation becomes opaque, and opacity erodes trust.
As automation grows, it often begins to depend on more than originally intended. A script that once processed a single file may expand to interact with multiple sources, rely on specific directory structures, or assume the presence of certain configurations. These dependencies are rarely formalised. They exist implicitly, embedded within the logic of the process. Over time, this creates a fragile network of assumptions. A small change in one part of the environment — a renamed folder, a moved file, a modified format — can have cascading effects that are difficult to trace back to their origin.
The challenge here is not complexity in itself, but unmanaged complexity. Reliable automation makes dependencies explicit. It centralises configuration, documents assumptions, and isolates external interactions wherever possible. By doing so, it becomes clear what the process relies on, and changes can be managed rather than discovered through failure.
Another common failure point is the lack of structured change management.
Automation is rarely static. Requirements evolve, edge cases are discovered, and improvements are made. However, when these changes are applied directly to a live process without versioning or traceability, the system becomes difficult to reason about. When something breaks, it is no longer clear what changed or when the issue was introduced. This is where many automation solutions diverge from traditional software practices. Code is modified in place, previous versions are lost, and there is no reliable way to compare behaviour over time.
Introducing even a basic form of version control transforms this situation. Changes become traceable, previous states can be restored, and the evolution of the system becomes visible rather than implicit.
Reliability is not just about preventing failure, but about being able to recover from it quickly and confidently.
In an effort to improve performance or handle specific scenarios, automation is often optimised beyond what is necessary. Additional conditions are introduced, edge cases are hard-coded, and logic becomes increasingly specialised.
While each change may be justified, the cumulative effect is a system that is tightly coupled to a particular set of circumstances. Such systems tend to perform exceptionally well within their intended scope, but degrade rapidly when conditions deviate. What was once flexible becomes rigid, and what was once simple becomes difficult to modify without unintended consequences.
There is a balance to be struck between optimisation and resilience. Reliable automation favours clarity over cleverness. It handles common cases well, accommodates variation where practical, and avoids unnecessary complexity that makes future changes more difficult.
Finally, there is the role of the user.
Automation does not exist in isolation; it interacts with people who rely on it, adapt to it, and occasionally work around it. If a process is not transparent, users may develop their own methods of verification. If it is unreliable, they may bypass it entirely. If it is poorly understood, they may misuse it in ways that were not anticipated.
In this sense, reliability is not purely technical. It is also about communication. A well-designed automated process makes its purpose clear, communicates its outcomes, and provides feedback when something goes wrong. It does not require users to infer its behaviour, nor does it force them to trust it blindly. Trust, once lost, is difficult to regain.
What emerges from all of these factors is a broader understanding of what automation really entails. It is not simply the act of replacing manual work with code, but the creation of a system that must operate correctly over time, in the presence of change, variation, and uncertainty. Designing for that reality requires a shift in mindset. Inputs must be validated rather than assumed. Failures must be visible rather than silent. Dependencies must be explicit rather than hidden. Changes must be controlled rather than ad hoc.
When these principles are applied, automation stops being fragile and begins to behave predictably, even as its environment evolves. And that, ultimately, is what distinguishes a script that works today from a system that continues to work tomorrow.
Cat On A Spreadsheet
Cat On A Spreadsheet