← Back

2026-04-26

Why Your ETL Pipeline Is a Liability, Not an Asset

There's a particular kind of pride that data engineers feel about a pipeline that just runs. You built it, it works, it hasn't needed attention in years. That's a success story.

It's also, often, a liability you haven't recognized yet.

The Reliability Trap

A pipeline that runs without intervention for years is one that nobody has touched in years. Which means nobody has had to read the code recently. Nobody has had to explain what it does. Nobody has had to verify that what it produces still matches what downstream consumers expect.

Reliability and understandability are not the same thing. A pipeline can be completely reliable — running nightly, producing output, passing whatever validation exists — while simultaneously being completely opaque to anyone currently on the team.

In regulated environments, that opacity is a problem the moment something needs to change. And in regulatory reporting, something always eventually needs to change.

The Change That Reveals the Problem

The change doesn't have to be large. A new field required by an updated statutory framework. A source system migration that shifts a column's data type. A business rule clarification that changes how a specific edge case should be handled.

What looks like a small change on paper becomes a forensic exercise when nobody can confidently answer the question: what does this pipeline actually do, step by step?

I've seen teams spend three weeks figuring out a pipeline they built themselves — because the original developers left, the documentation was never written, and the behavior of the system was only encoded in the code. When the code is complex enough that it can't be read quickly and confidently, you've lost the ability to change it safely.

And in financial services, changing something unsafely in a compliance pipeline is not a recoverable mistake.

The Hidden Costs That Don't Appear on Any Dashboard

Operational data teams measure throughput. Jobs completed, records processed, failure rates. These are the metrics that appear in monitoring dashboards, and they make pipelines look healthier than they are.

What doesn't appear on any dashboard:

The archaeologist tax. The time senior team members spend reverse-engineering pipeline behavior when a question arises. This is invisible — it looks like "investigation" rather than "technical debt repayment," but it's the same thing.

The change avoidance premium. The number of times a requested change was descoped, deferred, or worked around because the team wasn't confident they could make it safely. This one is completely invisible — it shows up only in what didn't get built.

The knowledge concentration risk. If one person is the only one who understands a mission-critical pipeline, that person's availability is a single point of failure for your compliance operations. This doesn't appear anywhere in a risk register, but it should.

What a Maintainable Pipeline Actually Looks Like

I want to be specific, because "write better documentation" is advice that everyone agrees with and nobody follows.

A maintainable pipeline is one where a competent new team member can understand the full data flow — from source fields to output fields, including every transformation and every filter — in under an hour. Not understand it roughly. Understand it precisely, well enough to explain it to an auditor.

That requires:

Inline documentation of business logic, not code mechanics. The code already explains what the transformation does. What it doesn't explain is why — which regulatory requirement, which business rule, which edge case the logic handles. That's what gets lost when people leave.

Explicit lineage between source fields and output fields. Every field in the output should trace to a source, or to a documented derivation. No mystery columns.

Tested behavior, not just tested execution. A pipeline can execute successfully while producing wrong output. The tests that matter are the ones that verify the output is correct, not just that the job ran to completion.

The Investment Conversation

Retrofitting maintainability into a running pipeline is expensive. It's the kind of work that's hard to schedule because it competes with work that has visible business output.

The way to make the investment case is to make the hidden costs visible. Estimate the archaeologist tax for the last quarter. Identify the knowledge concentration risks by asking: which pipelines would take more than a day to fix if the person who built them weren't available? Quantify the change avoidance premium by reviewing the backlog.

When you can put numbers to those costs, the conversation about investment becomes different. The pipeline that's been running reliably for three years stops looking like a success and starts looking like what it is: deferred maintenance with a running bill.


Build pipelines to be read, not just to run. The difference shows up at the moment of change — which, in a regulatory environment, is never if, always when.