Every engineering team knows about technical debt. It gets tracked in JIRA, discussed in sprint retrospectives, periodically addressed in dedicated cleanup sprints. Leadership understands it, at least abstractly. It's a concept that's made it into mainstream organizational vocabulary.
Data debt hasn't. And in regulated financial environments, that's a serious problem.
What Data Debt Actually Is
Data debt is the accumulated cost of deferred decisions about how data is defined, owned, stored, and governed. It shows up in specific, concrete ways:
Inconsistent definitions across systems. "Active customer" means something different in the CRM, in the actuarial model, and in the regulatory reporting pipeline. Nobody made a deliberate decision for them to diverge. It just happened, over years, as each system evolved independently.
Undocumented transformations. A field called net_premium in the reporting database is not the same as net_premium in the source system. Somewhere between ingestion and presentation, a calculation happened. Who wrote it, when, and why — that information no longer exists. The people who built it have left.
Orphaned data assets. Tables that are populated but never queried. ETL jobs that run nightly and load data that hasn't been accessed in eighteen months. Nobody knows if they're safe to remove. Nobody wants to be the one who finds out the hard way.
Quality rules that exist only in people's heads. A senior analyst who knows that the data from a specific legacy system needs a correction factor applied before it's trusted. When that analyst leaves, the knowledge goes with them.
Why It Accumulates So Fast in Financial Services
The regulatory environment creates constant pressure to build fast. FATCA arrives, you build a pipeline. CRS gets added, you extend it. HAYMER introduces pension-specific requirements, you patch the existing pipeline. Each addition is done under deadline pressure, with minimal documentation, by whoever is available.
Over time, the reporting infrastructure becomes something nobody fully understands. Each person on the team understands their piece. Nobody has a complete picture. And the complete picture is what an auditor needs when they ask you to explain a submitted figure.
I've seen this pattern repeatedly across organizations of different sizes. The compliance submissions are accurate — usually. The people making them are competent — always. But the ability to explain, reproduce, and audit those submissions degrades every year, because the underlying data infrastructure is carrying debt that was never addressed.
The Moment It Becomes Visible
Data debt is invisible until it isn't. The triggers are predictable:
A regulator asks for a retroactive explanation of a submitted figure from three years ago. A new system implementation reveals that two source systems have been producing different values for what everyone assumed was the same field. An executive asks a question about the business that requires combining data from three systems that have never been reconciled.
At that point, what looked like a documentation problem reveals itself as an architecture problem. The work required to answer the question isn't hours — it's weeks.
The Difference Between Acknowledging and Addressing It
Most data teams know they have data debt. Acknowledging it isn't the hard part. The hard part is making the organizational case for addressing it before the pain arrives.
The argument that usually works: frame data debt in terms of operational capacity. Every hour a senior analyst spends manually reconciling figures between systems is an hour not spent on analysis. Every sprint a data engineering team spends firefighting undocumented pipeline behavior is a sprint not spent building new capability. The debt has a running operational cost, not just a theoretical future risk.
When you can quantify that cost — even roughly — the conversation about investment becomes different.
Technical debt eventually forces a reckoning through system instability or developer attrition. Data debt forces its reckoning through audit findings, regulatory questions, and executive mistrust in the numbers. The latter is harder to recover from. The investment in addressing it, made early, is significantly cheaper than the remediation made under pressure.