Compliance officers tend to describe FATCA and CRS in clean terms: identify reportable accounts, classify the account holder, produce an XML file, transmit it to the local tax authority by the deadline. Done. The slide deck has three boxes and two arrows.
The pipeline that actually feeds those XML files looks nothing like three boxes and two arrows. It looks like a graveyard of jurisdiction mismatches, half-populated TIN fields, and core systems that were designed in 1998 to capture a customer's mailing address, not their tax residency.
This is the gap I want to talk about. Because the compliance checkbox and the pipeline reality are almost never in the same room, and that is exactly why FATCA/CRS programs quietly bleed money and accumulate regulatory risk for years.
The data the regulation assumes you have
FATCA and CRS reporting schemas assume the institution can produce, per account, a coherent set of facts:
- Account holder's tax residency country (or countries)
- TIN for each residency, in the correct format for that jurisdiction
- Account holder classification (individual, passive NFE, active NFE, financial institution, etc.)
- Controlling persons for passive entities, with their own residencies and TINs
- Account balance at year-end and gross amounts paid during the year
- Correct currency, correct decimal handling, correct date semantics
The regulation assumes these are facts. In practice, most of them are derivations, and several of them are guesses that survived because nobody downstream challenged them.
Where the data actually lives
In a typical Turkish bank or insurer (and frankly in most European institutions I've seen), the inputs to the FATCA/CRS pipeline come from at least six places:
- Core banking or policy admin system for balances, premiums, and account identifiers
- CRM for customer demographics, often with a different customer key
- KYC/onboarding system where self-certification forms live, sometimes scanned PDFs, sometimes structured
- AML system for risk classification that occasionally contradicts KYC
- Document management for the actual signed W-8/W-9/self-cert forms
- Manual spreadsheets maintained by the compliance team for edge cases that nobody wanted to model in a real system
Each of these has its own definition of "customer." The core system thinks a customer is an account. CRM thinks a customer is a person who may have multiple accounts. KYC thinks a customer is a file that was opened on a specific date. The pipeline has to reconcile all of these into a single reportable entity, and then carry that reconciliation forward year after year without breaking historical reports.
The tax residency problem
This is the one that breaks everyone. Tax residency is not address. It is not nationality. It is not where the customer pays their phone bill. It is a self-declared status, sometimes supported by documentary evidence, sometimes inferred from indicia.
Most legacy systems have one address field and one nationality field. Neither maps to tax residency. So institutions add a tax residency table, usually as a bolt-on after the regulation came into force, and now they have:
- Customers with a tax residency declared in 2016 that was never refreshed
- Customers whose nationality changed but whose residency record did not
- Customers with two residencies in the CRS system but one in the FATCA system because the two were implemented by different project teams two years apart
- Customers flagged as US persons in AML but not in the FATCA tagging because the indicia rules were implemented inconsistently
When the pipeline runs, every one of these inconsistencies becomes either a reporting decision or a suppression decision. Both are auditable. Both can be wrong.
The TIN format trap
Each jurisdiction has its own TIN format. The US wants a nine-digit SSN or ITIN or EIN. Turkey wants an 11-digit T.C. Kimlik No or a 10-digit VKN. Germany has its own. The UK has NINOs and UTRs which are different things. CRS schemas validate format, and the local tax authority's portal will reject the entire XML file if even one record fails validation.
The pipeline therefore needs jurisdiction-aware validation before submission. Most institutions discover this the hard way, on the day of the first submission, when the file comes back rejected and somebody has to manually identify which of 40,000 records is malformed. After that experience, validation gets pushed upstream. After the second year, validation gets pushed into onboarding. After the third year, somebody finally argues for it to be a hard stop in the CRM, and gets overruled because it would slow down account opening.
Controlling persons and the entity unwind
For passive non-financial entities, you have to report the controlling persons. This means the pipeline has to traverse a graph: entity to ultimate beneficial owners, with their residencies and TINs, with their own classification. Most KYC systems store UBO data as a flat list with free-text fields. Turning that into a clean controlling-persons block in the CRS XML is a non-trivial transformation, and it is exactly the place where data quality issues compound, because errors in the entity record multiply across every controlling person.
What the pipeline actually has to do
If you strip away the compliance language, the FATCA/CRS pipeline is doing the following:
- Ingest from heterogeneous source systems with conflicting customer keys
- Resolve those keys into a single reportable customer entity
- Enrich with tax residency, classification, and TIN data, applying indicia rules where self-certification is missing
- Validate against jurisdiction-specific format rules
- Aggregate financial data with correct year-end semantics and FX conversion
- Traverse entity structures for controlling persons
- Produce a schema-valid XML file per jurisdiction
- Maintain a defensible audit trail for every classification decision
- Reproduce prior-year submissions exactly, including corrections, for at least the local retention period
That last point is the one compliance never asks about until an audit lands. The pipeline has to be deterministic and reproducible across years, even as upstream systems change schema, even as customers update their self-certifications, even as the legal entity structure of the bank itself reorganises.
Why this matters operationally
The institutions that treat FATCA/CRS as a reporting obligation end up with a fragile annual project. Every January, a team scrambles to extract, reconcile, classify, validate, and submit. Errors get patched in flight. Suppressions get applied informally. The XML goes out, the regulator accepts it, and everyone forgets about it until next year.
The institutions that treat it as a data quality problem invest differently. They push tax residency capture into onboarding as a structured, validated field. They reconcile customer keys continuously, not annually. They treat TIN format validation as a CRM concern, not a reporting concern. They run the pipeline monthly in shadow mode so that the annual submission is the same code path that has been running silently for eleven months.
The difference between these two approaches is not visible on the compliance slide deck. It is visible in the cost of corrections, in the audit findings, in the operational risk capital, and in how many people you need on call in January.
What I tell teams building this
A few practical points I've consistently seen pay off:
- Make tax residency a first-class field in the source system, not an attribute hanging off the customer record
- Validate TIN format at the point of capture, with jurisdiction-specific rules, and treat a validation failure as missing data rather than valid data
- Treat indicia-based classification as a separate, versioned ruleset, not as logic buried in an ETL job
- Keep the controlling-persons graph in a real graph model or at least a properly normalised relational structure, not flattened into the customer table
- Run the full pipeline monthly, not annually, and diff the output so that drift is visible early
- Store the exact submitted XML, schema version, and classification logic version for every reporting period; you will need to reproduce it
FATCA and CRS will not get simpler. New jurisdictions keep joining CRS. The OECD keeps refining the schema. Crypto reporting under CARF is coming on top of all of this, with the same structural problems and a fresh set of identifier headaches. The institutions that built proper pipelines will absorb CARF as an incremental change. The institutions that built annual scramble projects will start the scramble all over again.
The checkbox and the pipeline are the same conversation. They just haven't been having it.