Skip to content
Industry Notes 3 July 2026

Your data moat is not your source of truth

'Data moat' is one of the most repeated phrases in AI strategy and one of the least examined. A moat is real and worth building — but it is not where your truth lives. The systems your data came from are. Here is what a data moat actually is, what it is not, and how to use one without quietly breaking your own architecture.

We sat in on a planning meeting last year where a number on a dashboard disagreed with the number in the system it had come from. The dashboard was the company's pride — the place everything had finally been brought together, the thing several people had taken to calling "our single source of truth." The figure it showed was wrong. The accounting system it had been built from was right. And for a few minutes the room genuinely debated which one to believe, as though the question were open.

It wasn't open. The accounting system created that number; the dashboard had copied it and mangled it somewhere in the pipe. But the language everyone had adopted — single source of truth — had quietly inverted the architecture in their heads. The place where the data had gathered had been promoted, in everyone's imagination, to the place where the data was true. Those are not the same place, and confusing them is more expensive than it looks.

This post is about a phrase adjacent to that confusion: the "data moat." It is one of the most repeated ideas in AI strategy and one of the least examined. It turns up in pitch decks and board papers as a full stop rather than a starting point — our data is our moat — and everyone nods, because it sounds both true and reassuring. Two different mistakes hide inside that nod. The first is overestimating the moat. The second, the one that breaks systems quietly, is mistaking the moat for the source of truth. We'll take both, but mostly the second, because it is the one that turns a useful idea into an architectural error you pay for later.

The moat is real — and weaker than the slogan

Start with the uncomfortable part, because it sets up everything else. The strongest version of the "data is our moat" claim does not survive much scrutiny.

The clearest takedown is still the one Andreessen Horowitz's Martin Casado and Peter Lauten wrote back in 2019, The Empty Promise of Data Moats. Their argument, aimed at enterprise startups, is that most of what gets called a "data network effect" is really a data scale effect, and that even the scale effect is a weaker defence than founders assume. The economics run the wrong way: unlike a traditional economy of scale, where each additional unit gets cheaper, the cost of adding genuinely useful new data tends to rise while the value of each incremental record falls. Past a threshold, the moat stops widening and the competition catches up. Their sharper warning is strategic rather than technical — treating data as a magical moat misdirects you from the things that actually compound defensibility, like deep workflow integration, switching costs and distribution.

A16z's later piece on the economics of AI businesses pushed the point further: the moats around AI products look shallower than many expected, and AI is largely a pass-through to the underlying product and data. The arrival of strong foundation models has, if anything, strengthened the original argument — a generic model bootstrapped on public data now closes a lot of the gap that a proprietary corpus used to open. A 2025 survey of the debate put it plainly: many supposed data moats are weaker, more porous, or more illusory than their owners believe, because companies confuse data possession with data leverage.

None of this means data is worthless as an advantage. It means the advantage is specific and conditional, not automatic. We've written before that your data is your moat — and we stand by it — but the claim only holds for a particular kind of data used in a particular way. The frontier model your competitor can rent is the same one you can rent. What it cannot rent is your first-party operational history: your taxonomy, your contracts, your support transcripts, the resolved record of who your customers actually are and what they actually did. That data is unique, it compounds, and it is hard to copy. The pile of it is not the moat. What you have done to make it usable — and whether it is woven into a decision someone actually makes — is the moat.

Which brings us to the more interesting mistake. Because once a team has accepted that their data matters, the next thing they tend to do with it is exactly the wrong thing.

What a data moat is not: the source of truth

Here is the line that matters, and it is worth drawing carefully because the vocabulary in this area is slippery.

"Source of truth" is an overloaded term, and the slipperiness is the problem. In the careful version of the language — IBM's data-management writing is a good reference here — a system of record is the authoritative source for a piece of data within its domain: the place a fact is created, owned, updated and deleted, and the place an auditor points when they want to know what is real. A customer's details are mastered in the CRM; an invoice is mastered in the accounting system; a shipment is mastered in the operations tool. A source of truth, by contrast, is precisely the layer that aggregates several systems of record into one harmonised view so you can answer questions that span them.

Read those two definitions back to back and the danger announces itself. A source of truth is, by definition, an aggregation — but the phrase gets heard as an origin. The moment a team starts calling its warehouse "the single source of truth," the words do something subtle and corrosive: they promote a downstream, derived, integrated copy into the imagined place where truth is made. That is the inversion we watched play out in the meeting at the top of this post.

The architectural reality has been settled since the field was named. Bill Inmon's 1990 definition of a data warehouse — still the one everyone quotes — describes it as "subject-oriented, integrated, time-variant and non-volatile." Two of those four words are the whole argument. Integrated means data from disparate systems reconciled into a consistent shape, with the naming conflicts and unit mismatches resolved. Subject-oriented means the data is organised around the things it describes — customers, products, shipments — rather than around the systems it came from. A warehouse, a lakehouse, a data hub, whatever you call the convergence layer, is by construction a collection assembled from somewhere else. It is non-volatile: once data lands, it is not supposed to be edited in place, because it is a reflection of what the source systems recorded, not an independent ledger that authors its own facts. It is downstream of the truth, not the truth.

So the moat — the place where your data comes together — is never a system of record. It does not own a single fact. And whatever you choose to call it, it must never become the place you believe truth is decided. There is a clean test for whether a team has this right, and it is the question the meeting failed: when the convergence layer and the source system disagree, which one wins? If the honest answer is "the warehouse," the architecture has been inverted in someone's head. The source system wins. The disagreement is not a philosophical puzzle; it is a pipeline bug, and the fix is in the pipe, not in a debate about which screen to trust.

The vendors who have thought hardest about this say it more bluntly than we would dare to. Salesforce's own architects, describing the unified customer profile in Data Cloud, are at pains to point out that it does not replace your mastered data — it links fragmented records together and harmonises them into a shared model, deliberately honouring the underlying systems of record rather than overwriting them. They go as far as framing the fact that the unified profile is not a golden record as a feature, not a shortcoming, precisely because it sidesteps the survivorship conflicts and heavy cleansing you inherit the moment you try to make the convergence layer the master. That is the right instinct. The convergence layer's superpower is that it connects truth without pretending to be it.

Two honest qualifications, because this is not a religious position.

First, there is a legitimate counter-view, argued well by teams like Pocus, that for certain data the warehouse genuinely should become the system of record — product-usage and behavioural events, for instance, are often born in the pipeline and have no prior owning system, and some organisations deliberately master derived or reference data in the warehouse through a proper governance process. That is fine. The discipline is not "the convergence layer may never master anything." It is: be deliberate and explicit about the small set of data your moat actually masters, and honest that everything else it merely aggregates. The failure mode is not mastering data on purpose; it is mastering it by accident, by letting a slogan make the decision for you.

Second, we have argued in a previous post that a company badly needs one place where "revenue" is defined once and everyone queries the same number — and that is true and worth doing. It is worth being precise about what that single source of truth actually is. It is a single source of truth for definitions and for the integrated view you query — the semantic layer where each metric is written down once. It is not a claim that the underlying facts are born in the warehouse. The facts still originate in the systems of record. The convergence layer is the meeting point. It is not the origin. Hold both of those at once and the two posts agree; collapse them and you get the meeting at the top of this one.

What a data moat is for: relate, inspect, interact

If the convergence layer is not where truth lives, what is it actually for? The positive definition is more useful than the warning, and it is where the real value sits.

A convergence layer earns its place by doing three things a single source system cannot, and that no amount of model cleverness substitutes for.

It relates. The first and least glamorous job is resolving identity. In most mid-sized businesses, "Acme Pty Ltd", "Acme (Pty) Ltd" and "ACME" are three rows in three systems, and any number that counts customers or sums their spend is built on sand until something decides they are the same entity. This is the work the entity-resolution and master-data literature has been refining for decades — matching, merging and survivorship to produce a reconciled record, then, increasingly, discovering and tracking the relationships between records so the result is less a tidy list and more a living graph that connects the dots across every source. Once entities are resolved and their relationships are declared, you can walk them: trace a defective batch through to every customer who received it, roll a project's costs up through its work packages, surface the shared phone number that links three loan applications. None of those questions can be answered inside any one source system, because the answer lives in the seams between systems. The convergence layer is where the seams get stitched. This is the part of the moat that genuinely compounds — not the raw data, but the resolved, related structure you have built over it.

It inspects. Because the layer is integrated and read-oriented, it lets you ask questions across domains that no operational tool was built to answer. IBM's stock example is the right one: "is this customer profitable?" is unanswerable in the CRM alone, or the accounting system alone, or the support desk alone — it requires all three, reconciled against one definition of "customer". The convergence layer is the only place that question has a home. That is its reason to exist: not to store the data more authoritatively than the source, but to make the cross-system question askable at all.

It interacts. This is the job that has changed most in the last two years, and it is the one that matters most for anyone putting AI in front of their business. The newest purpose of a convergence layer is to make the data legible to the AI stack. When SAP acquired the master-data company Reltio earlier this year, the rationale its executives gave was unusually candid for an acquisition announcement: AI, they said, cannot reach its full potential when data is fragmented across platforms and domains without connection or context — and the value of unifying it is to deliver the context that business AI requires. That is exactly right, and it is the same instinct behind the emerging practitioner idea of a "golden context" — master data reshaped from a record built for human eyes into a structure an agent can ground against. A model handed your raw tables has to guess your business logic from column names; a model handed a resolved, related, well-described model of your data can answer against facts it did not invent.

This is the thread that ties back to two principles we keep returning to. We've argued that your AI should never be the source of truth — that the model is a brilliant reader and a fluent writer but not your system of record, and that the deterministic check which confirms an answer must sit outside the model. The convergence layer is where that discipline becomes buildable. The pattern we put into production for clients reflects it directly: the language model is allowed to interpret a plainly-worded question and map it to the data, but a deterministic step confirms that the query it produced actually answers the question before any data is touched, and the figures themselves come from the source, computed, never authored by the model. The model narrates a number it did not make up. That is only possible because there is a structured, governed layer underneath it for the model to be grounded against — and that layer is the moat doing its real job. Not being the truth. Making the truth usable, inspectable, and safe to put an AI on top of.

How to use a data moat well

If you take the operating principles and strip the theory away, using a data moat well comes down to a short list. None of it is exotic. All of it is the difference between a moat that compounds and a moat that quietly becomes a liability.

  • Keep the systems of record authoritative. The moat reads from them; it does not overrule them. When the moat and a source system disagree, the source wins and you fix the pipeline. The convergence layer is allowed to be a source of truth in the precise sense — the aggregated view you query — only if everyone understands that means "the place we go to ask," never "the place the facts are born."

  • Integrate to relate, not to master. The convergence layer's job is to resolve identity and harmonise definitions so data can be related and inspected across systems. The handful of things it does master — derived data, reference data, behavioural events with no prior owner — should be a deliberate, governed decision, not an accident of language.

  • Make every figure trace home. If you cannot point at any number in the moat and say which system produced it and how it was computed, you do not yet have a source of truth — you have a confident screen. Lineage is what survives an audit, and increasingly what survives a regulator. It is the same reproducibility discipline that keeps a model out of your decisions, applied one layer down.

  • Build the moat for machines, too. A resolved entity model and a well-described schema that an AI can ground against is now part of the moat, not a nice-to-have bolted on afterwards. The context you create is what lets a model answer with your facts instead of its guesses.

  • Don't mistake the pile for the moat. Volume is not defensibility; the a16z critique has only aged into greater relevance. The defensible asset is the resolved, related, queryable structure you've built over your data and the domain logic on top of it — and, above all, whether it is wired into a decision someone is accountable for. We've made the case at length that the unit of value is the decision, not the data; a moat that improves no decision is a swamp with good lighting.

  • Rent the plumbing; own the moat. The convergence layer's machinery — the pipelines, the storage, the resolution engine, the governance scaffolding — is operational infrastructure that looks identical across every company that needs it, and we've argued you should buy the boring and build the unique. The moat is not the warehouse software. It is your data and the logic you wrap around it. Spend your engineering on that, not on rebuilding plumbing the rest of the industry has already solved.

The shape of it, plainly

A data moat is real, and it is worth building, and it is almost never the thing the slogan says it is. It is not the volume of data you have accumulated — that advantage erodes faster than anyone selling it admits. And it is not your source of truth — that role belongs, permanently, to the systems where your data was created and is maintained. The moat is the place where the data from those systems comes together: where identities are resolved, relationships are made walkable, cross-system questions become askable, and the whole estate is shaped into something an AI can reason over without being trusted to author the facts.

Picture it as a confluence — the point where separate rivers meet. The meeting point is genuinely valuable; it is where the water becomes navigable, where you can finally see the whole flow at once. But the rivers still rise somewhere upstream, and no one who understands a river confuses the confluence with the spring.

The companies pulling ahead on AI in 2026 are not the ones with the deepest pile or the grandest "single source of truth" on a slide. They are the ones who know exactly where their truth lives, kept those systems authoritative, and built a moat that makes that truth usable — related, inspectable, and safe to put a model in front of — without ever once pretending to be it. Get that distinction right, and the moat compounds. Get it wrong, and you end up in a meeting, arguing about which screen to believe.


References

  1. Andreessen Horowitz — Martin Casado & Peter Lauten, The Empty Promise of Data Moats (2019). https://a16z.com/the-empty-promise-of-data-moats/
  2. Andreessen Horowitz — The New Business of AI and How It's Different From Traditional Software. https://a16z.com/the-new-business-of-ai-and-how-its-different-from-traditional-software/
  3. V7 Labs — Are Data Moats Dead in the Age of AI? A Guide to Data Moats (December 2025). https://www.v7labs.com/blog/data-moats-a-guide
  4. IBM — System of Record vs. Source of Truth: What's the Difference? https://www.ibm.com/think/topics/system-of-record-vs-source-of-truth
  5. Dataversity — The Data Warehouse: From the Past to the Present (Bill Inmon's definition of the data warehouse). https://www.dataversity.net/articles/data-warehouse-past-present/
  6. Salesforce Admins — Rethinking the Golden Record: The Advantages of Data Cloud's Unified Profile (2025). https://admin.salesforce.com/blog/2025/rethinking-golden-record-advantages-of-data-cloud-unified-profile
  7. Senzing — What Is Entity Resolution? How It Works and Why It Matters. https://senzing.com/what-is-entity-resolution/
  8. Semarchy — What is Golden Data in Master Data Management? https://semarchy.com/blog/what-are-golden-data-records/
  9. AI Magazine — Why is SAP Acquiring Master Data Management Company Reltio? (2026). https://aimagazine.com/news/why-is-sap-acquiring-master-data-management-company-reltio
  10. Tahir Khan — From Golden Record to Golden Context: Redefining Master Data for AI Agent Consumption (2026), citing DAMA International (2017). https://medium.com/@Tahir-Khan/from-golden-record-to-golden-context-redefining-master-data-for-ai-agent-consumption-2349eec0840b
  11. Pocus — Is the Data Warehouse the New System of Record? https://www.pocus.com/blog/is-the-data-warehouse-the-new-system-of-record
  12. Aadi Manchanda — The Myth of the Single Source of Truth: What Teams Get Wrong About Data Architecture (2025). https://medium.com/@aadi.manchanda/the-myth-of-the-single-source-of-truth-what-teams-get-wrong-about-data-architecture-622098e90ba8

Written by JP Dippenaar, Sixees Labs. Last reviewed June 2026.

JP

Co-founder, Sixees Labs

Co-founder of Sixees Labs. Engineer and systems thinker focused on shipping AI that actually works in production.

We use cookies to understand how you use our site so we can improve it. Choose Necessary only to decline analytics. See our cookies policy for details.