Why incumbents quietly won enterprise AI while everyone watched the foundation models

Why incumbents quietly won enterprise AI while everyone watched the foundation models

The Direct Message

Tension: Enterprise AI’s winners were supposed to be the companies with the best foundation models. Instead, the advantage is accruing to incumbents with boring operational data and expert workforces they barely knew how to value.

Noise: The industry narrative still fixates on model capability and AI-native startup speed. That narrative obscures where defensibility actually lives: in instrumentation, feedback loops, and the conversion of expert judgment into training signal.

Direct Message: The competitive question is no longer whether your organization can access a capable model. It is whether your daily operations are writing themselves down in a form the machine can learn from — because that is the only asset that compounds.

Every DMNews article follows The Direct Message methodology.

The foundation model is not the moat. For three years, the assumption inside most boardrooms was that whoever had the smartest base model would win enterprise AI. That assumption has quietly collapsed. Incumbents in service-heavy industries — health insurers, logistics operators, legal services firms — hold three compounding assets that AI-native startups cannot replicate on speed alone: proprietary operational data, a workforce of domain experts, and accumulated tacit knowledge encoded in millions of past decisions. Model interchangeability is rising; OpenAI, Anthropic, and Google are converging on capability benchmarks faster than most procurement teams can evaluate them. The real advantage sits one layer up, inside the plumbing nobody wanted to build: the data capture, the feedback loops, the governance scaffolding, the way an organization turns every expert decision into a signal the machine can learn from.

Consider what that means in practice. A large health insurer processing 200,000 prior authorization cases weekly, capturing six to ten decision points per case, generates over a million labeled examples every seven days. No separate data-collection program. No synthetic training runs. No scraping. The work itself becomes the training signal.

Andreessen Horowitz partner Vijay Pande has described the shift as an inversion. The traditional stack was human-then-software-then-decision: a dispatcher looked at a screen, software surfaced options, the human chose. The new stack is AI-executes-then-humans-adjudicate. The model handles routine cases end-to-end. Dispatchers now spend their time on exceptions, and every exception decision gets captured as a teaching moment for the next iteration.

enterprise AI operations
Photo by Brett Sayles on Pexels

Organizations building this approach did not build better models. They built better instrumentation around commodity models. Permissions. Evaluation harnesses. Audit trails. Change management protocols that let them push model updates without breaking downstream systems. The work looked unglamorous. The results were not.

This is where AI-native challengers tend to stumble. They arrive with architectural speed and no context. They can ship a prototype in six weeks. They cannot ship a system that understands why a particular denial code means something different in Ohio than it does in Texas, or why a certain shipment type requires a second signature in one warehouse and not another. That knowledge does not live in documentation. It lives in the heads of domain experts and in the case histories organizations have been generating for years.

The phrase sits inside is doing heavy lifting in the enterprise AI conversation. Incumbents do not have to win the race to the best model. They have to win the race to instrument their own operations before a challenger figures out how to substitute for them. Most of them are losing that race anyway, but for different reasons than the panic narrative suggests. The problem is rarely technical. It is organizational.

McKinsey’s 2025 survey on enterprise AI adoption found that fewer than 15 percent of organizations have moved beyond pilot stage — and the primary barrier cited was not model capability but internal data readiness. Leadership often wants the flashy pilot with an OpenAI or Anthropic logo on the slide deck. The flashy pilot produces a demo. The claims database, properly instrumented, would produce a flywheel.

The distinction matters because flywheels compound and demos do not. A model that improves every week with fresh labeled data pulls steadily away from a static deployment, even if the static deployment started from a stronger base. Over twelve months, the gap becomes structural. This is the logic that has driven companies like Palantir in government operations, Veeva in life sciences, and Epic in health records to focus obsessively on what researchers at Google DeepMind have called knowledge distillation: the systematic conversion of expert decisions into machine-readable signals.

The governance question is where most of these efforts break down. Capturing large volumes of labeled examples is a data-engineering problem. Deciding which of those examples should update the model, which should be held out for evaluation, and which reflect the idiosyncratic preferences of a single overworked reviewer — that is a judgment problem. Organizations that treat governance as a compliance checkbox end up with drift. Organizations that treat it as the actual product tend to end up with something defensible.

DMNews has previously covered how most companies migrating their data aren’t ready for what comes next, and the operating-layer thesis is the logical consequence. Migration without instrumentation produces a more expensive version of the old system. Migration with instrumentation produces something that improves on its own.

feedback loop dashboard
Photo by Lukas Blazek on Pexels

There is a second-order effect that deserves more attention than it gets. When an operating layer is built correctly, the humans inside it do not become redundant. They become more leveraged. Adjudicators now handle fewer cases but harder ones. Their judgment matters more, not less, because every decision they make is teaching the system. The people who used to be a cost center become the training signal. Compensation structures have not caught up to that reality yet, and that gap will produce real labor tension over the next two years.

The federal procurement story illustrates a related dynamic. When agencies negotiate access to frontier models, as seen in the recent reversal on classified model access, the interesting variable is rarely the model itself. It is whether the agency has the operational scaffolding to make the model useful. Most do not. The ones that do are, quietly, the ones nobody reads about.

Consider what a startup is actually competing against when it targets an incumbent in health claims, logistics, underwriting, legal review, or clinical operations. It is not competing against the incumbent’s current software. It is competing against the possibility that the incumbent wakes up, instruments its existing workflows, and turns twenty years of human judgment into a training corpus the startup can never assemble. The window for that awakening is narrow. It has not closed yet.

Mark Gingrich, CTO of Cohere Health, put it bluntly at HIMSS 2025: his company is not going to out-engineer well-funded AI-native competitors. It is going to out-data them. Every prior authorization pharmacists have ever resolved is already a labeled example. The company just had not been treating it that way. With the right schema, Cohere Health now produces more usable training signal per week than most startup competitors’ entire customer bases.

The uncomfortable implication for AI-native startups is that their speed advantage has a shelf life. The first eighteen months of a new category reward architectural cleanliness. The next sixty months reward data gravity. Foundation models keep commoditizing. Operating layers do not, because they are welded to the specific operational reality of a specific business.

There is a version of this story that has played out before, in other technology waves. Cloud computing was supposed to flatten the advantages of established software vendors. It did, for a while. Then the vendors who already sat inside enterprise workflows — SAP, Oracle, Salesforce — figured out how to extend those workflows into the cloud, and the flattening stopped. The same pattern is visible in how mobile conversion strategies matured once the novelty of the channel wore off and operational depth started mattering more than channel presence.

The pattern in AI will rhyme but not repeat. What is different this time is the speed of the compounding. A well-instrumented operation can improve measurably inside a single quarter. Mis-instrumented operations can degrade just as fast, because a feedback loop built on bad labels teaches the system to be confidently wrong. The tooling is powerful in both directions.

Organizations face a clear question: whether their AI will get smarter every week or stay frozen at the capability level of whatever vendor they signed with last quarter. One of those produces a durable business. The other produces a line item.

The companies that treat operational instrumentation as a strategic directive are building something. The companies that treat it as a slogan are buying demos. The distance between those two postures is the distance between the incumbents who survive this decade and the ones who spend it explaining why their pilot never scaled.

What every organization already has, and most are failing to use, is the daily record of its own expertise. The work has been happening the whole time. Someone just has to start writing it down in a format the machine can read.

Picture of Direct Message News

Direct Message News

Direct Message News is the byline under which DMNews publishes its editorial output. Our team produces content across psychology, politics, culture, digital, analysis, and news, applying the Direct Message methodology of moving beyond surface takes to deliver real clarity. Articles reflect our team's collective editorial process, sourcing, drafting, fact-checking, editing, and review, rather than a single writer's work. DMNews takes editorial responsibility for content under this byline. For more on how we work, see our editorial standards.

MOST RECENT ARTICLES

Retail stores already know what you feel before you reach the checkout

Corporations are routing billions toward predicting consumer behavior and almost nothing toward resolving what consumers are actually complaining about

Hyperpersonalization wins customers but it dies in the CFO’s office if you can’t prove the margin

Publishers keep chasing engagement when the real problem is they forgot what attention feels like

Most SEO keyword tools vanish the moment you leave the search bar

While everyone chased digital in 2015, direct mail quietly delivered six times the response rate