- Tension: Organizations invest heavily in data infrastructure while systematically neglecting the quality of what flows through it.
- Noise: The technology industry promotes ever-larger data platforms as solutions, obscuring that the core failure is maintenance discipline.
- Direct Message: The most expensive data problem in business has the most unglamorous fix: consistent, tedious hygiene work nobody volunteers for.
To learn more about the DM News editorial approach, explore The Direct Message methodology.
Across industries, a peculiar pattern has emerged in how organizations allocate their technology budgets. Marketing teams purchase sophisticated customer data platforms. Sales departments adopt CRM systems with AI-driven forecasting. Finance groups invest in real-time analytics dashboards.
Yet when each of these systems underperforms, the postmortem rarely points to the software itself. It points to the information inside the software: duplicated records, outdated contacts, conflicting entries, mismatched fields.
The same companies willing to spend six or seven figures on a data platform routinely balk at the labor required to keep that data accurate. Duplicate data alone costs American businesses an estimated $600 billion annually. That figure sits alongside a broader estimate that poor data management drains $3.1 trillion from the U.S. economy each year. These numbers land with force, but what makes them remarkable is how preventable the underlying problem tends to be. The records exist. The tools to deduplicate them exist. The knowledge of best practices exists. What remains stubbornly absent is the willingness to do the dull, repetitive, deeply unglamorous work of cleaning and maintaining datasets. The result is a paradox that defines the modern data economy: billions poured into collection, pennies devoted to curation.
The gulf between data ambition and data discipline
A tension runs through nearly every data-driven organization, though it rarely surfaces in strategy decks or board presentations. On one side sits the stated belief that data represents a company’s most valuable asset. On the other sits the observable reality that data stewardship roles remain understaffed, undervalued, and frequently delegated to whoever drew the short straw in a team meeting. This gap between aspiration and action reveals something deeper than a budgeting oversight. It reflects a cultural contradiction baked into how businesses relate to information.
Consider the dynamics within a typical CRM environment. Sales representatives enter records at speed, often under quota pressure, with little incentive to check whether a contact already exists. Marketing teams import purchased lists without rigorous deduplication. Customer support agents create new tickets rather than searching for existing profiles. Each actor behaves rationally within their own workflow. The cumulative effect, however, is a database that degrades steadily over time, like a filing cabinet where every employee adds folders but nobody removes duplicates or corrects mislabeled tabs.
As many as 94% of companies suspect their customer and prospect data contains errors. That statistic deserves a second look. It means the vast majority of organizations already know the problem exists. The barrier to action is rarely awareness. The barrier is that data cleaning lacks the narrative appeal of data acquisition. Nobody gets promoted for deduplicating 40,000 contact records on a Tuesday afternoon. The work is invisible when done well and catastrophic when neglected, which creates a structural incentive to delay it indefinitely.
This dynamic plays out with particular intensity in marketing departments, where duplicate records translate directly into wasted spend. Three records for one customer means three catalogs mailed, three email sequences triggered, three ad impressions served to the same person. At scale, the compounding costs become staggering. A company sending emails at $3 per thousand addresses with a 10% duplication rate bleeds money with every campaign, yet the loss hides inside aggregated metrics where it looks like normal cost-of-doing-business overhead rather than a fixable leak.
The platform promise that sidesteps the real problem
The technology industry has developed a sophisticated vocabulary for selling solutions to data problems, and much of that vocabulary directs attention toward infrastructure upgrades rather than maintenance routines. The prevailing narrative in enterprise software marketing suggests that fragmented data demands better integration tools, more powerful platforms, or AI-driven cleansing algorithms that automate the mess away. Each promise contains a kernel of truth wrapped in a significant omission: no tool eliminates the need for human judgment about data quality, and no platform can compensate for an organizational culture that treats data entry as someone else’s problem.
A study by SnapLogic found that organizations in the U.S. and U.K. lose $140 billion annually due to disconnected data, leading to inefficiencies and missed opportunities. That finding underscores a real challenge: data siloed across incompatible systems creates friction. But the industry response to disconnection has largely been to sell more connectors, more middleware, more integration layers. The possibility that organizations might benefit more from disciplined governance of existing systems than from additional systems receives far less airtime, for the straightforward reason that governance consulting generates lower margins than platform licensing.
Meanwhile, the conventional wisdom around “data-driven decision-making” often oversimplifies the relationship between data volume and decision quality. The assumption that more data yields better outcomes obscures a critical dependency: that the data being analyzed must first be accurate. Meredith Bell has highlighted that IBM estimated dirty data was costing U.S. companies $3 trillion every year as far back as 2016. A decade later, the figure has not meaningfully improved, despite an explosion in the tools available to address it. This stagnation suggests that the obstacle has never been primarily technological. The tooling has advanced. The habits have not.
The noise surrounding data strategy tends to conflate sophistication with effectiveness. Organizations chase real-time dashboards, predictive analytics, and machine learning pipelines while the source tables feeding those systems contain records that have not been audited in years. The result resembles building an increasingly powerful engine on a cracked foundation: the performance gains look impressive in isolation but remain constrained by a weakness that precedes them.
Where the real leverage sits
The $600 billion data quality problem persists because organizations optimize for the exciting parts of the data lifecycle and neglect the tedious parts. The leverage sits precisely in the tedium: consistent deduplication, regular audits, enforced entry standards, and cultural recognition that maintaining a clean dataset is skilled, essential work.
This insight resists the gravitational pull of both technology marketing and organizational politics. It offers no product to purchase and no impressive initiative to announce. It asks, instead, for sustained attention to a process that most professionals find deeply unrewarding. That mismatch between where the value lives and where the motivation lies explains the persistence of the problem better than any technical analysis.
Building the unglamorous practice that compounds
If the core challenge is cultural rather than technological, then the path forward involves changing how organizations value and incentivize data stewardship. Several practical dimensions of this shift deserve attention.
Making the cost visible at the team level. Aggregate figures like $600 billion carry rhetorical weight but limited operational influence. When a marketing director can see that 12% of the department’s email budget last quarter went to duplicate sends, the conversation changes. Translating data quality problems into department-specific financial impact creates accountability that abstract industry statistics cannot.
Embedding quality checks into workflows rather than scheduling them as separate projects. Annual data cleansing initiatives follow the same pattern as crash diets: intense short-term effort followed by a return to old habits. Organizations that reduce duplication rates sustainably tend to implement validation rules at the point of entry, requiring records to pass quality checks before they can be saved. This approach trades a small increase in friction during data creation for a large reduction in downstream waste.
Elevating the role of data stewardship within team structures. When data quality responsibilities belong to everyone in theory, they belong to nobody in practice. Assigning clear ownership, whether to dedicated data stewards or to rotating responsibilities with defined metrics, breaks the diffusion-of-responsibility pattern that allows databases to degrade unchecked.
Auditing the cost of inaction on KPIs and reporting. Duplicate records do more than inflate costs; they distort the metrics organizations use to evaluate their own performance. Revenue figures, customer acquisition costs, lifetime value calculations, and churn rates all shift when the underlying records contain duplicates. The compounding effect means that strategic decisions made on the basis of polluted data may be directionally wrong in ways that remain invisible until a thorough audit reveals the gap between reported performance and actual performance.
The $600 billion figure will continue to grow as long as organizations treat data quality as a periodic cleanup project rather than an ongoing operational discipline. The spreadsheet, the CRM, the data warehouse: these systems do exactly what they are designed to do. They store and serve back whatever gets put into them. The quality of the output depends entirely on whether someone cared enough to get the input right, and whether the organization valued that care enough to sustain it over time. The fix remains stubbornly, persistently simple. That simplicity is precisely why it keeps getting skipped.