What Business Identity Data Reveals About Medicaid's Biggest Billers

Overview

Medicaid serves over 90 million Americans. It is the single largest source of health coverage in the country, funded by every taxpayer in every state. When fraud drains the program, the cost is not abstract. It means fewer resources for the people Medicaid is designed to serve, higher costs for providers doing legitimate work, and a growing bill that falls on all of us.

Last week, HHS published the largest Medicaid claims dataset in agency history: 10.32 GB of aggregated provider-level claims, procedures, and payments from 2018 through 2024. For the first time, this data is available for public analysis. Regardless of where you sit politically, more transparency into how public dollars are spent is a good thing.

At Middesk, we power business identity intelligence for companies like Plaid, Affirm, and Brex. We are connected to all 50 Secretaries of State, the IRS, and hundreds of government agencies. Our platform verifies over a million businesses and is used to ensure our customers are working with legitimate companies, from newly formed startups to large multinational corporations.

When we saw this dataset, we asked a straightforward question: what happens when you stop looking at what was billed and start looking at who's behind the billing?

Here's what we found.

The dataset

From 2018 to 2024, $1.09 trillion in Medicaid payments was distributed to approximately 1.6 million providers. The largest category by spend and claim frequency was personal care services (in-home visits) at $122 billion. The fastest growing category by volume was behavioral health and substance abuse, which increased more than 450% over those six years.

These numbers tell you about billing patterns. They don't tell you whether the entities doing the billing are legitimate, active, or even real. That distinction matters because every dollar that goes to a fraudulent provider is a dollar that doesn't reach an eligible beneficiary or a legitimate provider delivering care.

That's where business identity comes in.

How we approached it

We started with a known set of bad actors: 1,489 National Provider Identifiers (NPIs) that are either listed on the HHS Office of Inspector General's List of Excluded Individuals and Entities (LEIE), meaning they have been barred from federal healthcare programs for criminal activity or professional misconduct, or whose NPI records have been deactivated by CMS.

These are providers the federal government has already flagged and therefore, they should not be receiving Medicaid payments. Yet we identified 1,175 of them collecting a combined $563 million in payouts, roughly $479,000 per excluded provider. Another $155 million went to providers after their licenses were revoked. Not suspended, not under review. Revoked.

Then we went further.

Using Middesk's business graph, we traced connections outward from those 1,489 known-fraud providers. The idea is simple: fraudulent providers rarely operate alone. They share addresses, officers, and ownership structures with other entities that appear independent on paper. By cross-referencing state business filings, registered agent records, and officer data, we can surface those hidden connections. If a blacklisted provider operates out of the same address or shares an owner with another billing entity, that second entity warrants scrutiny. This process is called entity resolution, and it's the same intelligence Middesk uses every day to verify businesses for banks and fintechs.

This process identified 1,329 additional providers connected to the original set, responsible for $953 million in additional Medicaid payouts. Combined with the original seeds, the total exposure across this connected network is $1.7 billion.

What the data shows

Three patterns emerged when we cross-referenced the Medicaid claims data against our business identity intelligence.

Payments to excluded providers

$563 million in Medicaid payments went to 1,175 providers who appear on the OIG's exclusion list, entities barred from federal healthcare programs for criminal convictions, patient abuse, licensing revocations, or fraud. These aren't edge cases or gray areas. The federal government has already determined these providers should not participate in the program, yet they're still getting paid.

An additional $155 million went to providers whose licenses had been formally revoked by their state licensing authority. In a financial services context, this would be the equivalent of continuing to process transactions for a company whose banking charter has been pulled. It doesn't happen in finance because identity verification catches it in real time. The same capability should exist here.

Connected networks through shared identity signals

Starting from those known-fraud providers, entity resolution surfaced 1,329 additional providers that share addresses, officers, or ownership structures with the excluded entities. These connected providers collected $953 million in Medicaid payments.

The connections aren't visible in claims data. Each entity files its own NPI, bills under its own name, and appears independent. The links only emerge when you cross-reference state business filings, registered agent records, address classifications, and officer data. These are the same data sources Middesk uses to verify businesses for banks and fintechs.

The fastest-growing categories are the most vulnerable

Personal care services (billing code T1019) accounted for $122 billion in the dataset, making it the largest category by spend. Behavioral health and substance abuse claims grew more than 450% over the course of those six years. Both categories share characteristics that make them structurally vulnerable to fraud: services are delivered in homes or community settings with limited oversight, documentation requirements are minimal compared to clinical procedures, and the barrier to becoming a licensed provider is relatively low.

The networks we identified are concentrated in these categories. Fraud tends to gravitate toward the areas with the least structural resistance, which also happen to be the areas where demand is growing fastest and where vulnerable populations depend most on legitimate service delivery.

Why claims data alone isn't enough

The most important thing we learned from this analysis isn't about the dollar amounts. It's about how the fraud is structured.

The conversation around this dataset has focused on which providers billed the most and which codes grew the fastest. That's useful context, but it misses how sophisticated fraud actually works. The most effective schemes don't involve suspicious billing patterns that are easy to spot. They involve networks of entities, each billing plausible amounts, designed to look independent, connected in ways that claims data simply can't see.

A single billing entity submitting $37 million in personal care claims might be a large home health agency. Seven entities at the same residential address, controlled by the same two people, filed on the same day with sequential formation numbers, sharing a phone number, billing $82 million collectively under a single code with zero web presence — that's a different picture. But you only see it when you combine claims data with business identity data.

Claims data shows what was billed. Business identity data shows who's behind the billing and whether those entities are real, active, and independent. Entity resolution is what connects the two.

What we think this means

This analysis is a starting point, not a conclusion. Entity resolution surfaces structural risk signals, including shared addresses, overlapping officers, and coordinated formation patterns, that warrant investigation. It does not determine guilt. Some of the connections we surfaced will have legitimate explanations, and we flagged false positives in our own analysis where large health systems triggered connections simply because of their size and complexity.

That said, three of our top clusters are already corroborated by existing OIG audit findings, DOJ enforcement actions, or state-level fraud investigations. Entity resolution didn't invent those connections. It found them faster, and revealed the full network around already-known bad actors.

Our analysis specifically focused on providers already on the LEIE exclusion list, deactivated NPIs, and the entities directly connected to them. That's the most conservative approach we could take. Expanding from here, into anomalies in average claim sizes, geographic clustering, temporal patterns in entity formation relative to billing activity, and further provider legitimacy checks beyond blacklists, would surface significantly more.

The bigger picture

Making this data publicly available was a meaningful step forward for program integrity. Fraud in Medicaid is not a partisan issue. It raises costs for taxpayers across the political spectrum, degrades the quality of care for beneficiaries who depend on the program, and undermines the legitimate providers, many of them small businesses, who deliver services honestly. Better data and better tools to analyze it serve everyone's interest.

But transparency alone produces noise, not signal. A spreadsheet of claims doesn't show you that a network of entities at a single residential address shares officers, formation dates, and a phone number while billing $82 million under one code. That picture only emerges when you layer business identity data on top, cross-referencing state filings, address classifications, officer records, and entity formation timelines.

At Middesk, this is what we do every day for the world's leading banks and fintechs. The same infrastructure that prevents fraudulent businesses from opening accounts can identify providers billing government healthcare programs through networks of connected entities. We think these tools should be part of how we protect public programs. Not as a political exercise, but as a basic expectation of how public money is managed.

The data is public, the tools exist, and the fraud is findable. The only question is whether we move fast enough to act on it.

If you're a journalist, researcher, or data analyst working with this dataset, we'd like to help. Reach out at [email protected].

What business identity data reveals about Medicaid's biggest billers

What business identity data reveals about Medicaid's biggest billers

Authors

Kyle Mack

Overview

The dataset

How we approached it

What the data shows

Payments to excluded providers

Connected networks through shared identity signals

The fastest-growing categories are the most vulnerable

Why claims data alone isn't enough

What we think this means

The bigger picture

Related Articles

Navigating the AML Act of 2020 & New CFT Regulations

Introducing Signal: Simplify Business Onboarding with Real-time Insights

Best Identity Fraud Protection Services for Businesses

Say yes with confidence