Yan Zeng is Head of Data at Middesk, where he leads the Data Science, AI/ML, and Data Platform teams. He brings over 15 years of experience building and scaling data-driven products in fintech, with a focus on trust, risk, and payments innovation. Yan helped build and subsequently led Plaid’s payment risk product suite; he was the founding product manager for Signal, and previously held data leadership roles at Intuit, Gusto, and Lending Club.
Now at Middesk, Yan is channeling that experience into building both a world-class data team and the next generation of data and AI systems — helping businesses verify and monitor with greater accuracy, detect fraud earlier, and strengthen trust across the business ecosystem.
What drew you to Middesk and this opportunity?
I was drawn to Middesk for two main reasons. First, we’re building a compounding network around business identity — something few companies can do at scale. Middesk has a unique combination of network effects and proprietary data assets, which allows us to move beyond simple verification into adjacent products like fraud detection and ongoing risk monitoring.
The second reason is the caliber of the team. I’m surrounded by talented data engineers, data scientists, and machine learning experts who are deeply motivated by solving hard, meaningful problems. It’s a privilege to build alongside them.
What kinds of problems is your team solving right now?
The data organization at Middesk is made up of two core teams: Data Science and Data Platform. On the DS side, we build the ML- and AI-powered products that reduce information asymmetry when our customers onboard a new business—things like industry classification to manage compliance risk, and business and people graphs powered by Middesk’s network data to surface orthogonal insights for fraud detection and lead generation.
The Data Platform team ingests and validates data from hundreds of sources, keeps it fresh, and makes sure our pipelines and ETLs are reliable. The two teams work closely to manage the end-to-end entity-resolution process that stitches public records to the right business, and co-own the ML infrastructure—feature store services, training and scoring pipelines, and model deployment, monitoring, and explainability.
Right now we’re focused on three big problem areas. First, expanding US business identity coverage — especially for non-registered businesses — so customers see a complete picture of who they’re onboarding. Second, strengthening business fraud detection by combining our network graph with email, phone, device, and longitudinal business history signals to detect shelf companies, fraud rings, and synthetic identities in real time. Third, reducing friction for good customers with “smart prefill,” where we use our identity graph to automatically fill verified information into applications. What excites me is that these are compounding bets: every improvement in our data and platform directly translates into better fraud, risk, and onboarding outcomes for our customers.
Can you share a bit about the technology your team uses?
We run a modern, cloud-based data and ML stack. On the data engineering side, we manage a few hundred external and first-party data sources, using dbt and Spark to build our pipelines, ETL, and analytics models. On the ML side, we’ve built an end-to-end training and serving pipeline plus a shared feature store service. We leverage Elasticsearch for low-latency search and feature retrieval, Google Cloud’s ML platform and Databricks for training and experimentation, and a versioned deployment flow so models can be rolled out safely with monitoring and explainability. What’s most exciting is how we’re leaning into AI tooling itself: we use AI coding assistants to help with things like scaffolding new data pipelines, debugging production issues during on-call, and improving our reliability and alerting mechanisms. That lets the team spend more time on higher-leverage problems like graph-based fraud detection and smarter onboarding experiences, rather than boilerplate plumbing.
How would you describe the culture on the Data team and your leadership style?
Collaboration and curiosity are at the core of how we work. Building zero-to-one products means constant iteration and cross-functional teamwork — we partner closely with Product, Engineering, Design, and GTM teams to turn ideas into real, valuable products that move the business forward. I’d describe our group as full-stack data scientists and engineers; everyone contributes to the end-to-end process, from designing features and building pipelines to delivering models in production.
As a leader, I believe in leading by example — staying close to the work, asking detailed questions, and creating a space where people feel empowered to experiment and grow while maintaining a high technical bar.
Middesk’s values include leading with curiosity, embracing ambition, winning together, and putting customers first. Which resonates most with you?
For me, putting customers first is foundational. Especially when building new products, customer feedback and discovery guide everything — it’s how we turn fragmented requests into meaningful, scalable solutions that solve real problems.
I also strongly connect with leading with curiosity. Many of our biggest advances have come from someone on the team asking, “What if we tried this?” and then doing the work together to find out. I learn a ton from those explorations, and I try to match that by continually sharpening my own skills — reading technical books and going back to audit classes at Berkeley years after graduation. That ongoing learning loop, from both the team and the broader field, helps me turn new ideas into practical bets for the business.
What excites you most about what’s ahead for data and AI at Middesk?
We’re just scratching the surface of what’s possible. With our unique combination of network data and a rich business/people graph, we have the opportunity to move beyond verification into deeper fraud detection, AI agents that streamline and automate ops workflows for our customers, and much smarter, context-aware risk insights.
On the technical side, we’re investing in the infrastructure that makes all of this actually work in the real world. That means richer graph- and embedding-based features on top of our business identity graph, and an end-to-end ML platform that can reliably train, version, and serve models across many customers and use cases. We’re also leaning into AI-assisted engineering to evaluate new data sources at scale and spin up new pipelines faster. It’s rare to have both this caliber of data foundation and the mandate to build on it, and that combination is what makes the road ahead so exciting.
What advice would you give to someone interested in joining your team?
Come with an end-to-end mindset and a bias for action. You don’t need to have done everything before, but you should be curious about how your piece connects to the larger system.
Be comfortable with ambiguity — we’re often solving problems that don’t have clear answers yet — and stay anchored in our mission to build trust across the business ecosystem. If that energizes you, this is the place to be.
Interested in building the data and ML systems that power trust for the business economy? Check out the open roles on Yan’s team.
