The messy healthcare
data problem

Healthcare data is messy and riddled with data quality issues. Data teams that successfully clean their data are still just left with raw healthcare data. However, doing any analytics or machine learning requires building high-level concepts (e.g. features) on top of the raw data.

Fixing data quality issues and defining high-level concepts like the examples listed here requires significant healthcare domain knowledge. For example:

What defines a type 2 diabetes patient?
Does this visit count as an unplanned hospital readmission?
How do you merge facility and professional claims into single encounters?

The Tuva Project is an open source dbt project that makes this knowledge available as code.


How we solve it

The Tuva open source project is a collection of healthcare-specific data quality tests and high-level concepts that can be defined using raw healthcare data. Our high-level concepts are organized into a collection of data marts that give data teams more data building blocks to work with.

Open Source

Who we serve


Digital health companies


Health technology companies