The messy healthcare
data problem

Healthcare data is messy and riddled with data quality issues. Data teams that successfully clean their data are still just left with raw healthcare data. However, doing any analytics or machine learning requires building high-level concepts (e.g. features) on top of the raw data.
Fixing data quality issues and defining high-level concepts like the examples listed here requires significant healthcare domain knowledge. For example:
How we solve it

The Tuva open source project is a collection of healthcare-specific data quality tests and high-level concepts that can be defined using raw healthcare data. Our high-level concepts are organized into a collection of data marts that give data teams more data building blocks to work with.


