Getting started with

What can you do with this package?

  • Fill in missing data via imputation
  • Create and compare models based on your data
  • Save a model to produce daily predictions
  • Write predictions back to a database
  • Learn what factor drives each prediction



  • If you haven't, install 64-bit Python 3.5 via the Anaconda distribution
  • Open the terminal (i.e., CMD or PowerShell, if using Windows)
  • Run conda install pyodbc
  • Upgrade to latest scipy (note that upgrade command took forever)
  • Run conda remove scipy
  • Run conda install scipy
  • To install the latest release, run
    • pip install healthcareai
  • If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run
    • pip install


You may need to install the following dependencies: - sudo apt-get install python-tk - sudo pip install pyodbc - Note you'll might run into trouble with the pyodbc dependency. You may first need to run sudo apt-get install unixodbc-dev then retry sudo pip install pyodbc. Credit stackoverflow

Once you have the dependencies satisfied run pip install healthcareai or sudo pip install healthcareai


  • pip install healthcareai or sudo pip install healthcareai

Linux and macOS (via docker)

  • Install docker
  • Clone this repo (look for the green button on the repo main page)
  • cd into the cloned directory
  • run docker build -t healthcareai .
  • run the docker instance with docker run -p 8888:8888 healthcareai
  • You should then have a jupyter notebook available on http://localhost:8888.

Verify Installation

To verify that healthcareai installed correctly, open a terminal and run python. This opens an interactive python console (also known as a REPL). Then enter this command: from healthcareai import develop_supervised_model and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please let us know or better yet post on Stack Overflow(with the healthcare-ai tag) so we can help others along this process.

Getting started

  • Visit to read the docs and find examples.
  • Open Sphinx (which installed with Anaconda) and copy the examples into a new file
  • Modify the queries and parameters to match your data
  • If you plan on deploying a model (ie, pushing predictions to SQL Server), run this in SSMS beforehand:
CREATE TABLE [SAM].[dbo].[HCPyDeployClassificationBASE] (
 [BindingID] [int] ,
 [BindingNM] [varchar] (255),
 [LastLoadDTS] [datetime2] (7),
 [PatientEncounterID] [decimal] (38, 0), --< change to your grain col
 [PredictedProbNBR] [decimal] (38, 2),
 [Factor1TXT] [varchar] (255),
 [Factor2TXT] [varchar] (255),
 [Factor3TXT] [varchar] (255))

CREATE TABLE [SAM].[dbo].[HCPyDeployRegressionBASE] (
 [BindingID] [int],
 [BindingNM] [varchar] (255),
 [LastLoadDTS] [datetime2] (7),
 [PatientEncounterID] [decimal] (38, 0), --< change to your grain col
 [PredictedValueNBR] [decimal] (38, 2),
 [Factor1TXT] [varchar] (255),
 [Factor2TXT] [varchar] (255),
 [Factor3TXT] [varchar] (255))

Note that we're currently working on easy connections to other types of databases.

For Issues

  • Double check that the code follows the examples at
  • If you're still seeing an error, create a post in our Google Group that contains
    • Details on your environment (OS, database type, R vs Py)
    • Goals (ie, what are you trying to accomplish)
    • Crystal clear steps for reproducing the error