Skip to content

Enterprise Edition Notebooks

A diverse collection of Jupyter Notebooks that showcase relational datasets across various domains, addressing typical data science challenges like binary classification on time series and regression with complex relational data, using publicly available datasets for benchmarking.

Algorithms and Predictors

Serving as both documentation and practical blueprints, these notebooks demonstrate the performance of getML's feature engineering algorithms (FastProp, Multirel, Relboost, RelMT) and predictors (LinearRegression, LogisticRegression, XGBoostClassifier, XGBoostRegressor ) against competing tools like featuretools, tsfresh, and Prophet.

Enterprise edition

While FastProp excels in speed and resource efficiency, more advanced algorithms only available in the Enterprise Edition, deliver higher accuracy with even lower resource demands. Discover the benefits of the Enterprise edition and compare their features.

Overview

Task Data Size Domain
AdventureWorks - Predicting customer churn Classification Relational 71 tables, 233 MB Commerce
Air Pollution - Why feature learning is better than simple propositionalization Regression Multivariate time series 1 table, 41k rows Environment
Atherosclerosis - Disease lethality prediction Classification Relational 3 tables, 22 MB Health
Baseball - Predicting players' salary Regression Relational 25 tables, 74 MB Sports
Consumer expenditure - Why relational learning matters Classification Relational 3 tables, 150 MB E-commerce
CORA - Categorizing academic publications Classification Relational 3 tables, 4.6 MB Academia
Dodgers - Traffic volume prediction Regression Multivariate time series 1 table, 47k rows Transportation
Formula 1 - Predicting the winner of a race Classification Relational 13 tables, 56 MB Sports
IMDb - Predicting actors' gender Classification Relational with text 7 tables, 477.1 MB Entertainment
Interstate 94 - Multivariate time series prediction Regression Multivariate time series 1 table, 24k rows Transportation
Loans - Predicting loan default risk Classification Relational 8 tables, 60 MB Financial
MovieLens - Predicting a user's gender based on the movies they have watched Classification Relational 7 tables, 20 MB Entertainment
Occupancy - A multivariate time series example Classification Multivariate time series 1 table, 32k rows Energy
Online Retail - Predicting order cancellations Classification Relational 1 table, 398k rows E-commerce
Robot - Feature engineering on sensor data Regression Multivariate time series 1 table, 15k rows Robotics
Seznam - Predicting transaction volume Regression Relational 4 tables, 147 MB E-commerce
SFScores - Predicting health inspection scores of restaurants Regression Relational 3 tables, 9 MB Restaurants
StatsExchange - Predicting users' reputations Regression Relational 8 tables, 658 MB Internet

Source

These notebooks are published on the getml-demo repository.