Enterprise Edition Notebooks
A diverse collection of Jupyter Notebooks that showcase relational datasets across various domains, addressing typical data science challenges like binary classification on time series and regression with complex relational data, using publicly available datasets for benchmarking.
Algorithms and Predictors
Serving as both documentation and practical blueprints, these notebooks demonstrate the performance of getML's feature engineering algorithms (FastProp
, Multirel
, Relboost
, RelMT
) and predictors (LinearRegression
, LogisticRegression
, XGBoostClassifier
, XGBoostRegressor
) against competing tools like featuretools, tsfresh, and Prophet.
Enterprise edition
While FastProp excels in speed and resource efficiency, more advanced algorithms only available in the Enterprise Edition, deliver higher accuracy with even lower resource demands. Discover the benefits of the Enterprise edition and compare their features.
Overview
Task | Data | Size | Domain | |
---|---|---|---|---|
AdventureWorks - Predicting customer churn | Classification | Relational | 71 tables, 233 MB | Commerce |
Air Pollution - Why feature learning is better than simple propositionalization | Regression | Multivariate time series | 1 table, 41k rows | Environment |
Atherosclerosis - Disease lethality prediction | Classification | Relational | 3 tables, 22 MB | Health |
Baseball - Predicting players' salary | Regression | Relational | 25 tables, 74 MB | Sports |
Consumer expenditure - Why relational learning matters | Classification | Relational | 3 tables, 150 MB | E-commerce |
CORA - Categorizing academic publications | Classification | Relational | 3 tables, 4.6 MB | Academia |
Dodgers - Traffic volume prediction | Regression | Multivariate time series | 1 table, 47k rows | Transportation |
Formula 1 - Predicting the winner of a race | Classification | Relational | 13 tables, 56 MB | Sports |
IMDb - Predicting actors' gender | Classification | Relational with text | 7 tables, 477.1 MB | Entertainment |
Interstate 94 - Multivariate time series prediction | Regression | Multivariate time series | 1 table, 24k rows | Transportation |
Loans - Predicting loan default risk | Classification | Relational | 8 tables, 60 MB | Financial |
MovieLens - Predicting a user's gender based on the movies they have watched | Classification | Relational | 7 tables, 20 MB | Entertainment |
Occupancy - A multivariate time series example | Classification | Multivariate time series | 1 table, 32k rows | Energy |
Online Retail - Predicting order cancellations | Classification | Relational | 1 table, 398k rows | E-commerce |
Robot - Feature engineering on sensor data | Regression | Multivariate time series | 1 table, 15k rows | Robotics |
Seznam - Predicting transaction volume | Regression | Relational | 4 tables, 147 MB | E-commerce |
SFScores - Predicting health inspection scores of restaurants | Regression | Relational | 3 tables, 9 MB | Restaurants |
StatsExchange - Predicting users' reputations | Regression | Relational | 8 tables, 658 MB | Internet |
Source
These notebooks are published on the getml-demo repository.