The first ML framework
for relational learning.

Think of getML as Tensorflow – just for relational data.

Why getML?

Machine Learning models need features as an input. But building features by hand is an expensive process. Data scientists and experts spend up to 90% of their time on tasks related to feature engineering. We at getML build general-purpose algorithms for data scientists that automate feature engineering on any kind of relational data.

Billions of features with ~20 lines of Python

Benefits of getML for feature learning

Feature learning boosts your productivity

Feature learning automates manual feature engineering through supervised learning. This is preferable to writing and maintaining hundreds of SQL, pandas or R/data.table scripts for feature engineering. getML's algorithms allow data scientists to build end-to-end prediction pipelines in days instead of months.

Algorithms that discover domain specific patterns

Manual feature engineering is an error-prone, repetitive process that requires countless hours of meetings to obtain domain knowledge from experts. Using feature learning, data scientists let algorithms automatically learn all the relevant features logic straight from relational data.

Great features lead to high ML model accuracy

Improving your model performance starts with finding better features. Feature learning helps you avoid the negative impact of unknown unknowns or common time constraints in the model building phase. getML helps data scientists to deliver the most accurate prediction models, faster.

What is getML?

All you need to build
end-to-end ML pipelines.

Load Data
Python
Database connectors
Unified import interface for PostgreSQL, MySQL, MariaDB, SQLite3, SAP HANA, Greenplum or from any other ODBC compatible database
File storage
Import your data from CSV, parquet or AWS S3 buckets
Machine Learning
Feature Learning
FastProp, Multirel & Relboost for feature learning from relational data and time series
Prediction
Hyperparameter optimization
Evaluate & Deploy
Train pipelines
Wrap feature learner ensembles and predictors in end-to-end ML pipelines
Evaluate
Benchmark models & insights through features
Deploy
Use python, or deploy models behind a HTTP model server to serve predictions or feature transforms, or transpile pipelines to SQLite or Spark SQL.

getML is a high-performance machine learning framework to build regression and prediction models on any kind of relational data. It comes with an easy-to-use python API that allows to build end-to-end ML pipelines on terabytes of input data.

For maximum performance and speed

Blazing Fast C++ Engine

Core of the getML framework
Standalone application that handles I/O, feature learning & AutoML
Implements a high-performance data management layer for ML models at terabyte scale
Zero external dependencies
Explore pipelines, data frames, and engine processes

getML Interface

Comes with the getML engine
Web frontend for data exploration, easy inspection of trained models and learned features
Easy to use inside your existing Python codebase

Python API

Open-source license, available on pip
Wrapper around the getML engine for easy integration of relational learning into existing data science workflows
Sends all the instructions & data to the getML engine

How feature learning works

To find the best set of aggregation functions and conditions, getML’s supervised learning algorithms perform an iterative, tree-based search inside relational data. This allows for the automatic generation of complex features for a given target variable on a scale and accuracy that no manual or brute-force approach can match.

How do I use it?
>>> import getml

import getml

getml.set_project("loans")

population_train, population_test, order, trans, meta = getml.datasets.load_loans()

schema = getml.data.StarSchema(
    train=population_train,
    test=population_test,
    alias="population",
)

schema.join(
    trans,
    on="account_id",
    time_stamps=("date_loan", "date"),
)

schema.join(
    order,
    on="account_id",
)

schema.join(
    meta,
    on="account_id",
)

relmt = getml.feature_learning.RelMT(
    loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
)

xgboost = getml.predictors.XGBoostClassifier()

pipe = getml.pipeline.Pipeline(
    data_model=schema.data_model,
    feature_learners=relmt,
    predictors=xgboost,
)

pipe.fit(schema.train)

Try getML

It takes less than 30 seconds to get started.

Test-drive on our test cluster

To avoid set-up procedure you can test-drive getML in a docker environment on our test cluster.

Install getML locally

Starting with getML is as easy as downloading the getML suite and pip-installing the getml python API.

Benchmarks

Beating the state-of-the-art in Relational Learning

getML outperforms modern libraries and academic literature in terms of speed and accuracy.

5%

Beating state-of-the-art approaches when classifying a citation network by delivering 5% better results than academia.

11%

Outperforming Facebook’s Prophet by 11 percentage points in one-step-ahead predictions.

179x

Up to 179x faster than popular feature engineering libraries featuretools and tsfresh.