Predicting

Now that you know how to engineer a flat table of features, you are ready to make predictions of the target variable(s).

Using getML

getML comes with six built-in machine learning predictors:

Enterprise edition

ScaleGBMClassifier and ScaleGBMRegressor are exclusive to the Enterprise edition and are not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.

For licensing information and technical support, please contact us.

Using one of them in your analysis is very simple. Just pass one as the predictor argument to either Pipeline on initialization. As a list, more than one predictor can be passed to the pipeline.

feature_learner1 = getml.feature_learners.Relboost()

feature_learner2 = getml.feature_learners.Multirel()

predictor = getml.predictors.XGBoostRegressor()

pipe = getml.pipeline.Pipeline(
    data_model=data_model,
    peripheral=peripheral_placeholder,
    feature_learners=[feature_learner1, feature_learner2],
    predictors=predictor,
)

When you call fit() on a pipeline, the entire pipeline will be trained.

Note

The time estimation for training a pipeline is a rough estimate. Occasionally, the training time can be significantly longer than the estimate. But the pipeline never silently crashes. Given enough time, computations always finish.

Note that Pipeline comes with dependency tracking. That means it can figure out on its own what has changed and what needs to be trained again.

feature_learner1 = getml.feature_learners.Relboost()

feature_learner2 = getml.feature_learners.Multirel()

predictor = getml.predictors.XGBoostRegressor()

pipe = getml.pipeline.Pipeline(
    data_model=data_model,
    population=population_placeholder,
    peripheral=peripheral_placeholder,
    feature_learners=[feature_learner1, feature_learner2],
    predictors=predictor 
)

pipe.fit(...)

pipe.predictors[0].n_estimators = 50

# Only the predictor has changed,
# so only the predictor will be refitted.
pipe.fit(...)

To score the performance of your prediction on a test dataset, the getML models come with a score() method. The available metrics are documented in metrics.

To use a trained model, including both the trained features and the predictor, to make predictions on new, unseen data, call the predict() method of your model.

Using external software

In our experience, the most relevant contribution to making accurate predictions are the generated features. Before trying to tweak your analysis by using sophisticated prediction algorithms and tuning their hyperparameters, we recommend tuning the hyperparameters of your Multirel or Relboost instead. You can do so either by hand or using getML's automated hyperparameter optimization.

If you wish to use external predictors, you can transform new data, which is compliant with your relational data model, to a flat feature table using the transform() method of your pipeline.