LatinHypercubeSearch

LatinHypercubeSearch(
    param_space: Dict[str, Any],
    pipeline: Pipeline,
    score: str = rmse,
    n_iter: int = 100,
    seed: int = 5483,
    **kwargs
)

Bases: _Hyperopt

Latin hypercube sampling of the hyperparameters.

Uses a multidimensional, uniform cumulative distribution function to draw the random numbers from. For drawing n_iter samples, the distribution will be divided in n_iter*n_iter hypercubes of equal size (n_iter per dimension). n_iter of them will be selected in such a way only one per dimension is used and an independent and identically-distributed (iid) random number is drawn within the boundaries of the hypercube.

A latin hypercube search can be seen as a compromise between a grid search, which iterates through the entire hyperparameter space, and a random search, which draws completely random samples from the hyperparameter space.

Enterprise edition

This feature is exclusive to the Enterprise edition and is not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.

For licensing information and technical support, please contact us.

PARAMETER	DESCRIPTION
`param_space`	Dictionary containing numerical arrays of length two holding the lower and upper bounds of all parameters which will be altered in `pipeline` during the hyperparameter optimization. If we have two feature learners and one predictor, the hyperparameter space might look like this: `param_space = { "feature_learners": [ { "num_features": [10, 50], }, { "max_depth": [1, 10], "min_num_samples": [100, 500], "num_features": [10, 50], "reg_lambda": [0.0, 0.1], "shrinkage": [0.01, 0.4] }], "predictors": [ { "reg_lambda": [0.0, 10.0] } ] }` If we only want to optimize the predictor, then we can leave out the feature learners. TYPE: `Dict[str, Any]`
`pipeline`	Base pipeline used to derive all models fitted and scored during the hyperparameter optimization. Be careful in constructing it since only those parameters present in `param_space` will be overwritten. It defines the data schema and any hyperparameters that are not optimized. TYPE: `Pipeline`
`score`	The score to optimize. Must be from `metrics`. TYPE: `str` DEFAULT: `rmse`
`n_iter`	Number of iterations in the hyperparameter optimization and thus the number of parameter combinations to draw and evaluate. Range: [1, ∞] TYPE: `int` DEFAULT: `100`
`seed`	Seed used for the random number generator that underlies the sampling procedure to make the calculation reproducible. Due to nature of the underlying algorithm this is only the case if the fit is done without multithreading. To reflect this, a `seed` of None represents an unreproducible and is only allowed to be set to an actual integer if both `num_threads` and `n_jobs` instance variables of the `predictor` and `feature_selector` in `model` - if they are instances of either `XGBoostRegressor` or `XGBoostClassifier` - are set to 1. Internally, a `seed` of None will be mapped to 5543. Range: [0, ∞] TYPE: `int` DEFAULT: `5483`

Example

from getml import data
from getml import datasets
from getml import engine
from getml import feature_learning
from getml.feature_learning import aggregations
from getml.feature_learning import loss_functions
from getml import hyperopt
from getml import pipeline
from getml import predictors

# ----------------

engine.set_project("examples")

# ----------------

population_table, peripheral_table = datasets.make_numerical()

# ----------------
# Construct placeholders

population_placeholder = data.Placeholder("POPULATION")
peripheral_placeholder = data.Placeholder("PERIPHERAL")
population_placeholder.join(peripheral_placeholder, "join_key", "time_stamp")

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

fe1 = feature_learning.Multirel(
    aggregation=[
        aggregations.COUNT,
        aggregations.SUM
    ],
    loss_function=loss_functions.SquareLoss,
    num_features=10,
    share_aggregations=1.0,
    max_length=1,
    num_threads=0
)

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

fe2 = feature_learning.Relboost(
    loss_function=loss_functions.SquareLoss,
    num_features=10
)

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

predictor = predictors.LinearRegression()

# ----------------

pipe = pipeline.Pipeline(
    population=population_placeholder,
    peripheral=[peripheral_placeholder],
    feature_learners=[fe1, fe2],
    predictors=[predictor]
)

# ----------------
# Build a hyperparameter space.
# We have two feature learners and one
# predictor, so this is how we must
# construct our hyperparameter space.
# If we only wanted to optimize the predictor,
# we could just leave out the feature_learners.

param_space = {
    "feature_learners": [
        {
            "num_features": [10, 50],
        },
        {
            "max_depth": [1, 10],
            "min_num_samples": [100, 500],
            "num_features": [10, 50],
            "reg_lambda": [0.0, 0.1],
            "shrinkage": [0.01, 0.4]
        }],
    "predictors": [
        {
            "reg_lambda": [0.0, 10.0]
        }
    ]
}

# ----------------
# Wrap a LatinHypercubeSearch around the reference model

latin_search = hyperopt.LatinHypercubeSearch(
    pipeline=pipe,
    param_space=param_space,
    n_iter=30,
    score=pipeline.metrics.rsquared
)

latin_search.fit(
    population_table_training=population_table,
    population_table_validation=population_table,
    peripheral_tables=[peripheral_table]
)

Source code in getml/hyperopt/hyperopt.py

def __init__(
    self,
    param_space: Dict[str, Any],
    pipeline: Pipeline,
    score: str = metrics.rmse,
    n_iter: int = 100,
    seed: int = 5483,
    **kwargs,
):
    super().__init__(
        param_space=param_space,
        pipeline=pipeline,
        score=score,
        n_iter=n_iter,
        seed=seed,
        **kwargs,
    )

    self._type = "LatinHypercubeSearch"

    self.surrogate_burn_in_algorithm = latin_hypercube

    self.validate()

best_pipeline `property`

best_pipeline: Pipeline

The best pipeline that is part of the hyperparameter optimization.

This is always based on the validation data you have passed even if you have chosen to score the pipeline on other data afterwards.

RETURNS	DESCRIPTION
`Pipeline`	The best pipeline.

id `property`

id: str

Name of the hyperparameter optimization. This is used to uniquely identify it on the engine.

RETURNS	DESCRIPTION
`str`	The name of the hyperparameter optimization.

name `property`

name: str

Returns the ID of the hyperparameter optimization. The name property is kept for backward compatibility.

RETURNS	DESCRIPTION
`str`	The name of the hyperparameter optimization.

score `property`

score: str

The score to be optimized.

RETURNS	DESCRIPTION
`str`	The score to be optimized.

type `property`

type: str

The algorithm used for the hyperparameter optimization.

RETURNS	DESCRIPTION
`str`	The algorithm used for the hyperparameter optimization.

clean_up

clean_up() -> None

Deletes all pipelines associated with hyperparameter optimization, but the best pipeline.

Source code in getml/hyperopt/hyperopt.py

def clean_up(self) -> None:
    """
    Deletes all pipelines associated with hyperparameter optimization,
    but the best pipeline.
    """
    best_pipeline = self._best_pipeline_name()
    names = [obj["pipeline_name"] for obj in self.evaluations]
    for name in names:
        if name == best_pipeline:
            continue
        if exists(name):
            delete(name)

fit

fit(
    container: Union[Container, StarSchema, TimeSeries],
    train: str = "train",
    validation: str = "validation",
) -> _Hyperopt

Launches the hyperparameter optimization.

PARAMETER	DESCRIPTION
`container`	The data container used for the hyperparameter tuning. TYPE: `Union[Container, StarSchema, TimeSeries]`
`train`	The name of the subset in 'container' used for training. TYPE: `str` DEFAULT: `'train'`
`validation`	The name of the subset in 'container' used for validation. TYPE: `str` DEFAULT: `'validation'`

RETURNS	DESCRIPTION
`_Hyperopt`	The current instance.

Source code in getml/hyperopt/hyperopt.py

def fit(
    self,
    container: Union[Container, StarSchema, TimeSeries],
    train: str = "train",
    validation: str = "validation",
) -> _Hyperopt:
    """Launches the hyperparameter optimization.

    Args:
        container:
            The data container used for the hyperparameter tuning.

        train:
            The name of the subset in 'container' used for training.

        validation:
            The name of the subset in 'container' used for validation.

    Returns:
        The current instance.
    """

    if isinstance(container, (StarSchema, TimeSeries)):
        container = container.container

    if not isinstance(container, Container):
        raise TypeError(
            "'container' must be a `~getml.data.Container`, "
            + "a `~getml.data.StarSchema` or a `~getml.data.TimeSeries`"
        )

    if not isinstance(train, str):
        raise TypeError("""'train' must be a string""")

    if not isinstance(validation, str):
        raise TypeError("""'validation' must be a string""")

    self.pipeline.check(container[train])

    population_table_training = container[train].population

    population_table_validation = container[validation].population

    peripheral_tables = _transform_peripheral(
        container[train].peripheral, self.pipeline.peripheral
    )

    self._send()

    cmd: Dict[str, Any] = {}

    cmd["name_"] = self.id
    cmd["type_"] = "Hyperopt.launch"

    cmd["population_training_df_"] = population_table_training._getml_deserialize()

    cmd["population_validation_df_"] = (
        population_table_validation._getml_deserialize()
    )

    cmd["peripheral_dfs_"] = [
        elem._getml_deserialize() for elem in peripheral_tables
    ]

    with comm.send_and_get_socket(cmd) as sock:
        begin = time.monotonic()
        msg = comm.log(sock, extra={"cmd": cmd})
        end = time.monotonic()

    if msg != "Success!":
        comm.handle_engine_exception(msg)

    _print_time_taken(begin, end, "Time taken: ")

    self._save()

    return self.refresh()

refresh

refresh() -> _Hyperopt

Reloads the hyperparameter optimization from the Engine.

RETURNS	DESCRIPTION
`_Hyperopt`	Current instance

Source code in getml/hyperopt/hyperopt.py

def refresh(self) -> _Hyperopt:
    """Reloads the hyperparameter optimization from the Engine.

    Returns:
            Current instance

    """
    json_obj = _get_json_obj(self.id)
    return self._parse_json_obj(json_obj)

validate

validate() -> None

Validate the parameters of the hyperparameter optimization.

Source code in getml/hyperopt/hyperopt.py

def validate(self) -> None:
    """
    Validate the parameters of the hyperparameter optimization.
    """
    _validate_hyperopt(_Hyperopt._supported_params, **self.__dict__)  # type: ignore

    if self.surrogate_burn_in_algorithm != latin_hypercube:
        raise ValueError(
            "'surrogate_burn_in_algorithm' must be '" + latin_hypercube + "'."
        )

    if self.ratio_iter != 1.0:
        raise ValueError("'ratio_iter' must be 1.0.")

LatinHypercubeSearch

best_pipeline property

id property

name property

score property

type property

clean_up

fit

refresh

validate

best_pipeline `property`

id `property`

name `property`

score `property`

type `property`