Skip to content

RandomSearch

RandomSearch(
    param_space: Dict[str, Any],
    pipeline: Pipeline,
    score: str = rmse,
    n_iter: int = 100,
    seed: int = 5483,
    **kwargs
)

Bases: _Hyperopt

Uniformly distributed sampling of the hyperparameters.

During every iteration, a new set of hyperparameters is chosen at random by uniformly drawing a random value in between the lower and upper bound for each dimension of param_space independently.

Enterprise edition

This feature is exclusive to the Enterprise edition and is not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.

For licensing information and technical support, please contact us.

PARAMETER DESCRIPTION
param_space

Dictionary containing numerical arrays of length two holding the lower and upper bounds of all parameters which will be altered in pipeline during the hyperparameter optimization.

If we have two feature learners and one predictor, the hyperparameter space might look like this:

param_space = {
    "feature_learners": [
        {
            "num_features": [10, 50],
        },
        {
            "max_depth": [1, 10],
            "min_num_samples": [100, 500],
            "num_features": [10, 50],
            "reg_lambda": [0.0, 0.1],
            "shrinkage": [0.01, 0.4]
        }],
    "predictors": [
        {
            "reg_lambda": [0.0, 10.0]
        }
    ]
}
If we only want to optimize the predictor, then we can leave out the feature learners.

TYPE: Dict[str, Any]

pipeline

Base pipeline used to derive all models fitted and scored during the hyperparameter optimization. Be careful in constructing it since only those parameters present in param_space will be overwritten. It defines the data schema and any hyperparameters that are not optimized.

TYPE: Pipeline

score

The score to optimize. Must be from metrics.

TYPE: str DEFAULT: rmse

n_iter

Number of iterations in the hyperparameter optimization and thus the number of parameter combinations to draw and evaluate. Range: [1, ∞]

TYPE: int DEFAULT: 100

seed

Seed used for the random number generator that underlies the sampling procedure to make the calculation reproducible. Due to nature of the underlying algorithm this is only the case if the fit is done without multithreading. To reflect this, a seed of None represents an unreproducible and is only allowed to be set to an actual integer if both num_threads and n_jobs instance variables of the predictor and feature_selector in model - if they are instances of either XGBoostRegressor or XGBoostClassifier - are set to 1. Internally, a seed of None will be mapped to 5543. Range: [0, ∞]

TYPE: int DEFAULT: 5483

Example
from getml import data
from getml import datasets
from getml import engine
from getml import feature_learning
from getml.feature_learning import aggregations
from getml.feature_learning import loss_functions
from getml import hyperopt
from getml import pipeline
from getml import predictors

# ----------------

engine.set_project("examples")

# ----------------

population_table, peripheral_table = datasets.make_numerical()

# ----------------
# Construct placeholders

population_placeholder = data.Placeholder("POPULATION")
peripheral_placeholder = data.Placeholder("PERIPHERAL")
population_placeholder.join(peripheral_placeholder, "join_key", "time_stamp")

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

fe1 = feature_learning.Multirel(
    aggregation=[
        aggregations.COUNT,
        aggregations.SUM
    ],
    loss_function=loss_functions.SquareLoss,
    num_features=10,
    share_aggregations=1.0,
    max_length=1,
    num_threads=0
)

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

fe2 = feature_learning.Relboost(
    loss_function=loss_functions.SquareLoss,
    num_features=10
)

# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.

predictor = predictors.LinearRegression()

# ----------------

pipe = pipeline.Pipeline(
    population=population_placeholder,
    peripheral=[peripheral_placeholder],
    feature_learners=[fe1, fe2],
    predictors=[predictor]
)

# ----------------
# Build a hyperparameter space.
# We have two feature learners and one
# predictor, so this is how we must
# construct our hyperparameter space.
# If we only wanted to optimize the predictor,
# we could just leave out the feature_learners.

param_space = {
    "feature_learners": [
        {
            "num_features": [10, 50],
        },
        {
            "max_depth": [1, 10],
            "min_num_samples": [100, 500],
            "num_features": [10, 50],
            "reg_lambda": [0.0, 0.1],
            "shrinkage": [0.01, 0.4]
        }],
    "predictors": [
        {
            "reg_lambda": [0.0, 10.0]
        }
    ]
}

# ----------------
# Wrap a RandomSearch around the reference model

random_search = hyperopt.RandomSearch(
    pipeline=pipe,
    param_space=param_space,
    n_iter=30,
    score=pipeline.metrics.rsquared
)

random_search.fit(
    population_table_training=population_table,
    population_table_validation=population_table,
    peripheral_tables=[peripheral_table]
)
Source code in getml/hyperopt/hyperopt.py
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
def __init__(
    self,
    param_space: Dict[str, Any],
    pipeline: Pipeline,
    score: str = metrics.rmse,
    n_iter: int = 100,
    seed: int = 5483,
    **kwargs,
):
    super().__init__(
        param_space=param_space,
        pipeline=pipeline,
        score=score,
        n_iter=n_iter,
        seed=seed,
        **kwargs,
    )

    self._type = "RandomSearch"

    self.surrogate_burn_in_algorithm = random

    self.validate()

best_pipeline property

best_pipeline: Pipeline

The best pipeline that is part of the hyperparameter optimization.

This is always based on the validation data you have passed even if you have chosen to score the pipeline on other data afterwards.

RETURNS DESCRIPTION
Pipeline

The best pipeline.

id property

id: str

Name of the hyperparameter optimization. This is used to uniquely identify it on the engine.

RETURNS DESCRIPTION
str

The name of the hyperparameter optimization.

name property

name: str

Returns the ID of the hyperparameter optimization. The name property is kept for backward compatibility.

RETURNS DESCRIPTION
str

The name of the hyperparameter optimization.

score property

score: str

The score to be optimized.

RETURNS DESCRIPTION
str

The score to be optimized.

type property

type: str

The algorithm used for the hyperparameter optimization.

RETURNS DESCRIPTION
str

The algorithm used for the hyperparameter optimization.

clean_up

clean_up() -> None

Deletes all pipelines associated with hyperparameter optimization, but the best pipeline.

Source code in getml/hyperopt/hyperopt.py
246
247
248
249
250
251
252
253
254
255
256
257
def clean_up(self) -> None:
    """
    Deletes all pipelines associated with hyperparameter optimization,
    but the best pipeline.
    """
    best_pipeline = self._best_pipeline_name()
    names = [obj["pipeline_name"] for obj in self.evaluations]
    for name in names:
        if name == best_pipeline:
            continue
        if exists(name):
            delete(name)

fit

fit(
    container: Union[Container, StarSchema, TimeSeries],
    train: str = "train",
    validation: str = "validation",
) -> _Hyperopt

Launches the hyperparameter optimization.

PARAMETER DESCRIPTION
container

The data container used for the hyperparameter tuning.

TYPE: Union[Container, StarSchema, TimeSeries]

train

The name of the subset in 'container' used for training.

TYPE: str DEFAULT: 'train'

validation

The name of the subset in 'container' used for validation.

TYPE: str DEFAULT: 'validation'

RETURNS DESCRIPTION
_Hyperopt

The current instance.

Source code in getml/hyperopt/hyperopt.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
def fit(
    self,
    container: Union[Container, StarSchema, TimeSeries],
    train: str = "train",
    validation: str = "validation",
) -> _Hyperopt:
    """Launches the hyperparameter optimization.

    Args:
        container:
            The data container used for the hyperparameter tuning.

        train:
            The name of the subset in 'container' used for training.

        validation:
            The name of the subset in 'container' used for validation.

    Returns:
        The current instance.
    """

    if isinstance(container, (StarSchema, TimeSeries)):
        container = container.container

    if not isinstance(container, Container):
        raise TypeError(
            "'container' must be a `~getml.data.Container`, "
            + "a `~getml.data.StarSchema` or a `~getml.data.TimeSeries`"
        )

    if not isinstance(train, str):
        raise TypeError("""'train' must be a string""")

    if not isinstance(validation, str):
        raise TypeError("""'validation' must be a string""")

    self.pipeline.check(container[train])

    population_table_training = container[train].population

    population_table_validation = container[validation].population

    peripheral_tables = _transform_peripheral(
        container[train].peripheral, self.pipeline.peripheral
    )

    self._send()

    cmd: Dict[str, Any] = {}

    cmd["name_"] = self.id
    cmd["type_"] = "Hyperopt.launch"

    cmd["population_training_df_"] = population_table_training._getml_deserialize()

    cmd["population_validation_df_"] = (
        population_table_validation._getml_deserialize()
    )

    cmd["peripheral_dfs_"] = [
        elem._getml_deserialize() for elem in peripheral_tables
    ]

    with comm.send_and_get_socket(cmd) as sock:
        begin = time.monotonic()
        msg = comm.log(sock)
        end = time.monotonic()

    if msg != "Success!":
        comm.handle_engine_exception(msg)

    _print_time_taken(begin, end, "Time taken: ")

    self._save()

    return self.refresh()

refresh

refresh() -> _Hyperopt

Reloads the hyperparameter optimization from the Engine.

RETURNS DESCRIPTION
_Hyperopt

Current instance

Source code in getml/hyperopt/hyperopt.py
367
368
369
370
371
372
373
374
375
def refresh(self) -> _Hyperopt:
    """Reloads the hyperparameter optimization from the Engine.

    Returns:
            Current instance

    """
    json_obj = _get_json_obj(self.id)
    return self._parse_json_obj(json_obj)

validate

validate() -> None

Validate the parameters of the hyperparameter optimization.

Source code in getml/hyperopt/hyperopt.py
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
def validate(self) -> None:
    """
    Validate the parameters of the hyperparameter optimization.
    """
    _validate_hyperopt(_Hyperopt._supported_params, **self.__dict__)  # type: ignore

    if self.surrogate_burn_in_algorithm != random:
        raise ValueError("'surrogate_burn_in_algorithm' must be '" + random + "'.")

    if self.ratio_iter != 1.0:
        raise ValueError("'ratio_iter' must be 1.0.")