LatinHypercubeSearch
LatinHypercubeSearch(
param_space: Dict[str, Any],
pipeline: Pipeline,
score: str = metrics.rmse,
n_iter: int = 100,
seed: int = 5483,
**kwargs
)
Bases: _Hyperopt
Latin hypercube sampling of the hyperparameters.
Uses a multidimensional, uniform cumulative distribution function to draw the random numbers from. For drawing n_iter
samples, the distribution will be divided in n_iter
*n_iter
hypercubes of equal size (n_iter
per dimension). n_iter
of them will be selected in such a way only one per dimension is used and an independent and identically-distributed (iid) random number is drawn within the boundaries of the hypercube.
A latin hypercube search can be seen as a compromise between a grid search, which iterates through the entire hyperparameter space, and a random search, which draws completely random samples from the hyperparameter space.
Enterprise edition
This feature is exclusive to the Enterprise edition and is not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.
For licensing information and technical support, please contact us.
PARAMETER | DESCRIPTION |
---|---|
param_space | Dictionary containing numerical arrays of length two holding the lower and upper bounds of all parameters which will be altered in If we have two feature learners and one predictor, the hyperparameter space might look like this:
If we only want to optimize the predictor, then we can leave out the feature learners. |
pipeline | Base pipeline used to derive all models fitted and scored during the hyperparameter optimization. Be careful in constructing it since only those parameters present in TYPE: |
score | The score to optimize. Must be from |
n_iter | Number of iterations in the hyperparameter optimization and thus the number of parameter combinations to draw and evaluate. Range: [1, ∞] TYPE: |
seed | Seed used for the random number generator that underlies the sampling procedure to make the calculation reproducible. Due to nature of the underlying algorithm this is only the case if the fit is done without multithreading. To reflect this, a TYPE: |
Example
from getml import data
from getml import datasets
from getml import engine
from getml import feature_learning
from getml.feature_learning import aggregations
from getml.feature_learning import loss_functions
from getml import hyperopt
from getml import pipeline
from getml import predictors
# ----------------
engine.set_project("examples")
# ----------------
population_table, peripheral_table = datasets.make_numerical()
# ----------------
# Construct placeholders
population_placeholder = data.Placeholder("POPULATION")
peripheral_placeholder = data.Placeholder("PERIPHERAL")
population_placeholder.join(peripheral_placeholder, "join_key", "time_stamp")
# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.
fe1 = feature_learning.Multirel(
aggregation=[
aggregations.COUNT,
aggregations.SUM
],
loss_function=loss_functions.SquareLoss,
num_features=10,
share_aggregations=1.0,
max_length=1,
num_threads=0
)
# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.
fe2 = feature_learning.Relboost(
loss_function=loss_functions.SquareLoss,
num_features=10
)
# ----------------
# Base model - any parameters not included
# in param_space will be taken from this.
predictor = predictors.LinearRegression()
# ----------------
pipe = pipeline.Pipeline(
population=population_placeholder,
peripheral=[peripheral_placeholder],
feature_learners=[fe1, fe2],
predictors=[predictor]
)
# ----------------
# Build a hyperparameter space.
# We have two feature learners and one
# predictor, so this is how we must
# construct our hyperparameter space.
# If we only wanted to optimize the predictor,
# we could just leave out the feature_learners.
param_space = {
"feature_learners": [
{
"num_features": [10, 50],
},
{
"max_depth": [1, 10],
"min_num_samples": [100, 500],
"num_features": [10, 50],
"reg_lambda": [0.0, 0.1],
"shrinkage": [0.01, 0.4]
}],
"predictors": [
{
"reg_lambda": [0.0, 10.0]
}
]
}
# ----------------
# Wrap a LatinHypercubeSearch around the reference model
latin_search = hyperopt.LatinHypercubeSearch(
pipeline=pipe,
param_space=param_space,
n_iter=30,
score=pipeline.metrics.rsquared
)
latin_search.fit(
population_table_training=population_table,
population_table_validation=population_table,
peripheral_tables=[peripheral_table]
)
Source code in getml/hyperopt/hyperopt.py
1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 |
|
best_pipeline property
best_pipeline: Pipeline
The best pipeline that is part of the hyperparameter optimization.
This is always based on the validation data you have passed even if you have chosen to score the pipeline on other data afterwards.
RETURNS | DESCRIPTION |
---|---|
Pipeline | The best pipeline. |
id property
id: str
Name of the hyperparameter optimization. This is used to uniquely identify it on the engine.
RETURNS | DESCRIPTION |
---|---|
str | The name of the hyperparameter optimization. |
name property
name: str
Returns the ID of the hyperparameter optimization. The name property is kept for backward compatibility.
RETURNS | DESCRIPTION |
---|---|
str | The name of the hyperparameter optimization. |
score property
score: str
The score to be optimized.
RETURNS | DESCRIPTION |
---|---|
str | The score to be optimized. |
type property
type: str
The algorithm used for the hyperparameter optimization.
RETURNS | DESCRIPTION |
---|---|
str | The algorithm used for the hyperparameter optimization. |
clean_up
clean_up() -> None
Deletes all pipelines associated with hyperparameter optimization, but the best pipeline.
Source code in getml/hyperopt/hyperopt.py
246 247 248 249 250 251 252 253 254 255 256 257 |
|
fit
fit(
container: Union[Container, StarSchema, TimeSeries],
train: str = "train",
validation: str = "validation",
) -> _Hyperopt
Launches the hyperparameter optimization.
PARAMETER | DESCRIPTION |
---|---|
container | The data container used for the hyperparameter tuning. TYPE: |
train | The name of the subset in 'container' used for training. TYPE: |
validation | The name of the subset in 'container' used for validation. TYPE: |
RETURNS | DESCRIPTION |
---|---|
_Hyperopt | The current instance. |
Source code in getml/hyperopt/hyperopt.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
|
refresh
refresh() -> _Hyperopt
Reloads the hyperparameter optimization from the Engine.
RETURNS | DESCRIPTION |
---|---|
_Hyperopt | Current instance |
Source code in getml/hyperopt/hyperopt.py
367 368 369 370 371 372 373 374 375 |
|
validate
validate() -> None
Validate the parameters of the hyperparameter optimization.
Source code in getml/hyperopt/hyperopt.py
1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 |
|