getml.feature_learning.Multirel
dataclass
Multirel(
aggregation: Iterable[
MultirelAggregations
] = MULTIREL.default,
allow_sets: bool = True,
delta_t: float = 0.0,
grid_factor: float = 1.0,
loss_function: Optional[
Union[CrossEntropyLossType, SquareLossType]
] = None,
max_length: int = 4,
min_df: int = 30,
min_num_samples: int = 1,
num_features: int = 100,
num_subfeatures: int = 5,
num_threads: int = 0,
propositionalization: FastProp = FastProp(),
regularization: float = 0.01,
round_robin: bool = False,
sampling_factor: float = 1.0,
seed: int = 5543,
share_aggregations: float = 0.0,
share_conditions: float = 1.0,
shrinkage: float = 0.0,
silent: bool = True,
vocab_size: int = 500,
)
Bases: _FeatureLearner
Feature learning based on Multi-Relational Decision Tree Learning.
Multirel
automates feature learning
for relational data and time series. It is based on an efficient
variation of the Multi-Relational Decision Tree Learning (MRDTL).
For more information on the underlying feature learning algorithm, check out the User guide: Multirel.
Enterprise edition
This feature is exclusive to the Enterprise edition and is not available in the Community edition. Discover the benefits of the Enterprise edition and compare their features.
For licensing information and technical support, please contact us.
ATTRIBUTE | DESCRIPTION |
---|---|
agg_sets |
It is a class variable holding the available aggregation sets for the
Multirel feature learner.
Value:
TYPE:
|
PARAMETER | DESCRIPTION |
---|---|
aggregation |
Mathematical operations used by the automated feature learning algorithm to create new features. Must be an aggregation supported by Multirel feature learner
(
TYPE:
|
allow_sets |
Multirel can summarize different categories into sets for producing conditions. When expressed as SQL statements these sets might look like this:
sampling_factor is too low.
TYPE:
|
delta_t |
Frequency with which lag variables will be explored in a time series setting. When set to 0.0, there will be no lag variables. For more information please refer to Time Series in the User Guide. Range: [0, ∞]
TYPE:
|
grid_factor |
Multirel will try a grid of critical values for your
numerical features. A higher
TYPE:
|
loss_function |
Objective function used by the feature learning algorithm
to optimize your features. For regression problems use
TYPE:
|
max_length |
The maximum length a subcondition might have. Multirel will create conditions in the form
TYPE:
|
min_df |
Only relevant for columns with role
TYPE:
|
min_num_samples |
Determines the minimum number of samples a subcondition should apply to in order for it to be considered. Higher values lead to less complex statements and less danger of overfitting. Range: [1, ∞]
TYPE:
|
num_features |
Number of features generated by the feature learning algorithm. Range: [1, ∞]
TYPE:
|
num_subfeatures |
The number of subfeatures you would like to extract in a subensemble (for snowflake data model only). See Snowflake Schema for more information. Range: [1, ∞]
TYPE:
|
num_threads |
Number of threads used by the feature learning algorithm. If set to zero or a negative value, the number of threads will be determined automatically by the getML Engine. Range: [0, ∞]
TYPE:
|
propositionalization |
The feature learner used for joins which are flagged to be
propositionalized (by setting a join's |
regularization |
Most important regularization parameter for the quality of
the features produced by Multirel. Higher values will lead
to less complex features and less danger of overfitting. A
TYPE:
|
round_robin |
If True, the Multirel picks a different
TYPE:
|
sampling_factor |
Multirel uses a bootstrapping procedure (sampling with replacement) to train each of the features. The sampling factor is proportional to the share of the samples randomly drawn from the population table every time Multirel generates a new feature. A lower sampling factor (but still greater than 0.0), will lead to less danger of overfitting, less complex statements and faster training. When set to 1.0, roughly 20,000 samples are drawn from the population table. If the population table contains less than 20,000 samples, it will use standard bagging. When set to 0.0, there will be no sampling at all. Range: [0, ∞]
TYPE:
|
seed |
Seed used for the random number generator that underlies
the sampling procedure to make the calculation
reproducible. Internally, a
TYPE:
|
share_aggregations |
Every time a new feature is generated, the
TYPE:
|
share_conditions |
Every time a new column is tested for applying conditions, it might be skipped at random. This parameter determines the probability that a column will not be skipped. Range: [0, 1]
TYPE:
|
shrinkage |
Since Multirel works using a gradient-boosting-like
algorithm,
TYPE:
|
silent |
Controls the logging during training.
TYPE:
|
vocab_size |
Determines the maximum number
of words that are extracted in total from
TYPE:
|
validate
Checks both the types and the values of all instance variables and raises an exception if something is off.
PARAMETER | DESCRIPTION |
---|---|
params |
A dictionary containing the parameters to validate. If not is passed, the own parameters will be validated. |
Source code in getml/feature_learning/multirel.py
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|