getml.data.DataModel
DataModel(population: Union[Placeholder, str])
Abstract representation of the relationship between tables.
You might also want to refer to Placeholder
.
ATTRIBUTE | DESCRIPTION |
---|---|
population |
The placeholder representing the population table, which defines the statistical population and contains the targets.
|
Example
This example will construct a data model in which the 'population_table' depends on the 'peripheral_table' via the 'join_key' column. In addition, only those rows in 'peripheral_table' for which 'time_stamp' is smaller or equal to the 'time_stamp' in 'population_table' are considered:
dm = getml.data.DataModel(
population_table.to_placeholder("POPULATION")
)
dm.add(peripheral_table.to_placeholder("PERIPHERAL"))
dm.POPULATION.join(
dm.PERIPHERAL,
on="join_key",
time_stamps="time_stamp"
)
to_placeholder
:
dm = getml.data.DataModel(
population_table.to_placeholder("POPULATION")
)
dm.add(
getml.data.to_placeholder(
PERIPHERAL1=peripheral_table_1,
PERIPHERAL2=peripheral_table_2,
)
)
dm.POPULATION.join(
dm.PERIPHERAL,
on="join_key",
time_stamps="time_stamp",
relationship=getml.data.relationship.many_to_one,
)
relationship
.
If the join keys or time stamps are named differently in the two different tables, use a tuple:
dm.POPULATION.join(
dm.PERIPHERAL,
on=("join_key", "other_join_key"),
time_stamps=("time_stamp", "other_time_stamp"),
)
dm.POPULATION.join(
dm.PERIPHERAL,
on=["join_key1", "join_key2", ("join_key3", "other_join_key3")],
time_stamps="time_stamp",
)
dm.POPULATION.join(
dm.PERIPHERAL,
on="join_key",
time_stamps="time_stamp",
memory=getml.data.time.days(7),
)
dm.POPULATION.join(
dm.PERIPHERAL,
on="join_key",
time_stamps="time_stamp",
lagged_targets=True,
horizon=getml.data.time.hours(1),
memory=getml.data.time.days(7),
)
time
.
If the join involves many matches, it might be a good idea to set the
relationship to propositionalization
.
This forces the pipeline to always use a propositionalization
algorithm for this join, which can significantly speed things up.
dm.POPULATION.join(
dm.PERIPHERAL,
on="join_key",
time_stamps="time_stamp",
relationship=getml.data.relationship.propositionalization,
)
Please also refer to relationship
.
In some cases, it is necessary to have more than one placeholder on the same table. This is necessary to create more complicated data models. In this case, you can do something like this:
dm.add(
getml.data.to_placeholder(
PERIPHERAL=[peripheral_table]*2,
)
)
# We can now access our two placeholders like this:
placeholder1 = dm.PERIPHERAL[0]
placeholder2 = dm.PERIPHERAL[1]
Source code in getml/data/data_model.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
names
property
add
add(*placeholders: Placeholder)
Adds peripheral placeholders to the data model.
PARAMETER | DESCRIPTION |
---|---|
placeholders |
The placeholder or placeholders you would like to add.
TYPE:
|
Source code in getml/data/data_model.py
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
|