Skip to content

getml.data.Subset dataclass

Subset(
    container_id: str,
    peripheral: Dict[str, Union[DataFrame, View]],
    population: Union[DataFrame, View],
)

A Subset consists of a population table and one or several peripheral tables.

It is passed by a Container, StarSchema and TimeSeries to the Pipeline.

ATTRIBUTE DESCRIPTION
container_id

The ID of the container the subset belongs to.

TYPE: str

peripheral

A dictionary containing the peripheral tables.

TYPE: Dict[str, Union[DataFrame, View]]

population

The population table.

TYPE: Union[DataFrame, View]

Example
container = getml.data.Container(
    train=population_train,
    test=population_test
)

container.add(
    meta=meta,
    order=order,
    trans=trans
)

# train and test are Subsets.
# They contain population_train
# and population_test respectively,
# as well as their peripheral tables
# meta, order and trans.
my_pipeline.fit(container.train)

my_pipeline.score(container.test)