getml.pipeline.Tables
This container holds a pipeline's tables. These tables are build from the columns for which importances can be calculated. The motivation behind this container is to determine which tables are more important than others.
Tables can be accessed by name, index or with a NumPy array. The container supports slicing and can be sorted and filtered. Further, the container holds global methods to request tables' importances.
PARAMETER | DESCRIPTION |
---|---|
targets |
The targets associated with the pipeline. |
columns |
The columns with which the tables are built.
TYPE:
|
data |
A list of |
Note
The container is an iterable. So, in addition to
filter
you can also use python list
comprehensions for filtering.
Example
all_my_tables = my_pipeline.tables
first_table = my_pipeline.tables[0]
all_but_last_10_tables = my_pipeline.tables[:-10]
important_tables = [table for table in my_pipeline.tables if table.importance > 0.1]
names, importances = my_pipeline.tables.importances()
Source code in getml/pipeline/tables.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
names
property
targets
property
filter
Filters the tables container.
PARAMETER | DESCRIPTION |
---|---|
conditional |
A callable that evaluates to a boolean for a given item. |
RETURNS | DESCRIPTION |
---|---|
Tables
|
A container of filtered tables. |
Example
important_tables = my_pipeline.table.filter(lambda table: table.importance > 0.1)
peripheral_tables = my_pipeline.tables.filter(lambda table: table.marker == "[PERIPHERAL]")
Source code in getml/pipeline/tables.py
200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
importances
Returns the importances of tables.
Table importances are calculated by summing up the importances of the columns belonging to the tables. Each column is assigned an importance value that measures its contribution to the predictive performance. For each target, the importances add up to 1.
PARAMETER | DESCRIPTION |
---|---|
target_num |
Indicates for which target you want to view the importances. (Pipelines can have more than one target.)
TYPE:
|
sort |
Whether you want the results to be sorted.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
NDArray[str_]
|
The first array contains the names of the tables. |
NDArray[float_]
|
The second array contains their importances. By definition, all importances add up to 1. |
Source code in getml/pipeline/tables.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
sort
sort(
by: Optional[str] = None,
key: Optional[Callable[[Table], Any]] = None,
descending: Optional[bool] = None,
) -> Tables
Sorts the Tables container. If no arguments are provided the container is sorted by target and name.
PARAMETER | DESCRIPTION |
---|---|
by |
The name of field to sort by. Possible fields: - name(s) - importances(s) |
key |
A callable that evaluates to a sort key for a given item. |
descending |
Whether to sort in descending order. |
RETURNS | DESCRIPTION |
---|---|
Tables
|
A container of sorted tables. |
Example
by_importance = my_pipeline.tables.sort(key=lambda table: table.importance)
Source code in getml/pipeline/tables.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 |
|
to_pandas
to_pandas() -> DataFrame
Returns all information related to the tables in a pandas DataFrame.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A pandas DataFrame containing the tables' names, importances, targets and markers. |
Source code in getml/pipeline/tables.py
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|