Propositionalization: Robot sensor data¶
In this notebook, we compare getML's FastProp against well-known feature engineering libraries featuretools and tsfresh.
Summary:
- Prediction type: Regression
- Domain: Robotics
- Prediction target: The force vector on the robot's arm
- Population size: 15001
Background¶
A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called propositionalization.
getML's FastProp is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries featuretools and tsfresh. Both of these libraries use propositionalization approaches for feature engineering.
The data set has been generously provided by Erik Berger who originally collected it for his dissertation:
Berger, E. (2018). Behavior-Specific Proprioception Models for Robotic Force Estimation: A Machine Learning Approach. Freiberg, Germany: Technische Universitaet Bergakademie Freiberg.
Analysis¶
We begin by importing the libraries and setting the project.
%pip install -q "getml==1.5.0" "featuretools==1.31.0" "tsfresh==0.20.3"
Note: you may need to restart the kernel to use updated packages.
import os
import sys
os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"
from pathlib import Path
import numpy as np
import pandas as pd
import getml
print(f"getML API version: {getml.__version__}\n")
getML API version: 1.5.0
getml.engine.launch(allow_remote_ips=True, token="token")
getml.engine.set_project("robot")
getML Engine is already running.
Connected to project 'robot'.
# If we are in Colab, we need to fetch the utils folder from the repository
if os.getenv("COLAB_RELEASE_TAG"):
!curl -L https://api.github.com/repos/getml/getml-demo/tarball/master | tar --wildcards --strip-components=1 -xz '*utils*'
parent = Path(os.getcwd()).parent.as_posix()
if parent not in sys.path:
sys.path.append(parent)
from utils import Benchmark, FTTimeSeriesBuilder, TSFreshBuilder
1. Loading data¶
1.1 Download from source¶
data_all = getml.data.DataFrame.from_csv(
"https://static.getml.com/datasets/robotarm/robot-demo.csv", "data_all"
)
Downloading https://static.getml.com/datasets/robotarm/robot-demo.csv to /tmp/getml/static.getml.com/datasets/robotarm/robot-demo.csv...
Downloading robot-demo.csv... ━━━━━━━━━━━━━━━━━━━━ 100% • 14.7/14.7 MB • 00:00
data_all
name | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | f_x | f_y | f_z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float |
0 | 3.4098 | -0.3274 | 0.9604 | -3.7436 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | -1.2042 | 0.02167 | 0 | 3.4098 | -0.3274 | 0.9605 | -3.7437 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1233 | -6.5483 | -2.8045 | -0.8296 | 0.07625 | -0.1906 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1983 | 0.7699 | 0.41 | 0.08279 | -1.4094 | 0.786 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -22.654 | -11.503 | -18.673 | -3.5155 | 5.8354 | -2.05 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.009 | 0.9668 | 47.834 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | -11.03 | 6.9 | -7.33 |
1 | 3.4098 | -0.3274 | 0.9604 | -3.7436 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | -1.2042 | 0.02167 | 0 | 3.4098 | -0.3274 | 0.9604 | -3.7437 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1188 | -6.5506 | -2.8404 | -0.8281 | 0.06405 | -0.1998 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1983 | 0.7699 | 0.41 | 0.0828 | -1.4094 | 0.7859 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -21.627 | -11.046 | -18.66 | -3.5395 | 5.7577 | -1.9805 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.009 | 48.009 | 0.8594 | 47.834 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | -10.848 | 6.7218 | -7.4427 |
2 | 3.4098 | -0.3274 | 0.9604 | -3.7436 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | -1.2042 | 0.02167 | 0 | 3.4098 | -0.3274 | 0.9605 | -3.7437 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1099 | -6.5438 | -2.8 | -0.8205 | 0.07473 | -0.183 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1922 | 0.7699 | 0.41 | 0.08279 | -1.4094 | 0.7859 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -23.843 | -12.127 | -18.393 | -3.6453 | 5.978 | -1.9978 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.009 | 48.069 | 0.931 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | -10.666 | 6.5436 | -7.5555 |
3 | 3.4098 | -0.3274 | 0.9604 | -3.7436 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | -1.2042 | 0.02167 | 0 | 3.4098 | -0.3273 | 0.9604 | -3.7437 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1233 | -6.5483 | -2.8224 | -0.8266 | 0.07168 | -0.1998 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1967 | 0.7699 | 0.41 | 0.08275 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | -21.772 | -10.872 | -18.691 | -3.5512 | 5.6648 | -1.9976 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.069 | 0.931 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | -10.507 | 6.4533 | -7.65 |
4 | 3.4098 | -0.3274 | 0.9604 | -3.7436 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | -1.2042 | 0.02167 | 0 | 3.4098 | -0.3274 | 0.9604 | -3.7437 | -1.0191 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1255 | -6.5394 | -2.8 | -0.8327 | 0.07473 | -0.1952 | 0.1211 | -6.5483 | -2.8157 | -0.8327 | 0.07015 | -0.1922 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | -22.823 | -11.645 | -18.524 | -3.5305 | 5.8712 | -2.0096 | 0.7699 | 0.41 | 0.08278 | -1.4094 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.069 | 0.8952 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | -10.413 | 6.6267 | -7.69 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
14996 | 3.0837 | -0.8836 | 1.4501 | -2.2102 | -1.559 | -5.3265 | -0.03151 | -0.05375 | 0.04732 | 0.1482 | -0.05218 | 0.06706 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3694 | -4.1879 | -1.1847 | -0.09441 | -0.1568 | 0.1898 | 1.1605 | -42.951 | -19.023 | -2.6343 | 0.1551 | -0.1338 | 3.0836 | -0.8836 | 1.4503 | -2.2101 | -1.5591 | -5.3263 | -0.03347 | -0.05585 | 0.04805 | 0.151 | -0.05513 | 0.07114 | -0.3564 | -6.0394 | -2.3001 | -0.2181 | -0.1159 | 0.09608 | -0.3632 | -6.0394 | -2.3023 | -0.212 | -0.125 | 0.1113 | 0.7116 | 0.06957 | 0.06036 | -0.8506 | 2.9515 | -0.03352 | -0.03558 | -0.03029 | 0.002444 | -0.04208 | 0.1458 | -0.1098 | -0.8784 | -0.07291 | -37.584 | 0.0001132 | -2.1031 | 0.03318 | 0.7117 | 0.0697 | 0.06044 | -0.8511 | 2.951 | -0.03356 | -0.03508 | -0.02849 | 0.001571 | -0.03951 | 0.1442 | -0.1036 | 48.069 | 48.009 | 0.8952 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | 10.84 | -1.41 | 16.14 |
14997 | 3.0835 | -0.884 | 1.4505 | -2.2091 | -1.5594 | -5.326 | -0.02913 | -0.0497 | 0.04376 | 0.137 | -0.04825 | 0.062 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3677 | -4.1837 | -1.1874 | -0.09682 | -0.1562 | 0.189 | 1.1592 | -42.937 | -19.023 | -2.6331 | 0.1545 | -0.1338 | 3.0833 | -0.8841 | 1.4507 | -2.209 | -1.5596 | -5.3258 | -0.02909 | -0.04989 | 0.04198 | 0.1481 | -0.05465 | 0.06249 | -0.3161 | -6.1179 | -2.253 | -0.3752 | -0.03965 | 0.08693 | -0.3273 | -6.1022 | -2.2597 | -0.366 | -0.05033 | 0.0915 | 0.7114 | 0.06932 | 0.06039 | -0.8497 | 2.953 | -0.03359 | -0.0335 | -0.02723 | 0.001208 | -0.04242 | 0.1428 | -0.0967 | -2.7137 | 0.8552 | -38.514 | -0.6088 | -3.2383 | -0.9666 | 0.7114 | 0.06948 | 0.06045 | -0.8503 | 2.9525 | -0.03359 | -0.03246 | -0.02633 | 0.001469 | -0.03657 | 0.1333 | -0.09571 | 48.009 | 48.009 | 0.8594 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | 10.857 | -1.52 | 15.943 |
14998 | 3.0833 | -0.8844 | 1.4508 | -2.208 | -1.5598 | -5.3256 | -0.02676 | -0.04565 | 0.04019 | 0.1258 | -0.04431 | 0.05695 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3659 | -4.1797 | -1.1901 | -0.09922 | -0.1555 | 0.1881 | 1.1579 | -42.924 | -19.023 | -2.6321 | 0.154 | -0.1338 | 3.0831 | -0.8844 | 1.451 | -2.2078 | -1.56 | -5.3253 | -0.02776 | -0.04382 | 0.03652 | 0.1295 | -0.05064 | 0.04818 | -0.343 | -6.2569 | -2.1566 | -0.3035 | 0.00305 | 0.1434 | -0.3385 | -6.2322 | -2.1589 | -0.302 | -0.00915 | 0.1571 | 0.7111 | 0.06912 | 0.06039 | -0.849 | 2.9544 | -0.0337 | -0.02911 | -0.02589 | 0.001292 | -0.04046 | 0.1246 | -0.08058 | 4.2749 | 1.0128 | -36.412 | -1.2811 | -0.4296 | -1.1013 | 0.7112 | 0.06928 | 0.06046 | -0.8495 | 2.9538 | -0.03362 | -0.02984 | -0.02417 | 0.001364 | -0.03362 | 0.1224 | -0.08786 | 48.009 | 48.009 | 0.931 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | 10.89 | -1.74 | 15.55 |
14999 | 3.0831 | -0.8847 | 1.4511 | -2.2071 | -1.5602 | -5.3251 | -0.02438 | -0.0416 | 0.03662 | 0.1147 | -0.04038 | 0.0519 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3642 | -4.1758 | -1.1928 | -0.1016 | -0.1548 | 0.1873 | 1.1568 | -42.912 | -19.023 | -2.6311 | 0.1535 | -0.1338 | 3.0829 | -0.8848 | 1.4513 | -2.2068 | -1.5604 | -5.3249 | -0.02149 | -0.04059 | 0.03417 | 0.1202 | -0.0395 | 0.04178 | -0.4237 | -6.2703 | -2.0939 | -0.302 | -0.01372 | 0.1739 | -0.4125 | -6.2569 | -2.0916 | -0.2943 | -0.02898 | 0.1891 | 0.7109 | 0.06894 | 0.06039 | -0.8484 | 2.9557 | -0.03384 | -0.02738 | -0.01982 | 0.001031 | -0.03028 | 0.1157 | -0.06702 | 11.518 | 1.5002 | -39.314 | -1.8671 | -0.3734 | -0.5733 | 0.7109 | 0.06909 | 0.06047 | -0.8488 | 2.955 | -0.03364 | -0.02721 | -0.02201 | 0.001255 | -0.03067 | 0.1115 | -0.08003 | 48.009 | 48.009 | 0.931 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | 11.29 | -1.4601 | 15.743 |
15000 | 3.0829 | -0.885 | 1.4514 | -2.2062 | -1.5605 | -5.3247 | -0.02201 | -0.03755 | 0.03305 | 0.1035 | -0.03645 | 0.04684 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3624 | -4.172 | -1.1955 | -0.1041 | -0.1542 | 0.1864 | 1.1558 | -42.901 | -19.023 | -2.6302 | 0.1531 | -0.1338 | 3.0827 | -0.8851 | 1.4516 | -2.2059 | -1.5607 | -5.3246 | -0.02096 | -0.03808 | 0.02958 | 0.1171 | -0.03289 | 0.03883 | -0.417 | -6.2434 | -2.058 | -0.4102 | -0.04728 | 0.1967 | -0.4237 | -6.2367 | -2.0714 | -0.4163 | -0.0671 | 0.2059 | 0.7107 | 0.06878 | 0.06041 | -0.8478 | 2.9567 | -0.03382 | -0.02535 | -0.01854 | 0.001614 | -0.02421 | 0.11 | -0.06304 | 15.099 | 2.936 | -39.068 | -1.9402 | 0.139 | -0.2674 | 0.7107 | 0.06893 | 0.06048 | -0.8482 | 2.9561 | -0.03367 | -0.02458 | -0.01986 | 0.001142 | -0.0277 | 0.1007 | -0.07221 | 48.009 | 48.069 | 0.8952 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.955 | 11.69 | -1.1801 | 15.937 |
15001 rows x 96 columns
memory usage: 11.52 MB
name: data_all
type: getml.DataFrame
1.2 Prepare data¶
The force vector consists of three component (f_x, f_y and f_z), meaning that we have three targets. For this comparison, we only predict the first component (f_x).
Also, we want to speed things up a little, so we only use 10 columns. A previous analysis has revealed that the predictive power is mainly extracted from these 10 columns:
only_use = ["30", "34", "37", "38", "4", "59", "61", "7", "77", "78"]
data_all.set_role(["f_x"], getml.data.roles.target)
data_all.set_role(only_use, getml.data.roles.numerical)
This is what the data set looks like:
data_all
name | f_x | 30 | 34 | 37 | 38 | 4 | 59 | 61 | 7 | 77 | 78 | 3 | 5 | 6 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 31 | 32 | 33 | 35 | 36 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 60 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | f_y | f_z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | target | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float |
0 | -11.03 | -1.2042 | -0.3274 | -1.0191 | -6.0205 | -0.3274 | 0.08279 | 0.786 | -1.0191 | 0.08278 | -1.4094 | 3.4098 | 0.9604 | -3.7436 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | 0.02167 | 0 | 3.4098 | 0.9605 | -3.7437 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1233 | -6.5483 | -2.8045 | -0.8296 | 0.07625 | -0.1906 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1983 | 0.7699 | 0.41 | -1.4094 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -22.654 | -11.503 | -18.673 | -3.5155 | 5.8354 | -2.05 | 0.7699 | 0.41 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.009 | 0.9668 | 47.834 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | 6.9 | -7.33 |
1 | -10.848 | -1.2042 | -0.3274 | -1.0191 | -6.0205 | -0.3274 | 0.0828 | 0.7859 | -1.0191 | 0.08278 | -1.4094 | 3.4098 | 0.9604 | -3.7436 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | 0.02167 | 0 | 3.4098 | 0.9604 | -3.7437 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1188 | -6.5506 | -2.8404 | -0.8281 | 0.06405 | -0.1998 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1983 | 0.7699 | 0.41 | -1.4094 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -21.627 | -11.046 | -18.66 | -3.5395 | 5.7577 | -1.9805 | 0.7699 | 0.41 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.009 | 48.009 | 0.8594 | 47.834 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | 6.7218 | -7.4427 |
2 | -10.666 | -1.2042 | -0.3274 | -1.0191 | -6.0205 | -0.3274 | 0.08279 | 0.7859 | -1.0191 | 0.08278 | -1.4094 | 3.4098 | 0.9604 | -3.7436 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | 0.02167 | 0 | 3.4098 | 0.9605 | -3.7437 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1099 | -6.5438 | -2.8 | -0.8205 | 0.07473 | -0.183 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1922 | 0.7699 | 0.41 | -1.4094 | -0.3682 | 0 | 0 | 0 | 0 | 0 | 0 | -23.843 | -12.127 | -18.393 | -3.6453 | 5.978 | -1.9978 | 0.7699 | 0.41 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.009 | 48.069 | 0.931 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | 6.5436 | -7.5555 |
3 | -10.507 | -1.2042 | -0.3273 | -1.0191 | -6.0205 | -0.3274 | 0.08275 | 0.786 | -1.0191 | 0.08278 | -1.4094 | 3.4098 | 0.9604 | -3.7436 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | 0.02167 | 0 | 3.4098 | 0.9604 | -3.7437 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1233 | -6.5483 | -2.8224 | -0.8266 | 0.07168 | -0.1998 | 0.1211 | -6.5483 | -2.8157 | -0.8281 | 0.07015 | -0.1967 | 0.7699 | 0.41 | -1.4094 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | -21.772 | -10.872 | -18.691 | -3.5512 | 5.6648 | -1.9976 | 0.7699 | 0.41 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.069 | 0.931 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | 6.4533 | -7.65 |
4 | -10.413 | -1.2042 | -0.3274 | -1.0191 | -6.0205 | -0.3274 | 0.08278 | 0.786 | -1.0191 | 0.08278 | -1.4094 | 3.4098 | 0.9604 | -3.7436 | -6.0205 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8.38e-17 | -4.8116 | -1.4033 | -0.1369 | 0.002472 | 0 | 9.803e-16 | -55.642 | -16.312 | 0.02167 | 0 | 3.4098 | 0.9604 | -3.7437 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1255 | -6.5394 | -2.8 | -0.8327 | 0.07473 | -0.1952 | 0.1211 | -6.5483 | -2.8157 | -0.8327 | 0.07015 | -0.1922 | 0.7699 | 0.41 | -1.4094 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | -22.823 | -11.645 | -18.524 | -3.5305 | 5.8712 | -2.0096 | 0.7699 | 0.41 | 0.786 | -0.3681 | 0 | 0 | 0 | 0 | 0 | 0 | 48.069 | 48.069 | 0.8952 | 47.879 | 47.925 | 47.818 | 47.834 | 47.955 | 47.971 | 6.6267 | -7.69 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
14996 | 10.84 | -2.6343 | -0.8836 | -1.5591 | -5.3263 | -0.8836 | 0.06036 | 2.9515 | -1.559 | 0.06044 | -0.8511 | 3.0837 | 1.4501 | -2.2102 | -5.3265 | -0.03151 | -0.05375 | 0.04732 | 0.1482 | -0.05218 | 0.06706 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3694 | -4.1879 | -1.1847 | -0.09441 | -0.1568 | 0.1898 | 1.1605 | -42.951 | -19.023 | 0.1551 | -0.1338 | 3.0836 | 1.4503 | -2.2101 | -0.03347 | -0.05585 | 0.04805 | 0.151 | -0.05513 | 0.07114 | -0.3564 | -6.0394 | -2.3001 | -0.2181 | -0.1159 | 0.09608 | -0.3632 | -6.0394 | -2.3023 | -0.212 | -0.125 | 0.1113 | 0.7116 | 0.06957 | -0.8506 | -0.03352 | -0.03558 | -0.03029 | 0.002444 | -0.04208 | 0.1458 | -0.1098 | -0.8784 | -0.07291 | -37.584 | 0.0001132 | -2.1031 | 0.03318 | 0.7117 | 0.0697 | 2.951 | -0.03356 | -0.03508 | -0.02849 | 0.001571 | -0.03951 | 0.1442 | -0.1036 | 48.069 | 48.009 | 0.8952 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | -1.41 | 16.14 |
14997 | 10.857 | -2.6331 | -0.8841 | -1.5596 | -5.3258 | -0.884 | 0.06039 | 2.953 | -1.5594 | 0.06045 | -0.8503 | 3.0835 | 1.4505 | -2.2091 | -5.326 | -0.02913 | -0.0497 | 0.04376 | 0.137 | -0.04825 | 0.062 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3677 | -4.1837 | -1.1874 | -0.09682 | -0.1562 | 0.189 | 1.1592 | -42.937 | -19.023 | 0.1545 | -0.1338 | 3.0833 | 1.4507 | -2.209 | -0.02909 | -0.04989 | 0.04198 | 0.1481 | -0.05465 | 0.06249 | -0.3161 | -6.1179 | -2.253 | -0.3752 | -0.03965 | 0.08693 | -0.3273 | -6.1022 | -2.2597 | -0.366 | -0.05033 | 0.0915 | 0.7114 | 0.06932 | -0.8497 | -0.03359 | -0.0335 | -0.02723 | 0.001208 | -0.04242 | 0.1428 | -0.0967 | -2.7137 | 0.8552 | -38.514 | -0.6088 | -3.2383 | -0.9666 | 0.7114 | 0.06948 | 2.9525 | -0.03359 | -0.03246 | -0.02633 | 0.001469 | -0.03657 | 0.1333 | -0.09571 | 48.009 | 48.009 | 0.8594 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | -1.52 | 15.943 |
14998 | 10.89 | -2.6321 | -0.8844 | -1.56 | -5.3253 | -0.8844 | 0.06039 | 2.9544 | -1.5598 | 0.06046 | -0.8495 | 3.0833 | 1.4508 | -2.208 | -5.3256 | -0.02676 | -0.04565 | 0.04019 | 0.1258 | -0.04431 | 0.05695 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3659 | -4.1797 | -1.1901 | -0.09922 | -0.1555 | 0.1881 | 1.1579 | -42.924 | -19.023 | 0.154 | -0.1338 | 3.0831 | 1.451 | -2.2078 | -0.02776 | -0.04382 | 0.03652 | 0.1295 | -0.05064 | 0.04818 | -0.343 | -6.2569 | -2.1566 | -0.3035 | 0.00305 | 0.1434 | -0.3385 | -6.2322 | -2.1589 | -0.302 | -0.00915 | 0.1571 | 0.7111 | 0.06912 | -0.849 | -0.0337 | -0.02911 | -0.02589 | 0.001292 | -0.04046 | 0.1246 | -0.08058 | 4.2749 | 1.0128 | -36.412 | -1.2811 | -0.4296 | -1.1013 | 0.7112 | 0.06928 | 2.9538 | -0.03362 | -0.02984 | -0.02417 | 0.001364 | -0.03362 | 0.1224 | -0.08786 | 48.009 | 48.009 | 0.931 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | -1.74 | 15.55 |
14999 | 11.29 | -2.6311 | -0.8848 | -1.5604 | -5.3249 | -0.8847 | 0.06039 | 2.9557 | -1.5602 | 0.06047 | -0.8488 | 3.0831 | 1.4511 | -2.2071 | -5.3251 | -0.02438 | -0.0416 | 0.03662 | 0.1147 | -0.04038 | 0.0519 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3642 | -4.1758 | -1.1928 | -0.1016 | -0.1548 | 0.1873 | 1.1568 | -42.912 | -19.023 | 0.1535 | -0.1338 | 3.0829 | 1.4513 | -2.2068 | -0.02149 | -0.04059 | 0.03417 | 0.1202 | -0.0395 | 0.04178 | -0.4237 | -6.2703 | -2.0939 | -0.302 | -0.01372 | 0.1739 | -0.4125 | -6.2569 | -2.0916 | -0.2943 | -0.02898 | 0.1891 | 0.7109 | 0.06894 | -0.8484 | -0.03384 | -0.02738 | -0.01982 | 0.001031 | -0.03028 | 0.1157 | -0.06702 | 11.518 | 1.5002 | -39.314 | -1.8671 | -0.3734 | -0.5733 | 0.7109 | 0.06909 | 2.955 | -0.03364 | -0.02721 | -0.02201 | 0.001255 | -0.03067 | 0.1115 | -0.08003 | 48.009 | 48.009 | 0.931 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.94 | -1.4601 | 15.743 |
15000 | 11.69 | -2.6302 | -0.8851 | -1.5607 | -5.3246 | -0.885 | 0.06041 | 2.9567 | -1.5605 | 0.06048 | -0.8482 | 3.0829 | 1.4514 | -2.2062 | -5.3247 | -0.02201 | -0.03755 | 0.03305 | 0.1035 | -0.03645 | 0.04684 | 0.2969 | 0.5065 | -0.4459 | -1.3963 | 0.4916 | -0.6319 | -0.3624 | -4.172 | -1.1955 | -0.1041 | -0.1542 | 0.1864 | 1.1558 | -42.901 | -19.023 | 0.1531 | -0.1338 | 3.0827 | 1.4516 | -2.2059 | -0.02096 | -0.03808 | 0.02958 | 0.1171 | -0.03289 | 0.03883 | -0.417 | -6.2434 | -2.058 | -0.4102 | -0.04728 | 0.1967 | -0.4237 | -6.2367 | -2.0714 | -0.4163 | -0.0671 | 0.2059 | 0.7107 | 0.06878 | -0.8478 | -0.03382 | -0.02535 | -0.01854 | 0.001614 | -0.02421 | 0.11 | -0.06304 | 15.099 | 2.936 | -39.068 | -1.9402 | 0.139 | -0.2674 | 0.7107 | 0.06893 | 2.9561 | -0.03367 | -0.02458 | -0.01986 | 0.001142 | -0.0277 | 0.1007 | -0.07221 | 48.009 | 48.069 | 0.8952 | 47.818 | 47.834 | 47.818 | 47.803 | 47.94 | 47.955 | -1.1801 | 15.937 |
15001 rows x 96 columns
memory usage: 11.52 MB
name: data_all
type: getml.DataFrame
1.3 Separate data into a training and testing set¶
We also want to separate the data set into a training and testing set. We do so by using the first 10,500 measurements for training and then using the remainder for testing.
split = getml.data.split.time(data_all, "rowid", test=10500)
split
0 | train |
---|---|
1 | train |
2 | train |
3 | train |
4 | train |
... |
unknown number of rows
type: StringColumnView
time_series = getml.data.TimeSeries(
population=data_all,
split=split,
time_stamps="rowid",
lagged_targets=False,
memory=30,
)
time_series
data frames | staging table | |
---|---|---|
0 | population | POPULATION__STAGING_TABLE_1 |
1 | data_all | DATA_ALL__STAGING_TABLE_2 |
subset | name | rows | type | |
---|---|---|---|---|
0 | test | data_all | unknown | View |
1 | train | data_all | unknown | View |
name | rows | type | |
---|---|---|---|
0 | data_all | 15001 | View |
fast_prop = getml.feature_learning.FastProp(
loss_function=getml.feature_learning.loss_functions.SquareLoss,
)
pipe_fp_fl = getml.pipeline.Pipeline(
data_model=time_series.data_model,
feature_learners=[fast_prop],
tags=["feature learning", "fastprop"],
)
pipe_fp_fl.check(time_series.train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
OK.
benchmark = Benchmark()
with benchmark("fastprop"):
pipe_fp_fl.fit(time_series.train)
fastprop_train = pipe_fp_fl.transform(time_series.train, df_name="fastprop_train")
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
OK.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 FastProp: Trying 134 features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
Trained pipeline.
Time taken: 0:00:00.032192. Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 FastProp: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
fastprop_test = pipe_fp_fl.transform(time_series.test, df_name="fastprop_test")
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 FastProp: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
predictor = getml.predictors.XGBoostRegressor()
pipe_fp_pr = getml.pipeline.Pipeline(
tags=["prediction", "fastprop"], predictors=[predictor]
)
pipe_fp_pr.check(fastprop_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 0 issues labeled INFO and 5 issues labeled WARNING.
type | label | message | |
---|---|---|---|
0 | WARNING | COLUMN SHOULD BE UNUSED | All non-NULL entries in column 'feature_1_121' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only'). |
1 | WARNING | COLUMN SHOULD BE UNUSED | All non-NULL entries in column 'feature_1_126' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only'). |
2 | WARNING | COLUMN SHOULD BE UNUSED | All non-NULL entries in column 'feature_1_127' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only'). |
3 | WARNING | COLUMN SHOULD BE UNUSED | All non-NULL entries in column 'feature_1_128' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only'). |
4 | WARNING | COLUMN SHOULD BE UNUSED | All non-NULL entries in column 'feature_1_132' in POPULATION__STAGING_TABLE_1 are equal to each other. You should consider setting its role to unused_float or using it for comparison only (you can do the latter by setting a unit that contains 'comparison only'). |
pipe_fp_pr.fit(fastprop_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 0 issues labeled INFO and 5 issues labeled WARNING.
To see the issues in full, run .check() on the pipeline.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:04
Trained pipeline.
Time taken: 0:00:04.112529.
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=False, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostRegressor'], preprocessors=[], share_selected_features=0.5, tags=['prediction', 'fastprop'])
pipe_fp_pr.score(fastprop_test)
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
date time | set used | target | mae | rmse | rsquared | |
---|---|---|---|---|---|---|
0 | 2024-09-13 14:16:39 | fastprop_train | f_x | 0.4383 | 0.5764 | 0.9963 |
1 | 2024-09-13 14:16:39 | fastprop_test | f_x | 0.5516 | 0.7237 | 0.9951 |
2.2 Propositionalization with featuretools¶
data_train = time_series.train.population.to_df("data_train")
data_test = time_series.test.population.to_df("data_test")
dfs_pandas = {}
for df in [data_train, data_test, data_all]:
dfs_pandas[df.name] = df.to_pandas()
delete_columns = [
col for col in dfs_pandas[df.name].columns if col not in only_use + ["f_x"]
]
for col in delete_columns:
del dfs_pandas[df.name][col]
dfs_pandas[df.name]["id"] = 1
dfs_pandas[df.name]["ds"] = pd.to_datetime(
np.arange(0, dfs_pandas[df.name].shape[0]), unit="s"
)
dfs_pandas["data_train"]
30 | 34 | 37 | 38 | 4 | 59 | 61 | 7 | 77 | 78 | f_x | id | ds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -1.2042 | -0.32739 | -1.0191 | -6.0205 | -0.32737 | 0.082791 | 0.78597 | -1.0191 | 0.082782 | -1.4094 | -11.0300 | 1 | 1970-01-01 00:00:00 |
1 | -1.2042 | -0.32739 | -1.0191 | -6.0205 | -0.32737 | 0.082800 | 0.78592 | -1.0191 | 0.082782 | -1.4094 | -10.8480 | 1 | 1970-01-01 00:00:01 |
2 | -1.2042 | -0.32737 | -1.0191 | -6.0205 | -0.32737 | 0.082786 | 0.78594 | -1.0191 | 0.082782 | -1.4094 | -10.6660 | 1 | 1970-01-01 00:00:02 |
3 | -1.2042 | -0.32734 | -1.0191 | -6.0205 | -0.32737 | 0.082755 | 0.78599 | -1.0191 | 0.082782 | -1.4094 | -10.5070 | 1 | 1970-01-01 00:00:03 |
4 | -1.2042 | -0.32736 | -1.0191 | -6.0205 | -0.32737 | 0.082782 | 0.78597 | -1.0191 | 0.082782 | -1.4094 | -10.4130 | 1 | 1970-01-01 00:00:04 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10495 | -1.1446 | -0.37311 | -1.0486 | -5.9532 | -0.37326 | 0.087343 | 0.90793 | -1.0488 | 0.087468 | -1.4162 | -9.7673 | 1 | 1970-01-01 02:54:55 |
10496 | -1.1349 | -0.37103 | -1.0472 | -5.9564 | -0.37108 | 0.087241 | 0.90199 | -1.0474 | 0.087274 | -1.4160 | -9.9200 | 1 | 1970-01-01 02:54:56 |
10497 | -1.1255 | -0.36889 | -1.0458 | -5.9596 | -0.36896 | 0.087055 | 0.89618 | -1.0460 | 0.087082 | -1.4158 | -9.7743 | 1 | 1970-01-01 02:54:57 |
10498 | -1.1163 | -0.36680 | -1.0444 | -5.9627 | -0.36689 | 0.086907 | 0.89034 | -1.0447 | 0.086893 | -1.4155 | -8.6109 | 1 | 1970-01-01 02:54:58 |
10499 | -1.1072 | -0.36477 | -1.0430 | -5.9657 | -0.36487 | 0.086720 | 0.88476 | -1.0434 | 0.086706 | -1.4153 | -8.4345 | 1 | 1970-01-01 02:54:59 |
10500 rows × 13 columns
ft_builder = FTTimeSeriesBuilder(
num_features=200,
horizon=pd.Timedelta(seconds=0),
memory=pd.Timedelta(seconds=15),
column_id="id",
time_stamp="ds",
target="f_x",
)
with benchmark("featuretools"):
featuretools_train = ft_builder.fit(dfs_pandas["data_train"])
featuretools_test = ft_builder.transform(dfs_pandas["data_test"])
featuretools: Trying features... Selecting the best out of 442 features... Time taken: 0h:14m:21.609569
featuretools_train
MIN(peripheral.37) | MIN(peripheral.7) | FIRST(peripheral.37) | FIRST(peripheral.7) | MEDIAN(peripheral.37) | MEDIAN(peripheral.7) | MEAN(peripheral.37) | MEAN(peripheral.7) | SUM(peripheral.37) | SUM(peripheral.7) | ... | NUM_ZERO_CROSSINGS(peripheral.77) | NUM_ZERO_CROSSINGS(peripheral.59) | IS_MONOTONICALLY_DECREASING(peripheral.77) | TIME_SINCE_LAST_MAX(peripheral.ds, 4) | TIME_SINCE_LAST_MAX(peripheral.ds, 30) | TIME_SINCE_LAST_MAX(peripheral.ds, 77) | TIME_SINCE_LAST_MAX(peripheral.ds, 7) | f_x | id | ds | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_featuretools_index | |||||||||||||||||||||
0 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.019100 | -1.019100 | -1.0191 | -1.0191 | ... | 0 | 0 | True | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | -11.0300 | 1 | 1970-01-01 00:00:00 |
1 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.019100 | -1.019100 | -2.0382 | -2.0382 | ... | 0 | 0 | True | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | -10.8480 | 1 | 1970-01-01 00:00:01 |
2 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.019100 | -1.019100 | -3.0573 | -3.0573 | ... | 0 | 0 | True | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | -10.6660 | 1 | 1970-01-01 00:00:02 |
3 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.019100 | -1.019100 | -4.0764 | -4.0764 | ... | 0 | 0 | True | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | -10.5070 | 1 | 1970-01-01 00:00:03 |
4 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.0191 | -1.019100 | -1.019100 | -5.0955 | -5.0955 | ... | 0 | 0 | True | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | 1.726237e+09 | -10.4130 | 1 | 1970-01-01 00:00:04 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10495 | -1.0716 | -1.0721 | -1.0716 | -1.0721 | -1.0596 | -1.0596 | -1.059793 | -1.059927 | -15.8969 | -15.8989 | ... | 0 | 0 | True | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | -9.7673 | 1 | 1970-01-01 02:54:55 |
10496 | -1.0698 | -1.0702 | -1.0698 | -1.0702 | -1.0580 | -1.0580 | -1.058167 | -1.058280 | -15.8725 | -15.8742 | ... | 0 | 0 | True | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | -9.9200 | 1 | 1970-01-01 02:54:56 |
10497 | -1.0681 | -1.0684 | -1.0681 | -1.0684 | -1.0565 | -1.0563 | -1.056567 | -1.056667 | -15.8485 | -15.8500 | ... | 0 | 0 | True | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | -9.7743 | 1 | 1970-01-01 02:54:57 |
10498 | -1.0662 | -1.0665 | -1.0662 | -1.0665 | -1.0549 | -1.0548 | -1.054987 | -1.055087 | -15.8248 | -15.8263 | ... | 0 | 0 | True | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | -8.6109 | 1 | 1970-01-01 02:54:58 |
10499 | -1.0644 | -1.0648 | -1.0644 | -1.0648 | -1.0534 | -1.0532 | -1.053440 | -1.053547 | -15.8016 | -15.8032 | ... | 0 | 0 | True | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | 1.726227e+09 | -8.4345 | 1 | 1970-01-01 02:54:59 |
10500 rows × 203 columns
roles = {
getml.data.roles.target: ["f_x"],
getml.data.roles.join_key: ["id"],
getml.data.roles.time_stamp: ["ds"],
}
df_featuretools_train = getml.data.DataFrame.from_pandas(
featuretools_train, name="featuretools_train", roles=roles
)
df_featuretools_test = getml.data.DataFrame.from_pandas(
featuretools_test, name="featuretools_test", roles=roles
)
df_featuretools_train.set_role(
df_featuretools_train.roles.unused, getml.data.roles.numerical
)
df_featuretools_test.set_role(
df_featuretools_test.roles.unused, getml.data.roles.numerical
)
predictor = getml.predictors.XGBoostRegressor()
pipe_ft_pr = getml.pipeline.Pipeline(
tags=["prediction", "featuretools"], predictors=[predictor]
)
pipe_ft_pr
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=False, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostRegressor'], preprocessors=[], share_selected_features=0.5, tags=['prediction', 'featuretools'])
pipe_ft_pr.fit(df_featuretools_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
OK.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:04
Trained pipeline.
Time taken: 0:00:04.476476.
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=False, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostRegressor'], preprocessors=[], share_selected_features=0.5, tags=['prediction', 'featuretools'])
pipe_ft_pr.score(df_featuretools_test)
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
date time | set used | target | mae | rmse | rsquared | |
---|---|---|---|---|---|---|
0 | 2024-09-13 14:37:28 | featuretools_train | f_x | 0.4396 | 0.584 | 0.9962 |
1 | 2024-09-13 14:37:28 | featuretools_test | f_x | 0.5828 | 0.7595 | 0.9948 |
2.3 Propositionalization with tsfresh¶
tsfresh_builder = TSFreshBuilder(
num_features=200,
memory=15,
column_id="id",
time_stamp="ds",
target="f_x",
)
with benchmark("tsfresh"):
tsfresh_train = tsfresh_builder.fit(dfs_pandas["data_train"])
tsfresh_test = tsfresh_builder.transform(dfs_pandas["data_test"])
Rolling: 100%|██████████| 40/40 [00:02<00:00, 13.60it/s] Feature Extraction: 100%|██████████| 40/40 [00:09<00:00, 4.09it/s] Feature Extraction: 100%|██████████| 40/40 [00:11<00:00, 3.64it/s]
Selecting the best out of 130 features... Time taken: 0h:0m:26.638195
Rolling: 100%|██████████| 40/40 [00:01<00:00, 37.19it/s] Feature Extraction: 100%|██████████| 40/40 [00:04<00:00, 9.62it/s] Feature Extraction: 100%|██████████| 40/40 [00:04<00:00, 9.93it/s]
roles = {
getml.data.roles.target: ["f_x"],
getml.data.roles.join_key: ["id"],
getml.data.roles.time_stamp: ["ds"],
}
df_tsfresh_train = getml.data.DataFrame.from_pandas(
tsfresh_train, name="tsfresh_train", roles=roles
)
df_tsfresh_test = getml.data.DataFrame.from_pandas(
tsfresh_test, name="tsfresh_test", roles=roles
)
df_tsfresh_train.set_role(df_tsfresh_train.roles.unused, getml.data.roles.numerical)
df_tsfresh_test.set_role(df_tsfresh_test.roles.unused, getml.data.roles.numerical)
pipe_tsf_pr = getml.pipeline.Pipeline(
tags=["predicition", "tsfresh"], predictors=[predictor]
)
pipe_tsf_pr
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=False, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostRegressor'], preprocessors=[], share_selected_features=0.5, tags=['predicition', 'tsfresh'])
pipe_tsf_pr.check(df_tsfresh_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
OK.
pipe_tsf_pr.fit(df_tsfresh_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
OK.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:03
Trained pipeline.
Time taken: 0:00:03.942466.
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=False, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostRegressor'], preprocessors=[], share_selected_features=0.5, tags=['predicition', 'tsfresh'])
pipe_tsf_pr.score(df_tsfresh_test)
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
date time | set used | target | mae | rmse | rsquared | |
---|---|---|---|---|---|---|
0 | 2024-09-13 14:38:12 | tsfresh_train | f_x | 0.4916 | 0.6636 | 0.9951 |
1 | 2024-09-13 14:38:12 | tsfresh_test | f_x | 0.5986 | 0.7906 | 0.9938 |
3. Comparison¶
num_features = dict(
fastprop=134,
featuretools=158,
tsfresh=120,
)
runtime_per_feature = [
benchmark.runtimes["fastprop"] / num_features["fastprop"],
benchmark.runtimes["featuretools"] / num_features["featuretools"],
benchmark.runtimes["tsfresh"] / num_features["tsfresh"],
]
features_per_second = [1.0 / r.total_seconds() for r in runtime_per_feature]
normalized_runtime_per_feature = [
r / runtime_per_feature[0] for r in runtime_per_feature
]
comparison = pd.DataFrame(
dict(
runtime=[
benchmark.runtimes["fastprop"],
benchmark.runtimes["featuretools"],
benchmark.runtimes["tsfresh"],
],
num_features=num_features.values(),
features_per_second=features_per_second,
normalized_runtime=[
1,
benchmark.runtimes["featuretools"] / benchmark.runtimes["fastprop"],
benchmark.runtimes["tsfresh"] / benchmark.runtimes["fastprop"],
],
normalized_runtime_per_feature=normalized_runtime_per_feature,
rsquared=[pipe_fp_pr.rsquared, pipe_ft_pr.rsquared, pipe_tsf_pr.rsquared],
rmse=[pipe_fp_pr.rmse, pipe_ft_pr.rmse, pipe_tsf_pr.rmse],
mae=[pipe_fp_pr.mae, pipe_ft_pr.mae, pipe_tsf_pr.mae],
)
)
comparison.index = ["getML: FastProp", "featuretools", "tsfresh"]
comparison
runtime | num_features | features_per_second | normalized_runtime | normalized_runtime_per_feature | rsquared | rmse | mae | |
---|---|---|---|---|---|---|---|---|
getML: FastProp | 0 days 00:00:00.398347 | 134 | 336.360579 | 1.000000 | 1.000000 | 0.995058 | 0.723716 | 0.551608 |
featuretools | 0 days 00:14:21.611109 | 158 | 0.183377 | 2162.966230 | 1834.253280 | 0.994784 | 0.759460 | 0.582831 |
tsfresh | 0 days 00:00:26.638362 | 120 | 4.504789 | 66.872255 | 74.667339 | 0.993836 | 0.790602 | 0.598600 |
comparison.to_csv("comparisons/robot.csv")
getml.engine.shutdown()