Excerpt
The occupancy detection data set is a very simple multivariate time series. This is to demonstrate how relational learning can be successfully applied to time series.
The key is a self-join. Instead of creating features by merging and aggregating peripheral tables in a relational data model, we perform the same operations on the population table itself. This results in features like these:
Using getML's algorithms for relational learning, we can extract all of these features automatically. Having created a flat table of such features, we can then apply state-of-the-art machine learning algorithms, like xgboost.
This project is based on a public domain time series dataset. It is available in the UC Irvine Machine Learning Repository. The challenge is straightforward: We want to predict whether an office room is occupied at a given moment in time using sensor data. The data is measured about once a minute. The ground-truth occupancy was obtained from time-stamped pictures. The available columns are as follows:
This project demonstrates that relational learning is a powerful tool for time series. getML is able to outperform the benchmarks for a scientific paper on a simple public domain time series data set using relatively little effort.