project

Interstate 94 (nb intro)

EXCERPT (BLOG ITEM + OG, TWITTER)

Hourly traffic volume prediction on Interstate 94

Multivariate time series prediction with getML

In this tutorial, we demonstrate a time series application of getML. We predict the hourly traffic volume on I-94 westbound from Minneapolis-St Paul. We benchmark our results against Facebook's Prophet. getML's relational learning algorithms outperform Prophet's classical time series approach by ~15%.

Summary:

  • Prediction type: Regression model
  • Domain: Transportation
  • Prediction target: Hourly traffic volume
  • Source data: Multivariate time series, 5 components
  • Population size: 24096

Author: Sören Nikolaus

Background

The dataset features some particularly interesting characteristics common for time series, which classical models may struggle to deal with appropriately. Such characteristics are:

  • High frequency (hourly)
  • Dependence on irregular events (holidays)
  • Strong and overlapping cycles (daily, weekly)
  • Anomalies
  • Multiple seasonalities

The analysis is built on top of a dataset provided by the MN Department of Transportation, with some data preparation done by John Hogue.


Propositionalization: Interstate 94

In this notebbok, we compare getML's FastProp against well-known feature engineering libraries featuretools and tsfresh.

Summary:

  • Prediction type: Regression model
  • Domain: Transportation
  • Prediction target: Hourly traffic volume
  • Source data: Multivariate time series, 5 components
  • Population size: 24096

Author: Sören Nikolaus

Background

A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called propositionalization.

getML's FastProp is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries featuretools and tsfresh. Both of these libraries use propositionalization approaches for feature engineering.

In this notebook, we predict the hourly traffic volume on I-94 westbound from Minneapolis-St Paul. The analysis is built on top of a dataset provided by the MN Department of Transportation, with some data preparation done by John Hogue. For further details about the data set refer to the full notebook.

Related code example

Initial Notebook:
Open in nbviewer
Open in mybinder

Propositionalization:
Open in nbviewer
Open in mybinder