project

Dodgers (nb intro)

EXCERPT (BLOG ITEM + OG, TWITTER)

Traffic volume prediction on LA's 101 North freeway

Univariate time series prediction with getML

In this tutorial, we demonstrate a time series application of getML.

We benchmark our results against Facebook's Prophet and tsfresh.

getML's relational learning algorithms outperform Prophet's classical time series approach by ~14% and tsfresh's brute force approaches to feature engineering by ~26% (measured in terms of the predictive R-squared).

Summary:

  • Prediction type: Regression model
  • Domain: Transportation
  • Prediction target: traffic volume
  • Source data: Univariate time series
  • Population size: 47497

Author: Patrick Urbanke

Background

The data set features some particularly interesting characteristics common for time series, which classical models may struggle to deal with. Such characteristics are:

  • High frequency (every five minutes)
  • Dependence on irregular events (holidays, Dodgers games)
  • Strong and overlapping cycles (daily, weekly)
  • Anomalies
  • Multiple seasonalities

To quote the maintainers of the data set:

"This loop sensor data was collected for the Glendale on ramp for the 101 North freeway in Los Angeles. It is close enough to the stadium to see unusual traffic after a Dodgers game, but not so close and heavily used by game traffic so that the signal for the extra traffic is overly obvious."

The dataset was originally collected for this paper:

"Adaptive event detection with time-varying Poisson processes" A. Ihler, J. Hutchins, and P. Smyth Proceedings of the 12th ACM SIGKDD Conference (KDD-06), August 2006.

It is maintained by the UCI Machine Learning Repository:

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


Propositionalization: Traffic near Dodgers' stadium

In this notebbok, we compare getML's FastProp against well-known feature engineering libraries featuretools and tsfresh.

Summary:

  • Prediction type: Regression model
  • Domain: Transportation
  • Prediction target: traffic volume
  • Source data: Univariate time series
  • Population size: 47497

Author: Dr. Patrick Urbanke

Background

A common approach to feature engineering is to generate attribute-value representations from relational data by applying a fixed set of aggregations to columns of interest and perform a feature selection on the (possibly large) set of generated features afterwards. In academia, this approach is called propositionalization.

getML's FastProp is an implementation of this propositionalization approach that has been optimized for speed and memory efficiency. In this notebook, we want to demonstrate how – well – fast FastProp is. To this end, we will benchmark FastProp against the popular feature engineering libraries featuretools and tsfresh. Both of these libraries use propositionalization approaches for feature engineering.

In this notebook, we use traffic data that was collected for the Glendale on ramp for the 101 North freeway in Los Angeles. For further details about the data set refer to the full notebook.

Related code example

Initial Notebook:
Open in nbviewer
Open in mybinder

Propositionalization:
Open in nbviewer
Open in mybinder