In this tutorial, we demonstrate how getML can be applied in an e-commerce context. Using a dataset of about 400,000 orders, our goal is to predict whether an order will be cancelled.
We also show that we can significantly improve our results by using getML's built-in hyperparameter tuning routines.
Author: Dr. Patrick Urbanke
The data set contains about 400,000 orders from a British online retailer. Each order consists of a product that has been ordered and a corresponding quantity. Several orders can be summarized onto a single invoice. The goal is to predict whether an order will be cancelled.
Because the company mainly sells to other businesses, the cancellation rate is relatively low, namely 1.83%.
The data set has been originally collected for this study:
Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).
It has been downloaded from the UCI Machine Learning Repository:
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
FastProp is our unique take on propositionalization. In real-world benchmarks against popular propositionalization libraries, we find FastProp is between 34x to 179x faster than the current state of the art.
Automated feature engineering for relational business data? Sound great, but you don't really know what relational data is? This post is for you!