project

Online retail (nb intro)

EXCERPT (BLOG ITEM + OG, TWITTER)

Predicting order cancellations using getML

In this tutorial, we demonstrate how getML can be applied in an e-commerce context. Using a dataset of about 400,000 orders, our goal is to predict whether an order will be cancelled.

We also show that we can significantly improve our results by using getML's built-in hyperparameter tuning routines.

Summary:

  • Prediction type: Classification model
  • Domain: E-commerce
  • Prediction target: Whether an order will be cancelled
  • Population size: 397925

Author: Dr. Patrick Urbanke

Background

The data set contains about 400,000 orders from a British online retailer. Each order consists of a product that has been ordered and a corresponding quantity. Several orders can be summarized onto a single invoice. The goal is to predict whether an order will be cancelled.

Because the company mainly sells to other businesses, the cancellation rate is relatively low, namely 1.83%.

The data set has been originally collected for this study:

Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).

It has been downloaded from the UCI Machine Learning Repository:

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Related code example

Notebook:
Open in nbviewer
Open in mybinder