project

Cora (nb intro)

EXCERPT (BLOG ITEM + OG, TWITTER)

CORA - Categorizing academic publications using getML

In this notebook, we compare getML against extant approaches in the relational learning literature on the CORA data set, which is often used for benchmarking. We demonstrate that getML outperforms the state of the art in the relational learning literature on this data set. Beyond the benchmarking aspects, this notebooks showcases getML's excellent capabilities in dealing with categorical data.

Summary:

  • Prediction type: Classification model
  • Domain: Academia
  • Prediction target: The category of a paper
  • Population size: 2708

Author: Dr. Patrick Urbanke

Background

CORA is a well-known benchmarking dataset in the academic literature on relational learning. The dataset contains 2708 scientific publications on machine learning. The papers are divided into 7 categories. The challenge is to predict the category of a paper based on the papers it cites, the papers it is cited by and keywords contained in the paper.

It has been downloaded from the CTU Prague relational learning repository (Motl and Schulte, 2015).

Related code example

Notebook:
Open in nbviewer
Open in mybinder