AdventureWorks - Predicting customer churn¶
In this notebook, we will demonstrate how getML can be used for a customer churn project using a synthetic dataset of a fictional company. We will also benchmark getML against featuretools.
Summary:
- Prediction type: Classification model
- Domain: Customer loyalty
- Prediction target: churn
- Population size: 19704
Background¶
AdventureWorks is a fictional company that sells bicycles. It is used by Microsoft to showcase how its MS SQL Server can be used to manage business data. Since this dataset resembles a real-world customer database and it is open-source, we will use it to showcase how getML can be used for a classic customer churn project (real customer databases are not easily available for the purposes of showcasing and benchmarking, for reasons of data privacy).
The dataset has been downloaded from the CTU Prague relational learning repository (Motl and Schulte, 2015) (Now residing at relational-data.org.).
We will benchmark getML 's feature learning algorithms against featuretools, an open-source implementation of the propositionalization algorithm, similar to getML's FastProp.
Analysis¶
Let's get started with the analysis and set up your session:
%pip install -q "getml==1.5.0" "featuretools==1.31.0" "ipywidgets==8.1.5"
import os
import warnings
import numpy as np
import pandas as pd
import featuretools
import woodwork as ww
import getml
os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"
warnings.simplefilter(action='ignore', category=FutureWarning)
getml.engine.launch(allow_remote_ips=True, token='token')
getml.set_project('adventure_works')
Launching ./getML --allow-push-notifications=true --allow-remote-ips=true --home-directory=/home/user --in-memory=true --install=false --launch-browser=true --log=false --token=token in /home/user/.getML/getml-1.5.0-x64-linux... Launched the getML Engine. The log output will be stored in /home/user/.getML/logs/20240912131357.log. Loading pipelines... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
Connected to project 'adventure_works'.
1. Loading data¶
1.1 Download from source¶
We begin by downloading the data:
conn = getml.database.connect_mysql(
host="db.relational-data.org",
dbname="AdventureWorks2014",
port=3306,
user="guest",
password="relational"
)
conn
Connection(dbname='AdventureWorks2014', dialect='mysql', host='db.relational-data.org', port=3306)
def load_if_needed(name):
"""
Loads the data from the relational learning
repository, if the data frame has not already
been loaded.
"""
if not getml.data.exists(name):
data_frame = getml.DataFrame.from_db(
name=name,
table_name=name,
conn=conn
)
data_frame.save()
else:
data_frame = getml.data.load_data_frame(name)
return data_frame
product = load_if_needed("Product")
sales_order_detail = load_if_needed("SalesOrderDetail")
sales_order_header = load_if_needed("SalesOrderHeader")
sales_order_reason = load_if_needed("SalesOrderHeaderSalesReason")
special_offer = load_if_needed("SpecialOffer")
store = load_if_needed("Store")
product
name | ProductID | MakeFlag | FinishedGoodsFlag | SafetyStockLevel | ReorderPoint | DaysToManufacture | ProductSubcategoryID | ProductModelID | Name | ProductNumber | Color | StandardCost | ListPrice | Size | SizeUnitMeasureCode | WeightUnitMeasureCode | Weight | ProductLine | Class | Style | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | 0 | 1000 | 750 | 0 | nan | nan | Adjustable Race | AR-5381 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 694215B7-08F7-4C0D-ACB1-D734BA44... | 2014-02-08 09:01:36 |
1 | 2 | 0 | 0 | 1000 | 750 | 0 | nan | nan | Bearing Ball | BA-8327 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 58AE3C20-4F3A-4749-A7D4-D568806C... | 2014-02-08 09:01:36 |
2 | 3 | 1 | 0 | 800 | 600 | 1 | nan | nan | BB Ball Bearing | BE-2349 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 9C21AED2-5BFA-4F18-BCB8-F11638DC... | 2014-02-08 09:01:36 |
3 | 4 | 0 | 0 | 800 | 600 | 0 | nan | nan | Headset Ball Bearings | BE-2908 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | ECFED6CB-51FF-49B5-B06C-7D8AC834... | 2014-02-08 09:01:36 |
4 | 316 | 1 | 0 | 800 | 600 | 1 | nan | nan | Blade | BL-2036 | NULL | 0.0000 | 0.0000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | E73E9750-603B-4131-89F5-3DD15ED5... | 2014-02-08 09:01:36 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
499 | 995 | 1 | 1 | 500 | 375 | 1 | 5 | 96 | ML Bottom Bracket | BB-8107 | NULL | 44.9506 | 101.2400 | NULL | NULL | G | 168.00 | NULL | M | NULL | 2013-05-30 00:00:00 | NULL | NULL | 71AB847F-D091-42D6-B735-7B0C2D82... | 2014-02-08 09:01:36 |
500 | 996 | 1 | 1 | 500 | 375 | 1 | 5 | 97 | HL Bottom Bracket | BB-9108 | NULL | 53.9416 | 121.4900 | NULL | NULL | G | 170.00 | NULL | H | NULL | 2013-05-30 00:00:00 | NULL | NULL | 230C47C5-08B2-4CE3-B706-69C0BDD6... | 2014-02-08 09:01:36 |
501 | 997 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 44 | BK-R19B-44 | Black | 343.6496 | 539.9900 | 44 | CM | LB | 19.77 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 44CE4802-409F-43AB-9B27-CA534218... | 2014-02-08 09:01:36 |
502 | 998 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 48 | BK-R19B-48 | Black | 343.6496 | 539.9900 | 48 | CM | LB | 20.13 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 3DE9A212-1D49-40B6-B10A-F564D981... | 2014-02-08 09:01:36 |
503 | 999 | 1 | 1 | 100 | 75 | 4 | 2 | 31 | Road-750 Black, 52 | BK-R19B-52 | Black | 343.6496 | 539.9900 | 52 | CM | LB | 20.42 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | AE638923-2B67-4679-B90E-ABBAB17D... | 2014-02-08 09:01:36 |
504 rows x 25 columns
memory usage: 0.17 MB
name: Product
type: getml.DataFrame
sales_order_detail
name | SalesOrderID | SalesOrderDetailID | OrderQty | ProductID | SpecialOfferID | CarrierTrackingNumber | UnitPrice | UnitPriceDiscount | LineTotal | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 43659 | 1 | 1 | 776 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 2024.994000 | B207C96D-D9E6-402B-8470-2CC176C4... | 2011-05-30 22:00:00 |
1 | 43659 | 2 | 3 | 777 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 6074.982000 | 7ABB600D-1E77-41BE-9FE5-B9142CFC... | 2011-05-30 22:00:00 |
2 | 43659 | 3 | 1 | 778 | 1 | 4911-403C-98 | 2024.9940 | 0.0000 | 2024.994000 | 475CF8C6-49F6-486E-B0AD-AFC6A50C... | 2011-05-30 22:00:00 |
3 | 43659 | 4 | 1 | 771 | 1 | 4911-403C-98 | 2039.9940 | 0.0000 | 2039.994000 | 04C4DE91-5815-45D6-8670-F462719F... | 2011-05-30 22:00:00 |
4 | 43659 | 5 | 1 | 772 | 1 | 4911-403C-98 | 2039.9940 | 0.0000 | 2039.994000 | 5A74C7D2-E641-438E-A7AC-37BF2328... | 2011-05-30 22:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
121312 | 75122 | 121313 | 1 | 878 | 1 | NULL | 21.9800 | 0.0000 | 21.980000 | 8CAD6675-18CC-4F47-8287-97B41A8E... | 2014-06-29 22:00:00 |
121313 | 75122 | 121314 | 1 | 712 | 1 | NULL | 8.9900 | 0.0000 | 8.990000 | 84F1C363-1C50-4442-BE16-541C59B6... | 2014-06-29 22:00:00 |
121314 | 75123 | 121315 | 1 | 878 | 1 | NULL | 21.9800 | 0.0000 | 21.980000 | C18B6476-429F-4BB1-828E-2BE5F82A... | 2014-06-29 22:00:00 |
121315 | 75123 | 121316 | 1 | 879 | 1 | NULL | 159.0000 | 0.0000 | 159.000000 | 75A89C6A-C60A-47EA-8A52-B52A9C43... | 2014-06-29 22:00:00 |
121316 | 75123 | 121317 | 1 | 712 | 1 | NULL | 8.9900 | 0.0000 | 8.990000 | 73646D26-0461-450D-8019-2C6C8586... | 2014-06-29 22:00:00 |
121317 rows x 11 columns
memory usage: 21.60 MB
name: SalesOrderDetail
type: getml.DataFrame
sales_order_header
name | SalesOrderID | RevisionNumber | Status | OnlineOrderFlag | CustomerID | SalesPersonID | TerritoryID | BillToAddressID | ShipToAddressID | ShipMethodID | CreditCardID | CurrencyRateID | OrderDate | DueDate | ShipDate | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | SubTotal | TaxAmt | Freight | TotalDue | Comment | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 43659 | 8 | 5 | 0 | 29825 | 279 | 5 | 985 | 985 | 5 | 16281 | nan | 2011-05-30 22:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... | 2011-06-06 22:00:00 |
1 | 43660 | 8 | 5 | 0 | 29672 | 279 | 5 | 921 | 921 | 5 | 5618 | nan | 2011-05-30 22:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... | 2011-06-06 22:00:00 |
2 | 43661 | 8 | 5 | 0 | 29734 | 282 | 6 | 517 | 517 | 5 | 1346 | 4 | 2011-05-30 22:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | 32726.4786 | 3153.7696 | 985.5530 | 36865.8012 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... | 2011-06-06 22:00:00 |
3 | 43662 | 8 | 5 | 0 | 29994 | 282 | 6 | 482 | 482 | 5 | 10456 | 4 | 2011-05-30 22:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... | 2011-06-06 22:00:00 |
4 | 43663 | 8 | 5 | 0 | 29565 | 276 | 4 | 1073 | 1073 | 5 | 4322 | nan | 2011-05-30 22:00:00 | 2011-06-12 00:00:00 | 2011-06-07 00:00:00 | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... | 2011-06-06 22:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
31460 | 75119 | 8 | 5 | 1 | 11981 | nan | 1 | 17649 | 17649 | 1 | 6761 | nan | 2014-06-29 22:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75119 | NULL | 10-4030-011981 | 429826Vi35166 | 42.2800 | 3.3824 | 1.0570 | 46.7194 | NULL | 9382F1C9-383A-435F-9449-0EECEA21... | 2014-07-06 22:00:00 |
31461 | 75120 | 8 | 5 | 1 | 18749 | nan | 6 | 28374 | 28374 | 1 | 8925 | nan | 2014-06-29 22:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75120 | NULL | 10-4030-018749 | 929849Vi46003 | 84.9600 | 6.7968 | 2.1240 | 93.8808 | NULL | AE6A4FCF-FF73-4CD4-AF2C-5993D00D... | 2014-07-06 22:00:00 |
31462 | 75121 | 8 | 5 | 1 | 15251 | nan | 6 | 26553 | 26553 | 1 | 14220 | nan | 2014-06-29 22:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75121 | NULL | 10-4030-015251 | 529864Vi73738 | 74.9800 | 5.9984 | 1.8745 | 82.8529 | NULL | D7395C0E-00CB-4BFA-A238-0D6A9F49... | 2014-07-06 22:00:00 |
31463 | 75122 | 8 | 5 | 1 | 15868 | nan | 6 | 14616 | 14616 | 1 | 18719 | nan | 2014-06-29 22:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75122 | NULL | 10-4030-015868 | 330022Vi97312 | 30.9700 | 2.4776 | 0.7743 | 34.2219 | NULL | 4221035A-4159-492F-AF40-4363A64F... | 2014-07-06 22:00:00 |
31464 | 75123 | 8 | 5 | 1 | 18759 | nan | 6 | 14024 | 14024 | 1 | 10084 | nan | 2014-06-29 22:00:00 | 2014-07-12 00:00:00 | 2014-07-07 00:00:00 | SO75123 | NULL | 10-4030-018759 | 230370Vi51970 | 189.9700 | 15.1976 | 4.7493 | 209.9169 | NULL | D54752FF-2B54-4BE5-95EA-3B72289C... | 2014-07-06 22:00:00 |
31465 rows x 26 columns
memory usage: 12.56 MB
name: SalesOrderHeader
type: getml.DataFrame
sales_order_reason
name | SalesOrderID | SalesReasonID | ModifiedDate |
---|---|---|---|
role | unused_float | unused_float | unused_string |
0 | 43697 | 5 | 2011-05-30 22:00:00 |
1 | 43697 | 9 | 2011-05-30 22:00:00 |
2 | 43702 | 5 | 2011-05-31 22:00:00 |
3 | 43702 | 9 | 2011-05-31 22:00:00 |
4 | 43703 | 5 | 2011-05-31 22:00:00 |
... | ... | ... | |
27642 | 75119 | 1 | 2014-06-29 22:00:00 |
27643 | 75120 | 1 | 2014-06-29 22:00:00 |
27644 | 75121 | 1 | 2014-06-29 22:00:00 |
27645 | 75122 | 1 | 2014-06-29 22:00:00 |
27646 | 75123 | 1 | 2014-06-29 22:00:00 |
27647 rows x 3 columns
memory usage: 1.22 MB
name: SalesOrderHeaderSalesReason
type: getml.DataFrame
special_offer
name | SpecialOfferID | MinQty | MaxQty | Description | DiscountPct | Type | Category | StartDate | EndDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | nan | No Discount | 0.0000 | No Discount | No Discount | 2011-05-01 00:00:00 | 2014-11-30 00:00:00 | 0290C4F5-191F-4337-AB6B-0A2DDE03... | 2011-03-31 22:00:00 |
1 | 2 | 11 | 14 | Volume Discount 11 to 14 | 0.0200 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | D7542EE7-15DB-4541-985C-5CC27AEF... | 2011-04-30 22:00:00 |
2 | 3 | 15 | 24 | Volume Discount 15 to 24 | 0.0500 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 4BDBCC01-8CF7-40A9-B643-40EC5B71... | 2011-04-30 22:00:00 |
3 | 4 | 25 | 40 | Volume Discount 25 to 40 | 0.1000 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 504B5E85-8F3F-4EBC-9E1D-C1BC5DEA... | 2011-04-30 22:00:00 |
4 | 5 | 41 | 60 | Volume Discount 41 to 60 | 0.1500 | Volume Discount | Reseller | 2011-05-31 00:00:00 | 2014-05-30 00:00:00 | 677E1D9D-944F-4E81-90E8-47EB0A82... | 2011-04-30 22:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
11 | 12 | 0 | nan | LL Road Frame Sale | 0.3500 | Excess Inventory | Reseller | 2013-05-30 00:00:00 | 2013-07-14 00:00:00 | C0AF1C89-9722-4235-9248-3FBA4D9E... | 2013-04-29 22:00:00 |
12 | 13 | 0 | nan | Touring-3000 Promotion | 0.1500 | New Product | Reseller | 2013-05-30 00:00:00 | 2013-08-29 00:00:00 | 5061CCE4-E021-45A8-9A75-DFB36CBB... | 2013-04-29 22:00:00 |
13 | 14 | 0 | nan | Touring-1000 Promotion | 0.2000 | New Product | Reseller | 2013-05-30 00:00:00 | 2013-08-29 00:00:00 | 1AF84A9E-A98C-4BD9-B48F-DC2B8B6B... | 2013-04-29 22:00:00 |
14 | 15 | 0 | nan | Half-Price Pedal Sale | 0.5000 | Seasonal Discount | Customer | 2013-07-14 00:00:00 | 2013-08-14 00:00:00 | 03E3594D-6EBB-46A6-B8EE-A9289C0C... | 2013-06-13 22:00:00 |
15 | 16 | 0 | nan | Mountain-500 Silver Clearance Sa... | 0.4000 | Discontinued Product | Reseller | 2014-03-31 00:00:00 | 2014-05-30 00:00:00 | EB7CB484-BCCF-4D2D-BF73-521B2001... | 2014-02-28 23:00:00 |
16 rows x 11 columns
memory usage: 0.00 MB
name: SpecialOffer
type: getml.DataFrame
store
name | BusinessEntityID | SalesPersonID | Name | Demographics | rowguid | ModifiedDate |
---|---|---|---|---|---|---|
role | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string |
0 | 292 | 279 | Next-Door Bike Store | <StoreSurvey xmlns="http://schem... | A22517E3-848D-4EBE-B9D9-7437F343... | 2014-09-12 09:15:07 |
1 | 294 | 276 | Professional Sales and Service | <StoreSurvey xmlns="http://schem... | B50CA50B-C601-4A13-B07E-2C63862D... | 2014-09-12 09:15:07 |
2 | 296 | 277 | Riders Company | <StoreSurvey xmlns="http://schem... | 337C3688-1339-4E1A-A08A-B54B2356... | 2014-09-12 09:15:07 |
3 | 298 | 275 | The Bike Mechanics | <StoreSurvey xmlns="http://schem... | 7894F278-F0C8-4D16-BD75-213FDBF1... | 2014-09-12 09:15:07 |
4 | 300 | 286 | Nationwide Supply | <StoreSurvey xmlns="http://schem... | C3FC9705-A8C4-4F3A-9550-EB2FA4B7... | 2014-09-12 09:15:07 |
... | ... | ... | ... | ... | ... | |
696 | 1988 | 282 | Retreat Inn | <StoreSurvey xmlns="http://schem... | EA21EC81-1BFA-4A07-9B4D-73D9852A... | 2014-09-12 09:15:07 |
697 | 1990 | 281 | Technical Parts Manufacturing | <StoreSurvey xmlns="http://schem... | C8E3C4ED-8F58-4DB2-B600-E0CD11D9... | 2014-09-12 09:15:07 |
698 | 1992 | 277 | Totes & Baskets Company | <StoreSurvey xmlns="http://schem... | CE860B58-643C-4567-BFD8-06E97969... | 2014-09-12 09:15:07 |
699 | 1994 | 277 | World of Bikes | <StoreSurvey xmlns="http://schem... | 0C10F2B6-A13A-440C-9C25-5B28D482... | 2014-09-12 09:15:07 |
700 | 2051 | 275 | A Bicycle Association | <StoreSurvey xmlns="http://schem... | 82237172-D3FE-4A95-82EF-636F6552... | 2014-09-12 09:15:07 |
701 rows x 6 columns
memory usage: 0.38 MB
name: Store
type: getml.DataFrame
1.2 Prepare data for getML¶
getML requires that we define roles for each of the columns.
product.set_role("ProductID", getml.data.roles.join_key)
product.set_role(["MakeFlag", "ProductSubcategoryID", "ProductModelID"], getml.data.roles.categorical)
product.set_role(["SafetyStockLevel", "ReorderPoint", "StandardCost", "ListPrice"], getml.data.roles.numerical)
product
name | ProductID | MakeFlag | ProductSubcategoryID | ProductModelID | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | FinishedGoodsFlag | DaysToManufacture | Name | ProductNumber | Color | Size | SizeUnitMeasureCode | WeightUnitMeasureCode | Weight | ProductLine | Class | Style | SellStartDate | SellEndDate | DiscontinuedDate | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | join_key | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
0 | 1 | 0 | nan | nan | 1000 | 750 | 0 | 0 | 0 | 0 | Adjustable Race | AR-5381 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 694215B7-08F7-4C0D-ACB1-D734BA44... | 2014-02-08 09:01:36 |
1 | 2 | 0 | nan | nan | 1000 | 750 | 0 | 0 | 0 | 0 | Bearing Ball | BA-8327 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 58AE3C20-4F3A-4749-A7D4-D568806C... | 2014-02-08 09:01:36 |
2 | 3 | 1 | nan | nan | 800 | 600 | 0 | 0 | 0 | 1 | BB Ball Bearing | BE-2349 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | 9C21AED2-5BFA-4F18-BCB8-F11638DC... | 2014-02-08 09:01:36 |
3 | 4 | 0 | nan | nan | 800 | 600 | 0 | 0 | 0 | 0 | Headset Ball Bearings | BE-2908 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | ECFED6CB-51FF-49B5-B06C-7D8AC834... | 2014-02-08 09:01:36 |
4 | 316 | 1 | nan | nan | 800 | 600 | 0 | 0 | 0 | 1 | Blade | BL-2036 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 2008-04-30 00:00:00 | NULL | NULL | E73E9750-603B-4131-89F5-3DD15ED5... | 2014-02-08 09:01:36 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
499 | 995 | 1 | 5 | 96 | 500 | 375 | 44.9506 | 101.24 | 1 | 1 | ML Bottom Bracket | BB-8107 | NULL | NULL | NULL | G | 168.00 | NULL | M | NULL | 2013-05-30 00:00:00 | NULL | NULL | 71AB847F-D091-42D6-B735-7B0C2D82... | 2014-02-08 09:01:36 |
500 | 996 | 1 | 5 | 97 | 500 | 375 | 53.9416 | 121.49 | 1 | 1 | HL Bottom Bracket | BB-9108 | NULL | NULL | NULL | G | 170.00 | NULL | H | NULL | 2013-05-30 00:00:00 | NULL | NULL | 230C47C5-08B2-4CE3-B706-69C0BDD6... | 2014-02-08 09:01:36 |
501 | 997 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 44 | BK-R19B-44 | Black | 44 | CM | LB | 19.77 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 44CE4802-409F-43AB-9B27-CA534218... | 2014-02-08 09:01:36 |
502 | 998 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 48 | BK-R19B-48 | Black | 48 | CM | LB | 20.13 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | 3DE9A212-1D49-40B6-B10A-F564D981... | 2014-02-08 09:01:36 |
503 | 999 | 1 | 2 | 31 | 100 | 75 | 343.6496 | 539.99 | 1 | 4 | Road-750 Black, 52 | BK-R19B-52 | Black | 52 | CM | LB | 20.42 | R | L | U | 2013-05-30 00:00:00 | NULL | NULL | AE638923-2B67-4679-B90E-ABBAB17D... | 2014-02-08 09:01:36 |
504 rows x 25 columns
memory usage: 0.16 MB
name: Product
type: getml.DataFrame
sales_order_detail.set_role(["SalesOrderID", "SalesOrderDetailID", "ProductID", "SpecialOfferID"], getml.data.roles.join_key)
sales_order_detail.set_role(["OrderQty", "UnitPrice", "UnitPriceDiscount", "LineTotal"], getml.data.roles.numerical)
sales_order_detail.set_role("ModifiedDate", getml.data.roles.time_stamp)
sales_order_detail
name | ModifiedDate | SalesOrderID | SalesOrderDetailID | ProductID | SpecialOfferID | OrderQty | UnitPrice | UnitPriceDiscount | LineTotal | CarrierTrackingNumber | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | join_key | join_key | join_key | join_key | numerical | numerical | numerical | numerical | unused_string | unused_string |
unit | time stamp, comparison only | ||||||||||
0 | 2011-05-30 22:00:00 | 43659 | 1 | 776 | 1 | 1 | 2024.994 | 0 | 2024.994 | 4911-403C-98 | B207C96D-D9E6-402B-8470-2CC176C4... |
1 | 2011-05-30 22:00:00 | 43659 | 2 | 777 | 1 | 3 | 2024.994 | 0 | 6074.982 | 4911-403C-98 | 7ABB600D-1E77-41BE-9FE5-B9142CFC... |
2 | 2011-05-30 22:00:00 | 43659 | 3 | 778 | 1 | 1 | 2024.994 | 0 | 2024.994 | 4911-403C-98 | 475CF8C6-49F6-486E-B0AD-AFC6A50C... |
3 | 2011-05-30 22:00:00 | 43659 | 4 | 771 | 1 | 1 | 2039.994 | 0 | 2039.994 | 4911-403C-98 | 04C4DE91-5815-45D6-8670-F462719F... |
4 | 2011-05-30 22:00:00 | 43659 | 5 | 772 | 1 | 1 | 2039.994 | 0 | 2039.994 | 4911-403C-98 | 5A74C7D2-E641-438E-A7AC-37BF2328... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
121312 | 2014-06-29 22:00:00 | 75122 | 121313 | 878 | 1 | 1 | 21.98 | 0 | 21.98 | NULL | 8CAD6675-18CC-4F47-8287-97B41A8E... |
121313 | 2014-06-29 22:00:00 | 75122 | 121314 | 712 | 1 | 1 | 8.99 | 0 | 8.99 | NULL | 84F1C363-1C50-4442-BE16-541C59B6... |
121314 | 2014-06-29 22:00:00 | 75123 | 121315 | 878 | 1 | 1 | 21.98 | 0 | 21.98 | NULL | C18B6476-429F-4BB1-828E-2BE5F82A... |
121315 | 2014-06-29 22:00:00 | 75123 | 121316 | 879 | 1 | 1 | 159 | 0 | 159 | NULL | 75A89C6A-C60A-47EA-8A52-B52A9C43... |
121316 | 2014-06-29 22:00:00 | 75123 | 121317 | 712 | 1 | 1 | 8.99 | 0 | 8.99 | NULL | 73646D26-0461-450D-8019-2C6C8586... |
121317 rows x 11 columns
memory usage: 14.08 MB
name: SalesOrderDetail
type: getml.DataFrame
sales_order_reason.set_role("SalesOrderID", getml.data.roles.join_key)
sales_order_reason.set_role("SalesReasonID", getml.data.roles.categorical)
sales_order_reason
name | SalesOrderID | SalesReasonID | ModifiedDate |
---|---|---|---|
role | join_key | categorical | unused_string |
0 | 43697 | 5 | 2011-05-30 22:00:00 |
1 | 43697 | 9 | 2011-05-30 22:00:00 |
2 | 43702 | 5 | 2011-05-31 22:00:00 |
3 | 43702 | 9 | 2011-05-31 22:00:00 |
4 | 43703 | 5 | 2011-05-31 22:00:00 |
... | ... | ... | |
27642 | 75119 | 1 | 2014-06-29 22:00:00 |
27643 | 75120 | 1 | 2014-06-29 22:00:00 |
27644 | 75121 | 1 | 2014-06-29 22:00:00 |
27645 | 75122 | 1 | 2014-06-29 22:00:00 |
27646 | 75123 | 1 | 2014-06-29 22:00:00 |
27647 rows x 3 columns
memory usage: 1.00 MB
name: SalesOrderHeaderSalesReason
type: getml.DataFrame
special_offer.set_role(["SpecialOfferID"], getml.data.roles.join_key)
special_offer.set_role(["MinQty", "DiscountPct"], getml.data.roles.numerical)
special_offer.set_role(["Category", "Description", "Type"], getml.data.roles.categorical)
special_offer.set_role(["StartDate", "EndDate"], getml.data.roles.time_stamp)
special_offer
name | StartDate | EndDate | SpecialOfferID | Category | Description | Type | MinQty | DiscountPct | MaxQty | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | join_key | categorical | categorical | categorical | numerical | numerical | unused_float | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | |||||||||
0 | 2011-05-01 | 2014-11-30 | 1 | No Discount | No Discount | No Discount | 0 | 0 | nan | 0290C4F5-191F-4337-AB6B-0A2DDE03... | 2011-03-31 22:00:00 |
1 | 2011-05-31 | 2014-05-30 | 2 | Reseller | Volume Discount 11 to 14 | Volume Discount | 11 | 0.02 | 14 | D7542EE7-15DB-4541-985C-5CC27AEF... | 2011-04-30 22:00:00 |
2 | 2011-05-31 | 2014-05-30 | 3 | Reseller | Volume Discount 15 to 24 | Volume Discount | 15 | 0.05 | 24 | 4BDBCC01-8CF7-40A9-B643-40EC5B71... | 2011-04-30 22:00:00 |
3 | 2011-05-31 | 2014-05-30 | 4 | Reseller | Volume Discount 25 to 40 | Volume Discount | 25 | 0.1 | 40 | 504B5E85-8F3F-4EBC-9E1D-C1BC5DEA... | 2011-04-30 22:00:00 |
4 | 2011-05-31 | 2014-05-30 | 5 | Reseller | Volume Discount 41 to 60 | Volume Discount | 41 | 0.15 | 60 | 677E1D9D-944F-4E81-90E8-47EB0A82... | 2011-04-30 22:00:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
11 | 2013-05-30 | 2013-07-14 | 12 | Reseller | LL Road Frame Sale | Excess Inventory | 0 | 0.35 | nan | C0AF1C89-9722-4235-9248-3FBA4D9E... | 2013-04-29 22:00:00 |
12 | 2013-05-30 | 2013-08-29 | 13 | Reseller | Touring-3000 Promotion | New Product | 0 | 0.15 | nan | 5061CCE4-E021-45A8-9A75-DFB36CBB... | 2013-04-29 22:00:00 |
13 | 2013-05-30 | 2013-08-29 | 14 | Reseller | Touring-1000 Promotion | New Product | 0 | 0.2 | nan | 1AF84A9E-A98C-4BD9-B48F-DC2B8B6B... | 2013-04-29 22:00:00 |
14 | 2013-07-14 | 2013-08-14 | 15 | Customer | Half-Price Pedal Sale | Seasonal Discount | 0 | 0.5 | nan | 03E3594D-6EBB-46A6-B8EE-A9289C0C... | 2013-06-13 22:00:00 |
15 | 2014-03-31 | 2014-05-30 | 16 | Reseller | Mountain-500 Silver Clearance Sa... | Discontinued Product | 0 | 0.4 | nan | EB7CB484-BCCF-4D2D-BF73-521B2001... | 2014-02-28 23:00:00 |
16 rows x 11 columns
memory usage: 0.00 MB
name: SpecialOffer
type: getml.DataFrame
store.set_role(["SalesPersonID"], getml.data.roles.join_key)
store["test"] = store["ModifiedDate"].update(getml.data.random() > 0.5, "NULL")
store.set_role(["SalesPersonID"], getml.data.roles.join_key)
store.set_role(["test"], getml.data.roles.time_stamp)
store
name | test | SalesPersonID | BusinessEntityID | Name | Demographics | rowguid | ModifiedDate |
---|---|---|---|---|---|---|---|
role | time_stamp | join_key | unused_float | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | ||||||
0 | 2014-09-12 09:15:07 | 279 | 292 | Next-Door Bike Store | <StoreSurvey xmlns="http://schem... | A22517E3-848D-4EBE-B9D9-7437F343... | 2014-09-12 09:15:07 |
1 | NULL | 276 | 294 | Professional Sales and Service | <StoreSurvey xmlns="http://schem... | B50CA50B-C601-4A13-B07E-2C63862D... | 2014-09-12 09:15:07 |
2 | 2014-09-12 09:15:07 | 277 | 296 | Riders Company | <StoreSurvey xmlns="http://schem... | 337C3688-1339-4E1A-A08A-B54B2356... | 2014-09-12 09:15:07 |
3 | NULL | 275 | 298 | The Bike Mechanics | <StoreSurvey xmlns="http://schem... | 7894F278-F0C8-4D16-BD75-213FDBF1... | 2014-09-12 09:15:07 |
4 | NULL | 286 | 300 | Nationwide Supply | <StoreSurvey xmlns="http://schem... | C3FC9705-A8C4-4F3A-9550-EB2FA4B7... | 2014-09-12 09:15:07 |
... | ... | ... | ... | ... | ... | ... | |
696 | NULL | 282 | 1988 | Retreat Inn | <StoreSurvey xmlns="http://schem... | EA21EC81-1BFA-4A07-9B4D-73D9852A... | 2014-09-12 09:15:07 |
697 | 2014-09-12 09:15:07 | 281 | 1990 | Technical Parts Manufacturing | <StoreSurvey xmlns="http://schem... | C8E3C4ED-8F58-4DB2-B600-E0CD11D9... | 2014-09-12 09:15:07 |
698 | NULL | 277 | 1992 | Totes & Baskets Company | <StoreSurvey xmlns="http://schem... | CE860B58-643C-4567-BFD8-06E97969... | 2014-09-12 09:15:07 |
699 | NULL | 277 | 1994 | World of Bikes | <StoreSurvey xmlns="http://schem... | 0C10F2B6-A13A-440C-9C25-5B28D482... | 2014-09-12 09:15:07 |
700 | 2014-09-12 09:15:07 | 275 | 2051 | A Bicycle Association | <StoreSurvey xmlns="http://schem... | 82237172-D3FE-4A95-82EF-636F6552... | 2014-09-12 09:15:07 |
701 rows x 7 columns
memory usage: 0.38 MB
name: Store
type: getml.DataFrame
sales_order_header["SalesPersonIDCat"] = sales_order_header["SalesPersonID"]
sales_order_header["TerritoryIDCat"] = sales_order_header["TerritoryID"]
sales_order_header.set_role(["CustomerID", "SalesOrderID", "SalesPersonID", "TerritoryID"], getml.data.roles.join_key)
sales_order_header.set_role(
["RevisionNumber", "OnlineOrderFlag", "SalesPersonIDCat", "TerritoryIDCat", "ShipMethodID"],
getml.data.roles.categorical)
sales_order_header.set_role(["SubTotal", "TaxAmt", "Freight", "TotalDue"], getml.data.roles.numerical)
sales_order_header.set_role(["OrderDate", "DueDate", "ShipDate", "ModifiedDate"], getml.data.roles.time_stamp)
sales_order_header
name | OrderDate | DueDate | ShipDate | ModifiedDate | CustomerID | SalesOrderID | SalesPersonID | TerritoryID | RevisionNumber | OnlineOrderFlag | SalesPersonIDCat | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | Status | BillToAddressID | ShipToAddressID | CreditCardID | CurrencyRateID | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | Comment | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | time_stamp | time_stamp | join_key | join_key | join_key | join_key | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | ||||||||||||||||||||||||
0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29825 | 43659 | 279 | 5 | 8 | 0 | 279 | 5 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 5 | 985 | 985 | 16281 | nan | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... |
1 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29672 | 43660 | 279 | 5 | 8 | 0 | 279 | 5 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | 5 | 921 | 921 | 5618 | nan | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... |
2 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29734 | 43661 | 282 | 6 | 8 | 0 | 282 | 6 | 5 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | 5 | 517 | 517 | 1346 | 4 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... |
3 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29994 | 43662 | 282 | 6 | 8 | 0 | 282 | 6 | 5 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | 5 | 482 | 482 | 10456 | 4 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... |
4 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29565 | 43663 | 276 | 4 | 8 | 0 | 276 | 4 | 5 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | 5 | 1073 | 1073 | 4322 | nan | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
31460 | 2014-06-29 22:00:00 | 2014-07-12 | 2014-07-07 | 2014-07-06 22:00:00 | 11981 | 75119 | nan | 1 | 8 | 1 | nan | 1 | 1 | 42.28 | 3.3824 | 1.057 | 46.7194 | 5 | 17649 | 17649 | 6761 | nan | SO75119 | NULL | 10-4030-011981 | 429826Vi35166 | NULL | 9382F1C9-383A-435F-9449-0EECEA21... |
31461 | 2014-06-29 22:00:00 | 2014-07-12 | 2014-07-07 | 2014-07-06 22:00:00 | 18749 | 75120 | nan | 6 | 8 | 1 | nan | 6 | 1 | 84.96 | 6.7968 | 2.124 | 93.8808 | 5 | 28374 | 28374 | 8925 | nan | SO75120 | NULL | 10-4030-018749 | 929849Vi46003 | NULL | AE6A4FCF-FF73-4CD4-AF2C-5993D00D... |
31462 | 2014-06-29 22:00:00 | 2014-07-12 | 2014-07-07 | 2014-07-06 22:00:00 | 15251 | 75121 | nan | 6 | 8 | 1 | nan | 6 | 1 | 74.98 | 5.9984 | 1.8745 | 82.8529 | 5 | 26553 | 26553 | 14220 | nan | SO75121 | NULL | 10-4030-015251 | 529864Vi73738 | NULL | D7395C0E-00CB-4BFA-A238-0D6A9F49... |
31463 | 2014-06-29 22:00:00 | 2014-07-12 | 2014-07-07 | 2014-07-06 22:00:00 | 15868 | 75122 | nan | 6 | 8 | 1 | nan | 6 | 1 | 30.97 | 2.4776 | 0.7743 | 34.2219 | 5 | 14616 | 14616 | 18719 | nan | SO75122 | NULL | 10-4030-015868 | 330022Vi97312 | NULL | 4221035A-4159-492F-AF40-4363A64F... |
31464 | 2014-06-29 22:00:00 | 2014-07-12 | 2014-07-07 | 2014-07-06 22:00:00 | 18759 | 75123 | nan | 6 | 8 | 1 | nan | 6 | 1 | 189.97 | 15.1976 | 4.7493 | 209.9169 | 5 | 14024 | 14024 | 10084 | nan | SO75123 | NULL | 10-4030-018759 | 230370Vi51970 | NULL | D54752FF-2B54-4BE5-95EA-3B72289C... |
31465 rows x 28 columns
memory usage: 8.34 MB
name: SalesOrderHeader
type: getml.DataFrame
We must also define customer churn. In this case, we define customer churn as a customer not making another purchase within 180 days of his or her last purchase.
Thus, the churn variable is defined as follows:
- 0, if another purchase by the same customer has been made within 180 days after
OrderDate
- 1, if no purchase by the same customer has been made within 180 days after
OrderDate
- NULL, if
max(OrderDate) - OrderDate <= 180 days
NULL targets can not be used in our analysis.
sales_order_header_pd = sales_order_header[["OrderDate", "CustomerID", "SalesOrderID"]].to_pandas()
repeat_purchases = sales_order_header_pd.merge(
sales_order_header_pd[["OrderDate", "CustomerID"]],
on="CustomerID",
how="left",
)
repeat_purchases = repeat_purchases[
repeat_purchases["OrderDate_y"] > repeat_purchases["OrderDate_x"]
]
repeat_purchases = repeat_purchases[
repeat_purchases["OrderDate_y"] - repeat_purchases["OrderDate_x"] > pd.Timedelta('180 days')
]
repeat_purchases.groupby("SalesOrderID", as_index=False).aggregate({"CustomerID": "max"})
repeat_purchase_ids = {sid: True for sid in repeat_purchases["SalesOrderID"]}
cut_off_date = max(sales_order_header_pd["OrderDate"]) - pd.Timedelta('180 days')
churn = np.asarray([
np.nan if order_date >= cut_off_date else 0 if order_id in repeat_purchase_ids else 1
for (order_date, order_id) in zip(sales_order_header_pd["OrderDate"], sales_order_header_pd["SalesOrderID"])
])
sales_order_header["churn"] = churn
sales_order_header = sales_order_header[~sales_order_header.churn.is_nan()].to_df("SalesOrderHeaderRefined")
sales_order_header.set_role("churn", getml.data.roles.target)
sales_order_header
name | OrderDate | DueDate | ShipDate | ModifiedDate | CustomerID | SalesOrderID | SalesPersonID | TerritoryID | churn | RevisionNumber | OnlineOrderFlag | SalesPersonIDCat | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | Status | BillToAddressID | ShipToAddressID | CreditCardID | CurrencyRateID | SalesOrderNumber | PurchaseOrderNumber | AccountNumber | CreditCardApprovalCode | Comment | rowguid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | time_stamp | time_stamp | time_stamp | time_stamp | join_key | join_key | join_key | join_key | target | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | unused_float | unused_float | unused_float | unused_float | unused_float | unused_string | unused_string | unused_string | unused_string | unused_string | unused_string |
unit | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | time stamp, comparison only | |||||||||||||||||||||||||
0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29825 | 43659 | 279 | 5 | 0 | 8 | 0 | 279 | 5 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 5 | 985 | 985 | 16281 | nan | SO43659 | PO522145787 | 10-4020-000676 | 105041Vi84182 | NULL | 79B65321-39CA-4115-9CBA-8FE0903E... |
1 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29672 | 43660 | 279 | 5 | 0 | 8 | 0 | 279 | 5 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | 5 | 921 | 921 | 5618 | nan | SO43660 | PO18850127500 | 10-4020-000117 | 115213Vi29411 | NULL | 738DC42D-D03B-48A1-9822-F95A67EA... |
2 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29734 | 43661 | 282 | 6 | 0 | 8 | 0 | 282 | 6 | 5 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | 5 | 517 | 517 | 1346 | 4 | SO43661 | PO18473189620 | 10-4020-000442 | 85274Vi6854 | NULL | D91B9131-18A4-4A11-BC3A-90B6F53E... |
3 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29994 | 43662 | 282 | 6 | 0 | 8 | 0 | 282 | 6 | 5 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | 5 | 482 | 482 | 10456 | 4 | SO43662 | PO18444174044 | 10-4020-000227 | 125295Vi53935 | NULL | 4A1ECFC0-CC3A-4740-B028-1C50BB48... |
4 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 29565 | 43663 | 276 | 4 | 0 | 8 | 0 | 276 | 4 | 5 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | 5 | 1073 | 1073 | 4322 | nan | SO43663 | PO18009186470 | 10-4020-000510 | 45303Vi22691 | NULL | 9B1E7A40-6AE0-4AD3-811C-A6495185... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
19699 | 2013-12-30 23:00:00 | 2014-01-12 | 2014-01-07 | 2014-01-06 23:00:00 | 20826 | 63358 | nan | 7 | 1 | 8 | 1 | nan | 7 | 1 | 1173.96 | 93.9168 | 29.349 | 1297.2258 | 5 | 24387 | 24387 | 3239 | nan | SO63358 | NULL | 10-4030-020826 | 1142084Vi17039 | NULL | 41278FBB-3DD8-488B-AEA5-8BF4A6F1... |
19700 | 2013-12-30 23:00:00 | 2014-01-12 | 2014-01-07 | 2014-01-06 23:00:00 | 24114 | 63359 | nan | 10 | 1 | 8 | 1 | nan | 10 | 1 | 1179.47 | 94.3576 | 29.4868 | 1303.3144 | 5 | 29682 | 29682 | nan | 10770 | SO63359 | NULL | 10-4030-024114 | NULL | NULL | 749532D4-BFF7-4FEC-9F77-396C4A96... |
19701 | 2013-12-30 23:00:00 | 2014-01-12 | 2014-01-07 | 2014-01-06 23:00:00 | 11417 | 63360 | nan | 7 | 1 | 8 | 1 | nan | 7 | 1 | 548.98 | 43.9184 | 13.7245 | 606.6229 | 5 | 21465 | 21465 | 6582 | nan | SO63360 | NULL | 10-4030-011417 | 242387Vi34223 | NULL | F92FA2A3-73E1-4DD0-987C-99D1C87E... |
19702 | 2013-12-30 23:00:00 | 2014-01-12 | 2014-01-07 | 2014-01-06 23:00:00 | 18125 | 63361 | nan | 8 | 1 | 8 | 1 | nan | 8 | 1 | 2384.07 | 190.7256 | 59.6018 | 2634.3974 | 5 | 26562 | 26562 | 11994 | nan | SO63361 | NULL | 10-4030-018125 | 1242859Vi61993 | NULL | 3A3758BF-CDFA-4740-9104-87BD5A08... |
19703 | 2013-12-30 23:00:00 | 2014-01-12 | 2014-01-07 | 2014-01-06 23:00:00 | 15692 | 63362 | nan | 10 | 1 | 8 | 1 | nan | 10 | 1 | 2419.06 | 193.5248 | 60.4765 | 2673.0613 | 5 | 27090 | 27090 | 16083 | 10770 | SO63362 | NULL | 10-4030-015692 | 242864Vi83167 | NULL | 3151D31F-4020-41B2-87F3-B6732869... |
19704 rows x 29 columns
memory usage: 5.39 MB
name: SalesOrderHeaderRefined
type: getml.DataFrame
split = getml.data.split.random(train=0.8, test=0.2)
split
0 | train |
---|---|
1 | train |
2 | train |
3 | test |
4 | train |
... |
infinite number of rows
type: StringColumnView
container = getml.data.Container(population=sales_order_header, split=split)
container.add(
product=product,
sales_order_detail=sales_order_detail,
sales_order_header=sales_order_header,
sales_order_reason=sales_order_reason,
special_offer=special_offer,
store=store,
)
container
subset | name | rows | type | |
---|---|---|---|---|
0 | test | SalesOrderHeaderRefined | 3879 | View |
1 | train | SalesOrderHeaderRefined | 15825 | View |
alias | name | rows | type | |
---|---|---|---|---|
0 | product | Product | 504 | DataFrame |
1 | sales_order_detail | SalesOrderDetail | 121317 | DataFrame |
2 | sales_order_header | SalesOrderHeaderRefined | 19704 | DataFrame |
3 | sales_order_reason | SalesOrderHeaderSalesReason | 27647 | DataFrame |
4 | special_offer | SpecialOffer | 16 | DataFrame |
5 | store | Store | 701 | DataFrame |
2. Predictive modeling¶
We loaded the data and defined the roles and units. Next, we create a getML pipeline for relational learning.
2.1 Define relational model¶
dm = getml.data.DataModel(sales_order_header.to_placeholder("population"))
dm.add(getml.data.to_placeholder(
product=product,
sales_order_detail=sales_order_detail,
sales_order_header=sales_order_header,
sales_order_reason=sales_order_reason,
special_offer=special_offer,
store=store,
))
dm.population.join(
dm.sales_order_header,
on="CustomerID",
time_stamps="OrderDate",
lagged_targets=True,
horizon=getml.data.time.days(1),
)
dm.population.join(
dm.sales_order_detail,
on="SalesOrderID",
)
dm.population.join(
dm.sales_order_reason,
on="SalesOrderID",
)
dm.population.join(
dm.store,
on="SalesPersonID",
)
dm.sales_order_detail.join(
dm.product,
on="ProductID",
relationship=getml.data.relationship.many_to_one,
)
dm.sales_order_detail.join(
dm.special_offer,
on="SpecialOfferID",
relationship=getml.data.relationship.many_to_one,
)
dm
data frames | staging table | |
---|---|---|
0 | population | POPULATION__STAGING_TABLE_1 |
1 | sales_order_detail, product, special_offer | SALES_ORDER_DETAIL__STAGING_TABLE_2 |
2 | sales_order_header | SALES_ORDER_HEADER__STAGING_TABLE_3 |
3 | sales_order_reason | SALES_ORDER_REASON__STAGING_TABLE_4 |
4 | store | STORE__STAGING_TABLE_5 |
2.2 getML pipeline¶
Set-up the feature learner & predictor
seasonal = getml.preprocessors.Seasonal()
mapping = getml.preprocessors.Mapping()
fast_prop = getml.feature_learning.FastProp(
loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
num_threads=1,
num_features=400,
)
relboost = getml.feature_learning.Relboost(
loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss,
num_threads=1,
)
predictor = getml.predictors.XGBoostClassifier(n_jobs=1)
Build the pipeline
pipe1 = getml.Pipeline(
tags=['fast_prop'],
data_model=dm,
preprocessors=[mapping],
feature_learners=[fast_prop],
predictors=[predictor],
include_categorical=True,
)
pipe1
Pipeline(data_model='population', feature_learners=['FastProp'], feature_selectors=[], include_categorical=True, loss_function='CrossEntropyLoss', peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Mapping'], share_selected_features=0.5, tags=['fast_prop'])
pipe2 = getml.Pipeline(
tags=['relboost'],
data_model=dm,
preprocessors=[seasonal, mapping],
feature_learners=[relboost],
predictors=[predictor],
include_categorical=True,
)
pipe2
Pipeline(data_model='population', feature_learners=['Relboost'], feature_selectors=[], include_categorical=True, loss_function='CrossEntropyLoss', peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Seasonal', 'Mapping'], share_selected_features=0.5, tags=['relboost'])
2.3 Model training¶
pipe1.check(container.train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 2 issues labeled INFO and 0 issues labeled WARNING.
type | label | message | |
---|---|---|---|
0 | INFO | FOREIGN KEYS NOT FOUND | When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. |
1 | INFO | FOREIGN KEYS NOT FOUND | When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. |
pipe1.fit(container.train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 2 issues labeled INFO and 0 issues labeled WARNING.
To see the issues in full, run .check() on the pipeline.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 FastProp: Trying 710 features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:02 FastProp: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:01 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:15
Trained pipeline.
Time taken: 0:00:18.910607.
Pipeline(data_model='population', feature_learners=['FastProp'], feature_selectors=[], include_categorical=True, loss_function='CrossEntropyLoss', peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Mapping'], share_selected_features=0.5, tags=['fast_prop', 'container-OeYnEZ'])
pipe2.check(container.train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 2 issues labeled INFO and 0 issues labeled WARNING.
type | label | message | |
---|---|---|---|
0 | INFO | FOREIGN KEYS NOT FOUND | When joining POPULATION__STAGING_TABLE_1 and SALES_ORDER_REASON__STAGING_TABLE_4 over 'SalesOrderID' and 'SalesOrderID', there are no corresponding entries for 33.769352% of entries in 'SalesOrderID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. |
1 | INFO | FOREIGN KEYS NOT FOUND | When joining POPULATION__STAGING_TABLE_1 and STORE__STAGING_TABLE_5 over 'SalesPersonID' and 'SalesPersonID', there are no corresponding entries for 84.941548% of entries in 'SalesPersonID' in 'POPULATION__STAGING_TABLE_1'. You might want to double-check your join keys. |
pipe2.fit(container.train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 2 issues labeled INFO and 0 issues labeled WARNING.
To see the issues in full, run .check() on the pipeline.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Relboost: Training features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:37 Relboost: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:05 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:03
Trained pipeline.
Time taken: 0:00:47.069322.
Pipeline(data_model='population', feature_learners=['Relboost'], feature_selectors=[], include_categorical=True, loss_function='CrossEntropyLoss', peripheral=['product', 'sales_order_detail', 'sales_order_header', 'sales_order_reason', 'special_offer', 'store'], predictors=['XGBoostClassifier'], preprocessors=['Seasonal', 'Mapping'], share_selected_features=0.5, tags=['relboost', 'container-OeYnEZ'])
2.4 Model evaluation¶
fastprop_score = pipe1.score(container.test)
fastprop_score
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 FastProp: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
date time | set used | target | accuracy | auc | cross entropy | |
---|---|---|---|---|---|---|
0 | 2024-09-12 13:01:58 | train | churn | 0.9187 | 0.975 | 0.2106 |
1 | 2024-09-12 13:02:48 | test | churn | 0.9142 | 0.9723 | 0.2199 |
relboost_score = pipe2.score(container.test)
relboost_score
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Relboost: Building features... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:01
date time | set used | target | accuracy | auc | cross entropy | |
---|---|---|---|---|---|---|
0 | 2024-09-12 13:02:47 | train | churn | 0.9323 | 0.983 | 0.169 |
1 | 2024-09-12 13:02:51 | test | churn | 0.9286 | 0.9788 | 0.1888 |
2.5 featuretools¶
population_train_pd = container.train.population.drop(container.train.population.roles.unused).to_pandas()
population_test_pd = container.test.population.drop(container.train.population.roles.unused).to_pandas()
product_pd = product.drop(product.roles.unused).to_pandas()
sales_order_detail_pd = sales_order_detail.drop(sales_order_detail.roles.unused).to_pandas()
sales_order_header_pd = sales_order_header.drop(sales_order_header.roles.unused).to_pandas()
sales_order_reason_pd = sales_order_reason.drop(sales_order_reason.roles.unused).to_pandas()
special_offer_pd = special_offer.drop(special_offer.roles.unused).to_pandas()
store_pd = store.drop(store.roles.unused).to_pandas()
featuretools does not support many-to-one joins. Therefore, we must manually merge sales_order_detail_pd
, product_pd
and special_offer_pd
.
sales_order_detail_pd = sales_order_detail_pd.merge(
product_pd,
on="ProductID",
how="left",
)
sales_order_detail_pd = sales_order_detail_pd.merge(
special_offer_pd,
on="SpecialOfferID",
how="left",
)
del sales_order_detail_pd["SalesOrderDetailID"]
del sales_order_detail_pd["ProductID"]
del sales_order_detail_pd["SpecialOfferID"]
sales_order_detail_pd
SalesOrderID | OrderQty | UnitPrice | UnitPriceDiscount | LineTotal | ModifiedDate | MakeFlag | ProductSubcategoryID | ProductModelID | SafetyStockLevel | ReorderPoint | StandardCost | ListPrice | Category | Description | Type | MinQty | DiscountPct | StartDate | EndDate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 43659 | 1.0 | 2024.994 | 0.0 | 2024.994 | 2011-05-30 22:00:00 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
1 | 43659 | 3.0 | 2024.994 | 0.0 | 6074.982 | 2011-05-30 22:00:00 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
2 | 43659 | 1.0 | 2024.994 | 0.0 | 2024.994 | 2011-05-30 22:00:00 | 1 | 1 | 19 | 100.0 | 75.0 | 1898.0944 | 3374.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
3 | 43659 | 1.0 | 2039.994 | 0.0 | 2039.994 | 2011-05-30 22:00:00 | 1 | 1 | 19 | 100.0 | 75.0 | 1912.1544 | 3399.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
4 | 43659 | 1.0 | 2039.994 | 0.0 | 2039.994 | 2011-05-30 22:00:00 | 1 | 1 | 19 | 100.0 | 75.0 | 1912.1544 | 3399.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
121312 | 75122 | 1.0 | 21.980 | 0.0 | 21.980 | 2014-06-29 22:00:00 | 0 | 30 | 121 | 4.0 | 3.0 | 8.2205 | 21.98 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
121313 | 75122 | 1.0 | 8.990 | 0.0 | 8.990 | 2014-06-29 22:00:00 | 0 | 19 | 2 | 4.0 | 3.0 | 6.9223 | 8.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
121314 | 75123 | 1.0 | 21.980 | 0.0 | 21.980 | 2014-06-29 22:00:00 | 0 | 30 | 121 | 4.0 | 3.0 | 8.2205 | 21.98 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
121315 | 75123 | 1.0 | 159.000 | 0.0 | 159.000 | 2014-06-29 22:00:00 | 0 | 27 | 122 | 4.0 | 3.0 | 59.4660 | 159.00 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
121316 | 75123 | 1.0 | 8.990 | 0.0 | 8.990 | 2014-06-29 22:00:00 | 0 | 19 | 2 | 4.0 | 3.0 | 6.9223 | 8.99 | No Discount | No Discount | No Discount | 0.0 | 0.0 | 2011-05-01 | 2014-11-30 |
121317 rows × 20 columns
def prepare_sales_order_header(peripheral_pd, train_or_test):
"""
Helper function that imitates the behavior of
the data model defined above.
"""
peripheral_new = peripheral_pd.merge(
train_or_test[["CustomerID", "OrderDate", "SalesOrderID"]],
on="CustomerID"
)
peripheral_new = peripheral_new[
peripheral_new["OrderDate_x"] < peripheral_new["OrderDate_y"]
]
del peripheral_new["SalesOrderID_x"]
del peripheral_new["OrderDate_y"]
del peripheral_new["CustomerID"]
del peripheral_new["SalesPersonIDCat"]
del peripheral_new["TerritoryIDCat"]
return peripheral_new.rename(columns={"OrderDate_x": "OrderDate", "SalesOrderID_y": "SalesOrderID"})
def prepare_store(peripheral_pd, train_or_test):
"""
Helper function that imitates the behavior of
the data model defined above.
"""
peripheral_new = peripheral_pd.merge(
train_or_test[["SalesPersonID", "SalesOrderID"]],
on="SalesPersonID"
)
return peripheral_new
store_train_pd = prepare_store(store_pd, population_train_pd)
store_test_pd = prepare_store(store_pd, population_test_pd)
store_train_pd
SalesPersonID | test | SalesOrderID | |
---|---|---|---|
0 | 279 | 2014-09-12 09:15:07 | 43659 |
1 | 279 | 2014-09-12 09:15:07 | 43660 |
2 | 279 | 2014-09-12 09:15:07 | 43681 |
3 | 279 | 2014-09-12 09:15:07 | 43685 |
4 | 279 | 2014-09-12 09:15:07 | 43695 |
... | ... | ... | ... |
142427 | 275 | 2014-09-12 09:15:07 | 63252 |
142428 | 275 | 2014-09-12 09:15:07 | 63258 |
142429 | 275 | 2014-09-12 09:15:07 | 63262 |
142430 | 275 | 2014-09-12 09:15:07 | 63280 |
142431 | 275 | 2014-09-12 09:15:07 | 63285 |
142432 rows × 3 columns
sales_order_header_train_pd = prepare_sales_order_header(sales_order_header_pd, population_train_pd)
sales_order_header_test_pd = prepare_sales_order_header(sales_order_header_pd, population_test_pd)
sales_order_header_train_pd
RevisionNumber | OnlineOrderFlag | ShipMethodID | SalesPersonID | TerritoryID | SubTotal | TaxAmt | Freight | TotalDue | churn | OrderDate | DueDate | ShipDate | ModifiedDate | SalesOrderID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 8 | 0 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0.0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 44305 |
2 | 8 | 0 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0.0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 45061 |
3 | 8 | 0 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0.0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 45779 |
4 | 8 | 0 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0.0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 46604 |
5 | 8 | 0 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | 0.0 | 2011-05-30 22:00:00 | 2011-06-12 | 2011-06-07 | 2011-06-06 22:00:00 | 47693 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
38640 | 8 | 1 | 1 | None | 6 | 21.4900 | 1.7192 | 0.5373 | 23.7465 | 0.0 | 2013-12-21 23:00:00 | 2014-01-03 | 2013-12-29 | 2013-12-28 23:00:00 | 62961 |
38914 | 8 | 1 | 1 | None | 6 | 49.4800 | 3.9584 | 1.2370 | 54.6754 | 1.0 | 2013-12-24 23:00:00 | 2014-01-06 | 2014-01-01 | 2013-12-31 23:00:00 | 62875 |
39079 | 8 | 1 | 1 | None | 6 | 34.9900 | 2.7992 | 0.8748 | 38.6640 | 1.0 | 2013-12-26 23:00:00 | 2014-01-08 | 2014-01-03 | 2014-01-02 23:00:00 | 63307 |
39116 | 8 | 1 | 1 | None | 6 | 108.4600 | 8.6768 | 2.7115 | 119.8483 | 1.0 | 2013-12-26 23:00:00 | 2014-01-08 | 2014-01-03 | 2014-01-02 23:00:00 | 63307 |
39201 | 8 | 1 | 1 | None | 6 | 53.9900 | 4.3192 | 1.3498 | 59.6590 | 0.0 | 2013-12-27 23:00:00 | 2014-01-09 | 2014-01-04 | 2014-01-03 23:00:00 | 63003 |
12271 rows × 15 columns
del population_train_pd["CustomerID"]
del population_train_pd["SalesPersonIDCat"]
del population_test_pd["CustomerID"]
del population_test_pd["SalesPersonIDCat"]
featuretools's dataframes are similar for getML's container class. featuretools uses woodwork for typing information, which will be guessed automatically, but can also be provided manually.
def add_index(df):
df.insert(0, "index", range(len(df)))
population_pd_logical_types = {
'RevisionNumber': ww.logical_types.Categorical,
'OnlineOrderFlag': ww.logical_types.Categorical,
'ShipMethodID': ww.logical_types.Integer,
'TerritoryIDCat': ww.logical_types.Integer,
'SalesOrderID': ww.logical_types.Integer,
'SalesPersonID': ww.logical_types.IntegerNullable,
'TerritoryID': ww.logical_types.IntegerNullable,
'SubTotal': ww.logical_types.Double,
'TaxAmt': ww.logical_types.Double,
'Freight': ww.logical_types.Double,
'TotalDue': ww.logical_types.Double,
'churn': ww.logical_types.Categorical,
'OrderDate': ww.logical_types.Datetime,
'DueDate': ww.logical_types.Datetime,
'ShipDate': ww.logical_types.Datetime,
'ModifiedDate': ww.logical_types.Datetime
}
population_train_pd.ww.init(logical_types=population_pd_logical_types, index="SalesOrderID", name="population")
population_test_pd.ww.init(logical_types=population_pd_logical_types, index="SalesOrderID", name="population")
sales_order_detail_pd_logical_types = {
'index': ww.logical_types.Integer,
'SalesOrderID': ww.logical_types.Integer,
'OrderQty': ww.logical_types.Integer,
'UnitPrice': ww.logical_types.Double,
'UnitPriceDiscount': ww.logical_types.Double,
'LineTotal': ww.logical_types.Double,
'ModifiedDate': ww.logical_types.Datetime,
# merged with product_pd
'MakeFlag': ww.logical_types.Categorical,
'ProductSubcategoryID': ww.logical_types.Categorical,
'ProductModelID': ww.logical_types.Integer,
'ReorderPoint': ww.logical_types.Integer,
'StandardCost': ww.logical_types.Double,
'ListPrice': ww.logical_types.Double,
# merged with special_offer_pd
'Category': ww.logical_types.Categorical,
'Description': ww.logical_types.Categorical,
'Type': ww.logical_types.Categorical,
'MinQty': ww.logical_types.Integer,
'DiscountPct': ww.logical_types.Double,
'StartDate': ww.logical_types.Datetime,
'EndDate': ww.logical_types.Datetime
}
add_index(sales_order_detail_pd)
sales_order_detail_pd.ww.init(logical_types=sales_order_detail_pd_logical_types, index="index", name="sales_order_detail")
sales_order_header_pd_logical_types = {
'index': ww.logical_types.Integer,
'RevisionNumber': ww.logical_types.Categorical,
'OnlineOrderFlag': ww.logical_types.Categorical,
'ShipMethodID': ww.logical_types.Integer,
'SalesOrderID': ww.logical_types.Integer,
'SalesPersonID': ww.logical_types.IntegerNullable,
'TerritoryID': ww.logical_types.IntegerNullable,
'SubTotal': ww.logical_types.Double,
'TaxAmt': ww.logical_types.Double,
'Freight': ww.logical_types.Double,
'TotalDue': ww.logical_types.Double,
'churn': ww.logical_types.Categorical,
'OrderDate': ww.logical_types.Datetime,
'DueDate': ww.logical_types.Datetime,
'ShipDate': ww.logical_types.Datetime,
'ModifiedDate': ww.logical_types.Datetime
}
add_index(sales_order_header_train_pd)
sales_order_header_train_pd.ww.init(logical_types=sales_order_header_pd_logical_types, index="index", name="sales_order_header")
add_index(sales_order_header_test_pd)
sales_order_header_test_pd.ww.init(logical_types=sales_order_header_pd_logical_types, index="index", name="sales_order_header")
sales_order_reason_pd_logical_types = {
'index': ww.logical_types.Integer,
'SalesReasonID': ww.logical_types.Categorical,
'SalesOrderID': ww.logical_types.Integer
}
add_index(sales_order_reason_pd)
sales_order_reason_pd.ww.init(logical_types=sales_order_reason_pd_logical_types, index="index", name="sales_order_reason")
store_pd_logical_types = {
'index': ww.logical_types.Integer,
'SalesPersonID': ww.logical_types.IntegerNullable,
'test': ww.logical_types.Datetime,
'SalesOrderID': ww.logical_types.Integer
}
add_index(store_train_pd)
store_train_pd.ww.init(logical_types=store_pd_logical_types, index="index", name="store")
add_index(store_test_pd)
store_test_pd.ww.init(logical_types=store_pd_logical_types, index="index", name="store")
dataframes_train = {
"population" : (population_train_pd, ),
"sales_order_header": (sales_order_header_train_pd, ),
"sales_order_detail": (sales_order_detail_pd, ),
"sales_order_reason": (sales_order_reason_pd, ),
"store": (store_train_pd, ),
}
dataframes_test = {
"population" : (population_test_pd, ),
"sales_order_header": (sales_order_header_test_pd, ),
"sales_order_detail": (sales_order_detail_pd, ),
"sales_order_reason": (sales_order_reason_pd, ),
"store": (store_test_pd, ),
}
featuretools's relationships are similar for getML's data model.
relationships = [
("population", "SalesOrderID", "sales_order_header", "SalesOrderID"),
("population", "SalesOrderID", "sales_order_detail", "SalesOrderID"),
("population", "SalesOrderID", "sales_order_reason", "SalesOrderID"),
("population", "SalesOrderID", "store", "SalesOrderID"),
]
featuretools_train_pd = featuretools.dfs(
dataframes=dataframes_train,
relationships=relationships,
target_dataframe_name="population")[0]
featuretools_test_pd = featuretools.dfs(
dataframes=dataframes_test,
relationships=relationships,
target_dataframe_name="population")[0]
featuretools_train_pd
RevisionNumber | OnlineOrderFlag | TerritoryIDCat | ShipMethodID | SalesPersonID | TerritoryID | SubTotal | TaxAmt | Freight | TotalDue | ... | NUM_UNIQUE(sales_order_detail.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.YEAR(StartDate)) | MODE(store.DAY(test)) | MODE(store.MONTH(test)) | MODE(store.WEEKDAY(test)) | MODE(store.YEAR(test)) | NUM_UNIQUE(store.DAY(test)) | NUM_UNIQUE(store.MONTH(test)) | NUM_UNIQUE(store.WEEKDAY(test)) | NUM_UNIQUE(store.YEAR(test)) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SalesOrderID | |||||||||||||||||||||
43659 | 8 | 0 | 5 | 5 | 279 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | ... | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
43660 | 8 | 0 | 5 | 5 | 279 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | ... | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
43661 | 8 | 0 | 6 | 5 | 282 | 6 | 32726.4786 | 3153.7696 | 985.5530 | 36865.8012 | ... | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
43663 | 8 | 0 | 4 | 5 | 276 | 4 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | ... | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
43664 | 8 | 0 | 1 | 5 | 280 | 1 | 24432.6088 | 2344.9921 | 732.8100 | 27510.4109 | ... | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
63358 | 8 | 1 | 7 | 1 | <NA> | 7 | 1173.9600 | 93.9168 | 29.3490 | 1297.2258 | ... | 1 | 1 | NaN | NaN | NaN | NaN | <NA> | <NA> | <NA> | <NA> |
63359 | 8 | 1 | 10 | 1 | <NA> | 10 | 1179.4700 | 94.3576 | 29.4868 | 1303.3144 | ... | 1 | 1 | NaN | NaN | NaN | NaN | <NA> | <NA> | <NA> | <NA> |
63360 | 8 | 1 | 7 | 1 | <NA> | 7 | 548.9800 | 43.9184 | 13.7245 | 606.6229 | ... | 1 | 1 | NaN | NaN | NaN | NaN | <NA> | <NA> | <NA> | <NA> |
63361 | 8 | 1 | 8 | 1 | <NA> | 8 | 2384.0700 | 190.7256 | 59.6018 | 2634.3974 | ... | 1 | 1 | NaN | NaN | NaN | NaN | <NA> | <NA> | <NA> | <NA> |
63362 | 8 | 1 | 10 | 1 | <NA> | 10 | 2419.0600 | 193.5248 | 60.4765 | 2673.0613 | ... | 1 | 1 | NaN | NaN | NaN | NaN | <NA> | <NA> | <NA> | <NA> |
15825 rows × 227 columns
featuretools_train = getml.data.DataFrame.from_pandas(featuretools_train_pd, "featuretools_train")
featuretools_test = getml.data.DataFrame.from_pandas(featuretools_test_pd, "featuretools_test")
featuretools_train.set_role("churn", getml.data.roles.target)
featuretools_train.set_role(featuretools_train.roles.unused_float, getml.data.roles.numerical)
featuretools_train.set_role(featuretools_train.roles.unused_string, getml.data.roles.categorical)
featuretools_train
name | churn | RevisionNumber | OnlineOrderFlag | SalesPersonID | TerritoryID | COUNT(sales_order_header) | MODE(sales_order_header.OnlineOrderFlag) | MODE(sales_order_header.RevisionNumber) | MODE(sales_order_header.churn) | NUM_UNIQUE(sales_order_header.OnlineOrderFlag) | NUM_UNIQUE(sales_order_header.RevisionNumber) | NUM_UNIQUE(sales_order_header.churn) | COUNT(sales_order_detail) | MODE(sales_order_detail.Category) | MODE(sales_order_detail.Description) | MODE(sales_order_detail.MakeFlag) | MODE(sales_order_detail.ProductSubcategoryID) | MODE(sales_order_detail.Type) | NUM_UNIQUE(sales_order_detail.Category) | NUM_UNIQUE(sales_order_detail.Description) | NUM_UNIQUE(sales_order_detail.MakeFlag) | NUM_UNIQUE(sales_order_detail.ProductSubcategoryID) | NUM_UNIQUE(sales_order_detail.Type) | COUNT(sales_order_reason) | MODE(sales_order_reason.SalesReasonID) | NUM_UNIQUE(sales_order_reason.SalesReasonID) | COUNT(store) | DAY(DueDate) | DAY(ModifiedDate) | DAY(OrderDate) | DAY(ShipDate) | MONTH(DueDate) | MONTH(ModifiedDate) | MONTH(OrderDate) | MONTH(ShipDate) | WEEKDAY(DueDate) | WEEKDAY(ModifiedDate) | WEEKDAY(OrderDate) | WEEKDAY(ShipDate) | YEAR(DueDate) | YEAR(ModifiedDate) | YEAR(OrderDate) | YEAR(ShipDate) | MODE(sales_order_header.DAY(DueDate)) | MODE(sales_order_header.DAY(ModifiedDate)) | MODE(sales_order_header.DAY(OrderDate)) | MODE(sales_order_header.DAY(ShipDate)) | MODE(sales_order_header.MONTH(DueDate)) | MODE(sales_order_header.MONTH(ModifiedDate)) | MODE(sales_order_header.MONTH(OrderDate)) | MODE(sales_order_header.MONTH(ShipDate)) | MODE(sales_order_header.WEEKDAY(DueDate)) | MODE(sales_order_header.WEEKDAY(ModifiedDate)) | MODE(sales_order_header.WEEKDAY(OrderDate)) | MODE(sales_order_header.WEEKDAY(ShipDate)) | MODE(sales_order_header.YEAR(DueDate)) | MODE(sales_order_header.YEAR(ModifiedDate)) | MODE(sales_order_header.YEAR(OrderDate)) | MODE(sales_order_header.YEAR(ShipDate)) | NUM_UNIQUE(sales_order_header.DAY(DueDate)) | NUM_UNIQUE(sales_order_header.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.DAY(OrderDate)) | NUM_UNIQUE(sales_order_header.DAY(ShipDate)) | NUM_UNIQUE(sales_order_header.MONTH(DueDate)) | NUM_UNIQUE(sales_order_header.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_header.MONTH(OrderDate)) | NUM_UNIQUE(sales_order_header.MONTH(ShipDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(DueDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(OrderDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ShipDate)) | NUM_UNIQUE(sales_order_header.YEAR(DueDate)) | NUM_UNIQUE(sales_order_header.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_header.YEAR(OrderDate)) | NUM_UNIQUE(sales_order_header.YEAR(ShipDate)) | MODE(sales_order_detail.DAY(EndDate)) | MODE(sales_order_detail.DAY(ModifiedDate)) | MODE(sales_order_detail.DAY(StartDate)) | MODE(sales_order_detail.MONTH(EndDate)) | MODE(sales_order_detail.MONTH(ModifiedDate)) | MODE(sales_order_detail.MONTH(StartDate)) | MODE(sales_order_detail.WEEKDAY(EndDate)) | MODE(sales_order_detail.WEEKDAY(ModifiedDate)) | MODE(sales_order_detail.WEEKDAY(StartDate)) | MODE(sales_order_detail.YEAR(EndDate)) | MODE(sales_order_detail.YEAR(ModifiedDate)) | MODE(sales_order_detail.YEAR(StartDate)) | NUM_UNIQUE(sales_order_detail.DAY(EndDate)) | NUM_UNIQUE(sales_order_detail.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.DAY(StartDate)) | NUM_UNIQUE(sales_order_detail.MONTH(EndDate)) | NUM_UNIQUE(sales_order_detail.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.MONTH(StartDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(EndDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(StartDate)) | NUM_UNIQUE(sales_order_detail.YEAR(EndDate)) | NUM_UNIQUE(sales_order_detail.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.YEAR(StartDate)) | MODE(store.DAY(test)) | MODE(store.MONTH(test)) | MODE(store.WEEKDAY(test)) | MODE(store.YEAR(test)) | NUM_UNIQUE(store.DAY(test)) | NUM_UNIQUE(store.MONTH(test)) | NUM_UNIQUE(store.WEEKDAY(test)) | NUM_UNIQUE(store.YEAR(test)) | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | MAX(sales_order_header.Freight) | MAX(sales_order_header.SalesPersonID) | MAX(sales_order_header.ShipMethodID) | MAX(sales_order_header.SubTotal) | MAX(sales_order_header.TaxAmt) | MAX(sales_order_header.TerritoryID) | MAX(sales_order_header.TotalDue) | MEAN(sales_order_header.Freight) | MEAN(sales_order_header.SalesPersonID) | MEAN(sales_order_header.ShipMethodID) | MEAN(sales_order_header.SubTotal) | MEAN(sales_order_header.TaxAmt) | MEAN(sales_order_header.TerritoryID) | MEAN(sales_order_header.TotalDue) | MIN(sales_order_header.Freight) | MIN(sales_order_header.SalesPersonID) | MIN(sales_order_header.ShipMethodID) | MIN(sales_order_header.SubTotal) | MIN(sales_order_header.TaxAmt) | MIN(sales_order_header.TerritoryID) | MIN(sales_order_header.TotalDue) | SKEW(sales_order_header.Freight) | SKEW(sales_order_header.SalesPersonID) | SKEW(sales_order_header.ShipMethodID) | SKEW(sales_order_header.SubTotal) | SKEW(sales_order_header.TaxAmt) | SKEW(sales_order_header.TerritoryID) | SKEW(sales_order_header.TotalDue) | STD(sales_order_header.Freight) | STD(sales_order_header.SalesPersonID) | STD(sales_order_header.ShipMethodID) | STD(sales_order_header.SubTotal) | STD(sales_order_header.TaxAmt) | STD(sales_order_header.TerritoryID) | STD(sales_order_header.TotalDue) | SUM(sales_order_header.Freight) | SUM(sales_order_header.SalesPersonID) | SUM(sales_order_header.ShipMethodID) | SUM(sales_order_header.SubTotal) | SUM(sales_order_header.TaxAmt) | SUM(sales_order_header.TerritoryID) | SUM(sales_order_header.TotalDue) | MAX(sales_order_detail.DiscountPct) | MAX(sales_order_detail.LineTotal) | MAX(sales_order_detail.ListPrice) | MAX(sales_order_detail.MinQty) | MAX(sales_order_detail.OrderQty) | MAX(sales_order_detail.ProductModelID) | MAX(sales_order_detail.ReorderPoint) | MAX(sales_order_detail.SafetyStockLevel) | MAX(sales_order_detail.StandardCost) | MAX(sales_order_detail.UnitPrice) | MAX(sales_order_detail.UnitPriceDiscount) | MEAN(sales_order_detail.DiscountPct) | MEAN(sales_order_detail.LineTotal) | MEAN(sales_order_detail.ListPrice) | MEAN(sales_order_detail.MinQty) | MEAN(sales_order_detail.OrderQty) | MEAN(sales_order_detail.ProductModelID) | MEAN(sales_order_detail.ReorderPoint) | MEAN(sales_order_detail.SafetyStockLevel) | MEAN(sales_order_detail.StandardCost) | MEAN(sales_order_detail.UnitPrice) | MEAN(sales_order_detail.UnitPriceDiscount) | MIN(sales_order_detail.DiscountPct) | MIN(sales_order_detail.LineTotal) | MIN(sales_order_detail.ListPrice) | MIN(sales_order_detail.MinQty) | MIN(sales_order_detail.OrderQty) | MIN(sales_order_detail.ProductModelID) | MIN(sales_order_detail.ReorderPoint) | MIN(sales_order_detail.SafetyStockLevel) | MIN(sales_order_detail.StandardCost) | MIN(sales_order_detail.UnitPrice) | MIN(sales_order_detail.UnitPriceDiscount) | SKEW(sales_order_detail.DiscountPct) | SKEW(sales_order_detail.LineTotal) | SKEW(sales_order_detail.ListPrice) | SKEW(sales_order_detail.MinQty) | SKEW(sales_order_detail.OrderQty) | SKEW(sales_order_detail.ProductModelID) | SKEW(sales_order_detail.ReorderPoint) | SKEW(sales_order_detail.SafetyStockLevel) | SKEW(sales_order_detail.StandardCost) | SKEW(sales_order_detail.UnitPrice) | SKEW(sales_order_detail.UnitPriceDiscount) | STD(sales_order_detail.DiscountPct) | STD(sales_order_detail.LineTotal) | STD(sales_order_detail.ListPrice) | STD(sales_order_detail.MinQty) | STD(sales_order_detail.OrderQty) | STD(sales_order_detail.ProductModelID) | STD(sales_order_detail.ReorderPoint) | STD(sales_order_detail.SafetyStockLevel) | STD(sales_order_detail.StandardCost) | STD(sales_order_detail.UnitPrice) | STD(sales_order_detail.UnitPriceDiscount) | SUM(sales_order_detail.DiscountPct) | SUM(sales_order_detail.LineTotal) | SUM(sales_order_detail.ListPrice) | SUM(sales_order_detail.MinQty) | SUM(sales_order_detail.OrderQty) | SUM(sales_order_detail.ProductModelID) | SUM(sales_order_detail.ReorderPoint) | SUM(sales_order_detail.SafetyStockLevel) | SUM(sales_order_detail.StandardCost) | SUM(sales_order_detail.UnitPrice) | SUM(sales_order_detail.UnitPriceDiscount) | MAX(store.SalesPersonID) | MEAN(store.SalesPersonID) | MIN(store.SalesPersonID) | SKEW(store.SalesPersonID) | STD(store.SalesPersonID) | SUM(store.SalesPersonID) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | target | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical |
0 | 0 | 8 | 0 | 279 | 5 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 12 | No Discount | No Discount | 1 | 1 | No Discount | 1 | 1 | 2 | 5 | 1 | 0 | NULL | NULL | 80 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 5 | 5 | 20565.6206 | 1971.5149 | 616.0984 | 23153.2339 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6074.982 | 3399.99 | 0 | 6 | 33 | 75 | 100 | 1912.1544 | 2039.994 | 0 | 0 | 1713.8017 | 1989.8658 | 0 | 2.1667 | 17.3333 | 45 | 60 | 1120.2742 | 1193.6427 | 0 | 0 | 10.373 | 8.99 | 0 | 1 | 2 | 3 | 4 | 3.3963 | 5.1865 | 0 | 0 | 1.1963 | -0.3885 | 0 | 1.4799 | -0.06687 | -0.3884 | -0.3884 | -0.3887 | -0.3885 | 0 | 0 | 1883.1394 | 1729.5034 | 0 | 1.5859 | 7.2655 | 37.0749 | 49.4332 | 971.2526 | 1038.0419 | 0 | 0 | 20565.6206 | 23878.39 | 0 | 26 | 208 | 540 | 720 | 13443.2903 | 14323.7118 | 0 | 279 | 279 | 279 | 0 | 0 | 22320 |
1 | 0 | 8 | 0 | 279 | 5 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 2 | No Discount | No Discount | 1 | 2 | No Discount | 1 | 1 | 1 | 1 | 1 | 0 | NULL | NULL | 80 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 5 | 5 | 1294.2529 | 124.2483 | 38.8276 | 1457.3288 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 874.794 | 1457.99 | 0 | 1 | 30 | 75 | 100 | 884.7083 | 874.794 | 0 | 0 | 647.1264 | 1120.49 | 0 | 1 | 29 | 75 | 100 | 685.7074 | 647.1264 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 28 | 75 | 100 | 486.7066 | 419.4589 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 321.9705 | 477.2971 | 0 | 0 | 1.4142 | 0 | 0 | 281.4297 | 321.9705 | 0 | 0 | 1294.2529 | 2240.98 | 0 | 2 | 58 | 150 | 200 | 1371.4149 | 1294.2529 | 0 | 279 | 279 | 279 | 0 | 0 | 22320 |
2 | 0 | 8 | 0 | 282 | 6 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 15 | No Discount | No Discount | 1 | 1 | No Discount | 1 | 1 | 2 | 5 | 1 | 0 | NULL | NULL | 74 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 6 | 5 | 32726.4786 | 3153.7696 | 985.553 | 36865.8012 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8099.976 | 3399.99 | 0 | 5 | 33 | 375 | 500 | 1912.1544 | 2039.994 | 0 | 0 | 2181.7652 | 1590.4467 | 0 | 2.5333 | 14 | 151 | 201.3333 | 883.0532 | 934.9116 | 0 | 0 | 20.746 | 8.99 | 0 | 1 | 2 | 3 | 4 | 6.9223 | 5.1865 | 0 | 0 | 1.2022 | 0.2799 | 0 | 0.7955 | 0.7239 | 0.6718 | 0.6718 | 0.3285 | 0.348 | 0 | 0 | 2515.2803 | 1423.6488 | 0 | 1.1872 | 10.0499 | 166.7505 | 222.3339 | 802.3616 | 858.7914 | 0 | 0 | 32726.4786 | 23856.7 | 0 | 38 | 210 | 2265 | 3020 | 13245.7975 | 14023.6738 | 0 | 282 | 282 | 282 | 0 | 0 | 20868 |
3 | 0 | 8 | 0 | 276 | 4 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 1 | No Discount | No Discount | 1 | 2 | No Discount | 1 | 1 | 1 | 1 | 1 | 0 | NULL | NULL | 39 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 4 | 5 | 419.4589 | 40.2681 | 12.5838 | 472.3108 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 30 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 30 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 30 | 75 | 100 | 486.7066 | 419.4589 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 419.4589 | 782.99 | 0 | 1 | 30 | 75 | 100 | 486.7066 | 419.4589 | 0 | 276 | 276 | 276 | 0 | 0 | 10764 |
4 | 0 | 8 | 0 | 280 | 1 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 8 | No Discount | No Discount | 1 | 1 | No Discount | 1 | 1 | 2 | 2 | 1 | 0 | NULL | NULL | 38 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 1 | 5 | 24432.6088 | 2344.9921 | 732.81 | 27510.4109 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8099.976 | 3399.99 | 0 | 4 | 19 | 75 | 100 | 1912.1544 | 2039.994 | 0 | 0 | 3054.0761 | 2553.115 | 0 | 1.75 | 17 | 57 | 76 | 1438.4664 | 1531.5806 | 0 | 0 | 28.8404 | 49.99 | 0 | 1 | 11 | 3 | 4 | 38.4923 | 28.8404 | 0 | 0 | 0.8117 | -1.4399 | 0 | 1.3554 | -1.4402 | -1.4402 | -1.4402 | -1.4399 | -1.4399 | 0 | 0 | 2860.3836 | 1545.0056 | 0 | 1.165 | 3.7033 | 33.3295 | 44.4394 | 864.1073 | 927.5374 | 0 | 0 | 24432.6088 | 20424.92 | 0 | 14 | 136 | 456 | 608 | 11507.731 | 12252.6448 | 0 | 280 | 280 | 280 | 0 | 0 | 10640 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
15820 | 1 | 8 | 1 | NULL | 7 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 4 | No Discount | No Discount | 0 | 37 | No Discount | 1 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 1 | 1173.96 | 93.9168 | 29.349 | 1297.2258 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1120.49 | 1120.49 | 0 | 1 | 93 | 375 | 500 | 713.0798 | 1120.49 | 0 | 0 | 293.49 | 293.49 | 0 | 1 | 53.75 | 207 | 276 | 183.2694 | 293.49 | 0 | 0 | 3.99 | 3.99 | 0 | 1 | 4 | 3 | 4 | 1.4923 | 3.99 | 0 | 0 | 1.9981 | 1.9981 | 0 | 0 | -0.2621 | -0.1153 | -0.1153 | 1.9994 | 1.9981 | 0 | 0 | 551.4201 | 551.4201 | 0 | 0 | 44.2371 | 196.204 | 261.6053 | 353.2259 | 551.4201 | 0 | 0 | 1173.96 | 1173.96 | 0 | 4 | 215 | 828 | 1104 | 733.0777 | 1173.96 | 0 | nan | nan | nan | nan | nan | 0 |
15821 | 1 | 8 | 1 | NULL | 10 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 3 | No Discount | No Discount | 0 | 19 | No Discount | 1 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 10 | 1 | 1179.47 | 94.3576 | 29.4868 | 1303.3144 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1120.49 | 1120.49 | 0 | 1 | 29 | 75 | 100 | 713.0798 | 1120.49 | 0 | 0 | 393.1567 | 393.1567 | 0 | 1 | 14 | 27 | 36 | 252.8315 | 393.1567 | 0 | 0 | 8.99 | 8.99 | 0 | 1 | 2 | 3 | 4 | 6.9223 | 8.99 | 0 | 0 | 1.7238 | 1.7238 | 0 | 0 | 0.9352 | 1.7321 | 1.7321 | 1.7199 | 1.7238 | 0 | 0 | 630.2226 | 630.2226 | 0 | 0 | 13.7477 | 41.5692 | 55.4256 | 398.8992 | 630.2226 | 0 | 0 | 1179.47 | 1179.47 | 0 | 3 | 42 | 81 | 108 | 758.4944 | 1179.47 | 0 | nan | nan | nan | nan | nan | 0 |
15822 | 1 | 8 | 1 | NULL | 7 | 3 | 1 | 8 | 0 | 1 | 1 | 1 | 2 | No Discount | No Discount | 0 | 2 | No Discount | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 14 | 8 | 1 | 9 | 4 | 4 | 4 | 4 | 3 | 4 | 4 | 5 | 2013 | 2013 | 2013 | 2013 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 7 | 1 | 548.98 | 43.9184 | 13.7245 | 606.6229 | 61.5493 | nan | 1 | 2461.97 | 196.9576 | 7 | 2720.4769 | 42.5419 | nan | 1 | 1701.6761 | 136.1341 | 7 | 1880.3521 | 14.849 | nan | 1 | 593.96 | 47.5168 | 7 | 656.3258 | -1.3935 | nan | 0 | -1.3935 | -1.3935 | 0 | -1.3935 | 24.5318 | nan | 0 | 981.2706 | 78.5017 | 0 | 1084.304 | 127.6258 | 0 | 3 | 5105.0282 | 408.4023 | 21 | 5641.0563 | 0 | 539.99 | 539.99 | 0 | 1 | 113 | 75 | 100 | 343.6496 | 539.99 | 0 | 0 | 274.49 | 274.49 | 0 | 1 | 72 | 39 | 52 | 173.506 | 274.49 | 0 | 0 | 8.99 | 8.99 | 0 | 1 | 31 | 3 | 4 | 3.3623 | 8.99 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 375.4737 | 375.4737 | 0 | 0 | 57.9828 | 50.9117 | 67.8823 | 240.6195 | 375.4737 | 0 | 0 | 548.98 | 548.98 | 0 | 2 | 144 | 78 | 104 | 347.0119 | 548.98 | 0 | nan | nan | nan | nan | nan | 0 |
15823 | 1 | 8 | 1 | NULL | 8 | 1 | 1 | 8 | 0 | 1 | 1 | 1 | 1 | No Discount | No Discount | 1 | 3 | No Discount | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 1 | 2 | 2 | 3 | 2012 | 2012 | 2012 | 2012 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 8 | 1 | 2384.07 | 190.7256 | 59.6018 | 2634.3974 | 25.0109 | nan | 1 | 1000.4375 | 80.035 | 8 | 1105.4834 | 25.0109 | nan | 1 | 1000.4375 | 80.035 | 8 | 1105.4834 | 25.0109 | nan | 1 | 1000.4375 | 80.035 | 8 | 1105.4834 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 25.0109 | 0 | 1 | 1000.4375 | 80.035 | 8 | 1105.4834 | 0 | 2384.07 | 2384.07 | 0 | 1 | 34 | 75 | 100 | 1481.9379 | 2384.07 | 0 | 0 | 2384.07 | 2384.07 | 0 | 1 | 34 | 75 | 100 | 1481.9379 | 2384.07 | 0 | 0 | 2384.07 | 2384.07 | 0 | 1 | 34 | 75 | 100 | 1481.9379 | 2384.07 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 2384.07 | 2384.07 | 0 | 1 | 34 | 75 | 100 | 1481.9379 | 2384.07 | 0 | nan | nan | nan | nan | nan | 0 |
15824 | 1 | 8 | 1 | NULL | 10 | 2 | 1 | 8 | 0 | 1 | 1 | 2 | 2 | No Discount | No Discount | 0 | 3 | No Discount | 1 | 1 | 2 | 2 | 1 | 0 | NULL | NULL | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 13 | 7 | 18 | 8 | 5 | 5 | 4 | 5 | 1 | 0 | 0 | 1 | 2012 | 2012 | 2012 | 2012 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 10 | 1 | 2419.06 | 193.5248 | 60.4765 | 2673.0613 | 89.4568 | nan | 1 | 3578.27 | 286.2616 | 10 | 3953.9884 | 74.6655 | nan | 1 | 2986.62 | 238.9296 | 10 | 3300.2151 | 59.8743 | nan | 1 | 2394.97 | 191.5976 | 10 | 2646.4419 | nan | nan | nan | nan | nan | nan | nan | 20.918 | nan | 0 | 836.7195 | 66.9376 | 0 | 924.575 | 149.3311 | 0 | 2 | 5973.24 | 477.8592 | 20 | 6600.4303 | 0 | 2384.07 | 2384.07 | 0 | 1 | 34 | 75 | 100 | 1481.9379 | 2384.07 | 0 | 0 | 1209.53 | 1209.53 | 0 | 1 | 33.5 | 39 | 52 | 747.5121 | 1209.53 | 0 | 0 | 34.99 | 34.99 | 0 | 1 | 33 | 3 | 4 | 13.0863 | 34.99 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 1661.0504 | 1661.0504 | 0 | 0 | 0.7071 | 50.9117 | 67.8823 | 1038.6349 | 1661.0504 | 0 | 0 | 2419.06 | 2419.06 | 0 | 2 | 67 | 78 | 104 | 1495.0242 | 2419.06 | 0 | nan | nan | nan | nan | nan | 0 |
15825 rows x 227 columns
memory usage: 22.03 MB
name: featuretools_train
type: getml.DataFrame
featuretools_test.set_role("churn", getml.data.roles.target)
featuretools_test.set_role(featuretools_test.roles.unused_float, getml.data.roles.numerical)
featuretools_test.set_role(featuretools_test.roles.unused_string, getml.data.roles.categorical)
featuretools_test
name | churn | RevisionNumber | OnlineOrderFlag | SalesPersonID | TerritoryID | COUNT(sales_order_header) | MODE(sales_order_header.OnlineOrderFlag) | MODE(sales_order_header.RevisionNumber) | MODE(sales_order_header.churn) | NUM_UNIQUE(sales_order_header.OnlineOrderFlag) | NUM_UNIQUE(sales_order_header.RevisionNumber) | NUM_UNIQUE(sales_order_header.churn) | COUNT(sales_order_detail) | MODE(sales_order_detail.Category) | MODE(sales_order_detail.Description) | MODE(sales_order_detail.MakeFlag) | MODE(sales_order_detail.ProductSubcategoryID) | MODE(sales_order_detail.Type) | NUM_UNIQUE(sales_order_detail.Category) | NUM_UNIQUE(sales_order_detail.Description) | NUM_UNIQUE(sales_order_detail.MakeFlag) | NUM_UNIQUE(sales_order_detail.ProductSubcategoryID) | NUM_UNIQUE(sales_order_detail.Type) | COUNT(sales_order_reason) | MODE(sales_order_reason.SalesReasonID) | NUM_UNIQUE(sales_order_reason.SalesReasonID) | COUNT(store) | DAY(DueDate) | DAY(ModifiedDate) | DAY(OrderDate) | DAY(ShipDate) | MONTH(DueDate) | MONTH(ModifiedDate) | MONTH(OrderDate) | MONTH(ShipDate) | WEEKDAY(DueDate) | WEEKDAY(ModifiedDate) | WEEKDAY(OrderDate) | WEEKDAY(ShipDate) | YEAR(DueDate) | YEAR(ModifiedDate) | YEAR(OrderDate) | YEAR(ShipDate) | MODE(sales_order_header.DAY(DueDate)) | MODE(sales_order_header.DAY(ModifiedDate)) | MODE(sales_order_header.DAY(OrderDate)) | MODE(sales_order_header.DAY(ShipDate)) | MODE(sales_order_header.MONTH(DueDate)) | MODE(sales_order_header.MONTH(ModifiedDate)) | MODE(sales_order_header.MONTH(OrderDate)) | MODE(sales_order_header.MONTH(ShipDate)) | MODE(sales_order_header.WEEKDAY(DueDate)) | MODE(sales_order_header.WEEKDAY(ModifiedDate)) | MODE(sales_order_header.WEEKDAY(OrderDate)) | MODE(sales_order_header.WEEKDAY(ShipDate)) | MODE(sales_order_header.YEAR(DueDate)) | MODE(sales_order_header.YEAR(ModifiedDate)) | MODE(sales_order_header.YEAR(OrderDate)) | MODE(sales_order_header.YEAR(ShipDate)) | NUM_UNIQUE(sales_order_header.DAY(DueDate)) | NUM_UNIQUE(sales_order_header.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.DAY(OrderDate)) | NUM_UNIQUE(sales_order_header.DAY(ShipDate)) | NUM_UNIQUE(sales_order_header.MONTH(DueDate)) | NUM_UNIQUE(sales_order_header.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_header.MONTH(OrderDate)) | NUM_UNIQUE(sales_order_header.MONTH(ShipDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(DueDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(OrderDate)) | NUM_UNIQUE(sales_order_header.WEEKDAY(ShipDate)) | NUM_UNIQUE(sales_order_header.YEAR(DueDate)) | NUM_UNIQUE(sales_order_header.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_header.YEAR(OrderDate)) | NUM_UNIQUE(sales_order_header.YEAR(ShipDate)) | MODE(sales_order_detail.DAY(EndDate)) | MODE(sales_order_detail.DAY(ModifiedDate)) | MODE(sales_order_detail.DAY(StartDate)) | MODE(sales_order_detail.MONTH(EndDate)) | MODE(sales_order_detail.MONTH(ModifiedDate)) | MODE(sales_order_detail.MONTH(StartDate)) | MODE(sales_order_detail.WEEKDAY(EndDate)) | MODE(sales_order_detail.WEEKDAY(ModifiedDate)) | MODE(sales_order_detail.WEEKDAY(StartDate)) | MODE(sales_order_detail.YEAR(EndDate)) | MODE(sales_order_detail.YEAR(ModifiedDate)) | MODE(sales_order_detail.YEAR(StartDate)) | NUM_UNIQUE(sales_order_detail.DAY(EndDate)) | NUM_UNIQUE(sales_order_detail.DAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.DAY(StartDate)) | NUM_UNIQUE(sales_order_detail.MONTH(EndDate)) | NUM_UNIQUE(sales_order_detail.MONTH(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.MONTH(StartDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(EndDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.WEEKDAY(StartDate)) | NUM_UNIQUE(sales_order_detail.YEAR(EndDate)) | NUM_UNIQUE(sales_order_detail.YEAR(ModifiedDate)) | NUM_UNIQUE(sales_order_detail.YEAR(StartDate)) | MODE(store.DAY(test)) | MODE(store.MONTH(test)) | MODE(store.WEEKDAY(test)) | MODE(store.YEAR(test)) | NUM_UNIQUE(store.DAY(test)) | NUM_UNIQUE(store.MONTH(test)) | NUM_UNIQUE(store.WEEKDAY(test)) | NUM_UNIQUE(store.YEAR(test)) | TerritoryIDCat | ShipMethodID | SubTotal | TaxAmt | Freight | TotalDue | MAX(sales_order_header.Freight) | MAX(sales_order_header.SalesPersonID) | MAX(sales_order_header.ShipMethodID) | MAX(sales_order_header.SubTotal) | MAX(sales_order_header.TaxAmt) | MAX(sales_order_header.TerritoryID) | MAX(sales_order_header.TotalDue) | MEAN(sales_order_header.Freight) | MEAN(sales_order_header.SalesPersonID) | MEAN(sales_order_header.ShipMethodID) | MEAN(sales_order_header.SubTotal) | MEAN(sales_order_header.TaxAmt) | MEAN(sales_order_header.TerritoryID) | MEAN(sales_order_header.TotalDue) | MIN(sales_order_header.Freight) | MIN(sales_order_header.SalesPersonID) | MIN(sales_order_header.ShipMethodID) | MIN(sales_order_header.SubTotal) | MIN(sales_order_header.TaxAmt) | MIN(sales_order_header.TerritoryID) | MIN(sales_order_header.TotalDue) | SKEW(sales_order_header.Freight) | SKEW(sales_order_header.SalesPersonID) | SKEW(sales_order_header.ShipMethodID) | SKEW(sales_order_header.SubTotal) | SKEW(sales_order_header.TaxAmt) | SKEW(sales_order_header.TerritoryID) | SKEW(sales_order_header.TotalDue) | STD(sales_order_header.Freight) | STD(sales_order_header.SalesPersonID) | STD(sales_order_header.ShipMethodID) | STD(sales_order_header.SubTotal) | STD(sales_order_header.TaxAmt) | STD(sales_order_header.TerritoryID) | STD(sales_order_header.TotalDue) | SUM(sales_order_header.Freight) | SUM(sales_order_header.SalesPersonID) | SUM(sales_order_header.ShipMethodID) | SUM(sales_order_header.SubTotal) | SUM(sales_order_header.TaxAmt) | SUM(sales_order_header.TerritoryID) | SUM(sales_order_header.TotalDue) | MAX(sales_order_detail.DiscountPct) | MAX(sales_order_detail.LineTotal) | MAX(sales_order_detail.ListPrice) | MAX(sales_order_detail.MinQty) | MAX(sales_order_detail.OrderQty) | MAX(sales_order_detail.ProductModelID) | MAX(sales_order_detail.ReorderPoint) | MAX(sales_order_detail.SafetyStockLevel) | MAX(sales_order_detail.StandardCost) | MAX(sales_order_detail.UnitPrice) | MAX(sales_order_detail.UnitPriceDiscount) | MEAN(sales_order_detail.DiscountPct) | MEAN(sales_order_detail.LineTotal) | MEAN(sales_order_detail.ListPrice) | MEAN(sales_order_detail.MinQty) | MEAN(sales_order_detail.OrderQty) | MEAN(sales_order_detail.ProductModelID) | MEAN(sales_order_detail.ReorderPoint) | MEAN(sales_order_detail.SafetyStockLevel) | MEAN(sales_order_detail.StandardCost) | MEAN(sales_order_detail.UnitPrice) | MEAN(sales_order_detail.UnitPriceDiscount) | MIN(sales_order_detail.DiscountPct) | MIN(sales_order_detail.LineTotal) | MIN(sales_order_detail.ListPrice) | MIN(sales_order_detail.MinQty) | MIN(sales_order_detail.OrderQty) | MIN(sales_order_detail.ProductModelID) | MIN(sales_order_detail.ReorderPoint) | MIN(sales_order_detail.SafetyStockLevel) | MIN(sales_order_detail.StandardCost) | MIN(sales_order_detail.UnitPrice) | MIN(sales_order_detail.UnitPriceDiscount) | SKEW(sales_order_detail.DiscountPct) | SKEW(sales_order_detail.LineTotal) | SKEW(sales_order_detail.ListPrice) | SKEW(sales_order_detail.MinQty) | SKEW(sales_order_detail.OrderQty) | SKEW(sales_order_detail.ProductModelID) | SKEW(sales_order_detail.ReorderPoint) | SKEW(sales_order_detail.SafetyStockLevel) | SKEW(sales_order_detail.StandardCost) | SKEW(sales_order_detail.UnitPrice) | SKEW(sales_order_detail.UnitPriceDiscount) | STD(sales_order_detail.DiscountPct) | STD(sales_order_detail.LineTotal) | STD(sales_order_detail.ListPrice) | STD(sales_order_detail.MinQty) | STD(sales_order_detail.OrderQty) | STD(sales_order_detail.ProductModelID) | STD(sales_order_detail.ReorderPoint) | STD(sales_order_detail.SafetyStockLevel) | STD(sales_order_detail.StandardCost) | STD(sales_order_detail.UnitPrice) | STD(sales_order_detail.UnitPriceDiscount) | SUM(sales_order_detail.DiscountPct) | SUM(sales_order_detail.LineTotal) | SUM(sales_order_detail.ListPrice) | SUM(sales_order_detail.MinQty) | SUM(sales_order_detail.OrderQty) | SUM(sales_order_detail.ProductModelID) | SUM(sales_order_detail.ReorderPoint) | SUM(sales_order_detail.SafetyStockLevel) | SUM(sales_order_detail.StandardCost) | SUM(sales_order_detail.UnitPrice) | SUM(sales_order_detail.UnitPriceDiscount) | MAX(store.SalesPersonID) | MEAN(store.SalesPersonID) | MIN(store.SalesPersonID) | SKEW(store.SalesPersonID) | STD(store.SalesPersonID) | SUM(store.SalesPersonID) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
role | target | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical | numerical |
0 | 0 | 8 | 0 | 282 | 6 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 22 | No Discount | No Discount | 1 | 2 | No Discount | 1 | 1 | 1 | 2 | 1 | 0 | NULL | NULL | 74 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 6 | 5 | 28832.5289 | 2775.1646 | 867.2389 | 32474.9324 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5248.764 | 3578.27 | 0 | 6 | 30 | 375 | 500 | 2171.2942 | 2146.962 | 0 | 0 | 1310.5695 | 1021.1545 | 0 | 2.4545 | 22.1818 | 184.0909 | 245.4545 | 619.8742 | 588.8855 | 0 | 0 | 178.5808 | 337.22 | 0 | 1 | 9 | 75 | 100 | 187.1571 | 178.5808 | 0 | 0 | 1.8938 | 2.1819 | 0 | 0.7899 | -0.6531 | 0.6093 | 0.6093 | 2.1479 | 2.1812 | 0 | 0 | 1219.991 | 911.954 | 0 | 1.5346 | 9.1529 | 147.7098 | 196.9464 | 555.2103 | 556.8657 | 0 | 0 | 28832.5289 | 22465.4 | 0 | 54 | 488 | 4050 | 5400 | 13637.2318 | 12955.4816 | 0 | 282 | 282 | 282 | 0 | 0 | 20868 |
1 | 0 | 8 | 0 | 283 | 1 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 10 | No Discount | No Discount | 0 | 1 | No Discount | 1 | 1 | 2 | 5 | 1 | 0 | NULL | NULL | 38 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 1 | 5 | 14352.7713 | 1375.9427 | 429.9821 | 16158.6961 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4049.988 | 3399.99 | 0 | 6 | 33 | 75 | 100 | 1912.1544 | 2039.994 | 0 | 0 | 1435.2771 | 1703.841 | 0 | 2 | 19.2 | 39 | 52 | 957.9515 | 1022.007 | 0 | 0 | 10.373 | 8.99 | 0 | 1 | 2 | 3 | 4 | 3.3963 | 5.1865 | 0 | 0 | 0.695 | -9.915e-05 | 0 | 2.5156 | -0.071 | 0 | 0 | -0.0002387 | -8.522e-05 | 0 | 0 | 1652.4063 | 1766.8723 | 0 | 1.4907 | 9.1019 | 37.9473 | 50.5964 | 994.013 | 1060.435 | 0 | 0 | 14352.7713 | 17038.41 | 0 | 20 | 192 | 390 | 520 | 9579.5155 | 10220.0699 | 0 | 283 | 283 | 283 | 0 | 0 | 10754 |
2 | 0 | 8 | 0 | 283 | 1 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 1 | No Discount | No Discount | 1 | 12 | No Discount | 1 | 1 | 1 | 1 | 1 | 0 | NULL | NULL | 38 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 1 | 5 | 714.7043 | 70.5175 | 22.0367 | 807.2585 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 714.7043 | 1349.6 | 0 | 1 | 5 | 375 | 500 | 739.041 | 714.7043 | 0 | 0 | 714.7043 | 1349.6 | 0 | 1 | 5 | 375 | 500 | 739.041 | 714.7043 | 0 | 0 | 714.7043 | 1349.6 | 0 | 1 | 5 | 375 | 500 | 739.041 | 714.7043 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 714.7043 | 1349.6 | 0 | 1 | 5 | 375 | 500 | 739.041 | 714.7043 | 0 | 283 | 283 | 283 | 0 | 0 | 10754 |
3 | 0 | 8 | 0 | 279 | 5 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 6 | No Discount | No Discount | 1 | 2 | No Discount | 1 | 1 | 1 | 1 | 1 | 0 | NULL | NULL | 80 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 5 | 5 | 5596.4705 | 537.2612 | 167.8941 | 6301.6258 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1749.588 | 1457.99 | 0 | 2 | 30 | 75 | 100 | 884.7083 | 874.794 | 0 | 0 | 932.7451 | 1007.99 | 0 | 1.5 | 29.3333 | 75 | 100 | 619.3738 | 571.2373 | 0 | 0 | 419.4589 | 782.99 | 0 | 1 | 28 | 75 | 100 | 486.7066 | 419.4589 | 0 | 0 | 0.7356 | 0.9682 | 0 | 0 | -0.9682 | 0 | 0 | 0.9682 | 0.9682 | 0 | 0 | 653.2467 | 348.5685 | 0 | 0.5477 | 1.0328 | 0 | 0 | 205.5272 | 235.134 | 0 | 0 | 5596.4705 | 6047.94 | 0 | 9 | 176 | 450 | 600 | 3716.243 | 3427.4236 | 0 | 279 | 279 | 279 | 0 | 0 | 22320 |
4 | 0 | 8 | 0 | 277 | 4 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 1 | No Discount | No Discount | 1 | 2 | No Discount | 1 | 1 | 1 | 1 | 1 | 0 | NULL | NULL | 76 | 12 | 6 | 30 | 7 | 6 | 6 | 5 | 6 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 5 | 5 | 6 | 0 | 6 | 2014 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 12 | 9 | 4 | 2014 | 1 | 1 | 1 | 1 | 4 | 5 | 874.794 | 83.9802 | 26.2438 | 985.018 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 874.794 | 1457.99 | 0 | 1 | 28 | 75 | 100 | 884.7083 | 874.794 | 0 | 0 | 874.794 | 1457.99 | 0 | 1 | 28 | 75 | 100 | 884.7083 | 874.794 | 0 | 0 | 874.794 | 1457.99 | 0 | 1 | 28 | 75 | 100 | 884.7083 | 874.794 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 874.794 | 1457.99 | 0 | 1 | 28 | 75 | 100 | 884.7083 | 874.794 | 0 | 277 | 277 | 277 | 0 | 0 | 21052 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
3874 | 1 | 8 | 1 | NULL | 9 | 1 | 1 | 8 | 0 | 1 | 1 | 1 | 2 | No Discount | No Discount | 0 | 2 | No Discount | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 8 | 2 | 26 | 3 | 4 | 4 | 3 | 4 | 0 | 1 | 1 | 2 | 2013 | 2013 | 2013 | 2013 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 9 | 1 | 1725.98 | 138.0784 | 43.1495 | 1907.2079 | 51.7855 | nan | 1 | 2071.4196 | 165.7136 | 9 | 2288.9187 | 51.7855 | nan | 1 | 2071.4196 | 165.7136 | 9 | 2288.9187 | 51.7855 | nan | 1 | 2071.4196 | 165.7136 | 9 | 2288.9187 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 51.7855 | 0 | 1 | 2071.4196 | 165.7136 | 9 | 2288.9187 | 0 | 1700.99 | 1700.99 | 0 | 1 | 89 | 375 | 500 | 1082.51 | 1700.99 | 0 | 0 | 862.99 | 862.99 | 0 | 1 | 58 | 225 | 300 | 545.9281 | 862.99 | 0 | 0 | 24.99 | 24.99 | 0 | 1 | 27 | 75 | 100 | 9.3463 | 24.99 | 0 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 1185.111 | 1185.111 | 0 | 0 | 43.8406 | 212.132 | 282.8427 | 758.8413 | 1185.111 | 0 | 0 | 1725.98 | 1725.98 | 0 | 2 | 116 | 450 | 600 | 1091.8563 | 1725.98 | 0 | nan | nan | nan | nan | nan | 0 |
3875 | 1 | 8 | 1 | NULL | 9 | 1 | 1 | 8 | 0 | 1 | 1 | 1 | 4 | No Discount | No Discount | 0 | 37 | No Discount | 1 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 24 | 18 | 11 | 19 | 10 | 10 | 10 | 10 | 0 | 1 | 1 | 2 | 2011 | 2011 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 9 | 1 | 2488.93 | 199.1144 | 62.2233 | 2750.2677 | 89.4568 | nan | 1 | 3578.27 | 286.2616 | 9 | 3953.9884 | 89.4568 | nan | 1 | 3578.27 | 286.2616 | 9 | 3953.9884 | 89.4568 | nan | 1 | 3578.27 | 286.2616 | 9 | 3953.9884 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 89.4568 | 0 | 1 | 3578.27 | 286.2616 | 9 | 3953.9884 | 0 | 2443.35 | 2443.35 | 0 | 1 | 93 | 375 | 500 | 1554.9479 | 2443.35 | 0 | 0 | 622.2325 | 622.2325 | 0 | 1 | 58.25 | 207 | 276 | 392.9987 | 622.2325 | 0 | 0 | 3.99 | 3.99 | 0 | 1 | 24 | 3 | 4 | 1.4923 | 3.99 | 0 | 0 | 1.9994 | 1.9994 | 0 | 0 | 0.002931 | -0.1153 | -0.1153 | 1.9998 | 1.9994 | 0 | 0 | 1214.1424 | 1214.1424 | 0 | 0 | 38.422 | 196.204 | 261.6053 | 774.6468 | 1214.1424 | 0 | 0 | 2488.93 | 2488.93 | 0 | 4 | 233 | 828 | 1104 | 1571.9949 | 2488.93 | 0 | nan | nan | nan | nan | nan | 0 |
3876 | 1 | 8 | 1 | NULL | 1 | 1 | 1 | 8 | 0 | 1 | 1 | 1 | 3 | No Discount | No Discount | 0 | 37 | No Discount | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 21 | 15 | 8 | 16 | 8 | 8 | 8 | 8 | 6 | 0 | 0 | 1 | 2011 | 2011 | 2011 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 | 1 | 1248.83 | 99.9064 | 31.2208 | 1379.9572 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 1 | 3729.364 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 1 | 3729.364 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 1 | 3729.364 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 84.3748 | 0 | 1 | 3374.99 | 269.9992 | 1 | 3729.364 | 0.02 | 1214.85 | 1214.85 | 11 | 1 | 94 | 375 | 500 | 755.1508 | 1214.85 | 0 | 0.006667 | 416.2767 | 416.2767 | 3.6667 | 1 | 73.3333 | 275 | 366.6667 | 255.9531 | 416.2767 | 0 | 0 | 4.99 | 4.99 | 0 | 1 | 35 | 75 | 100 | 1.8663 | 4.99 | 0 | 1.7321 | 1.7297 | 1.7297 | 1.7321 | 0 | -1.7162 | -1.7321 | -1.7321 | 1.7312 | 1.7297 | 0 | 0.01155 | 691.6889 | 691.6889 | 6.3509 | 0 | 33.2315 | 173.2051 | 230.9401 | 432.3412 | 691.6889 | 0 | 0.02 | 1248.83 | 1248.83 | 11 | 3 | 220 | 825 | 1100 | 767.8594 | 1248.83 | 0 | nan | nan | nan | nan | nan | 0 |
3877 | 1 | 8 | 1 | NULL | 4 | 1 | 1 | 8 | 0 | 1 | 1 | 1 | 4 | No Discount | No Discount | 0 | 28 | No Discount | 1 | 1 | 2 | 3 | 1 | 1 | 1 | 1 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | 24 | 18 | 11 | 19 | 4 | 4 | 4 | 4 | 1 | 2 | 2 | 3 | 2012 | 2012 | 2012 | 2012 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 4 | 1 | 2453.04 | 196.2432 | 61.326 | 2710.6092 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 4 | 3729.364 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 4 | 3729.364 | 84.3748 | nan | 1 | 3374.99 | 269.9992 | 4 | 3729.364 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 84.3748 | 0 | 1 | 3374.99 | 269.9992 | 4 | 3729.364 | 0 | 2384.07 | 2384.07 | 0 | 1 | 113 | 75 | 100 | 1481.9379 | 2384.07 | 0 | 0 | 613.26 | 613.26 | 0 | 1 | 91.25 | 21 | 28 | 376.9332 | 613.26 | 0 | 0 | 4.99 | 4.99 | 0 | 1 | 34 | 3 | 4 | 1.8663 | 4.99 | 0 | 0 | 1.9978 | 1.9978 | 0 | 0 | -1.9747 | 2. | 2. | 1.9992 | 1.9978 | 0 | 0 | 1180.758 | 1180.758 | 0 | 0 | 38.2481 | 36 | 48 | 736.7187 | 1180.758 | 0 | 0 | 2453.04 | 2453.04 | 0 | 4 | 365 | 84 | 112 | 1507.7328 | 2453.04 | 0 | nan | nan | nan | nan | nan | 0 |
3878 | 1 | 8 | 1 | NULL | 1 | 0 | NULL | NULL | NULL | NULL | NULL | NULL | 3 | No Discount | No Discount | 0 | 2 | No Discount | 2 | 2 | 2 | 3 | 2 | 2 | 1 | 2 | 0 | 12 | 6 | 30 | 7 | 1 | 1 | 12 | 1 | 6 | 0 | 0 | 1 | 2014 | 2014 | 2013 | 2014 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 30 | 30 | 1 | 11 | 12 | 5 | 6 | 0 | 6 | 2014 | 2013 | 2011 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 1 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1 | 1 | 1179.97 | 94.3976 | 29.4993 | 1303.8669 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.02 | 1120.49 | 1120.49 | 11 | 1 | 33 | 75 | 100 | 713.0798 | 1120.49 | 0 | 0.006667 | 393.3233 | 393.3233 | 3.6667 | 1 | 22 | 27 | 36 | 245.1085 | 393.3233 | 0 | 0 | 24.49 | 24.49 | 0 | 1 | 4 | 3 | 4 | 9.1593 | 24.49 | 0 | 1.7321 | 1.7315 | 1.7315 | 1.7321 | 0 | -1.6067 | 1.7321 | 1.7321 | 1.7319 | 1.7315 | 0 | 0.01155 | 629.7667 | 629.7667 | 6.3509 | 0 | 15.7162 | 41.5692 | 55.4256 | 405.2798 | 629.7667 | 0 | 0.02 | 1179.97 | 1179.97 | 11 | 3 | 66 | 81 | 108 | 735.3254 | 1179.97 | 0 | nan | nan | nan | nan | nan | 0 |
3879 rows x 227 columns
memory usage: 5.40 MB
name: featuretools_test
type: getml.DataFrame
We train an untuned XGBoostRegressor on top of featuretools' features, just like we have done for getML's features.
Since some of featuretools features are categorical, we allow the pipeline to include these features as well. Other features contain NaN values, which is why we also apply getML's Imputation preprocessor.
imputation = getml.preprocessors.Imputation()
predictor = getml.predictors.XGBoostClassifier(n_jobs=1)
pipe3 = getml.Pipeline(
tags=['featuretools'],
preprocessors=[imputation],
predictors=[predictor],
include_categorical=True,
)
pipe3
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=True, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostClassifier'], preprocessors=['Imputation'], share_selected_features=0.5, tags=['featuretools'])
pipe3.fit(featuretools_train)
Checking data model...
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Checking... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
The pipeline check generated 0 issues labeled INFO and 19 issues labeled WARNING.
To see the issues in full, run .check() on the pipeline.
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 XGBoost: Training as predictor... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:06
Trained pipeline.
Time taken: 0:00:07.203621.
Pipeline(data_model='population', feature_learners=[], feature_selectors=[], include_categorical=True, loss_function='SquareLoss', peripheral=[], predictors=['XGBoostClassifier'], preprocessors=['Imputation'], share_selected_features=0.5, tags=['featuretools'])
featuretools_score = pipe3.score(featuretools_test)
featuretools_score
Staging... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00 Preprocessing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 00:00
date time | set used | target | accuracy | auc | cross entropy | |
---|---|---|---|---|---|---|
0 | 2024-09-12 13:04:10 | featuretools_train | churn | 0.9144 | 0.9697 | 0.2329 |
1 | 2024-09-12 13:04:10 | featuretools_test | churn | 0.9118 | 0.966 | 0.2428 |
2.6 Features¶
The most important features look as follows:
pipe1.features.to_sql()[pipe1.features.sort(by="importances")[0].name]
DROP TABLE IF EXISTS "FEATURE_1_301";
CREATE TABLE "FEATURE_1_301" AS
SELECT MIN( t1."orderdate" - t2."t4__startdate" ) AS "feature_1_301",
t1.rowid AS rownum
FROM "POPULATION__STAGING_TABLE_1" t1
INNER JOIN "SALES_ORDER_DETAIL__STAGING_TABLE_2" t2
ON t1."salesorderid" = t2."salesorderid"
GROUP BY t1.rowid;
pipe2.features.to_sql()[pipe2.features.sort(by="importances")[0].name]
DROP TABLE IF EXISTS "FEATURE_1_1";
CREATE TABLE "FEATURE_1_1" AS
SELECT AVG(
CASE
WHEN ( t1."orderdate" - t2."t4__startdate" > 65976703.448276 ) AND ( t1."salespersonidcat" IN ( '279', '282', '276', '280', '283', '277', '275', '278', '281', '289', '290', '287', '284', '286', '288', '285' ) ) AND ( t2."month__strftime__m__modifieddate__mapping_2_target_1_avg" > 0.741158 ) THEN 1.944903076268523
WHEN ( t1."orderdate" - t2."t4__startdate" > 65976703.448276 ) AND ( t1."salespersonidcat" IN ( '279', '282', '276', '280', '283', '277', '275', '278', '281', '289', '290', '287', '284', '286', '288', '285' ) ) AND ( t2."month__strftime__m__modifieddate__mapping_2_target_1_avg" <= 0.741158 OR t2."month__strftime__m__modifieddate__mapping_2_target_1_avg" IS NULL ) THEN -1.952864243920228
WHEN ( t1."orderdate" - t2."t4__startdate" > 65976703.448276 ) AND ( t1."salespersonidcat" NOT IN ( '279', '282', '276', '280', '283', '277', '275', '278', '281', '289', '290', '287', '284', '286', '288', '285' ) OR t1."salespersonidcat" IS NULL ) AND ( t1."territoryidcat" IN ( '6' ) ) THEN 0.3081692636953738
WHEN ( t1."orderdate" - t2."t4__startdate" > 65976703.448276 ) AND ( t1."salespersonidcat" NOT IN ( '279', '282', '276', '280', '283', '277', '275', '278', '281', '289', '290', '287', '284', '286', '288', '285' ) OR t1."salespersonidcat" IS NULL ) AND ( t1."territoryidcat" NOT IN ( '6' ) OR t1."territoryidcat" IS NULL ) THEN 1.657101846423358
WHEN ( t1."orderdate" - t2."t4__startdate" <= 65976703.448276 OR t1."orderdate" IS NULL OR t2."t4__startdate" IS NULL ) AND ( t1."territoryidcat" IN ( '2', '10', '9', '8' ) ) AND ( t2."t3__productmodelid" IN ( '1', '14', '31', '23', '19', '21', '12', '2', '10', '17', '22', '35', '34', '36', '24', '18', '20', '25', '28', '37', '26', '29', '30', '32', '16', '9', '6', '15', '13', '8', '7', '3', '5', '104', '106', '60', '61', '52', '54', '55', '56', '42', '45', '46', '50', '51', '123', '124', '125', '126', '77', '78', '115', '116', '38', '114', '118', '119', '107', '127', '128', '79', '80', '66', '65', '67', '62', '63', '68', '70', '53', '103', '47', '48', '102', '99', '101', '98', '95', '97' ) ) THEN -1.994292123763791
WHEN ( t1."orderdate" - t2."t4__startdate" <= 65976703.448276 OR t1."orderdate" IS NULL OR t2."t4__startdate" IS NULL ) AND ( t1."territoryidcat" IN ( '2', '10', '9', '8' ) ) AND ( t2."t3__productmodelid" NOT IN ( '1', '14', '31', '23', '19', '21', '12', '2', '10', '17', '22', '35', '34', '36', '24', '18', '20', '25', '28', '37', '26', '29', '30', '32', '16', '9', '6', '15', '13', '8', '7', '3', '5', '104', '106', '60', '61', '52', '54', '55', '56', '42', '45', '46', '50', '51', '123', '124', '125', '126', '77', '78', '115', '116', '38', '114', '118', '119', '107', '127', '128', '79', '80', '66', '65', '67', '62', '63', '68', '70', '53', '103', '47', '48', '102', '99', '101', '98', '95', '97' ) OR t2."t3__productmodelid" IS NULL ) THEN 2.393874664272962
WHEN ( t1."orderdate" - t2."t4__startdate" <= 65976703.448276 OR t1."orderdate" IS NULL OR t2."t4__startdate" IS NULL ) AND ( t1."territoryidcat" NOT IN ( '2', '10', '9', '8' ) OR t1."territoryidcat" IS NULL ) AND ( t2."t3__productmodelid" IN ( '1', '14', '31', '23', '19', '21', '2', '10', '11', '4', '17', '35', '33', '24', '18', '28', '37', '26', '29', '30', '32', '16', '9', '6', '13', '8', '7', '3', '5', '104', '106', '59', '60', '61', '52', '55', '42', '45', '46', '50', '51', '123', '124', '125', '78', '116', '38', '118', '119', '107', '127', '128', '79', '80', '81', '84', '66', '67', '62', '63', '64', '68', '69', '70', '53', '103', '47', '48', '102', '99', '100', '101', '98', '95', '97' ) ) THEN -1.575213171680517
WHEN ( t1."orderdate" - t2."t4__startdate" <= 65976703.448276 OR t1."orderdate" IS NULL OR t2."t4__startdate" IS NULL ) AND ( t1."territoryidcat" NOT IN ( '2', '10', '9', '8' ) OR t1."territoryidcat" IS NULL ) AND ( t2."t3__productmodelid" NOT IN ( '1', '14', '31', '23', '19', '21', '2', '10', '11', '4', '17', '35', '33', '24', '18', '28', '37', '26', '29', '30', '32', '16', '9', '6', '13', '8', '7', '3', '5', '104', '106', '59', '60', '61', '52', '55', '42', '45', '46', '50', '51', '123', '124', '125', '78', '116', '38', '118', '119', '107', '127', '128', '79', '80', '81', '84', '66', '67', '62', '63', '64', '68', '69', '70', '53', '103', '47', '48', '102', '99', '100', '101', '98', '95', '97' ) OR t2."t3__productmodelid" IS NULL ) THEN -0.7654079549065077
ELSE NULL
END
) AS "feature_1_1",
t1.rowid AS rownum
FROM "POPULATION__STAGING_TABLE_1" t1
INNER JOIN "SALES_ORDER_DETAIL__STAGING_TABLE_2" t2
ON t1."salesorderid" = t2."salesorderid"
GROUP BY t1.rowid;
2.7 Productionization¶
It is possible to productionize the pipeline by transpiling the features into production-ready SQL code. Please also refer to getML's sqlite3
and spark
modules.
# Creates a folder containing the SQL code.
pipe2.features.to_sql().save("adventure_works", remove=True)
# Creates a folder containing the SQL code.
pipe2.features.to_sql(dialect=getml.pipeline.dialect.spark_sql).save("adventure_works_spark", remove=True)
2.8 Discussion¶
For a more convenient overview, we summarize our results into a table.
scores = [fastprop_score, relboost_score, featuretools_score]
pd.DataFrame(
data={
'Name': ['getML: FastProp', 'getML: Relboost', 'featuretools'],
'Accuracy': [f"{score.accuracy:.2%}" for score in scores],
'AUC': [f"{score.auc:.4f}" for score in scores],
'Cross entropy': [f"{score.cross_entropy:.4f}" for score in scores]
}
)
Name | Accuracy | AUC | Cross entropy | |
---|---|---|---|---|
0 | getML: FastProp | 91.42% | 0.9723 | 0.2199 |
1 | getML: Relboost | 92.86% | 0.9788 | 0.1888 |
2 | featuretools | 91.18% | 0.9660 | 0.2428 |
The picture we get is very consistent: Relboost outperforms FastProp and FastProp outperforms featuretools for all three measures.
These results also clearly demonstrate that this is a synthetic dataset - in the real-world, customer churn can not be predicted at this level of accuracy.
getml.engine.shutdown()
3. Conclusion¶
We demonstrated how getML can be used for a classic customer churn project. We have also benchmarked against featuretools. We found that getML outperforms featuretools.
References¶
Motl, Jan, and Oliver Schulte. "The CTU prague relational learning repository." arXiv preprint arXiv:1511.03086 (2015).