当前位置:
文档之家› 大数据复杂风控模型在PayPal Risk的应用
大数据复杂风控模型在PayPal Risk的应用
9
TRADITINAL FRAUD TYPES IN PAYPAL
404
STOLEN FINACIAL INR & SNAD *
INR: SNAD:
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Item Not Received Significantly Not as Described
Offline Variable Data warehouse (HDFS) Online Variables (NoSql DB) Offline Variables (NoSql DB) Offline Variable Computation
Model Training Platform
Online Variable Simulation
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
17
LARGE-SCALE MODELING
Cross Validation Feature Selection Model Ensemble
#44566874232 #65888123344
18
AGENDA
PayPal & PayPal Risk Large Scale Machine Learning Solution Feature Engineering & Modeling Future Plan
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
NEW INIT STATS
NORMALIZE
EVAL
VARSELECT
Built on Guagua*
POSTTRAIN*
TRAIN
Built on Guagua*
*Guagua is an iterative computing framework for both Hadoop MapReduce and Hadoop YARN
8
PAYPAL RISK
CONSUMER SELLER
TRUST
PAYPAL RISK
SAVE LOSS
BUYER/SELLER PROTECTION
ENABLE BUSINESS
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Offline Variable Data warehouse (HDFS) Online Variables (NoSql DB) Offline Variables (NoSql DB) Offline Variable Computation
Model Training Platform
Online Variable Simulation
0 9 20 32 40 54 58
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
22
DEFAULT BINNING ALGORITHM (SORT)
10
“NEW” FRAUD TYPES IN PAYPAL
MONEY LAUNDRY
COLLUSION
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
11
AGENDA
PayPal & PayPal Risk Large Scale Machine Learning Solution Feature Engineering & Modeling Future Plan
Large-Scale Machine Learning at PayPal Risk
1
TO DECLINE, OR NOT DECLINE?
7:16pmCEST PDT 4:16am 7:15pm PDT $100 USD €20 EUR
7:18pm PDT 12:18pm AEST
7:15pm: Card holder lives in US - His wife paid a bill online using their home laptop 4:16am CEST: Card holder’s son studies in Europe - He bought Angry Birds on iPad instead of studying 12:18pm AEST: Card holder travels to Australia - He just paid for lunch at a POS
STRONG FOUNDATION
STRONG MOMENTUM
Active Customer Accounts Gained in 2014
165 Million $235 Billion
Total Payment Volume
Active Customer Accounts
+19 Million +26% +19% +22%
riables Ne ed lots of va to make accu rate decisions… … but that im pacts decision spe ed
Reduce real-time data load… … but that impacts decision accuracy
0 15 24 32 40 46 60
Equal Positive (Each bin with two positive (red) elements)
0 26
30 38 50
60
Equal Negative (Each bin with two negative (green) elements)
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
6
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
7
PayPal: Leading the Digital Payments Revolution
Speed
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
13
RISK MODELING ARCHITECTURE
Online System Offline System
Decision Services Logging System (Kafka + Storm) Variable Services Model Services Models Monitor System
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
15
SHIFU: COMBINING FEATURE ENGNEERING AND DATA MODELING
Shifu is an open-source, end-to-end machine learning and data mining framework built on top of Hadoop.
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
5
AGENDA
PayPal & PayPal Risk Large Scale Machine Learning Solution Feature Engineering & Modeling Future Plan
Positive Rates Negative Rates
Kolmogorov–Smirnov Values
Information Values Weight of Evidence Values
Skewness & Kurtosis: /div898/handbook/eda/section3/eda35b.htm
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
12
TRADE-OFF IN RISK DECISION PLATFORM Accuracy, speed & robustness are conflicting requirements!
Training Set(60%)
#231889456 1 . #2223644887
Cross Validation Dataset(20%) Test Set(20%)
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.
Unit/Weighted Wised
Mean Std-dev Max
Min Skewness*
Kurtosis* Missing Rate
Unit/Weighted Binning
Boundaries
Counts Positive Counts Negative Counts