Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort. With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize and recommend less relevant items when the user-item interactions are sparse and high-rank. In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. We productionized and evaluated the system on Google Play, a commercial mobile app store with over one billion active users and over one million apps. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models. We have also open-sourced our implementation in TensorFlow.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Click-Through Rate Prediction Amazon Wide & Deep AUC 0.8637 # 5
Click-Through Rate Prediction Bing News Wide & Deep AUC 0.8377 # 2
Log Loss 0.2668 # 2
Click-Through Rate Prediction Company* Wide & Deep (FM & DNN) AUC 0.8661 # 6
Log Loss 0.02640 # 6
Click-Through Rate Prediction Company* Wide & Deep (LR & DNN) AUC 0.8673 # 3
Log Loss 0.02634 # 3
Click-Through Rate Prediction Criteo Wide&Deep AUC 0.7981 # 34
Log Loss 0.46772 # 22
Click-Through Rate Prediction Dianping Wide & Deep AUC 0.8361 # 4
Log Loss 0.3364 # 3
News Recommendation MIND Wide&Deep AUC 62.16 # 8
MRR 29.31 # 8
nDCG@5 31.28 # 8
nDCG@10 37.12 # 8
Click-Through Rate Prediction MovieLens 20M Wide & Deep AUC 0.7304 # 6

Methods