计算机科学
深层神经网络
随机梯度下降算法
反事实思维
正规化(语言学)
深度学习
人工智能
估计员
人工神经网络
差异(会计)
缩小
经验风险最小化
机器学习
推荐系统
数学
统计
程序设计语言
业务
哲学
会计
认识论
作者
Thorsten Joachims,Adith Swaminathan,Maarten de Rijke
摘要
We propose a new output layer for deep neural networks that permits the use of logged contextual bandit feedback for training. Such contextual bandit feedback can be available in huge quantities (e.g., logs of search engines, recommender systems) at little cost, opening up a path for training deep networks on orders of magnitude more data. To this effect, we propose a Counterfactual Risk Minimization (CRM) approach for training deep networks using an equivariant empirical risk estimator with variance regularization, BanditNet, and show how the resulting objective can be decomposed in a way that allows Stochastic Gradient Descent (SGD) training. We empirically demonstrate the effectiveness of the method by showing how deep networks — ResNets in particular — can be trained for object recognition without conventionally labeled images.
科研通智能强力驱动
Strongly Powered by AbleSci AI