计算机科学
人工智能
半监督学习
核(代数)
机器学习
支持向量机
构造(python库)
任务(项目管理)
学习迁移
特征学习
模式识别(心理学)
生成模型
班级(哲学)
编码(社会科学)
代表(政治)
标记数据
生成语法
数学
组合数学
统计
政治
经济
管理
程序设计语言
法学
政治学
作者
Rajat Raina,Alexis Battle,Honglak Lee,Benjamin Packer,Andrew Y. Ng
出处
期刊:International Conference on Machine Learning
日期:2007-06-20
被引量:1504
标识
DOI:10.1145/1273496.1273592
摘要
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.
科研通智能强力驱动
Strongly Powered by AbleSci AI