强化学习
钢筋
计算机科学
认知科学
人工智能
心理学
社会心理学
作者
DeepSeek-AI,Daya Guo,Dejian Yang,Haowei Zhang,Junxiao Song,Ruoyu Zhang,Runxin Xu,Qihao Zhu,Shirong Ma,Peiyi Wang,Xiao Bi,Xiaokang Zhang,Xingkai Yu,Yu Wu,Zhenhua Wu,Zhibin Gou,Zhihong Shao,Zhuoshu Li,Ziyi Gao,Aixin Liu
出处
期刊:Cornell University - arXiv
日期:2025-01-22
被引量:8
标识
DOI:10.48550/arxiv.2501.12948
摘要
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
科研通智能强力驱动
Strongly Powered by AbleSci AI