亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization

强化学习 计算机科学 人工智能 功能(生物学) 机制(生物学) 机器学习 贝尔曼方程 交易策略 理论(学习稳定性) 价值(数学) 数学优化 财务 认识论 生物 进化生物学 哲学 数学 经济
作者
Yuling Huang,Chujin Zhou,Lin Zhang,Xiaoping Lu
出处
期刊:Mathematics [Multidisciplinary Digital Publishing Institute]
卷期号:12 (24): 4020-4020
标识
DOI:10.3390/math12244020
摘要

Reinforcement Learning (RL) is increasingly being applied to complex decision-making tasks such as financial trading. However, designing effective reward functions remains a significant challenge. Traditional static reward functions often fail to adapt to dynamic environments, leading to inefficiencies in learning. This paper presents a novel approach, called Self-Rewarding Deep Reinforcement Learning (SRDRL), which integrates a self-rewarding network within the RL framework. The SRDRL mechanism operates in two primary phases: First, supervised learning techniques are used to learn from expert knowledge by employing advanced time-series feature extraction models, including TimesNet and WFTNet. This step refines the self-rewarding network parameters by comparing predicted rewards with expert-labeled rewards, which are based on metrics such as Min-Max, Sharpe Ratio, and Return. In the second phase, the model selects the higher value between the expert-labeled and predicted rewards as the RL reward, storing it in the replay buffer. This combination of expert knowledge and predicted rewards enhances the performance of trading strategies. The proposed implementation, called Self-Rewarding Double DQN (SRDDQN), demonstrates that the self-rewarding mechanism improves learning and optimizes trading decisions. Experiments conducted on datasets including DJI, IXIC, and SP500 show that SRDDQN achieves a cumulative return of 1124.23% on the IXIC dataset, significantly outperforming the next best method, Fire (DQN-HER), which achieved 51.87%. SRDDQN also enhances the stability and efficiency of trading strategies, providing notable improvements over traditional RL methods. The integration of a self-rewarding mechanism within RL addresses a critical limitation in reward function design and offers a scalable, adaptable solution for complex, dynamic trading environments.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
19秒前
Zzz发布了新的文献求助10
24秒前
领导范儿应助Zzz采纳,获得10
32秒前
今后应助科研通管家采纳,获得10
34秒前
1分钟前
wcl发布了新的文献求助10
1分钟前
1分钟前
1分钟前
1分钟前
FashionBoy应助寡王一路硕博采纳,获得10
1分钟前
orixero应助寡王一路硕博采纳,获得10
1分钟前
小二郎应助寡王一路硕博采纳,获得10
1分钟前
小蘑菇应助寡王一路硕博采纳,获得10
1分钟前
ccc应助寡王一路硕博采纳,获得10
1分钟前
菜鸟学习完成签到 ,获得积分10
2分钟前
wcl关闭了wcl文献求助
2分钟前
Orange应助寡王一路硕博采纳,获得10
2分钟前
科研通AI2S应助科研通管家采纳,获得10
2分钟前
xwang完成签到,获得积分10
3分钟前
彭于晏应助wcl采纳,获得10
3分钟前
盘菜完成签到,获得积分10
3分钟前
humorlife完成签到,获得积分10
4分钟前
现代的冰海完成签到,获得积分10
4分钟前
zyyicu完成签到,获得积分10
4分钟前
zm完成签到 ,获得积分10
4分钟前
FeelingUnreal完成签到,获得积分10
5分钟前
GHOSTagw完成签到,获得积分10
5分钟前
赘婿应助Job采纳,获得10
6分钟前
Job完成签到,获得积分10
6分钟前
6分钟前
wcl发布了新的文献求助10
6分钟前
葛力完成签到,获得积分10
7分钟前
ljm完成签到 ,获得积分10
7分钟前
7分钟前
Job发布了新的文献求助10
7分钟前
YYY完成签到,获得积分20
8分钟前
传奇3应助Job采纳,获得10
8分钟前
科研通AI2S应助科研通管家采纳,获得10
8分钟前
顾矜应助YYY采纳,获得10
9分钟前
9分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
PowerCascade: A Synthetic Dataset for Cascading Failure Analysis in Power Systems 2000
Various Faces of Animal Metaphor in English and Polish 800
Signals, Systems, and Signal Processing 610
Adverse weather effects on bus ridership 500
Photodetectors: From Ultraviolet to Infrared 500
On the Dragon Seas, a sailor's adventures in the far east 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6350602
求助须知:如何正确求助?哪些是违规求助? 8165255
关于积分的说明 17181961
捐赠科研通 5406852
什么是DOI,文献DOI怎么找? 2862713
邀请新用户注册赠送积分活动 1840290
关于科研通互助平台的介绍 1689460