Multi-layer features ablation of BERT model and its application in stock trend prediction

计算机科学 滑动窗口协议 编码器 变压器 人工智能 库存(枪支) 语言模型 机器学习 窗口(计算) 万维网 机械工程 物理 量子力学 电压 工程类 操作系统
作者
Feng Zhao,Xinning Li,Yating Gao,Ying Li,Zhiquan Feng,Caiming Zhang
出处
期刊:Expert Systems With Applications [Elsevier]
卷期号:207: 117958-117958
标识
DOI:10.1016/j.eswa.2022.117958
摘要

Stock comments published by experts are important references for accurate stock trends prediction. How to comprehensively and accurately capture the topic of expert stock comments is an important issue which belongs to text classification . The Bidirectional Encoder Representations from Transformers (BERT) pretrained language model is widely used for text classification , due to its high identification accuracy. However, BERT has some limitations. First , it only utilizes fixed length text, leading to suboptimal performance in long text information exploration. Second , it only relies on the features extracted from the last layer, resulting in incomprehensive classification features. To tackle these issues, we propose a multi-layer features ablation study of BERT model for accurate identification of stock comments’ themes. Specifically, we firstly divide the original text to meet the length requirement of the BERT model based on sliding window technology. In this way, we can enlarge the sample size which is beneficial for reducing the over-fitting problem. At the same time, by dividing the long text into multiple short texts, all the information of the long text can be comprehensively captured through the synthesis of the subject information of multiple short texts. In addition , we extract the output features of each layer in the BERT model and apply the ablation strategy to extract more effective information in these features. Experimental results demonstrate that compared with non-intercepted comments, the topic recognition accuracy is improved by intercepting stock comments based on sliding window technology. It proves that intercepting text can improve the performance of text classification. Compared with the BERT, the multi-layer features ablation study we present in the paper further improves the performance in the topic recognition of stock comments, and can provide reference for the majority of investors. Our study has better performance and practicability on stock trend prediction by stock comments topic recognition.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
刚刚
寒冷的奇异果完成签到,获得积分10
1秒前
hziyu发布了新的文献求助10
2秒前
2秒前
野性的南蕾完成签到,获得积分10
2秒前
毛毛哦啊发布了新的文献求助10
2秒前
zzzzzk发布了新的文献求助10
2秒前
2秒前
lalala发布了新的文献求助10
3秒前
三里墩头应助oldlee采纳,获得20
3秒前
3秒前
iNk应助西安小小朱采纳,获得10
3秒前
CodeCraft应助西安小小朱采纳,获得10
3秒前
无花果应助爱学习的小迟采纳,获得10
4秒前
哭泣的映寒完成签到 ,获得积分10
4秒前
xls完成签到,获得积分10
4秒前
4秒前
故意的傲玉应助圈圈采纳,获得10
4秒前
5秒前
522完成签到,获得积分10
5秒前
5秒前
kbj发布了新的文献求助10
5秒前
6秒前
老西瓜发布了新的文献求助10
6秒前
人各有痣完成签到,获得积分10
6秒前
后知后觉发布了新的文献求助10
6秒前
xiaoxiao发布了新的文献求助30
6秒前
6秒前
7秒前
7秒前
英姑应助哈哈呀采纳,获得10
8秒前
8秒前
hurry完成签到,获得积分10
8秒前
Hungrylunch应助陈玉婷采纳,获得20
8秒前
领导范儿应助hu970采纳,获得10
9秒前
new_vision发布了新的文献求助10
9秒前
拼搏翠桃完成签到,获得积分10
10秒前
糖糖科研顺利呀完成签到 ,获得积分10
10秒前
10秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3527304
求助须知:如何正确求助?哪些是违规求助? 3107454
关于积分的说明 9285518
捐赠科研通 2805269
什么是DOI,文献DOI怎么找? 1539827
邀请新用户注册赠送积分活动 716708
科研通“疑难数据库(出版商)”最低求助积分说明 709672