非结构化数据
计算机科学
语言模型
财务
数据科学
自然语言处理
数据挖掘
大数据
业务
作者
Huaxia Li,Haoyun Gao,Chengzhang Wu,Miklos A. Vasarhelyi
出处
期刊:Journal of Information Systems
[American Accounting Association]
日期:2024-08-01
卷期号:: 1-22
被引量:2
标识
DOI:10.2308/isys-2023-047
摘要
ABSTRACT This research addresses the challenge of extracting financial data from unstructured sources, a persistent issue for accounting researchers, investors, and regulators. Leveraging large language models (LLMs), this study introduces a novel framework for automated financial data extraction from Portable Document Format (PDF)-formatted files. Following a design science methodology, this research develops the framework through a combination of text mining and prompt engineering techniques. The framework is subsequently applied to analyze governmental annual reports and corporate environmental, social, and governance reports, which are presented in PDF format. Test results indicate that the framework achieves an average 99.5 percent accuracy rate in a notably short time span when extracting key financial indicators. A subsequent large out-of-sample test reveals an overall accuracy rate converging around 96 percent. This study contributes to the evolving literature on applying LLMs in accounting and offers a valuable tool for both academic and industrial applications. Data Availability: Data are available upon request. JEL Classifications: M41; O31; C81.
科研通智能强力驱动
Strongly Powered by AbleSci AI