随机森林
人工智能
计算机科学
深度学习
领域(数学)
分类器(UML)
自然语言处理
机器学习
领域(数学分析)
人工神经网络
情报检索
数学
数学分析
纯数学
作者
Haihua Chen,Lei Wu,Jiangping Chen,Wei Lu,Junhua Ding
标识
DOI:10.1016/j.ipm.2021.102798
摘要
Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields.
科研通智能强力驱动
Strongly Powered by AbleSci AI