计算机科学
联营
人工智能
过程(计算)
语义学(计算机科学)
机器学习
序列(生物学)
情报检索
数据挖掘
遗传学
生物
操作系统
程序设计语言
作者
Yunduo Liu,Xu Fang,Yushan Zhao,Zichen Ma,Tengke Wang,Shunxiang Zhang,Yuhao Tian
标识
DOI:10.1080/09540091.2023.2295818
摘要
To further enhance the accuracy of the Chinese patent classification, this paper proposes a model, based on the patent structure and takes the patent claim as subjects, with multi-instance multi-label learning as the main method. Firstly, the patent claims are divided into multiple independent texts using the sequence number as the splitting token. For each patent, multiple claims are regarded as multiple instances, and the corresponding IPCs serve as its multiple labels. Next, the concept of secondary_label is introduced following the composition rules of IPC, and the relationships between instances and multiple secondary_labels are mined through the construction of fully-connected layers. To capture more comprehensive semantic information of instances, BIGRU and self-attention are employed to enhance semantics and reduce information loss during the training process. Finally, the max-pooling operations are utilised to obtain the predicted categories of patents based on capturing the relationships between instances and different hierarchical labels. Experimental results on the '2017 Chinese patent dataset' demonstrate that the multi-instance multi-label approach can effectively mine deeper relationships between patents and labels in classification tasks. As a result, our model significantly improves the accuracy of patent text classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI