计算机科学
表(数据库)
子网
棱锥(几何)
启发式
特征提取
领域(数学)
数据挖掘
人工智能
特征(语言学)
情报检索
机器学习
作者
Xiangben Hu,Jielin Jiang,Zhichen Hu,Tao Huang,Shengjun Xue,Xiaolong Xu
标识
DOI:10.1109/dasc-picom-cbdcom-cyberscitech52372.2021.00099
摘要
In the scholarly literature, tables carry a huge amount of information. In the traditional information extraction of the table, researcher often spend a lot of manpower to intergrate information. Using computer vision technology can imporve the efficiency of gathering data. However, in the literature of academic, too many negative samples often lead to poor result. In response to the previous problem, this paper proposes DeshengNet based on deep learning table information extraction in digital documents. Firstly, the feature map of literature pictures is obtained through the deep residual network. Then, the multi-scale features are merged with the feature pyramid network. Afterwards, the class box subnet is used for table positioning. Aiming at the problem of too many negative samples in the paper. The focal loss method is used for training. After detection, the spatial features of the table are used for heuristic extraction. The experimental results show that the method proposed in this paper can be applied to the industrial field and serve the researcher.
科研通智能强力驱动
Strongly Powered by AbleSci AI