计算机科学
人工智能
元数据
搜索引擎索引
编码器
文件分类
分类器(UML)
机器学习
对偶(语法数字)
文字嵌入
情报检索
嵌入
艺术
文学类
操作系统
作者
Muhammad Inaam ul haq,Qianmu Li,Khalid Mahmood,Ayesha Shafique,Rizwan Ullah
标识
DOI:10.1093/comjnl/bxae132
摘要
Abstract Scientific literature is growing in volume with time. The number of papers published each year by 28 100 journals is 2.5 million. The citation indexes and search engines are used extensively to find these publications. An individual receives many documents in response to a query, but only a few are relevant. The final documents lack structure due to inadequate indexing. Many systems index research papers using keywords instead of subject hierarchies. In the scientific literature classification paradigm, various multilabel classification methods have been proposed based on metadata features. The existing metadata-driven statistical measures use bag of words and traditional embedding techniques, like Word2Vec and BERT, which cannot quantify textual properties effectively. In this paper, we try to solve the limitations of existing classification techniques by unveiling the semantic context of the words using an advanced transformer-based recurrent neural networks (RNN) approach incorporating Dual Attention and layer-wise learning rate to enhance the classification performance. We propose a novel model, BioElectra-BiLSTM-Dual Attention that extracts the semantic features from the titles and abstracts of the research articles using BioElectra-encoder and then BILSTM layer along with Dual Attention label embeddings their correlation matrix and layer-wise learning rate strategy employed for performance enhancement. We evaluated the performance of the proposed model on the multilabel scientific literature LitCovid dataset and the results suggest that it significantly improves the macro-F1 and micro-F1 score as compared to the state-of-the-art baselines (ML-Net, Binary Bert, and LitMCBert).
科研通智能强力驱动
Strongly Powered by AbleSci AI