计算机科学
任务(项目管理)
章节(排版)
模板
解析
认知重构
鉴定(生物学)
阅读(过程)
订单(交换)
句法结构
情报检索
自然语言处理
语法
程序设计语言
语言学
管理
哲学
经济
财务
操作系统
生物
社会心理学
植物
心理学
作者
Matheus Werner,Eduardo Sany Laber
标识
DOI:10.1016/j.eswa.2023.122495
摘要
This paper presents a novel resume parser designed to effectively reorganize the textual content of any resume into its original section structure. Our work addresses two practical challenges overlooked by the existing literature: (i) ensuring the correct reading order of text retrieved from resume files and (ii) extracting individually all sections, as well as work experience and education subsections. By taking into account the observation that most resumes adhere to basic document templates, we reframe the reading order problem as a template identification task. Our experiments suggest that even a widely-used small model like EfficientNet-B0 can accurately identify common templates. Additionally, we propose a sequence tagging approach that simultaneously identifies all resume sections and some subsections. We implement and compare two solutions based on the well-known CRF and BERT models. Our evaluation provides strong evidence that the CRF can serve as a practical alternative to BERT, depending on hardware and budget constraints. They yield comparable results in terms of identifying resume sections, while BERT displays a substantial advantage when identifying education and work experience subsections. An interesting direction for future work is to expand our approach to ensure the correct ordering of a large family of templates.
科研通智能强力驱动
Strongly Powered by AbleSci AI