人工智能
计算机科学
自然语言处理
聚类分析
集合(抽象数据类型)
支持向量机
机器翻译
任务(项目管理)
鉴定(生物学)
文本分割
风格(视觉艺术)
分割
工程类
历史
植物
考古
生物
程序设计语言
系统工程
作者
Emad Mohamed,Raheem Sarwar,Sayed Mostafa
出处
期刊:Digital Scholarship in the Humanities
[Oxford University Press]
日期:2022-10-13
卷期号:38 (2): 658-666
被引量:3
摘要
Abstract Given a set of target language documents and their translators, the translator attribution task aims at identifying which translator translated which documents. The attribution and the identification of the translator’s style could contribute to fields including translation studies, digital humanities, and forensic linguistics. To conduct this investigation, firstly, we develop a new corpus containing the translations of world-famous books into Arabic. We then pre-process the books in our corpus which mainly involves cleaning irrelevant material, morphological segmentation analysis of words, and devocalization. After pre-processing the books, we propose to use 100 most frequent words and/or morphologically segmented function words as writing style markers of the translators (i.e. stylometric features) to differentiate between translations of different translators. After the completion of features extraction process, we applied several supervised and unsupervised machine-learning algorithms along with our novel cluster-to-author index to perform this task. We found that the translators are not invisible, and morphological analysis may not be more useful than just using the 100 most frequent words as features. The support vector machine linear kernel algorithm reported 99% classification accuracy. Similar findings were reported by the unsupervised machine-learning methods, namely, K-mean clustering and hierarchical clustering.
科研通智能强力驱动
Strongly Powered by AbleSci AI