马拉雅拉姆语
计算机科学
自然语言处理
机器翻译
人工智能
黏着语
基于实例的机器翻译
判决
短语
机器翻译评价
基于迁移的机器翻译
机器翻译软件可用性
集合(抽象数据类型)
翻译(生物学)
语音识别
解析
程序设计语言
化学
信使核糖核酸
基因
生物化学
作者
Mary Priya Sebastian,G. Santhosh Kumar
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing
日期:2023-01-19
卷期号:22 (4): 1-51
被引量:10
摘要
Statistical Machine Translation (SMT) is a preferred Machine Translation approach to convert the text in a specific language into another by automatically learning translations using a parallel corpus. SMT has been successful in producing quality translations in many foreign languages, but there are only a few works attempted in South Indian languages. The article discusses on experiments conducted with SMT for Malayalam language and analyzes how the methods defined for SMT in foreign languages affect a Dravidian language, Malayalam. The baseline SMT model does not work for Malayalam due to its unique characteristics like agglutinative nature and morphological richness. Hence, the challenge is to identify where precisely the SMT model has to be modified such that it adapts the challenges of the language peculiarity into the baseline model and give better translations for English to Malayalam translation. The alignments between English and Malayalam sentence pairs, subjected to the training process in SMT, plays a crucial role in producing quality output translation. Therefore, this work focuses on improving the translation model of SMT by refining the alignments between English–Malayalam sentence pairs. The phrase alignment algorithms align the verb and noun phrases in the sentence pairs and develop a new set of alignments for the English–Malayalam sentence pairs. These alignment sets refine the alignments formed from Giza++ produced as a result of EM training algorithm. The improved Phrase-Based SMT model trained using these refined alignments resulted in better translation quality, as indicated by the AER and BLUE scores.
科研通智能强力驱动
Strongly Powered by AbleSci AI