计算机科学
任务(项目管理)
语言模型
自然语言处理
代表(政治)
人工智能
序列标记
编码(集合论)
序列(生物学)
语音识别
程序设计语言
管理
集合(抽象数据类型)
政治
政治学
生物
法学
经济
遗传学
作者
Peerachet Porkaew,Prachya Boonkwan,Thepchai Supnithi
标识
DOI:10.1109/isai-nlp54397.2021.9678190
摘要
Recently, pretrained language representations like BERT and RoBERTa have drawn more and more attention in NLP. In this work we propose a pretrained language representation for Thai language, which based on RoBERTa architecture. Our monolingual data used in the training are collected from publicly available resources including Wikipedia, OpenSubtitles, news and articles. Although the pretrained model can be fine-tuned for wide area of individual tasks, fine-tuning the model with multiple objectives also yields a surprisingly effective model. We evaluated the performance of our multi-task model on part-of-speech tagging, named entity recognition and clause boundary prediction. Our model achieves the comparable performance to strong single-task baselines. Our code and models are available at https://github.com/lstnlp/hoogberta.
科研通智能强力驱动
Strongly Powered by AbleSci AI