计算机科学
水准点(测量)
人工智能
领域(数学分析)
语言模型
稀缺
自然语言处理
情绪分析
机器学习
可靠性(半导体)
自然语言
深度学习
数学分析
功率(物理)
物理
数学
大地测量学
量子力学
经济
微观经济学
地理
作者
M. Marlot,Divya Nidhi Srivastava,Fui Kent Wong,Ming Xiang Lee
摘要
Abstract In this paper we explore the development of an oil and gas language model (LM) using an unsupervised multitask learning approach. A large language model (LLM)enables computers to understand and generate human language. The study addresses data scarcity and domain-specific language challenges, showcasing the model's performance on specific oil and gas tasks and qualitative testing. To do that, we collected a highly diversified dataset of 33,000 documents in energy and oil and gas domains to train and benchmark our model.Our findings demonstrate that even a small model, properly finetuned on domain-specific data, outperforms larger models trained on generic corpora, highlighting the benefits of finetuning LMs in technical domains. The paper contributes to advancing natural language processing (NLP) in the oil and gas industry, emphasizing the importance of addressing domain-specific nuances and limitations for improved NLP model performance and reliability.
科研通智能强力驱动
Strongly Powered by AbleSci AI