计算机科学
稳健性(进化)
领域(数学分析)
水准点(测量)
本体工程
机器学习
生成语法
人工智能
数据科学
领域知识
过程本体
数学分析
生物化学
化学
数学
大地测量学
基因
地理
摘要
Abstract In engineering disciplines, leveraging generative language models necessitates using specialized datasets for training or finetuning these pre-existing models. Compiling these domain-specific datasets is a complex endeavor, demanding significant human effort and resources. To address the problem of domain-specific dataset scarcity, this study investigates the potential of generative Large Language Models (LLMs) in creating synthetic domain-specific textual datasets for engineering design domains. By harnessing the advanced capabilities of LLMs, such as GPT-4, a systematic methodology was developed to create high-fidelity datasets using designed prompts, evaluated against a manually labeled benchmark dataset through various computational measurements without human intervention. Findings suggest that well-designed prompts can significantly enhance the quality of domain-specific synthetic datasets with reduced manual effort. The research highlights the importance of prompt design in eliciting precise, domain-relevant information and discusses the balance between dataset robustness and richness. It is demonstrated that synthetic datasets can rival the quality of human-labeled domain-specific datasets, offering a strategic solution to the limitations imposed by dataset shortages in engineering domains. The implications for design thinking processes are particularly noteworthy, with the potential to assist designers through GPT-4's structured reasoning capabilities. This work presents a complete guide for domain-specific dataset generation, automated evaluation metrics, and insights into the interplay between data robustness and comprehensiveness.
科研通智能强力驱动
Strongly Powered by AbleSci AI