生成语法
合成生物学
人工智能
计算机科学
工程类
生物
计算生物学
出处
期刊:Social Science Research Network
[Social Science Electronic Publishing]
日期:2023-01-01
被引量:4
摘要
Classification is paramount in today’s data-rich environment as firms increasingly depend on machine learning to distill intelligence from vast amounts of unstructured text such as news articles, reports, and social media. Contemporary classification models can swiftly identify constructs of interest, such as sentiment, authors’ arguments, or product categorizations in textual data. To train an effective classification model, many correctly labeled examples are required. While simple constructs can be labeled via crowdsourcing, more complex constructs necessitate the involvement of expert labelers—a scarce resource. This research leverages generative AI, specifically ChatGPT4, as a surrogate for human expertise in complex classification tasks. It assesses the feasibility of this approach in an empirical study that identifies marketing mix variables in consumers' posts on Twitter. The results demonstrate that, unlike crowdsourced labels, those generated by ChatGPT4 are in high agreement with expert labels. To overcome ChatGPT4's proprietary nature, slow processing speed, and high cost, this research approximates it with an open-source model that is fine-tuned on ChatGPT4's labels. The created “synthetic expert” not only exhibits near parity with ChatGPT4 in terms of expert agreement, but is also highly scalable, fully independent, and free from third-party constraints. The model and code is shared online to rapidly disseminate the potential of synthetic expertise for complex classification tasks across fields and functions in academia and practice.
科研通智能强力驱动
Strongly Powered by AbleSci AI