生成语法
计算机科学
认知科学
化学
人工智能
心理学
作者
Eman A. Alasadi,Carlos R. Baiz
标识
DOI:10.1021/acs.jchemed.4c00138
摘要
The introduction of multimodal capabilities in large language models (LLMs) marks a significant advancement in the field of artificial intelligence (AI). In particular, the ability to process and interpret visual data, including complex graphs and plots frequently encountered in chemistry, expands the potential of these models. This integration of text and image processing allows multimodal AI to tackle a broader range of problems, especially in areas where visual information is central to understanding and solving problems. This study provides an examination of GPT-4's image input capabilities, specifically targeting its efficacy in interpreting and solving chemistry problems that require graphical information. This study evaluates GPT-4's image input feature, focusing on its accuracy in interpreting chemical diagrams, structures, and tabular data, and its utility as an interactive, conversational tutor in chemistry education. The research assesses the consistency of the AI's responses to visual data of varying quality and its ability to parse handwritten problems and answers. Further, the study examines GPT-4's capacity for molecular structure analysis and spectral data interpretation, vital for advanced problem-solving in chemistry. Through analysis, we demonstrate how the image processing capabilities of GPT-4 could be leveraged for pedagogical purposes, particularly in undergraduate chemistry courses. In addition, we provide advice for prompt development to improve response quality.
科研通智能强力驱动
Strongly Powered by AbleSci AI