摘要
ABSTRACTWith the increasing interest and investment in research and development (R&D), the need for more efficient research project management has grown. Accordingly, we built prediction models to classify research projects that were expected to show excellent research output. Specifically, we applied five machine learning techniques to build prediction models. In an empirical analysis of data on research projects funded by South Korea over the last five years (2014–2018), we found that the automated machine learning model (autoML), in which the machine builds the most suitable learning model, shows relatively greater and more robust performance than models based on other techniques. We also established that research funding and project type played the most important roles in predicting excellent research projects. This study is significant because it shows the need for a paradigm shift in building an evidence-based project management system by verifying the utility and applicability of a data-driven approach in R&D project management.KEYWORDS: Research and developmentresearch project outputpredictionclassificationartificial intelligence Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 The South Korean government's R&D investment has constantly increased since 1964 and surpassed KRW 20 trillion (≈ USD17.1 billion) for the first time in 2019, and the R&D budget for 2020 has been KRW 24 trillion, (≈ USD 20.5 billion) showing a remarkable increase of 17.3% compared to the previous year.2 The number of government-funded research projects conducted in 2019 in South Korea was approximately 70,000, showing a 22.6% growth compared to 2015 (Lee & Yoo, Citation2020).3 In a preliminary study, we compared the prediction performance between classical and AI-based approaches. The results unequivocally demonstrate that AI-based approaches exhibit a significant superiority over classical approaches. This substantiates the importance of incorporating advanced quantitative methods like AI to effectively address our research problem. For comprehensive experimental findings, please refer to Supplemental S1.4 AI techniques are recently showing remarkable development in terms of performance, which already exceeds human judgment or prediction in various fields. This development is applied to various public sectors from images or voice recognition to security and healthcare, contributing to creating better social values.5 NTIS operates and discloses the National R&D Information Standard Database. As of 2017, total 422 organizations are collecting information including representative specialized agencies (17 agencies) and project management agencies (125 agencies) managing R&D projects in each government ministry.6 For simplicity, only the values of the top three codes of each categorical variable were reported.7 Naïve Bayes, Support Vector Machine, Random Forest, TabNet, and autoML8 There are a total of seven algorithms included in autoML: Distributed random forest, Generalized linear model, XGBoost Gradient boosting algorithm, H2O Gradient boosting algorithm, Deeplearning, and Stacked ensemble.Additional informationFundingThis work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government [grant number 2019R1F1A1063365].Notes on contributorsHuijae KimHuijae Kim is a Ph.D. student in the department of industrial and systems engineering at KAIST, Korea. Her research interests primarily focus on data analytics and optimisation. Kim received her MS degree from KAIST in the department of industrial and systems engineering.Hoon JangHoon Jang is an associate professor in the College of Global Business at Korea University, Korea. His research interests are primarily in the area of complex system designs, data-driven modelling and applied operations management problems. Dr. Jang obtained his MS and PhD degrees from KAIST in the dept of industrial and systems engineering.