摘要
Commentary Although a multitude of predictive factors for total hip replacement (THR) in osteoarthritis (OA) patients have been identified for use in traditional predictive statistical models1, their use in emerging deep-learning models has been limited, with deep-learning predominantly being utilized for the prediction of established radiographic grades2-4. von Schacky et al.2 developed a multitask deep convoluted neural network (DCNN) model to grade OA features in hip radiographs obtained from the Osteoarthritis Initiative (OAI) study. The DCNN model demonstrated similar diagnostic accuracy compared with an expert musculoskeletal radiologist. Leung et al.3 developed a deep-learning model that utilized radiographs from the OAI study to predict the Kellgren-Lawrence grade and probability of total knee replacement within 9 years, and the deep-learning model outperformed human binary outcome models based on standard grading systems. However, prior to the study by Xu et al., no study is believed to have constructed a DCNN model to assess the risk of THR. This retrospective, multicenter, case-control study thus represents the first to utilize a DCNN model to assess the risk of THR with use of baseline radiographs and basic clinical symptoms. The authors applied robust data from the OAI, a National Institutes of Health-initiated longitudinal, multicenter, observational study. Within limitations, the DCNN-based study model achieved an overall sensitivity and specificity of 92.59% and 86.96%, respectively, and a high area under the receiver operating characteristic curve (AUC) of 0.944 to predict THR within 9 years. The AUC for the most likely time frame was 0.907 for 0 to 2 years, 0.916 for 3 to 5 years, and 0.841 (95% confidence interval, 0.697 to 0.985) for 6 to 9 years. These high values for the DCNN deep-learning model developed in this study indicate the feasibility of using the model for predicting the risk of THR from baseline radiographs and clinical symptoms. The model not only resulted in a high AUC for the 9-year risk estimate, but also displayed good discrimination between patients who would and would not undergo THR during the three 3-year time intervals within the 9 years. Thus, the model would enable the identification of patients with an imminent risk of osteoarthritis progression resulting in arthroplasty within 3 years as well as aid in monitoring of the patients predicted to be at risk for THR in the 2 later time periods and arranging appropriately timed interventions. A total of 736 participants from the OAI data set were analyzed, including 184 with OA who subsequently underwent THR and 552 controls. Over 4,000 individuals were excluded from the analysis of the OAI data set for not meeting the previously defined selection criteria or not having a propensity-score-based match. Cases and controls were each split at 72% (n = 528), 14% (n = 104), and 14% (n = 104) into training, validation, and testing cohorts. This split implies a cohort of just 26 patients each for validation and testing in the case group. Most participants were White and most had relatively high levels of education, income, and medical insurance, which may have impacted patient decision-making in favor of THR and the generalizability of the results. A high rate of loss to follow-up is to be expected for a study with this design and this duration (108 months in the OAI data set). The study defined the outcome as the performance of THR during various time periods, which enabled training of the DCNN model to classify patients regarding whether or not they were expected to undergo THR at any time during one of the time periods. Predicting a particular likely time to THR would have required a very different approach, using regression rather than classification. Although pure researchers and data scientists may prefer precise estimates of timing to THR, in clinical practice the choice and determination of THR timing are multifactorial; furthermore, the timing involves shared-decision-making between the patient and surgeon. The radiographs utilized in this DCNN model were entirely anteroposterior pelvic radiographs, which could be considered a limitation as other models have selectively utilized other views, allowing for greater transfer learning. An additional limitation of the study is that the methodological steps involved in the learning process and the parsing of the input data are inherently indiscernible with the use of artificial intelligence (AI); however, the precision and accuracy of the model are adequate indirect corroborators that the model was appropriately developed. Such deep-learning models to predict THR in patients with OA need to be refined and validated in a large, diverse, prospective cohort study before being adopted into routine clinical practice; however, such superior prospective studies would be resource-intensive, and their feasibility is uncertain. Realistically, the incorporation of other statistical models, especially in countries with robust clinical data, may complement and improve the accuracy of deep-learning models. In theory, the use of AI eliminates the potential for human error of interpretation, ensures diagnostic accuracy comparable with that of expert interpretation, prognosticates the potential risk and timing of THR, and informs shared clinical decision-making. The model described by Xu et al. provides an estimate of the risk of arthroplasty, and of its timing (in 3-year intervals), within 9 years with use of basic anteroposterior radiographs and clinical data. These results represent a fascinating prospect for patient counseling and operative planning, and could factor into the formulation of institutional, regional, and national policy and into health-care delivery. The use of deep-learning AI to predict operative risk, both generally and within specific time intervals, represents an interesting, imaginative, and innovative field with immense potential for evolution, and it may well prove to be a useful addition to the clinical frontier.