摘要
Recent advances in machine learning and artificial intelligence (AI) brought unprecedented promises across the fields of medicine, including nephrology. The clinical complexity and challenges in patient management highlight the potential benefit of data-driven, algorithmic approaches in nephrology.1 For example, neural networks and other deep learning methods have been applied, from analyzing kidney biopsy specimens to predicting kidney failures.2 Along with hopes and hype comes the increasing concern that data- and model-based decision making can in fact exacerbate bias and inequity in health care. Researchers have shown that a model-driven prediction of eGFR that has been used for decades could be racially biased by assigning higher eGFR estimates to patients identifying as Black, although uncertainty remains in the biological explanation underlying the race correction.3 Another model implemented in practice, the Kidney Donor Risk Index, assigns unwarrantedly higher predicted risk of kidney graft failure in patients identifying as Black, which can potentially exacerbate inequality in access to organs for transplantation.3 When a single race variable has significant potential to create bias, the likelihood of the presence of bias is much greater in black box AI models that often blindly take in a large number of variables. It is imperative, therefore, that both the developers and end users of AI-based clinical applications understand the ways in which biases arise in data and model outputs. Through this article, we aim to help the readers recognize biases in AI applications and get familiarized with methods to mitigate biases. Types of Bias in Clinical AI Applications Figure 1 illustrates the types of biases that can arise throughout different stages of AI development. At a high level, there is the algorithmic side and human side of biases, as described below.Figure 1: Stages of artificial intelligence application development and associated biases. Text in red indicates the type of bias, text in blue indicates nonalgorithmic bias mitigation, and text in tan indicates algorithmic debiasing method application.Bias in Data-Generating Process AI development begins with collecting patient data, which almost always comes from a selected sample among the underlying target population. Skewness in patient sampling can lead to disparate model performance in over- or under-represented subgroups. Differences in outcome ascertainment, such as higher sensitivity or specificity of an event of interest in electronic health data, can be another source of bias. A previous study has shown how the accurate capture of health care cost as the outcome resulted in a model preferentially recommending White patients for additional treatment resources because less money is spent on Black patients compared with White patients with similar levels of morbidity.4 Clinician bias and complex evaluation process have unfavored Black patients, leading to disparity in receiving kidney transplant; data accurately capturing this practice can generate a model that treats Black race as a risk factor for transplant failure, reinforcing the underlying inequity.5 Bias in Model Training, Testing, and Validation Often unknown to consumers of AI, numerous modeling decisions take place during the course of development. Unlike traditional medicine where publication of study protocols has become a standard practice, convoluted process of model selection, training and testing, and validation is seldom prespecified or communicated, although it has substantial effect on the outcomes. Data quality can be a function of sociodemographic factors if access to care is associated with reliable capture of data. If a group of people have a lot of missing data because of several barriers to health care, AI models will likely underperform for this group and can lead to more harm than benefits if missing data are simply excluded in model training. Similarly, various model updates that take place upon observing subpar performance, missing data treatment methods, and the decision threshold for hemodialysis can affect the performance of models predicting AKI.6 Bias in Interpretation and Application of AI Models The human side of bias plays a significant role in translation of AI to clinical benefits. Clinician trust and acceptance in AI can be a deciding factor over the actual model performance for the extent of real-world application of AI. Variability in the levels of health literacy and cultural acceptance among patients can lead to missed opportunities of improving patient outcomes through novel technologies. Importantly, the patient-provider relationship that is a product of history, culture, and mutual trust can be modified through deployment of AI in clinical nephrology, which can have unintended consequences such as loss of trust and authority or reduced adherence to medical advice.7 Ways to Mitigate and Prevent Biases in Clinical AI Application Algorithmic Debiasing Methods Biases that are algorithmic in nature, that is related to data sampling, model training, and obtaining outputs, can be addressed at least in part through the debiasing methods. In this sense, bias often refers to unwarranted statistical associations between patient attributes of interest and the outcome. Existing algorithmic debiasing methods can be categorized into preprocessing, in-processing, and postprocessing methods.8 Preprocessing methods treat the training data before model fitting to address imbalances in data. An intuitive example is the reweighing method that transforms the training data to achieve balance in groups defined by sensitive attributes of interest such as race or sex. In-processing methods modify how a model learns from data in a way that reduces the influence of a variable in the learning process. As the name implies, postprocessing takes place after a model is fitted and adjusts the outputs in a post hoc manner to address biases. Tools exist publicly to enable people to readily apply these methods in practice.8 Nonalgorithmic Bias Mitigation Completely unbiased sampling of data is usually unfeasible, so it is the responsibility of both developers and users of AI to evaluate patient representation bias. Comparing the distribution of patient demographics between training data and target population is a good starting point. Implicit bias in patient care, such as partial recommendation of novel treatment options, can be identified through examining electronic health data that reflect the practice patterns. Efforts to increase diversity in data collection and to provide equitable treatment options should accompany the aforementioned activities. In addition, detailed and transparent documentation of the modeling process, including publication of datasets and code, should become a norm in the field. It can also incentivise researchers to perform replication studies and sensitivity analyses that are critical in ascertaining clinical benefits of AI. Finally, patient and provider education is paramount to ensuring unbiased interpretation and utilization of AI. In conclusion, big data and AI utilization is an inevitable wave in medicine, and nephrology is no exception. Rigorous bias evaluation and mitigation throughout the development and application process can prevent biased AI from adversely affecting patients and health systems, especially those who are underserved. Recent efforts in providing the public with a guideline or playbook for navigating this process is important progress toward achieving fair and equitable utilization of AI.9 The epitome of AI is its ability to stay “live” and continuously learn over time, calling out the need for continuous monitoring and retraining of models to ensure unbiasedness of data and model outputs.10