机器学习
人工智能
计算机科学
医疗保健
在线机器学习
学习曲线
匹配(统计)
无监督学习
作者
Dianbo Liu,Kathe Fox,Griffin Weber,Tim Miller
标识
DOI:10.1016/j.jbi.2022.104151
摘要
A patient's health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose.Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions "confederated machine learning", which we aim to develop in this study.We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements.Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims.Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.
科研通智能强力驱动
Strongly Powered by AbleSci AI