作者
Julian Matschinske,Julian Späth,Mohammad Bakhtiari,Niklas Probul,Mohammad Mahdi Kazemi Majdabadi,Reza Nasirigerdeh,Reihaneh Torkzadehmahani,Anne Hartebrodt,Balazs-Attila Orban,Sándor-József Fejér,Olga Zolotareva,Supratim Das,Linda Baumbach,Josch K. Pauling,Olivera Tomašević,Béla Bihari,Marcus Bloice,Nina C. Donner,Walid Fdhila,Tobias Frisch,Anne-Christin Hauschild,Dominik Heider,Andreas Holzinger,Walter Hötzendorfer,Jan Hospes,Tim Kacprowski,Markus Kastelitz,Markus List,Rudolf Mayer,Mónika Moga,Heimo Müller,Anastasia Pustozerova,Richard Röttger,Christina C. Saak,Anna Saranti,Harald Schmidt,Christof Tschohl,Nina K. Wenke,Jan Baumbach
摘要
Background Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. Objective Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. Methods The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. Results FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. Conclusions FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.