缺少数据
计算机科学
航程(航空)
集合(抽象数据类型)
机器学习
数据科学
数据集
人工智能
大数据
组分(热力学)
数据挖掘
工程类
热力学
物理
航空航天工程
程序设计语言
作者
Robin Mitra,Sarah F. McGough,Tapabrata Chakraborti,Chris Holmes,Ryan Copping,Niels Hagenbuch,Stefanie Biedermann,Jack Noonan,Brieuc Lehmann,Aditi Shenvi,Xuan Vinh Doan,David Leslie,Ginestra Bianconi,Rubén J. Sánchez-García,Alisha Davies,Maxine Mackintosh,Eleni‐Rosalina Andrinopoulou,Anahid Basiri,Chris Harbron,Ben D. MacArthur
标识
DOI:10.1038/s42256-022-00596-z
摘要
Missing data are an unavoidable complication in many machine learning tasks. When data are ‘missing at random’ there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such ‘structured missingness’ raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here we outline the current literature and propose a set of grand challenges in learning from data with structured missingness. Gathering big datasets has become an essential component of machine learning in many scientific areas, but it is unavoidable that some data values are missing. An important and growing effect that needs careful attention, especially when heterogeneous data sources are combined, is that of structured missingness, where data values are missing not at random, but with a specific structure.
科研通智能强力驱动
Strongly Powered by AbleSci AI