Exploring Large-scale Public Medical Image Datasets

计算机科学 人工智能 比例(比率) 图像(数学) 数据科学 情报检索 地图学 地理
作者
Luke Oakden‐Rayner
出处
期刊:Academic Radiology [Elsevier BV]
卷期号:27 (1): 106-112 被引量:177
标识
DOI:10.1016/j.acra.2019.10.006
摘要

Medical artificial intelligence systems are dependent on well characterized large-scale datasets. Recently released public datasets have been of great interest to the field, but pose specific challenges due to the disconnect they cause between data generation and data usage, potentially limiting the utility of these datasets.We visually explore two large public datasets, to determine how accurate the provided labels are and whether other subtle problems exist. The ChestXray14 dataset contains 112,120 frontal chest films, and the Musculoskeletal Radiology (MURA) dataset contains 40,561 upper limb radiographs. A subset of around 700 images from both datasets was reviewed by a board-certified radiologist, and the quality of the original labels was determined.The ChestXray14 labels did not accurately reflect the visual content of the images, with positive predictive values mostly between 10% and 30% lower than the values presented in the original documentation. There were other significant problems, with examples of hidden stratification and label disambiguation failure. The MURA labels were more accurate, but the original normal/abnormal labels were inaccurate for the subset of cases with degenerative joint disease, with a sensitivity of 60% and a specificity of 82%.Visual inspection of images is a necessary component of understanding large image datasets. We recommend that teams producing public datasets should perform this important quality control procedure and include a thorough description of their findings, along with an explanation of the data generating procedures and labeling rules, in the documentation for their datasets.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
活力安南发布了新的文献求助10
刚刚
天天快乐应助空城采纳,获得10
刚刚
刚刚
_呱_完成签到,获得积分10
刚刚
33月完成签到 ,获得积分10
1秒前
stife32发布了新的文献求助100
1秒前
1秒前
研友_VZG7GZ应助ying采纳,获得10
1秒前
脑洞疼应助落月铭采纳,获得30
1秒前
Zzz发布了新的文献求助10
1秒前
2秒前
如闪电般归来完成签到,获得积分10
2秒前
李健的小迷弟应助lee采纳,获得30
3秒前
3秒前
3秒前
静书发布了新的文献求助10
3秒前
天才包完成签到 ,获得积分10
3秒前
小麻豆完成签到,获得积分10
3秒前
3秒前
乾之三爻发布了新的文献求助10
3秒前
4秒前
充电宝应助guihai采纳,获得10
4秒前
什么也难不倒我完成签到 ,获得积分10
4秒前
5秒前
KristenStewart完成签到,获得积分10
5秒前
Gzdaigzn完成签到,获得积分10
5秒前
leotao完成签到,获得积分10
5秒前
代代代代发布了新的文献求助10
5秒前
DZQ发布了新的文献求助10
6秒前
6秒前
huzi完成签到,获得积分10
6秒前
木木木完成签到,获得积分10
6秒前
英姑应助loboto采纳,获得10
6秒前
6秒前
小白菜发布了新的文献求助10
7秒前
7秒前
郑玉成发布了新的文献求助10
7秒前
不配.应助殷勤的觅松采纳,获得60
7秒前
月岛滴滴完成签到,获得积分10
8秒前
baolike完成签到,获得积分10
8秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Zur lokalen Geoidbestimmung aus terrestrischen Messungen vertikaler Schweregradienten 1000
Storie e culture della televisione 500
Selected research on camelid physiology and nutrition 500
《2023南京市住宿行业发展报告》 500
Architectural Corrosion and Critical Infrastructure 400
A review of Order Plesiosauria, and the description of a new, opalised pliosauroid, Leptocleidus demoscyllus, from the early cretaceous of Coober Pedy, South Australia 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4890960
求助须知:如何正确求助?哪些是违规求助? 4174608
关于积分的说明 12956124
捐赠科研通 3936644
什么是DOI,文献DOI怎么找? 2159757
邀请新用户注册赠送积分活动 1178149
关于科研通互助平台的介绍 1083632