定制
计算机科学
软件部署
管道(软件)
可信赖性
数据科学
可扩展性
数据质量
人工智能
软件工程
数据库
计算机安全
工程类
运营管理
公制(单位)
程序设计语言
法学
政治学
作者
Weixin Liang,Girmaw Abebe Tadesse,Daniel E. Ho,Li Fei-Fei,Matei Zaharia,Ce Zhang,James Zou
标识
DOI:10.1038/s42256-022-00516-1
摘要
As artificial intelligence (AI) transitions from research to deployment, creating the appropriate datasets and data pipelines to develop and evaluate AI models is increasingly the biggest challenge. Automated AI model builders that are publicly available can now achieve top performance in many applications. In contrast, the design and sculpting of the data used to develop AI often rely on bespoke manual work, and they critically affect the trustworthiness of the model. This Perspective discusses key considerations for each stage of the data-for-AI pipeline—starting from data design to data sculpting (for example, cleaning, valuation and annotation) and data evaluation—to make AI more reliable. We highlight technical advances that help to make the data-for-AI pipeline more scalable and rigorous. Furthermore, we discuss how recent data regulations and policies can impact AI.
科研通智能强力驱动
Strongly Powered by AbleSci AI