水准点(测量)
材料信息学
信息学
计算机科学
聚合物
数据库
数据科学
数据挖掘
健康信息学
材料科学
工程信息学
工程类
地理
公共卫生
护理部
复合材料
电气工程
医学
大地测量学
标识
DOI:10.1021/acs.jcim.0c00726
摘要
Open-source data on large scale are the cornerstones for data-driven research, but they are not readily available for polymers. In this work, we build a benchmark database, called PI1M (referring to ∼1 million polymers for polymer informatics), to provide data resources that can be used for machine learning research in polymer informatics. A generative model is trained on ∼12 000 polymers manually collected from the largest existing polymer database PolyInfo, and then the model is used to generate ∼1 million polymers. A new representation for polymers, polymer embedding (PE), is introduced, which is then used to perform several polymer informatics regression tasks for density, glass transition temperature, melting temperature, and dielectric constants. By comparing the PE trained by the PolyInfo data and that by the PI1M data, we conclude that the PI1M database covers similar chemical space as PolyInfo, but significantly populate regions where PolyInfo data are sparse. We believe that PI1M will serve as a good benchmark database for future research in polymer informatics.
科研通智能强力驱动
Strongly Powered by AbleSci AI