计算机科学
估计员
样品(材料)
推论
约束(计算机辅助设计)
统计推断
数据挖掘
采样(信号处理)
样本量测定
人工智能
统计
数学
化学
几何学
滤波器(信号处理)
色谱法
计算机视觉
作者
Rui Pan,Yingqiu Zhu,Baishan Guo,Xuening Zhu,Hansheng Wang
出处
期刊:IEEE Transactions on Knowledge and Data Engineering
[Institute of Electrical and Electronics Engineers]
日期:2023-09-01
卷期号:35 (9): 9502-9513
被引量:2
标识
DOI:10.1109/tkde.2023.3241075
摘要
The emergence of massive data in recent years brings challenges to automatic statistical inference. This is particularly true if the data are too numerous to be read into memory as a whole. Accordingly, new sampling techniques are needed to sample data from a hard drive. In this paper, we propose a sequential addressing subsampling (SAS) method that can sample data directly from the hard drive. The newly proposed SAS method is time saving in terms of addressing cost compared to that of the random addressing subsampling (RAS) method. Estimators (e.g., the sample mean) based on the SAS subsamples are constructed, and their properties are studied. We conduct a series of simulation studies to verify the finite sample performance of the proposed SAS estimators. The time cost is also compared between the SAS and RAS methods. An analysis of the airline data is presented for illustration purpose.
科研通智能强力驱动
Strongly Powered by AbleSci AI