GenerativeMTD: A deep synthetic data generation framework for small datasets

计算机科学 水准点(测量) 数据挖掘 成对比较 合成数据 深度学习 k-最近邻算法 忠诚 小数据 人工智能 生成模型 机器学习 生成语法 电信 大地测量学 地理
作者
Jayanth Sivakumar,R. Karthik,Menaka Radhakrishnan,Daehan Won
出处
期刊:Knowledge Based Systems [Elsevier]
卷期号:280: 110956-110956 被引量:5
标识
DOI:10.1016/j.knosys.2023.110956
摘要

Synthetic data generation for tabular data unlike computer vision, is an emerging challenge. When tabular data needs to be synthesized, it either faces a small dataset problem or violates privacy if the data contains sensitive information. When the data is small, any data-driven modeling leads to biased decision making. On the other hand, deep learning models that use small dataset for training are limited. Tabular data also faces a myriad of challenges, such as mixed data types, fidelity, mode collapse, etc. To eradicate small dataset problems and increase the deep learning capabilities on small data, a new generative method, GenerativeMTD, is proposed in this research. The method generates fake data by using pseudo-real data as input during the training. Pseudo-real data serves the purpose of training the deep learning model with large samples when the real dataset size is small. The pseudo-real data is generated from the real data through k-nearest neighbor mega-trend diffusion. This pseudo-real data is then translated into synthetic data that is similar and realistic to the real data. The method outperforms some of the state-of-the-art methodologies that exist in tabular data generation. The proposed method also generates quality synthetic data for the benchmark datasets in terms of pairwise correlation differences. In addition, the method surpasses the benchmark models in terms of the distance-based privacy metrics: distance to the closest record and nearest neighbor distance ratio.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
HEIKU应助yangyangyang采纳,获得10
刚刚
Esfuerzo完成签到,获得积分10
刚刚
科研通AI5应助安静的安寒采纳,获得10
1秒前
吃鸡蛋不吃鸡蛋黄完成签到,获得积分10
1秒前
royan2完成签到,获得积分10
1秒前
阿勒泰完成签到,获得积分10
1秒前
小于爱科研完成签到,获得积分10
1秒前
1秒前
zkc完成签到,获得积分10
1秒前
1秒前
luo发布了新的文献求助30
1秒前
雾蓝发布了新的文献求助10
1秒前
2秒前
zhang发布了新的文献求助10
2秒前
佳佳发布了新的文献求助10
3秒前
royan2发布了新的文献求助10
3秒前
3秒前
zkc发布了新的文献求助10
4秒前
4秒前
4秒前
4秒前
沐沐君完成签到,获得积分10
4秒前
nancyzhy完成签到,获得积分10
4秒前
当时明月在完成签到,获得积分0
4秒前
共享精神应助无情念之采纳,获得10
5秒前
zhenzhen发布了新的文献求助10
5秒前
韭黄发布了新的文献求助10
5秒前
5秒前
5秒前
CodeCraft应助科研通管家采纳,获得10
5秒前
852应助科研通管家采纳,获得10
5秒前
在水一方应助科研通管家采纳,获得10
5秒前
小马甲应助科研通管家采纳,获得10
5秒前
5秒前
英姑应助科研通管家采纳,获得10
5秒前
maox1aoxin应助科研通管家采纳,获得30
6秒前
CipherSage应助科研通管家采纳,获得10
6秒前
激昂的幻梦完成签到,获得积分10
6秒前
6秒前
斯文败类应助科研通管家采纳,获得10
6秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3527699
求助须知:如何正确求助?哪些是违规求助? 3107752
关于积分的说明 9286499
捐赠科研通 2805513
什么是DOI,文献DOI怎么找? 1539954
邀请新用户注册赠送积分活动 716878
科研通“疑难数据库(出版商)”最低求助积分说明 709759