数据仓库
计算机科学
数据库
吞吐量
体积热力学
调度(生产过程)
物化视图
表(数据库)
过程(计算)
数据提取
数据转换
仪表板
数据挖掘
实时计算
视图
数据库设计
操作系统
工程类
运营管理
物理
梅德林
量子力学
法学
政治学
无线
作者
B Suriansyah,Amil Ahmad Ilham,Ady Wahyudi Paundu
标识
DOI:10.1109/iccosite57641.2023.10127721
摘要
Data growth is increasing day by day, so the data stored in the data warehouse is increasingly piling up. When data is displayed on the dashboard or information system, performance is slow because the process of loading queries from the data warehouse to the information system will access all the data stored in the data warehouse tables. This causes the speed of loading data on information systems to decrease, so optimization is needed in the data warehouse so that the load process becomes lighter even though data growth is increasing. In this research, a scheduling algorithm will be created in Hadoop whose job is to execute the transform extraction process and load summary data into several tables. Aims to streamline and optimized the Extract, Transform, Load (ETL) process to the data warehouse and reduce the volume of data in one table, then will be indexed according to the primary key in each table so that when data is joined to several tables it can be executed quickly. After testing by querying data with the same goal but different tables, namely tables that are optimized and unoptimized produce a query time of 1.418 seconds, while tables unoptimized have a query time of 2.418 seconds. Well as testing the speed of loading data into the information system by comparing the throughput of systems that are optimized and those that are unoptimized have an average throughput difference of 85%. With these results, it can be concluded that the speed in loading data into the information system has been successfully optimized by looking at this comparison.
科研通智能强力驱动
Strongly Powered by AbleSci AI