摘要
The use of a health big data via Medical Information Mart for Intensive Care (MIMIC) data sets has significant advancements in health informatics and clinical research of electronic health records (EHRs) in hospital systems. MIMIC-III and MIMIC-IV data sets are the two latest publicly available iterations of data from electronic records that possess distinct features, variables, and structures in two separate large relational databases. This study aimed to provide big data analytics comparisons of MIMIC-III and MIMIC-IV for experiential learning of health informatics from EHRs.Both data sets were quantitatively and qualitatively evaluated based on dataset properties (data, list of tables, timeline, patient encounters, software, system), data mining and visual categories (i.e., data size, data structure and schema, features/variables, data quality and completeness, clinical focus, access and use, dashboards, and visualization types), data usability heuristics and utilization (i.e., usability, complexity, granularity, applicability, and capacity). Results showed significant difference of 100 patients across 26 tables for MIMIC-III compared to 2,520 patients across 31 tables for MIMIC-IV data sets, respectively. There were 1716 diagnoses (ICD-9) and 503 procedures (ICD-9) for MIMIC-III. There were 262 diseases categories with >1000 disease instances and >2000 treatments for MIMIC-IV with no ICD codes. Moreover, these data sets contained large (big data) charting events in MIMIC-III of 758,356 rows, and MIMIC-IV eICU with charting events of 1,477,163 for nursing, and respiratory of 176,089 rows, respectively. The results suggest that MIMIC-III provided detailed information for retrospective clinical studies and operations in critical care with high data granularity in terms of re-admission (calculated fields from its admission table), length of stay, prescriptions, caregivers, and diagnosis and procedure (ICD-9). However, it lacked clinical capacity because of no diagnosis or charting event offset times or APACHE IV (Acute Physiology and Chronic Health Evaluation) scores that were in the MIMIC-IV eICU dataset. Hence, MIMIC-IV showed higher data granularity and capacity. MIMIC-IV eICU dataset introduces enhanced data attributes, more sophisticated patient trajectory tracking at the ICU unit level, and improved detailed information from electronic records. Moreover, MIMIC-IV has high usability, complexity, and applicability to critical care but lacking hospital operational data of re-admissions, caregivers, and ICD codes that MIMIC-III contained. However, MIMIC-IV dataset contained complex data schemas of treatment strings, nurse care plans, lab results, medications, microbiology, as well as infusion drug and respiratory charting. The big data analytics of MIMIC-III and MIMIC-IV needs to be further investigated for AI applications to demonstrate its usefulness for dynamic decision-making in health care.