Announcement

Free access for yesterday and today

Customer Service: cat_manager

View Pricing

加载中

正在获取最新内容，请稍候...

Back to all papers

Academic Review

MIMIC-IV-Ext-22MCTS: A 22 Million-Event Temporal Clinical Time-Series Dataset for Risk Prediction

2025-05-07

Evaluated by AI Assistant

National Library of Medicine,

AWS AI Labs,

Carnegie Mellon University,

Stevens Institute of Technology,

University of Illinois Urbana-Champaign

Evaluation Overview

Core information and assessment summary

Quality Metrics

Logical Coherence

high

The paper presents a clear problem, systematically addresses the challenges, details a novel framework, describes the resulting dataset, and evaluates its utility through relevant downstream tasks, maintaining a logical flow throughout.

Methodological Rigor

high

Strengths: Detailed description of the proposed end-to-end annotation framework., Specific components and models used (retrieval methods, LLM, embedding models) are identified., Clear definition of clinical event and timestamp annotation process., Comprehensive evaluation on multiple downstream tasks using established datasets (PubMedQA, TREC CTM)., Comparison against relevant baseline models (vanilla BERT, vanilla GPT-2)., Use of multiple standard evaluation metrics (Accuracy, NDCG, Precision, Recall).
Weaknesses: Validation of the annotation quality itself is primarily based on concordance with manual annotation on a small sample (10 case reports) rather than a broader evaluation of the entire generated dataset., The process for post-processing LLM output errors is mentioned but not detailed rigorously.

Evidence Sufficiency

high

The paper provides strong evidence for the utility of the dataset by demonstrating significant performance improvements on three distinct downstream healthcare NLP tasks when used for fine-tuning standard models. The scale of the dataset (22.5M+ events) is a key piece of evidence for its value. Evidence for the quality of the annotation is less extensive, relying on a small validation sample.

Novelty & Originality

high

The release of a large-scale clinical time series dataset specifically annotated with detailed relative timestamps from unstructured clinical notes appears to be highly original. The proposed end-to-end framework combining contextual retrieval with prompted LLMs for this specific information extraction task also demonstrates novelty.

Significance & Impact

high potential

The dataset addresses a recognized need for temporal clinical data, which is crucial for advancing predictive modeling and causal reasoning in healthcare. Demonstrated improvements on key tasks like QA and CTM suggest high practical significance and potential to impact clinical informatics research and applications.

Writing Clarity

good

Strengths: The problem statement and motivations are clearly articulated., The components of the proposed framework are described step-by-step., Technical terms are used appropriately within the domain., Evaluation procedures and metrics are explained.
Areas for Improvement: Some complex sentences or phrasing could be simplified for slightly improved readability.

Main Contributions

Theoretical: Proposal of a novel end-to-end annotation framework combining contextual retrieval and LLM-based reasoning for temporal clinical event extraction.

Methodological: Development of a high-quality prompt strategy for LLMs; design of a specific architecture (Temporal BERT) for integrating textual and temporal features.

Practical: Release of the MIMIC-IV-Ext-22MCTS dataset (22.5M+ events); Release of fine-tuned BERT and GPT-2 models on this dataset; Demonstration of dataset utility for improving performance on medical question answering, clinical trial matching, and sentence completion.

Context Information

Topic Timeliness: high

Literature Review Currency: good

Disciplinary Norm Compliance: Basically following Paradigm. The paper adheres to typical research norms for AI/NLP work in healthcare by defining the problem, proposing a method, creating/using a dataset, and evaluating performance using standard metrics and baselines.

Inferred Author Expertise: Clinical Informatics, Artificial Intelligence, Machine Learning, Natural Language Processing, Computer Science

Evaluation Summary

Logical Coherence

high

Methodological Rigor

high

Sufficiency of Evidence

high

Novelty and Originality

high

Significance and Impact

high potential

Writing Clarity

good

Objectivity and Bias

Not Assessed

Evaluator: AI Assistant

Evaluation Date: 2025-05-07

Related Papers

Infrared Image Deturbulence Restoration Using Degradation Parameter-Assisted Wide & Deep Learning

Beihang University, Image Processing Center; Beihang University; Beihang University

View Details →

Neural network distillation of orbital dependent density functional theory

ETH Zürich, Institute for Theoretical Physics; Flatiron Institute, Center for Computational Quantum Physics; City College of New York, Department of Physics; The Graduate Center, City University of New York, Department of Physics; NIST and University of Maryland, Joint Quantum Institute; Princeton University, Department of Chemistry; University of Bologna, Department of Physics and Astronomy; Max Planck Institute for the Structure and Dynamics of Matter; Flatiron Institute, Initiative for Computational Catalysis

View Details →

Copper delocalization leads to ultralow thermal conductivity in chalcohalide CuBiSeCl2

Xi'an Jiaotong University, School of Materials Science and Engineering; Xi'an University of Science and Technology, Department of Applied Physics; AiMaterials Research LLC

View Details →