Early Prediction of Bronchopulmonary Dysplasia in Preterm Infants Using Clinical Data from the First Three Postnatal Weeks with Large Language Models: A Retrospective Study This retrospective, observational study aims to evaluate the early prediction of bronchopulmonary dysplasia (BPD) in preterm infants using clinical data from the first, second, and third postnatal weeks. The study includes infants born before 32 weeks of gestation or weighing less than 1,500 grams, followed at the Neonatal Intensive Care Unit of Konya City Hospital. The study will compare the performance of different large language models (LLMs), including ChatGPT, Gemini, and Claude, in predicting BPD development. Clinical variables such as gestational age, birth weight, respiratory support, oxygen requirement, mechanical ventilation duration, and infection status will be used. Primary outcome: Accuracy of BPD risk prediction by each AI model compared to actual clinical outcomes. Secondary outcomes: Sensitivity and specificity of predictions, weekly prediction performance, and comparative performance among AI models. The results will provide insight into the potential clinical utility of AI-based approaches for early BPD risk assessment in preterm infants.
Premature birth remains a major risk factor for neonatal morbidity and mortality, with bronchopulmonary dysplasia (BPD) representing one of the most significant chronic pulmonary complications in very preterm infants. Despite advances in neonatal intensive care, early and accurate prediction of BPD remains challenging due to the multifactorial nature of its pathophysiology, involving respiratory support requirements, oxygen exposure, infection burden, and perinatal factors. This retrospective study evaluates the feasibility of using large language models (LLMs) for early prediction of BPD based on structured clinical data extracted from neonatal intensive care unit (NICU) records. Clinical variables are organized into weekly datasets corresponding to the first, second, and third postnatal weeks to capture the dynamic evolution of respiratory status and clinical condition over time. Standardized and anonymized patient-level datasets are formatted into structured prompts and provided to multiple LLMs (ChatGPT, Gemini, and Claude). Each model receives identical input variables to ensure comparability. The models are instructed to generate categorical risk stratification (low, medium, high) along with corresponding probability estimates for BPD development. To ensure methodological consistency, prompt engineering is standardized across all models and time points. Outputs are recorded for each weekly time window, allowing temporal comparison of predictive performance and assessment of how early postnatal data influences model accuracy. Model outputs are subsequently compared with confirmed clinical outcomes of BPD development in the study population. Performance evaluation focuses on discriminative ability and calibration of predictions across different time points and models. This design enables a systematic assessment of the potential role of LLM-based approaches in neonatal risk stratification and provides insight into their applicability as supportive clinical decision-making tools in neonatal intensive care settings.
Study Type
OBSERVATIONAL
Enrollment
108
Different large language models (ChatGPT, Gemini, Claude) will analyze retrospective clinical data to predict the risk of bronchopulmonary dysplasia (BPD). This is an observational evaluation; no experimental treatment or therapy is administered.
Konya City Hospital, İstiklal, Adana Çevre Yolu Cd. No:135/1
Konya, Turkey (Türkiye)
Accuracy of bronchopulmonary dysplasia (BPD) risk prediction by artificial intelligence (AI) models in preterm infants.
The primary outcome is the accuracy of different large language models (ChatGPT, Gemini, Claude) in predicting BPD development. AI-generated risk predictions will be compared to actual clinical outcomes to assess prediction correctness.
Time frame: Postnatal weeks 1, 2, and 3
Sensitivity and specificity of AI predictions
Evaluate the true positive rate (sensitivity) and true negative rate (specificity) of each AI model's BPD risk predictions compared to actual outcomes.
Time frame: Postnatal weeks 1, 2, and 3
Comparison of prediction accuracy across postnatal weeks
Compare AI model performance at different postnatal weeks to determine if prediction accuracy improves as more clinical data becomes available.
Time frame: Postnatal weeks 1, 2, and 3
Comparative performance of different AI models
Compare AI model performance at different postnatal weeks to determine if prediction accuracy improves as more clinical data becomes available.
Time frame: Postnatal weeks 1, 2, and 3
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.