We conducted a single-center, retrospective observational study to evaluate large language models (ChatGPT 4o, GPT-5, DeepSeek) for automated interpretation of de-identified IOLMaster 700 reports provided as raster images. Models produced structured biometric extraction, toric IOL recommendation, and refractive predictions (sphere, cylinder, axis). Primary outcomes included parameter-level agreement and refractive error metrics; secondary outcomes included decision-support performance for toric IOL selection and agreement on ordered T-codes. No clinical intervention was performed.
This study compares three large language models accessed in their native configurations, without fine-tuning or external tools. For each examination, the original IOLMaster 700 report image was supplied without manual annotation or pre-processing. A standardized instruction required: (i) structured extraction of AL, ACD, LT, WTW, K1/K2 and axes, ΔK, TK1/TK2 and axes, and ΔTK; (ii) binary toric candidacy and T-code according to institutional ALCON mapping; and (iii) refractive recommendations (sphere, cylinder, implantation axis). Each model generated three independent outputs per case. De-identification and IRB oversight (waiver of consent) were implemented according to institutional policy. The unit of enrollment is participants (n=54), with outcomes analyzed per eye (162 eyes) and per model generation where applicable.
Study Type
OBSERVATIONAL
Enrollment
100
Eye and ENT hospital of Fudan University
Shanghai, Shanghai Municipality, China
RECRUITINGRefractive prediction error for sphere
Mean absolute error (MAE, diopters) of model-predicted sphere versus clinical reference
Time frame: At index examination
Cohen's kappa with 95% CIs between model
Cohen's kappa with 95% CIs between model outputs and clinician-validated reference for per-parameter
Time frame: At index examination (single time point)
Cylinder prediction error
Mean absolute error (MAE, diopters) of model-predicted Cylinder
Time frame: At index examination
Axis prediction error
Mean absolute error (MAE, diopters) of model-predicted Axis
Time frame: At index examination
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.