Multimodal Deep Learning Model for Multi-task Diagnosis and Triage Suggestions of Ophthalmic Diseases

Guangdong Provincial People's Hospital2,000 enrolled

Overview

Accurate and comprehensive interpretation of anterior segment diseases from slit-lamp and smartphone photographs remains a clinical challenge due to the limited specificity and structure of existing Artificial Intelligence tools. The purpose of this international, multicenter clinical trial is to developed and validated an agent-based framework that integrates vision-language models and large language models to enhance the diagnostic workflow of anterior segment diseases.

Study Type

OBSERVATIONAL

Enrollment

2,000

Conditions

Anterior Segment Diseases

Interventions

Multimodal Vision-language Model DiagnosisDIAGNOSTIC_TEST

Multimodal Vision-language Model for Multi-task Diagnosis and Triage Suggestions of Ophthalmic Diseases Patients presenting with complaints of anterior segment diseases first complete a slit-lamp examination or take a mobile phone eye photograph. A multimodal vision-language model uses patient-related images (such as selfies and eye exam photos) to make an intelligent diagnosis. The diagnosis is kept private. The patient then seeks medical attention and undergoes a clinical examination by an experienced clinician. A second experienced clinician then reviews the clinical diagnosis. If the diagnosis agrees, it is considered the gold standard. If there is a discrepancy in the diagnosis, the consensus between the two clinicians is used as the gold standard.

Eligibility

Sex: ALLMin age: 18 YearsHealthy volunteers:

Medical Language ↔ Plain English

Inclusion Criteria: 1. Informed consent obtained; 2. Participants should be sufficiently able to read, write, and understand Chinese or English; 3. For normal participants: individuals should have no concerns related to their eyes. 4. For participants with eye-related chief complaints: individuals should have specific concerns or issues related to their eyes. Exclusion Criteria: 1. Incomplete clinical data to support final diagnosis; 2. Patients who, in the opinion of the attending physician or clinical study staff, are too medically unstable to participate in the study safely.

Outcomes

Primary Outcomes

Diagnostic accuracy of multimodal vision-language model.

For each patient, the diagnoses generated by the multimodal vision-language model and the clinical diagnosis provided by skilled clinicians were documented and compared. Consistency between the two diagnoses indicates the program's precision in clinical practice.

Time frame: from July 2025 to September 2025