This is a multi-center, cross-sectional study evaluating a smartphone-based artificial intelligence (AI) system for anterior segment eye disease screening. The system is designed to identify 16 clinically important anterior segment conditions from images captured using a standard Android smartphone. A core design feature of the system is that all image analysis is performed entirely on the smartphone itself, without requiring internet connectivity or cloud-based server infrastructure. The study is motivated by a structural challenge in the deployment of medical AI: systems that depend on cloud infrastructure for inference are non-functional in settings without reliable internet access, which disproportionately excludes populations in low-resource regions where the burden of preventable eye disease is highest. This study evaluates whether an on-device AI system, designed with operational constraints as a primary engineering objective, can deliver clinically acceptable diagnostic performance while remaining operable under real-world connectivity limitations. The study comprises five evaluation components. First, the diagnostic performance of the AI system is benchmarked against board-certified ophthalmologists of varying seniority on a standardized set of smartphone-captured anterior segment images. Second, the usability of the system is evaluated among non-medical users who perform self-administered screening with minimal instruction, with per-screening time recorded across consecutive attempts to characterize the learning curve. Third, a head-to-head field trial directly compares the on-device AI system against a functionally equivalent cloud-based deployment of the same model architecture across key operational dimensions including screening duration, diagnostic performance, and user acceptability. Fourth, population-level screening is conducted among consecutively enrolled community residents at two low-resource sites, with per-disease sensitivity and specificity calculated against reference-standard slit-lamp examinations. Fifth, pre-specified health-economic and environmental analyses compare the two deployment modalities in terms of per-person screening cost, cost-effectiveness, per-inference electricity consumption, and projected carbon emissions at scale. The reference standard for all diagnostic comparisons is slit-lamp biomicroscopic examination performed by board-certified ophthalmologists. The study is designed and reported in accordance with the DECIDE-AI reporting guideline for early-stage clinical evaluation of AI-driven decision-support systems.
Study Type
OBSERVATIONAL
Enrollment
3,000
A structured-pruned one-stage object-detection model deployed as a standalone Android application, performing all image inference on-device without internet connectivity, designed to detect 16 anterior segment eye diseases from smartphone-captured images.
Zhongshan Ophthalmic Center, Sun Yat-sen University
Guangzhou, Guangdong, China
RECRUITINGCase-level diagnostic accuracy of the AI system compared with board-certified ophthalmologists
Case-level accuracy is defined as the proportion of images with fully correct diagnostic labels concordant with the reference standard. The AI system and board-certified ophthalmologists stratified by clinical seniority (junior: fewer than 5 years of independent practice; intermediate: 5 to 15 years; senior: more than 15 years) independently evaluate the same standardized set of smartphone-captured anterior segment images, sampled to ensure balanced representation of all 16 disease categories and normal eyes. The reference standard for each image is established by a senior ophthalmologist with more than 30 years of clinical experience who does not participate in the benchmarking exercise. Clinicians are masked to the AI system output and to each other's assessments throughout.
Time frame: Day 1
Diagnostic accuracy of the AI system when operated by non-medical users
Diagnostic accuracy is defined as the proportion of images correctly classified by non-medical users operating the AI system independently. Non-medical users, including patients and their family members attending the outpatient clinic of Zhongshan Ophthalmic Center, are instructed to install the AI application on their own smartphones, follow the in-app guidelines, and capture anterior segment images of an accompanying person to receive a screening result. All inference is performed on-device without internet connectivity. AI-generated diagnostic outputs are compared with reference-standard diagnoses obtained from subsequent slit-lamp examinations performed by ophthalmologists at the same clinic.
Time frame: Day 1
Sensitivity of the on-device AI system in population-level community screening
Sensitivity is defined as the proportion of participants with a given anterior segment disease who are correctly identified as positive by the on-device AI system (true positives divided by the sum of true positives and false negatives), calculated separately for each target disease category. Village staff without medical training use the on-device AI system to screen consecutively enrolled local residents. The reference standard is established through slit-lamp examinations performed by ophthalmologists.
Time frame: Day 1
Incremental cost-effectiveness ratio of on-device versus cloud-based screening
The incremental cost-effectiveness ratio (ICER) is defined as the difference in lifetime costs between on-device and cloud-based screening divided by the difference in quality-adjusted life-years (QALYs) between the two strategies, estimated from a pre-specified decision-analytic model comprising a decision tree with a downstream Markov state-transition structure applied to a simulated cohort of 100,000 individuals.
Time frame: Day 1
Per-inference electricity consumption of on-device versus cloud-based deployment
Electricity consumption in joules per inference cycle is measured over 10,000 inference cycles on both the on-device smartphone and a standardized cloud-server configuration (Intel Xeon Gold 6248 CPU, NVIDIA Tesla V100 GPU) using the Experiment Impact Tracker toolkit. Cloud-server measurements are averaged across low-traffic and high-traffic server conditions. Results are reported separately for the on-device and cloud-based deployment modalities and compared as a ratio.
Time frame: Day 1
Total image evaluation time of the AI system compared with board-certified ophthalmologists
Total time in seconds required to complete independent evaluation of all images in the standardized benchmarking set, recorded separately for the AI system and for the board-certified ophthalmologists. Reported as mean with standard deviation for both the AI system and the clinician group.
Time frame: Day 1
Per-screening time learning curve among non-medical operators
The duration of each screening attempt in seconds is recorded for analysis. Participants repeat the screening process and self-evaluate their operating proficiency after each attempt. The procedure is concluded once a participant deems himself or herself proficient for three consecutive attempts. The learning curve is characterized by the change in mean per-screening duration across consecutive attempts.
Time frame: Day 1
Screening duration comparing on-device and cloud-based deployment in resource-limited field settings
Mean screening duration in seconds per participant, recorded for each of the two deployment modalities: on-device inference and cloud-based inference using a functionally equivalent deployment of the identical model architecture with inference performed on a remote server. Local residents are divided into two equal groups and screened by village staff without medical training using either the on-device or the cloud-based system.
Time frame: Day 1
User acceptability comparing on-device and cloud-based deployment in resource-limited field settings
User acceptability is assessed using a questionnaire investigating patient satisfaction, privacy concerns, and willingness to recommend the AI system, administered to participants in both the on-device and cloud-based deployment groups. Local residents are divided into two equal groups and screened by village staff without medical training using either the on-device or the cloud-based system. Questionnaire responses are compared between the two deployment groups.
Time frame: Day 1
Diagnostic accuracy of the on-device AI system in population-level community screening
Longhui Li
CONTACT
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Diagnostic accuracy is defined as the proportion of participants for whom the AI system assigns a correct diagnostic label concordant with the reference standard. Village staff without medical training use the on-device AI system to screen consecutively enrolled local residents. The reference standard is established through slit-lamp examinations performed by ophthalmologists.
Time frame: Day 1
Specificity of the on-device AI system in population-level community screening
Specificity is defined as the proportion of participants without a given anterior segment disease who are correctly identified as negative by the on-device AI system (true negatives divided by the sum of true negatives and false positives), calculated separately for each target disease category. Village staff without medical training use the on-device AI system to screen consecutively enrolled local residents. The reference standard is established through slit-lamp examinations performed by ophthalmologists.
Time frame: Day 1