There's a global shortage of radiologists. Radiology AI's automatic reporting is key for boosting efficiency and meeting patient needs, especially in resource-poor areas. Multimodal large models enable medical image auto-reporting systems. ChatGPT 4o can diagnose medical images but has issues like being closed-source and "hallucinations." The new open-source Janus Pro 1B-with strong performance, "any-to-any" capability, low cost, and open access-shows potential for medical imaging tasks with training. But little research explores its use here; most models are general, lacking field-specific optimization and systematic evaluation. This study will develop Janus Pro 1B-CXR (a medical image-specific model) via public data, test its value in diagnosis and reporting, and build an efficient automated system.
There is a global shortage of radiologists, and the automatic report generation function of radiology AI systems is crucial for improving medical efficiency and meeting patient needs, especially those in areas with scarce medical resources. Multimodal large models have made it possible to develop automatic report generation systems for medical images. Although ChatGPT 4o has certain capabilities in medical image diagnosis, it has issues such as being closed-source and hallucination. The recently launched open-source multimodal large model Janus-Pro has advantages including high performance, "Any to any", low cost, and open-source; after training and fine-tuning, it has the potential for medical image diagnosis and report generation. However, there is currently a lack of research on the application of Janus Pro 1B in image diagnosis; existing models are mostly general-purpose, lacking in-depth optimization for specific fields and systematic multi-dimensional evaluation methods. This study aims to develop a large model specialized in medical images, Janus Pro 1B-CXR, using public databases, verify its application value in image diagnosis and radiology report generation, and construct an efficient and accurate automated medical image analysis and diagnostic assistance system.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
DIAGNOSTIC
Masking
DOUBLE
Enrollment
296
Radiologists generate reports with reference to AI reports
The First Affiliated Hospital of Henan University of science and technology
Luoyang, China
Union Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology
Wuhan, China
The First Affiliated Hospital of Zhengzhou University
Zhengzhou, China
Report quality scores in the prospective study
In this prospective study, the quality of reports generated by junior radiologists was assessed using a 5-point Likert scale titled "Radiology Report Quality Assessment Scale", where the minimum value is 1 and the maximum value is 5, with higher scores indicating better report quality. These scores were compared between the AI-assisted group (junior radiologists using AI tools for report generation) and the standard care group (junior radiologists generating reports without AI assistance).
Time frame: 1 week
Agreement evaluation in the prospective study
In this prospective study, the agreement between reports generated by junior radiologists and standard reports was assessed using the RADPEER scoring system-a peer review program established by the American College of Radiology (ACR) designed to evaluate the interpretation accuracy of radiologists-where the degree of concordance is measured by grading discrepancies and agreements according to specific criteria that also account for the clinical significance of differences. The RADPEER system uses a 5-category scale with a minimum value of 1 and a maximum value of 5, where higher scores indicate greater agreement between the generated reports and standard reports.
Time frame: 1 week
Pairwise preference tests in the prospective study
In this prospective study, the preference between reports generated by junior radiologists in the AI-assisted group versus the standard care group was evaluated using the "Expert Pairwise Preference Assessment Tool", a structured measurement tool designed to quantify expert consensus on report superiority. The assessment was conducted by a panel of 5 independent radiology experts, who reviewed paired reports (one from the AI-assisted group and one from the standard-care group for the same clinical case) and individually indicated their preference for which report was more clinically valuable, accurate, or comprehensive. The unit of measure for this outcome is the "Percentage of paired cases with majority expert preference", defined as cases where ≥3 out of 5 experts expressed a clear preference for either the AI-assisted or standard care report.
Time frame: 1 week
Reading Time in the prospective study
The time from when radiologists began examining chest radiographs to the completion of final reports, comparing efficiency between the AI-assisted and Standard-care groups.
Time frame: 1 week
Report Quality Score in the retrospective study
In this retrospective study, the quality of reports generated by Janus-Pro-CXR, Janus-Pro, and ChatGPT 4o (compared to standard reports) was assessed using the 5-point Likert scale titled "Radiology Report Quality Assessment Scale". The scale has a minimum value of 1 and a maximum value of 5, with higher scores indicating better report quality.
Time frame: 1 week
Agreement Evaluation in the retrospective study
In this retrospective study, the agreement between reports generated by Janus-Pro-CXR, Janus-Pro, and ChatGPT 4o and standard reports was assessed using the measurement tool titled "RADPEER Scoring System"-a structured peer review system established by the American College of Radiology (ACR) for evaluating radiological interpretation accuracy. The RADPEER system uses a 5-category scale with a minimum value of 1 and a maximum value of 5, where higher scores indicate greater agreement between the generated reports and standard reports.
Time frame: 1 week
Pairwise preference tests in the retrospective study
In this retrospective study, the preference between reports generated by Janus-Pro-CXR, Janus-Pro, and ChatGPT 4o (compared to standard reports) was evaluated using the "Expert Pairwise Preference Assessment Tool"-a structured measurement tool designed to quantify expert consensus on report superiority. The assessment was conducted by a panel of 5 independent radiology experts, who reviewed paired reports (matching reports from a model-generated report vs. a standard report for the same clinical case) and individually indicated their preference based on predefined criteria including clinical accuracy, completeness, clarity, and diagnostic utility. The unit of measure for this outcome is the "Percentage of paired cases with majority expert preference", defined as cases where ≥3 out of 5 experts expressed a clear preference for one report over the other.
Time frame: 1 week
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.