The global shortage of radiologists is a pressing issue, particularly in regions with limited medical resources. Against this backdrop, making report generation the core objective of radiology AI systems not only aligns with the practical needs of radiologists but also better serves patient requirements. With the development of multimodal large models, it has become possible to develop automatic report generation systems for medical images. Although ChatGPT 4o has demonstrated certain capabilities in multiple medical sub - fields, as a closed - source system, it has limitations. Its model generation mechanism is opaque, and issues such as hallucination exist. Recently, Deepseek's open - source multimodal large model, Janus - Pro, an "any to any" model, has the advantages of high performance, low cost, and open - source. Nature published three consecutive articles introducing its stunning features. After training and fine - tuning, Janus - Pro shows great potential in medical image diagnosis and report generation. However, currently, the application of Janus - Pro in image diagnosis has not been evaluated. Most existing models are highly versatile but lack optimization for specific domains, and there is a lack of systematic and multi - dimensional evaluation methods to determine the pros and cons of multimodal large models in medical radiology. Based on these current situations, the purpose of our research is to develop and verify the application value of large models dedicated to medical images in image diagnosis and radiology report generation.
Study Type
OBSERVATIONAL
Enrollment
300
The Janus-Pro-CXR group has the AI-generated report for reference and can make changes to the AI-generated report. Another group completes the report independently.
The First Affiliated Hospital of Henan University of science and technology
Luoyang, Henan, China
The First Affiliated Hospital of Zhengzhou University
Zhengzhou, Henan, China
Union Hospital Affiliated to Tongji Medical College, Huazhong University of Science and Technology
Wuhan, Hubei, China
Report Quality Score and agreement score
The evaluation assessed the reports generated by the SCP group and the AI-assisted group, using a five-point Likert scale (5 being the best, 1 being the worst) to subjectively evaluate the capability of the large model in generating imaging reports. The RADPEER scoring system was used to assess the agreement between the original reports and the generated reports, including the clinical significance of any discrepancies.
Time frame: No more than 1 week
Pairwise preference
For each evaluator, they must select the better report between the one generated by the large model and the published report. The proportion of cases where three or more evaluators agreed that the AI-assisted group's reports were superior to those of the SCP group was calculated.
Time frame: No more than 1 week
Reading Time
To evaluate the impact of the large model on workflow efficiency, the reading time-defined as the duration from when a radiologist begins reviewing a chest X-ray to the completion of the final radiology report-was measured and automatically recorded by the system.
Time frame: No more than 1 week
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.