The goal of this observational study is to establish and validate a comprehensive AI-driven clinical decision support system (AI-CDSS) in whole-chain management for pulmonary tuberculosis (TB) patients. The main question it aims to answer is: How is the predictive performance of this system in terms of multiple key links during TB diagnosis and treatment? Can real-world benefits be derived from this system? This AI framework supports clinicians in making smarter decisions, ultimately improving cure rates and ensuring that every patient receives the most effective, personalized care possible.
This study establishes TB-ATLAS (Artificial Intelligence-driven Tuberculosis Landscape Analysis \& Stratification Research), a modular framework for whole-chain TB management. The objective is to develop and validate an umbrella suite of AI-driven models to optimize clinical decision-making from initial diagnosis to post-treatment follow-up. The core hypothesis is that multimodal patient data can stratify TB phenotypes and predict critical clinical events, enabling precision medicine. Beyond the primary focus on distinguishing Easy-to-Treat (ETT) from Hard-to-Treat (HTT) categories, the system incorporates satellite modules for pre-DST drug resistance risk, treatment adherence monitoring, adverse event (AE) early warning, and risk of post-TB lung disease (PTLD). This study employs a retrospective-prospective cohort design. By utilizing retrospective IPD from clinical trials and real-world EHRs (\>30,000 patients), the investigators apply advanced AI, including foundation models for feature representation and multi-task learning for modular development. Integration of structured clinical variables, microbiological profiles, radiomics, and host signatures ensures high-dimensional input. Model interpretability is prioritized via SHAP/LIME to ensure clinical trust. Then the performance will be evaluated using AUROC and calibration metrics. External validation will occur in a prospective cohort (n≥1,600) to assess the system's impact on predicting real-world outcomes compared to standardized care. The expected output is the TB-ATLAS Clinical Decision Support System (AI-CDSS). By providing evidence-based guidance on regimen intensity, resistance risk, and relapse monitoring, this platform facilitates the transition from "one-size-fits-all" standardized care towards individualized precision management, significantly enhancing clinical decision-making across diverse healthcare settings.
Study Type
OBSERVATIONAL
Enrollment
31,600
Hunan Chest Hospital
Changsha, Hunan, China
Huashan Hospital Affiliated to Fudan University
Shanghai, China
Predictive Performance of the "Easy-to-Treat" versus "Hard-to-Treat" stratification model for pulmonary tuberculosis (PTB)
The Area Under the Receiver Operating Characteristic curve (AUROC) of the model for discriminating between PTB patients classified as "Easy-to-Treat" versus "Hard-to-Treat". "Easy-to-Treat" patients are defined as patients with PTB who can achieve favorable outcome when treated with a short-course regimen (≤4 months for drug-sensitive TB, ≤6 months for rifampin-resistant TB). "Hard-to-Treat" patients are defined as patients with PTB who will experience unfavorable outcome on short-course treatment (≤4 months for drug-sensitive TB, ≤6 months for rifampin-resistant TB).
Time frame: from treatment initiation to 6 months post treatment
Brier Score of the "Easy-to-treat" versus "Hard-to-treat" Model
The Brier score will be used to assess the overall prediction accuracy and reliability of the model. It measures the mean squared difference between the predicted probabilities and the actual observed outcomes. The score ranges from 0 to 1, where 0 represents perfect accuracy and 1 represents total inaccuracy. Lower scores mean better model performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Calibration Slope of the "Easy-to-treat" versus "Hard-to-treat" Model
The calibration slope will be calculated to evaluate the agreement between the model's predicted probabilities and the actual observed outcomes. An ideal calibration slope value is 1. Values closer to 1 indicate better calibration performance, meaning the predicted probabilities perfectly reflect the true risk, which indicates a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Pre-Drug Susceptibility Testing (Pre-DST) Drug Resistance Predictive Model
The AUROC curve will be used to evaluate the discrimination performance of the pre-DST (Drug Susceptibility Testing) drug resistance predictive model. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Precision-Recall Curve (AUPRC) of the Pre-Drug Susceptibility Testing (Pre-DST) Drug Resistance Predictive Model
The AUPRC will be used to evaluate the prediction performance of the pre-drug susceptibility testing (Pre-DST) drug resistance predictive model, particularly under conditions of data imbalance. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
F1-score of the Secondary Decision Models for Pre-Drug Susceptibility Testing (Pre-DST) Drug Resistance Prediction
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for pre-drug susceptibility testing (Pre-DST) drug resistance prediction. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Adherence Forecasting Model
The AUROC curve will be used to evaluate the discrimination performance of the adherence forecasting model. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: From treatment initiation until treatment completion, assessed up to 6 months
Area Under the Precision-Recall Curve (AUPRC) of the Adherence Forecasting Model
The AUPRC will be used to evaluate the prediction performance of the adherence forecasting model under data imbalance. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: From treatment initiation until treatment completion, assessed up to 6 months
F1-score of the Secondary Decision Models for Adherence Forecasting
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for adherence forecasting. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: From treatment initiation until treatment completion, assessed up to 6 months
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Treatment Response Predictive Model
The AUROC curve will be used to evaluate the overall discrimination performance of the treatment response predictive model. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Precision-Recall Curve (AUPRC) of the Treatment Response Predictive Model
The AUPRC will be used to evaluate the prediction performance of the treatment response predictive model, particularly under conditions of data imbalance. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
F1-score of the Secondary Decision Models for Treatment Response Prediction
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for treatment response prediction. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Adverse Event (AE) Predictive Model
The AUROC curve will be used to evaluate the overall discrimination performance of the adverse event predictive model. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Precision-Recall Curve (AUPRC) of the Adverse Event (AE) Predictive Model
The AUPRC will be used to evaluate the prediction performance of the adverse event predictive model, particularly under conditions of data imbalance. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
F1-score of the Secondary Decision Models for Adverse Event (AE) Prediction
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for adverse event prediction. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Relapse Predictive Model
The AUROC curve will be used to evaluate the overall discrimination performance of the relapse predictive model. Relapse is defined per the World Health Organization (WHO) standard. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Precision-Recall Curve (AUPRC) of the Relapse Predictive Model
The AUPRC will be used to evaluate the prediction performance of the relapse predictive model, particularly under conditions of data imbalance. Relapse is defined per the World Health Organization (WHO) standard. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
F1-score of the Secondary Decision Models for Relapse Prediction
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for relapse prediction. Relapse is defined per the World Health Organization (WHO) standard. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Receiver Operating Characteristic (AUROC) Curve of the Post-Tuberculosis (TB) Lung Disease Predictive Model
The AUROC curve will be used to evaluate the overall discrimination performance of the post-tuberculosis (TB) lung disease predictive model. The score ranges from 0 to 1, where 0.5 indicates random guessing and 1 represents perfect discrimination. Higher scores mean better discrimination performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
Area Under the Precision-Recall Curve (AUPRC) of the Post-Tuberculosis (TB) Lung Disease Predictive Model
The AUPRC will be used to evaluate the prediction performance of the post-tuberculosis (TB) lung disease predictive model, particularly under conditions of data imbalance. The score ranges from 0 to 1. Higher scores mean better precision and recall performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
F1-score of the Secondary Decision Models for Post-Tuberculosis (TB) Lung Disease Prediction
The F1-score, calculated as the harmonic mean of precision and recall, will be used to evaluate the classification performance of the secondary decision models for post-tuberculosis (TB) lung disease prediction. The score ranges from 0 to 1. Higher scores mean better classification performance, indicating a better predictive outcome.
Time frame: 6 months post-treatment
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.