The goal of this clinical trial is to learn whether an artificial intelligence (AI) tool called FibroX can help primary care providers better diagnose significant liver fibrosis (≥F2) and clinically significant portal hypertension in adults with metabolic dysfunction-associated steatotic liver disease (MASLD). The main questions it aims to answer are: * Can FibroX improve the accuracy of diagnosing significant liver fibrosis (≥F2) and clinically significant portal hypertension compared to usual care? * Is FibroX easy to use and acceptable to primary care providers in simulated clinical settings? * Do providers trust FibroX as a decision-support tool? Researchers will compare FibroX-assisted care to usual care to see if FibroX improves diagnostic accuracy, provider trust, and supports better decision-making. Participants will: * Be primary care providers (MDs, DOs, NPs, PAs) from diverse clinics * Review simulated patient cases with MASLD risk factors * Use either usual care tools (standard labs and optional FIB-4 calculator) or FibroX (AI-generated risk score, triage band, and explainability panel) * Make diagnostic and referral decisions for each case * Complete surveys on usability, trust in AI, confidence, and cognitive workload This study will help determine whether FibroX can be integrated into real-world primary care workflows to support earlier and more accurate detection of liver fibrosis and portal hypertension, potentially reducing missed diagnoses, unnecessary referrals, and improving patient outcomes.
This study is a 12-month pilot clinical trial designed to evaluate the feasibility, usability, provider trust, and preliminary effectiveness of FibroX, an explainable artificial intelligence (AI) tool developed to improve the diagnosis of significant liver fibrosis (≥F2) and clinically significant portal hypertension in adults with metabolic dysfunction-associated steatotic liver disease (MASLD). MASLD is a common and progressive liver condition that can lead to cirrhosis, liver failure, and increased cardiovascular risk. Early detection of these conditions is critical because current guidelines recommend initiating therapy (e.g., resmetirom or semaglutide for ≥F2 fibrosis and beta-blockers for portal hypertension). However, existing tools like FIB-4 often lack accuracy and usability in routine primary care. FibroX addresses these limitations by using routinely available clinical data-such as age, liver enzymes, platelet count, BMI, and kidney function-to estimate the probability of significant fibrosis and portal hypertension. It provides a triage band (rule-out, indeterminate, rule-in) and a one-line explanation of which clinical factors most influenced the prediction. This transparency is achieved using Shapley Additive Explanations (SHAP), which helps clinicians understand how the AI reached its conclusion. In retrospective studies, FibroX demonstrated superior diagnostic performance compared to FIB-4 (AUROC 0.97 vs. 0.62) and was associated with long-term mortality risk, suggesting prognostic value beyond diagnostic utility. This pilot trial will simulate real-world primary care workflows to test whether FibroX can be effectively used by clinicians. The study will recruit 30-40 primary care providers (MDs, DOs, NPs, PAs) from 4-6 diverse clinics. Each provider will participate in two simulation periods, each involving 16 synthetic or de-identified patient cases reflecting adults with MASLD risk factors. Ground truth for fibrosis stage and portal hypertension will be determined by biopsy or expert consensus using Vibration-Controlled Transient Elastography (VCTE) and guideline-based criteria. Providers will be randomly assigned to review cases in one of two sequences: * FibroX-Enabled Care: Providers will receive FibroX's risk probability, triage band, and explainability panel. * Usual Care: Providers will use standard labs and vitals, with optional access to the FIB-4 calculator. After a one-week washout period, providers will switch to the other condition. For each case, providers will make a management decision (e.g., no action, order VCTE, refer to hepatology), record their confidence level, and complete surveys on usability, trust in AI, and cognitive workload. Primary Outcomes * Feasibility: Recruitment rate ≥70%, completion rate ≥85%, median decision time ≤3.5 minutes. * Usability and Acceptability: System Usability Scale (SUS) score ≥70. * Provider Trust: AI-Trust Scale score ≥6. * Effectiveness: Within-provider diagnostic accuracy for significant fibrosis (≥F2) and clinically significant portal hypertension. Secondary Outcomes * Appropriate referral rates * Net reclassification improvement (NRI) * Calibration metrics (intercept, slope) * Provider confidence and cognitive load (NASA-TLX) * Intended downstream testing burden * Adoption and fidelity to triage recommendations * Override rates and reasons * Fairness analysis across subgroups (age, sex, BMI, race/ethnicity) All provider actions and decision times will be automatically logged. Post-period surveys and qualitative debriefs will explore barriers and facilitators to using FibroX. Study Significance This pilot study will generate critical data to support a future multi-center trial and potential integration of FibroX into electronic health records. If successful, FibroX could enable scalable, guideline-concordant screening for significant liver fibrosis and portal hypertension in primary care, reducing missed diagnoses and unnecessary referrals. This aligns with national priorities for precision medicine and responsible AI implementation in healthcare.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
DIAGNOSTIC
Masking
NONE
Enrollment
40
FibroX is an explainable artificial intelligence (AI) tool designed to assist primary care providers in diagnosing significant liver fibrosis (≥F2) and clinically significant portal hypertension in patients with metabolic dysfunction-associated steatotic liver disease (MASLD). It uses routinely available clinical data (e.g., age, AST, ALT, platelets, BMI, HbA1c, creatinine) to generate a risk probability score, a triage band (rule-out, indeterminate, rule-in), and a one-line explainability panel using Shapley Additive Explanations (SHAP). Providers use FibroX during simulated patient encounters to guide diagnostic and referral decisions (e.g., order VCTE, refer to hepatology, initiate guideline-based therapy). The tool aims to improve diagnostic accuracy, increase provider trust, reduce missed diagnoses, and support guideline-concordant triage in primary care.
In the usual care condition, primary care providers assess simulated patient cases using standard clinical tools available in routine practice. These include laboratory results, vital signs, problem lists, medications, and prior imaging. Providers may optionally use the FIB-4 calculator to estimate liver fibrosis risk. No AI decision support is provided. This intervention serves as the comparator to evaluate whether FibroX improves diagnostic accuracy for significant liver fibrosis (≥F2) and clinically significant portal hypertension, as well as provider trust, decision-making quality, and workflow efficiency compared to usual care.
Diagnostic Accuracy for Significant Liver Fibrosis (≥F2) and Clinically Significant Portal Hypertension Using FibroX Compared to Usual Care
Within-provider diagnostic accuracy for detecting significant liver fibrosis (≥F2) and clinically significant portal hypertension in simulated primary care encounters. Accuracy will be assessed using sensitivity, specificity, and AUROC at clinically relevant thresholds. Ground truth for fibrosis stage and portal hypertension will be derived from biopsy, Vibration-Controlled Transient Elastography (VCTE)-based expert consensus, and guideline-defined criteria. Unit of Measure: Proportion (sensitivity and specificity in %, AUROC as a unitless value)
Time frame: Immediately after each simulation period, up to 24 weeks
System Usability Scale (SUS) Score for FibroX Integration
Usability of FibroX assessed using the System Usability Scale (SUS), a validated 10-item questionnaire scored from 0 to 100, where higher scores indicate better usability. Unit of Measure: Score (range: 0-100; higher scores = better usability)
Time frame: Immediately after each simulation period, up to 24 weeks
Provider Trust in AI Tool (FibroX)
Provider trust in FibroX assessed using the validated AI-Trust Scale, which includes 12 items scored on a Likert scale. Higher scores indicate greater trust in the AI tool. Unit of Measure: Score (range: 12-60; higher scores = greater trust)
Time frame: Immediately after the FibroX-enabled simulation period, up to 24 weeks
Median Decision Time per Case
Median time (in minutes) taken by providers to complete management decisions for simulated MASLD cases using FibroX versus usual care. Unit of Measure: Minutes
Time frame: Immediately after each simulation period, up to 24 weeks
Appropriate Referral Rate
Proportion of simulated cases where provider referral decisions (e.g., hepatology referral, VCTE order) align with guideline-concordant triage rules for MASLD risk stratification. Unit of Measure: Proportion (%)
Time frame: Immediately after each simulation period, up to 24 weeks
Net Reclassification Improvement (NRI)
Change in classification accuracy for MASLD risk categories (rule-out, indeterminate, rule-in) when using FibroX compared to usual care. Unit of Measure: NRI score (unitless)
Time frame: Immediately after each simulation period, up to 24 weeks
Calibration of Risk Predictions
Calibration of FibroX predictions compared to observed outcomes, assessed using calibration intercept, slope, and calibration plot. Unit of Measure: Intercept and slope (unitless)
Time frame: Immediately after each simulation period, up to 24 weeks
Provider Confidence in Decision-Making
Provider-reported confidence in MASLD management decisions during simulated cases, measured on a 5-point Likert scale (1 = very low confidence; 5 = very high confidence). Unit of Measure: Score (range: 1-5; higher scores = greater confidence)
Time frame: Immediately after each simulation period, up to 24 weeks
Cognitive Load During Case Review
Provider cognitive workload assessed using NASA Task Load Index (NASA-TLX), which provides an overall workload score from 0 to 100 across six dimensions. Unit of Measure: Score (range: 0-100; higher scores = greater workload)
Time frame: Immediately after each simulation period, up to 24 weeks
Intended Downstream Testing Burden
Number and type of additional tests or referrals (e.g., VCTE, hepatology consult) that providers intend to order after each simulated case. Unit of Measure: Count (number of tests/referrals per case)
Time frame: Immediately after each simulation period, up to 24 weeks
Adoption and Fidelity to Triage Recommendations
Proportion of cases where providers follow FibroX triage recommendations (e.g., rule-out, indeterminate, rule-in) without override. Unit of Measure: Proportion (%)
Time frame: Immediately after each simulation period, up to 24 weeks
Override Rate and Reasons
Proportion of cases where providers override FibroX recommendations and the documented reasons for override. Unit of Measure: Proportion (%)
Time frame: Immediately after each simulation period, up to 24 weeks
Fairness Analysis Across Subgroups
Performance of FibroX (sensitivity, specificity, calibration) across demographic subgroups (age, sex, BMI, race/ethnicity). Unit of Measure: Proportion (%) and AUROC (unitless)
Time frame: Immediately after each simulation period, up to 24 weeks
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.