This observational, multinational study assesses the feasibility of speech and self-report data collection across six languages for Artificial Intelligence (AI)-driven relapse risk estimation in psychosis. Over 12 months, patients at risk of relapse and healthy controls will provide weekly speech recordings and self-report data for automated analysis. Risk scores will be stored but not shared with treating clinicians. Independent clinical evaluations ensure data quality and validation. The study lays the foundation for future Clinical Decision Support System (CDSS) research and explores novel speech markers for relapse prediction while minimizing participant burden.
This project follows a prospective, exploratory, observational design with a non-randomized, two-arm structure, including a group of individuals with psychosis at risk of relapse and a healthy control group. As a multicenter, international research project, it will assess the usability and feasibility of speech data collection across 6 different languages (English, German, French, Dutch, Czech and Turkish) and healthcare systems, ensuring its applicability in diverse clinical environments. Speech and self-report data will be collected weekly for a year using a finalized smartphone application, ensuring consistency and feasibility in real-world clinical settings. The recordings will be securely transferred to an external platform for AI-based analysis, preventing any direct impact on clinical decision-making and maintaining the study's observational nature. To ensure data integrity and reliability, a Human-in-the-Loop (HITL) quality control process is implemented after each speech recording session: Initial Data Review: a designated reviewer (HITL1) checks the audio quality and accuracy of automated transcripts stored securely in the Trusted Secure Database (TSD) system in Norway. They correct transcription errors and flag anomalies such as poor audio quality to maintain data accuracy for analysis. Risk Assessment and Decision Suggestion: a second reviewer (HITL2) evaluates the data by: (i) assessing speech characteristics relevant to relapse risk based on raw response data and performance scores (e.g., story recall accuracy); (ii) providing an independent relapse risk estimate, without access to the automated AI-generated risk assessment, (iii) classify the participant as belonging to either the psychosis or healthy control group, and (iv) suggesting a clinical decision, which is recorded for research purposes but not shared with the treating clinician. Apart from this Fast Diagnostic Loop, where a clinical decision will be made, an exploratory component will be incorporated to identify and validate new speech markers associated with relapse. This New Marker Discovery Loop will involve the search of additional speech features associated with relapse beyond the standard markers. The goal is to discover novel speech markers that may improve our understanding of relapse mechanisms and potentially serve as predictive or diagnostic tools for future clinical use. For the 1-year follow-up period, participants will attend a total of three study visits. These visits will occur at the following time points: Visit 0: Baseline visit, Visit 1: 6 months post-baseline (±10 days), and Visit 2: 12 months post-baseline (±10 days). During these visits, participants from both groups will undergo assessments aiming to evaluate the usability and trustworthiness of the procedure and to assess various functional and quality-of-life outcomes.
Study Type
OBSERVATIONAL
Enrollment
360
The intervention is a weekly online assessment of speech, to detect subtle characteristics of psychotic speech. These recordings are made through a speech data collection tool: a smartphone app with implemented tasks for the collection of behavioral response data (speech and self-report). Data is then transferred to a safe repository (TSD) to be analysed by an AI-based speech data analysis algorithm: a backend system will run the predictor code to calculate automatic relapse risk scores.
National Institute of Mental Health
Klecany, Czechia
NOT_YET_RECRUITINGNewcastle Hospital and RCSI University of Medicine and Health Sciences
Greystones, Wicklow, Ireland
NOT_YET_RECRUITINGUniversity Medical Center Groningen
Groningen, Netherlands
NOT_YET_RECRUITINGThe Arctic University of Norway
Tromsø, Norway
ACTIVE_NOT_RECRUITINGUniversity Hospital Geneva and University of Geneva
Geneva, Chêne-Bourg, Switzerland
NOT_YET_RECRUITINGPsychiatric University Hospital Zurich and University of Zurich
Zurich, Switzerland
RECRUITINGDokuz Eylul University
Izmir, Turkey (Türkiye)
NOT_YET_RECRUITINGFeasibility: User adherence
Proportion of users adhering to their allocated tasks. Task completion is automatically assessed for each user. A user is regarded as adherent, if they complete at least 33% of their tasks across all sessions.
Time frame: From enrollment to the end of the study at 12 months
Feasibility: Transcription quality
Proportion of users for which high quality transcripts are produced. Transcription quality is measured by the Word Error Rate (WER) between Automated Speech Recognition (ASR) and human corrected transcripts. A user with an overall WER of at most 35% is regarded as having high quality transcripts
Time frame: From enrollment to the end of the study at 12 months
Feasibility: Usability of recordings
Proportion of recordings that can be used by the AI algorithm. Usability of each record is assessed by HITL reviewers based on the three questions (i) has the user interpreted the question correctly, and (ii) is the audio (at least partially) audible and (iii) comprehensible. The three questions must both be answered with yes for the recording to be usable
Time frame: From enrollment to the end of the study at 12 months
Overall system usability
Proportion of users rating the app as usable. Usability is evaluated using the System Usability Scale (SUS), providing a standardized usability score (range 0-100) based on a 10-item questionnaire. We regard a score of at least 70 as usable.
Time frame: 6 months and 12 months
Performance of the monitoring system for relapse prediction
Assessed by the Area Under the Receiver Operating Characteristic Curve (AUC). Thresholds for relapse prediction will be explored as part of the analysis
Time frame: From enrollment to the end of the study at 12 months
Provider-perceived usability and usefulness
Provider-perceived usability and usefulness of the app will be assessed using the mHealth App Usability Questionnaire (MAUQ - Provider version). Scores will be reported by domain and overall
Time frame: 6 months and 12 months
Relapse Outcome 1
Relapse rate, defined as the proportion of participants experiencing at least one relapse (rehospitalization) during the study period.
Time frame: 6 months and 12 months
Relapse Outcome 2
Human-in-the-loop (HITL) relapse risk estimates and relapse incidence: HITL risk scores (range 0-1) will be categorized into low (0-0.2), medium (\>0.2-0.6), and high (\>0.6) risk levels. Descriptive analyses will assess the proportion of relapses within each category.
Time frame: From enrollment to the end of the study at 12 months
Relapse Outcome 3
Clinical decision support system (CDSS) relapse risk estimates and relapse incidence: CDSS risk scores (range 0-1) will be categorized into low, medium, and high risk using the same thresholds. Descriptive analyses will assess relapse proportions per category.
Time frame: From enrollment to the end of the study at 12 months
Relapse Outcome 4
Predictive performance of HITL relapse risk estimates will be evaluated using the area under the curve (AUC), with exploratory assessment of prediction thresholds.
Time frame: From enrollment to the end of the study at 12 months
Relapse Outcome 5
Concordance between CDSS and HITL relapse risk estimates will be assessed by evaluating the level of agreement between algorithm-based and clinician-based risk scores.
Time frame: From enrollment to the end of the study at 12 months
Recordings Outcome 1
Number of recordings per participant across all sessions.
Time frame: From enrollment to the end of the study at 12 months
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Recordings Outcome 2
Interpretation of question: has the user interpreted the question correctly? Yes or no. Per recording.
Time frame: From enrollment to the end of the study at 12 months
Recordings Outcome 3
Audibility of audio: is the audio (at least partially) audible? Yes or no. Per recording.
Time frame: From enrollment to the end of the study at 12 months
Recordings Outcome 4
Comprehensibility of audio: is the audio (at least partially) comprehensible? Yes or no. Per recording
Time frame: From enrollment to the end of the study at 12 months
Speech-based features
Exploratory comparison analysis of speech-based features extracted using tools such as Prosogram or openSMILE.
Time frame: From enrollment to the end of study at 12 months.
Text-based features Outcome 1
In addition, robust text-based features like speech rate and pause rate will be derived using common Python libraries for Natural Language Processing (NLP) like NLTK and spaCy.
Time frame: From enrollment to the end of the study at 12 months.
Text-based features Outcome 2
WordNet will be used to analyze correctness of responses where appropriate (fluency task, story recall task, adapted Stroop task).
Time frame: From enrollment to the end of the study at 12 months
Neuroimaging-based features Outcome 1
Exploratory comparison analysis of features extracted from structural magnetic resonance imaging (sMRI) data.
Time frame: 6 months and 12 months
Neuroimaging-based features Outcome 2
Exploratory comparison analysis of features extracted from functional magnetic resonance imaging (fMRI) data.
Time frame: 6 months and 12 months
Neuroimaging-based features Outcome 3
Exploratory comparison analysis of features extracted from Diffusion Tensor Imaging (DTI) data.
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 1
Quality of life assessed with Quality of Life Scale (QoL). Assessed only in participants at risk of relapse
Time frame: Baseline, 6 months and 12 months
Social and occupational function of participants Outcome 2
Global Assessment of Functioning Scale (GAF): measures how much a person's symptoms affect their day-to-day life on a scale of 0 to 100. Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 3
Social and Occupational Functioning Assessment Scale (SOFAS): a global rating of current social and occupational functioning with scores ranging from 0 to 100. It differs from GAF by focusing on social and occupational functioning independent of the overall severity of the individual's psychological symptoms. Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 4
Number of psychiatric admissions. Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 5
Duration (in weeks) of psychiatric admissions. Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 6
Rates of self-harm (including suicide, suicide attempts and aggressive incidents) assessed with the Social Dysfunction \& Aggression Scale (SDAS). Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months
Social and occupational function of participants Outcome 7
Positive and Negative Syndrome Scale (PANSS): measure the prevalence of positive and negative syndromes in schizophrenia. Scores will be reported by dimension (positive, negative, and general psychopathology) and overall. Assessed only in participants at risk of relapse
Time frame: 6 months and 12 months