This pilot randomized controlled trial will investigate the feasibility and effectiveness of using generative artificial intelligence to create personalized therapeutic scripts for imagery rescripting (ImRs). Eighty participants will listen to autobiographical scenarios based on their own memories of childhood criticism and neutral events. The scenarios will be generated by the Gemini large language model and reviewed by trained experimenters. On Day 1, all participants will be exposed to both critical and neutral scenarios and randomly assigned to either an experimental group receiving an ImRs intervention or a control group receiving no therapeutic modification. Skin conductance and subjective emotional ratings will be collected during the session, with follow-up questionnaires administered one week later. In addition, cognitive-behavioral therapists will evaluate the quality of the generated scripts. The study aims to assess emotional and physiological responses to AI-generated content, compare outcomes between groups, and explore the potential of large language models in scalable psychological interventions.
This pilot randomized controlled trial will explore the application of large language models (LLMs) in the development of personalized therapeutic interventions. The study will focus on the emotional and psychophysiological effects of listening to autobiographical scenarios based on participants' childhood experiences of parental criticism and neutral events. All participants will be asked to recall and describe two critical and two neutral childhood memories. Based on this input, personalized scripts will be generated using Gemini, a large language model. Each script will be reviewed and, if necessary, modified by trained experimenters to ensure therapeutic coherence and alignment with imagery rescripting (ImRs) principles. On Day 1, all participants will listen to both critical and neutral personalized scenarios during the laboratory session. Participants will be randomly assigned to one of two groups: The experimental group will listen to modified versions of the critical memory scripts, in which a therapist figure will intervene to address the child's unmet needs-an application of imagery rescripting. The control group will listen to the same autobiographical content without any therapeutic intervention or modification. To assess physiological arousal, skin conductance will be continuously recorded throughout the session. After each scenario, participants will rate their emotional intensity and specific feelings (e.g., fear, sadness) on Likert scales. The experimental group will receive the ImRs intervention after the initial scenario phase. One week later, all participants will complete follow-up questionnaires assessing generalized anxiety (GAD-7) and the frequency of intrusive thoughts related to the memories. In addition, a panel of licensed cognitive-behavioral therapists will evaluate the generated scenarios for therapeutic quality. Their feedback will be used to assess the acceptability and coherence of AI-assisted therapeutic scripts. The study will test the feasibility of using LLM-generated content in clinical settings and aims to determine whether such interventions can reduce distress and intrusiveness while eliciting measurable emotional and physiological responses. Hypotheses: The AI-generated scripts will be positively evaluated by cognitive-behavioral therapists, with an average rating above 6 out of 10. The therapist ratings of the AI-generated scripts will not significantly differ from those of human-written scripts based on clinical interviews used in prior research. The criticism scenarios generated by the model will elicit anxiety responses in all participants. The level of anxiety evoked by the AI-generated criticism scenarios will correlate with participants' baseline fear of failure. Participants in the ImRs group will report fewer intrusive thoughts and lower generalized anxiety levels one week after the intervention compared to the control group.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
BASIC_SCIENCE
Masking
SINGLE
Enrollment
80
Participants listened to a series of personalized audio scenarios based on their childhood memories of parental criticism and neutral events. At the end of the session, one critical scenario was presented in a modified version that included a therapeutic imagery rescripting (ImRs) intervention. In this script, a therapist figure entered the scene and addressed the child's unmet needs, following the standard rescripting format. The scripts were generated using the Gemini large language model and reviewed by trained experimenters to ensure coherence and therapeutic validity. The intervention aimed to reduce distress and intrusive thoughts related to the memory.
SWPS University (University of Social Sciences and Humanities); Poznań Laboratory of Affective Neuroscience
Poznan, Wielkopolska, Poland
RECRUITINGGeneralized Anxiety Disorder Scale (GAD-7)
The Generalized Anxiety Disorder 7-item scale (GAD-7) will be used to assess self-reported anxiety. Scores range from 0 to 21; higher scores indicate a higher level of generalized anxiety symptoms. Scores will be collected before the intervention and one week later to measure change in general anxiety levels following exposure to either AI-based ImRs or control scripts.
Time frame: Screening, Pre-intervention (Day 1) and 1-week follow-up
The Performance Failure Appraisal Inventory
The Performance Failure Appraisal Inventory was used to assess fear of failure. It is a 25-item questionnaire that measures the strength of subjective beliefs about the consequences of failure. The PFAI has five subscales: fear of experiencing shame and embarrassment; fear of devaluing one's self-esteem; fear of having an uncertain future; fear of important others losing interest; and fear of upsetting important others. PFAI score ranges from 35 to 175, with higher scores indicating a higher level of fear of failure.
Time frame: Pre-intervention (Day 1)
Intrusive Thought Frequency (Rumination Inventory - adapted)
A modified version of the Event-Related Rumination Inventory will be used to assess the frequency and intrusiveness of thoughts related to the autobiographical criticism memories. The scale includes both intrusive and reflective rumination items, ranging from 20 to 80, with higher scores indicating a higher level of event-related rumination. Change in scores between baseline and follow-up will serve as an index of the cognitive impact of the intervention.
Time frame: Pre-intervention (Day 1) and 1-week follow-up
Skin Conductance Level (SCL)
Electrodermal activity will be recorded continuously during the presentation of autobiographical scenarios to assess physiological arousal. The SCL signal will be analyzed during the baseline, critical, and neutral conditions, as well as during the imagery rescripting (ImRs) intervention (experimental group only). Data will be used to examine whether AI-generated criticism scripts elicit arousal.
Time frame: During experiment/intervention (Day 1)
Emotional Response Ratings (Subjective)
Participants will rate the intensity of emotional reactions (fear, sadness, arousal, etc.) using Likert scales after each presented scenario. Scores will range from 1 to 10, with higher scores indicating a more intense emotional reaction. These ratings will help determine emotional engagement and compare affective response between criticism and neutral content, as well as between groups.
Time frame: During experiment/intervention (Day 1)
Therapist Ratings of Script Quality
A panel of cognitive-behavioral therapists will rate the AI-generated and manually created scripts on a 10-point Likert scale ranging from 1 to 20, for therapeutic quality, coherence, and emotional relevance. Higher scores will indicate higher quality/coherence/emotional relevance. Average therapist ratings will be used to test hypotheses about the acceptability of LLM-generated therapeutic content.
Time frame: Prior to intervention
Questionnaire on the Perceived Effectiveness and Appropriateness of Imagery-Based Intervention
A custom-developed questionnaire will be administered to participants in the experimental group to assess their subjective evaluation of the imagery rescripting intervention. The measure includes items evaluating emotional intensity, difficulty with imagery, resistance to memory modification, trust in the therapeutic process, realism of the experience, and other affective and cognitive responses. Items are rated on 5-point Likert scales and grouped into subscales representing common therapeutic barriers and facilitators. Each subscale ranges from 4 to 20, with a higher score indicating a higher problem with imagery techniques. The total and subscale scores will be analyzed to explore which factors are associated with intervention acceptance and perceived effectiveness.
Time frame: Post-intervention (Day 1)
TAPS Tool - Substance Use Screening
The Tobacco, Alcohol, Prescription Medication, and Other Substance Use Tool (TAPS) will be used to screen for problematic substance use. The self-report version of the TAPS consists of two parts: (1) past 12-month use of tobacco, alcohol, illicit drugs, and prescription medication for non-medical use. Scores will range from 1 (never) to 5 (every day/almost every day), with higher scores indicating a greater frequency of substance use. Scores will be used to exclude participants with probable substance use disorders, in line with the study's eligibility criteria.
Time frame: Screening only
Post-Traumatic Stress Symptoms Scale (DSM)
Self-report scale assessing PTSD symptoms as defined by DSM-5 criteria. The instrument includes 10 items evaluating symptom frequency over the previous 7 days on a 0-4 Likert scale. Total scores range from 0 to 40.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Time frame: Screening only
Working Alliance Inventory - Short Revised
The Working Alliance Inventory - Short Revised (WAI-SR) is a 12-item self-report measure assessing the perceived quality of the therapeutic alliance across three subscales: Goal, Task, and Bond. Each subscale ranges from 4 to 20, with a higher score indicating a higher level of working alliance factors. General scores range from 12 to 60, with a higher score indicating a higher level of working alliance. In this study, the WAI-SR was adapted to assess participants' sense of connection and alliance with the AI-delivered intervention (e.g., the rescripting script and voice used). Scores will be used to explore whether the subjective sense of alliance predicts perceived effectiveness, emotional impact, or response to the intervention.
Time frame: Post-intervention (Day 1)