Optimization of Late Imagery Rescripting Research Using Generative Artificial Intelligence

University of Social Sciences and Humanities, Warsaw40 enrolled

Overview

The aim of the study is to examine the effect of imagery rescripting (ImRs) in the context of utilizing large language models (LLMs). Intervention will involve the prior presentation of the most aversive fragment of the memory, the so-called 'hotspot.' This intervention will allow for the replication of the effect described by Dibbets and Arntz (2016), according to which the prior activation of the most emotional element of a memory enhances the effectiveness of ImRs. The study is also significant due to another ongoing study in which a substantial number of participants have already been examined; however, due to the exhaustion of funds, it was not possible to utilize the remainder of the recruited sample. Investigating an additional condition will allow for a more complete utilization of the available participant pool and significantly increase the project's scientific value by comparing the traditional ImRs mechanism with its AI-generated version.

This pilot randomized controlled trial will explore the application of large language models (LLMs) in the development of personalized therapeutic interventions. The study will focus on the emotional and psychophysiological effects of listening to autobiographical scenarios based on participants' childhood experiences of parental criticism. All participants will be asked to recall and describe two critical and two neutral childhood memories. Based on this input, personalized scripts will be generated using Gemini, a large language model. Each script will be reviewed and, if necessary, modified by trained experimenters to ensure therapeutic coherence and alignment with imagery rescripting (ImRs) principles. On Day 1, all participants will listen to critical personalized scenarios during the laboratory session. The experimental group will listen to modified versions of the critical memory scripts, in which a therapist figure will intervene to address the child's unmet needs-an application of imagery rescripting. To assess physiological arousal, skin conductance will be continuously recorded throughout the session. After each scenario, participants will rate their emotional intensity and specific feelings (e.g., fear, sadness) on Likert scales. The group will receive the ImRs intervention after the initial scenario phase. One week later, all participants will complete follow-up questionnaires assessing generalized anxiety (GAD-7) and the frequency of intrusive thoughts related to the memories. In addition, a panel of licensed cognitive-behavioral therapists will evaluate the generated scenarios for therapeutic quality. Their feedback will be used to assess the acceptability and coherence of AI-assisted therapeutic scripts. The study will test the feasibility of using LLM-generated content in clinical settings and aims to determine whether such interventions can reduce distress and intrusiveness while eliciting measurable emotional and physiological responses Hypotheses: The criticism scenarios generated by the model will elicit fearful responses in all participants. The level of fearful reaction evoked by the AI-generated criticism scenarios will correlate with participants' baseline fear of failure. Participants in the ImRs group will report fewer intrusive thoughts and lower generalized anxiety levels one week after the intervention. Magnitude of Prediction Error (operationalized as a difference in SCL response between hotspot and intervention parts of the scenario) will correlate positively with a decrease in the number of intrusive thoughts. The magnitude of imagery difficulty during the rescripting part will correlate negatively with the magnitude of Prediction Error. Subjective efficacy of intervention will be predicted by working alliance.

Outcomes

Primary Outcomes

Generalized Anxiety Disorder DSM Scale (GAD)

DSM Scale (Craske et al., 2013, own translation); brief dimensional self-rating questionnaire for generalized anxiety disorder (American Psychiatric Association, 2013, Polish translation: translation made by the authors using a standard back-translation method, 2020). Scale consists of 10 items relating to the thoughts, feelings, and behaviors the subjects have experienced in the last 7 days. The answers are marked on a 4-point Likert scale (0=never, 4=all the time), with scores ranging 0-4.

Time frame: Screening, Pre-intervention (Day 1) and 1-week follow-up

The Performance Failure Appraisal Inventory

The Performance Failure Appraisal Inventory was used to assess fear of failure. It is a 25-item questionnaire that measures the strength of subjective beliefs about the consequences of failure. The PFAI has five subscales: fear of experiencing shame and embarrassment; fear of devaluing one's self-esteem; fear of having an uncertain future; fear of important others losing interest; and fear of upsetting important others. PFAI score ranges from 35 to 175, with higher scores indicating a higher level of fear of failure.

Time frame: Pre-intervention (Day 1)

Intrusive Thought Frequency (Rumination Inventory - adapted)

A modified version of the Event-Related Rumination Inventory will be used to assess the frequency and intrusiveness of thoughts related to the autobiographical criticism memories. The scale includes both intrusive and reflective rumination items, ranging from 20 to 80, with higher scores indicating a higher level of event-related rumination. Change in scores between baseline and follow-up will serve as an index of the cognitive impact of the intervention.

Time frame: Pre-intervention (Day 1) and 1-week follow-up

Skin Conductance Level (SCL)

Electrodermal activity will be recorded continuously during the presentation of autobiographical scenarios to assess physiological arousal. The SCL signal will be analyzed during the baseline, critical, and neutral conditions, as well as during the imagery rescripting (ImRs) intervention (experimental group only). Data will be used to examine whether AI-generated criticism scripts elicit arousal.

Time frame: During experiment/intervention (Day 1)

Emotional Response Ratings (Subjective)

Participants will rate the intensity of emotional reactions (fear, sadness, arousal, etc.) using Likert scales after each presented scenario. Scores will range from 1 to 10, with higher scores indicating a more intense emotional reaction. These ratings will help determine emotional engagement and compare affective response between criticism and neutral content, as well as between groups.

Time frame: During experiment/intervention (Day 1)

Secondary Outcomes

Therapist Ratings of Script Quality

A panel of cognitive-behavioral therapists will rate the AI-generated and manually created scripts on a 10-point Likert scale ranging from 1 to 20, for therapeutic quality, coherence, and emotional relevance. Higher scores will indicate higher quality/coherence/emotional relevance. Average therapist ratings will be used to test hypotheses about the acceptability of LLM-generated therapeutic content.

Time frame: Prior to intervention

Questionnaire on the Perceived Effectiveness and Appropriateness of Imagery-Based Intervention

A custom-developed questionnaire will be administered to participants in the experimental group to assess their subjective evaluation of the imagery rescripting intervention. The measure includes items evaluating emotional intensity, difficulty with imagery, resistance to memory modification, trust in the therapeutic process, realism of the experience, and other affective and cognitive responses. Items are rated on 5-point Likert scales and grouped into subscales representing common therapeutic barriers and facilitators. Each subscale ranges from 4 to 20, with a higher score indicating a higher problem with imagery techniques. The total and subscale scores will be analyzed to explore which factors are associated with intervention acceptance and perceived effectiveness.

Time frame: Post-intervention (Day 1)

TAPS Tool - Substance Use Screening

The Tobacco, Alcohol, Prescription Medication, and Other Substance Use Tool (TAPS) will be used to screen for problematic substance use. The self-report version of the TAPS consists of two parts: (1) past 12-month use of tobacco, alcohol, illicit drugs, and prescription medication for non-medical use. Scores will range from 1 (never) to 5 (every day/almost every day), with higher scores indicating a greater frequency of substance use. Scores will be used to exclude participants with probable substance use disorders, in line with the study's eligibility criteria.

Time frame: Screening only

Post-Traumatic Stress Symptoms Scale (DSM)

Self-report scale assessing PTSD symptoms as defined by DSM-5 criteria. The instrument includes 10 items evaluating symptom frequency over the previous 7 days on a 0-4 Likert scale. Total scores range from 0 to 40.

Working Alliance Inventory - Short Revised

The Working Alliance Inventory - Short Revised (WAI-SR) is a 12-item self-report measure assessing the perceived quality of the therapeutic alliance across three subscales: Goal, Task, and Bond. Each subscale ranges from 4 to 20, with a higher score indicating a higher level of working alliance factors. General scores range from 12 to 60, with a higher score indicating a higher level of working alliance. In this study, the WAI-SR was adapted to assess participants' sense of connection and alliance with the AI-delivered intervention (e.g., the rescripting script and voice used). Scores will be used to explore whether the subjective sense of alliance predicts perceived effectiveness, emotional impact, or response to the intervention.

Time frame: Post-intervention (Day 1)

Optimization of Late Imagery Rescripting Research Using Generative Artificial Intelligence

Overview

Conditions

Interventions

Eligibility

Locations (2)

Outcomes

Primary Outcomes

Secondary Outcomes

Central Contacts