Pharmacists currently perform an independent double-check to identify drug-selection errors before they can reach the patient. However, the use of machine intelligence (MI) to support this cognitive decision-making work by pharmacists does not exist in practice. This research is being conducted to examine the effectiveness of the timing of machine intelligence (MI) advice on to determine if it results in lower task time, increased accuracy, and increased trust in the MI.
Pharmacists currently perform an independent double-check currently to identify drug-selection errors before they can reach the patient. However, the use of machine intelligence (MI) to support this cognitive decision-making work by pharmacists does not exist in practice. Instead, pharmacists rely solely on reference images of the medication which they can compare to the prescription vial contents. Previous research has shown that decision support systems can effectively improve healthcare delivery efficiency and accuracy, while preventing adverse drug events. However, little is known about how MI technologies impact pharmacists' work performance and cognitive demand. To facilitate the long-term symbiotic relationship between the pharmacists and the MI system, proper trust needs to be established. While trust has been identified as the central factor for effective human-machine teaming, issues arise when humans place unjustified trust in automated technologies do not place enough trust in them. Over trust in automation can lead to complacency and automation bias. For instance, the pharmacists may rely on the MI system to the extent that they blindly accept any recommendation by the system. Under trust can result in pharmacist disuse and potential abandonment of the MI system. Furthermore, little is known about the timing of the MI advice on pharmacists' work performance. For example, showing the MI's advice while the pharmacist is performing the medication verification task may yield different results than showing the MI's advice after the pharmacist made their decision. The study investigators have developed a MI system for medication images classification. The objective of this study is to examine the effectiveness of the timing of MI advice to determine if it results in lower task time, increased accuracy, and increased trust in the MI.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
Participants will complete the medication verification task without any MI help
Participants will receive MI in the form of a pop-up message if their decision differs from the MI's determination.
MI help will be displayed concurrently with the filled and reference images.
University of Michigan
Ann Arbor, Michigan, United States
Reaction Time
Difference in task time measured by the number of seconds from starting the task to accepting or rejecting a medication image
Time frame: Throughout the verification task
Decision Accuracy
Difference in detection rate measured by the number of medication verification errors across all participants in the Arm/Group.
Time frame: Throughout the verification task
Trust Change
Participants will complete 100 mock medication verification trials in each of the study arms (i.e., Scenario 1, Scenario 2, and No Help). After each trial in Scenario 1 and Scenario 2, participants will use a visual analog scale (VAS) to respond to the question: "How much do you trust the AI advice?" The endpoints of the 100-point VAS are 'Not at all' to 'Completely trust'. Participants indicate their level of trust in the MI advice after every trial on a scale from 1-100, with higher scores indicating greater levels of trust. The trust change, as measured by the visual analog scale, will be calculated using the following formula: Trust change (i) = Trust(i) - Trust(i - 1), where i=2, 3, ..., 100. To compute a single, summarized value for the Trust Change variable within a specific scenario, the individual Trust Change scores measured from the trials are averaged. This averaging method provides a comprehensive measure of how trust shifted across the duration of the scenario.
Time frame: After every trial in Scenarios 1 and 2
Trust
Trust will be assessed using the Muir \& Moray's (1996) Trust in Automation scale. Scores range from 0 to 100 with higher scores indicating greater levels of trust.
Time frame: Post-intervention in Scenarios 1 and 2.
Cognitive Effort
Participants' eye movements were tracked using a browser-based online eye tracking system. The outcome measure is the difference in cognitive effort as measured by fixation count in the defined areas of interest: fill image, reference image, or MI plot. Higher fixation rates indicate repeated interest in a certain area.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
OTHER
Masking
NONE
Enrollment
68
Time frame: Throughout the verification task
Cognitive Effort
Participants' eye movements were tracked using a browser-based online eye tracking system. The outcome measure is the difference in cognitive effort as measured by the duration of fixations in the defined areas of interest: fill image, reference image, or MI plot. Longer fixation duration indicates a higher cognitive load.
Time frame: Throughout the verification task
Workload
Participants will complete 100 mock medication verification trials in each of the 3 arms. The workload of each arm will be measured by the NASA Task Load Index (TLX). The 5 TLX dimensions assessed are: mental demand, effort, temporal demand, performance, and frustration. For each dimension, participants will indicate their response to a single question. For 4 of the dimensions, the endpoints of the Likert scale are 'very low' and 'very high'. The performance dimension is reverse-scored, and the endpoints are 'perfect' and 'failure'. Participants then complete 10 pairwise comparisons of the dimensions by indicating which dimension they consider to be a more important factor (e.g., effort vs frustration). Each category score multiplied by its respective pairwise comparison count is summed and divided by 10 to get an overall weighted workload score. The result is an overall workload score between 1 and 20, with higher scores indicating higher workload.
Time frame: After completing 100 mock verification trials in each arm
Usability
Participants will complete 100 mock medication verification trials in each of the 3 arms (No MI Help, Scenario 1, and Scenario 2). After completing 100 trials, participants will assess the mock verification interface using the System Usability Scale (SUS). The SUS is comprised of 10 statements that participants indicate their agreement with using a 5-point Likert scale ranging from strongly agree to strongly disagree. Odd-numbered questions have a positive response and even-numbered questions are reverse-scored. Scores are summed and multiplied by 2.5 to get a final SUS score. SUS scores range from 0 to 100 with higher scores indicating greater usability. An average SUS score is considered to be 68. Anything below 50 is "Not Acceptable. Scores between 51-70 are considered "Marginal", those above 71 are considered "Acceptable", and those at 80 or above are indicative of high usability.
Time frame: After completing 100 mock verification trials in each arm