The goal of this observational study is to learn if computer analysis of voice recordings can detect Type 2 diabetes in adults. The main questions it aims to answer are: * Can advanced voice analysis accurately identify participants with Type 2 diabetes or pre-diabetes based on vocal biomarkers? * How do voice-based predictions compare to HbA1c blood test results for diabetes screening? * Can machine learning approaches effectively address the challenge of undiagnosed diabetes in population screening? Participants will: * Record themselves reading a short passage and answering brief questions out loud in a single online session. * Complete health questionnaires about diabetes risk factors, medications, and general health status. * A subset of participants (n=1,000) will provide a blood sample through an at-home HbA1c testing kit to validate voice-based predictions against laboratory results. * Use their own devices (computer, tablet, or smartphone) to complete all study activities online from home.
This study addresses a critical challenge in Type 2 diabetes detection, where approximately 30% of individuals with diabetes remain undiagnosed, equating to roughly 1 million adults in the UK. Current screening methods rely on opportunistic testing with only 40.4% uptake among those offered NHS Health Checks, highlighting a need for innovative, accessible screening approaches that can identify at-risk individuals before complications develop. STUDY RATIONALE AND INNOVATION: Recent research demonstrates that diabetes affects voice production through multiple physiological pathways across the spectrum of the disease: peripheral neuropathy impacts vocal cord control and speech articulation, autonomic neuropathy affects breathing patterns and vocal dynamics, xerostomia (dry mouth) from neuropathic damage alters resonance characteristics, and glucose fluctuations modify the elastic properties of the larynx and vocal cords. These findings and early evidence from initial studies suggest that voice analysis could be used to screen for diabetes. The current study leverages these voice-diabetes associations using advanced machine learning to develop a non-invasive, scalable screening solution. STUDY DESIGN AND METHODOLOGY: This two-stage observational study combines large-scale data collection with strategic biological validation. Stage 1 involves 10,000 participants completing voice recordings and comprehensive health questionnaires through a secure online platform. Stage 2 selects 1,000 of these participants to provide informative diagnostic results for HbA1c home testing validation. Each assessment session lasts approximately 10-20 minutes and includes standardised voice recording tasks alongside validated health questionnaires. Voice recordings involve speaking out loud text and answering brief questions designed to capture diverse vocal characteristics while maintaining consistency across participants. TECHNICAL APPROACH: Voice data will be analysed using multiple specialised software tools to extract acoustic and linguistic features potentially associated with diabetes. The study employs advanced machine learning algorithms to identify patterns in voice characteristics that may indicate diabetes status. Our approach addresses the challenge of undiagnosed diabetes in the general population by developing models that account for the uncertainty inherent in self-reported diabetes status. Rather than treating all participants without a diagnosis as definitively healthy, the investigators implement computational methods that recognise the complexity of real-world diabetes prevalence patterns. Strategic participant selection for biological validation optimises the information gained from HbA1c testing, aiming to maximise model validation efficiency whilst minimising testing costs. Ground truth validation uses CE-marked HbA1c home testing kits with ≥95% documented accuracy, applying the standard clinical threshold of ≥6.5% HbA1c for diabetes diagnosis. This biological validation enables assessment of model performance against confirmed diabetes status rather than relying solely on self-reported diagnoses, providing robust evidence for the clinical utility of voice-based diabetes screening. DATA COLLECTION AND MANAGEMENT: All data collection occurs remotely through a secure web-based platform accessible via standard internet browsers (Chrome, Firefox, Safari). Participants use their personal devices (computers, tablets, or smartphones) equipped with microphone capabilities. The platform captures voice recordings, questionnaire responses including PHQ-8 and GAD-7 for mental health assessment, and comprehensive health history including cardiovascular, kidney, and respiratory conditions that may confound voice-diabetes associations. Data is pseudonymized using universally unique identifiers (UUID), with raw audio recordings and extracted features stored in encrypted AWS S3 buckets. Personal identifiable information is deleted after data collection completion, while research data is retained for up to 10 years for analysis and publication purposes under UK GDPR compliance. SAMPLE SIZE CONSIDERATIONS: The target enrolment of 10,000 participants in Stage 1 reflects the requirements for robust machine learning model development, providing a diverse dataset that includes individuals with diagnosed diabetes, undiagnosed diabetes, pre-diabetes, and otherwise healthy individuals. This sample size enables comprehensive representation across diverse demographic groups, ages, and health conditions necessary for developing generalisable voice biomarker models. The 1,000 participants selected for Stage 2 biological validation ensures sufficient positive and negative cases for comprehensive model validation while optimising resource allocation for expensive HbA1c testing. This approach addresses critical gaps in early diabetes detection, providing a scalable, cost-effective screening solution that could significantly impact population health outcomes through earlier identification and intervention.
Study Type
OBSERVATIONAL
Enrollment
7,319
Online
Nationwide, United Kingdom
Accuracy of AI Model for Type 2 Diabetes Classification as Assessed by Voice Biomarker Analysis
Binary classification performance (presence vs. absence of Type 2 diabetes) of the artificial intelligence-based system using voice biomarker analysis, with HbA1c laboratory results (≥48 mmol/mol threshold) serving as ground truth. Performance will be measured using sensitivity (target ≥65%), specificity (target ≥65%), and area under the receiver operating characteristic curve (AUC target \~0.70) through cross-validation methods.
Time frame: Single assessment session at enrolment with HbA1c validation results obtained within 2 months of submission of voice measurement.
Detection of Pre-diabetes Using Voice Biomarker Analysis
Classification performance for identifying pre-diabetic states (HbA1c 42-47 mmol/mol) using voice biomarker analysis compared to laboratory HbA1c results. Performance measured using sensitivity, specificity, and AUC metrics.
Time frame: Single assessment session at enrolment with HbA1c validation within 2 months of submitting voice measurement.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.