According to World Health Organization, worldwide one in 160 children has an ASD. About around 25% to 30% of children are unable to use verbal language to communicate (non-verbal ASD) or are minimally verbal, i.e., use fewer than 10 words (mv-ASD). The ability to communicate is a crucial life skill, and difficulties with communication can have a range of negative consequences such as poorer quality of life and behavioural difficulties. Communication interventions generally aim to improve children's ability to communicate either through speech or by supplementing speech with other means (e.g., sign language, pictures, or AAC - Advanced Augmented Communication tools). Individuals with non- verbal ASD or mv-ASD often communicate with people through vocalizations that in some cases have a self-consistent phonetic association to concepts (e.g., "ba" to mean "bathroom") or are onomatopoeic expressions (e.g., "woof" to refer to a dog). In most cases vocalizations sound arbitrary; even if they vary in tone, pitch, and duration depending it is extremely difficult to interpret the intended message or the individual's emotional or physical state they would convey, creating a barrier between the persons with ASD and the rest of the world that originate stress and frustration. Only caregivers who have long term acquaintance with the subjects are able to decode such wordless sounds and assign them to unique meanings. This project aims at defining algorithms, methods, and technologies to identify the communicative intent of vocal expressions generated by children with mv-ASD, and to create tools that help people who are not familiar with the subjects to understand these individuals during spontaneous conversations.
Study Type
INTERVENTIONAL
Allocation
NA
Purpose
BASIC_SCIENCE
Masking
NONE
Enrollment
33
Clinical evaluation of participants by means of Autism Diagnostic Observation Schedule
The project tests and adapts the technology developed at MIT for vocalization collection and labeling, and contributes to data gathering among Italian subjects (and their quality validation) in order to create a multi-cultural dataset and to enable cross-cultural studies and analyses. Next, the focus is placed on the analysis of harmonic features of the audio in the vocalizations of the dataset to identify recurring individual features and patterns corresponding to specific communications purposes or emotional states. Supervised and unsupervised machine learning approaches are developed and different machine learning algorithms will be compared to identify the most accurate ones for the project goal. Last, an exploratory evaluation of the vocalization-understanding machine learning model is conducted to test the usability and utility of the tool for vocalization interpretation.
Scientific Institute, IRCCS Eugenio Medea
Bosisio Parini, Lecco, Italy
Frequency of audio signal samples and their associated labels
Frequency (measured in number per hour) of audio signal samples (sounds and verbalizations) produced by each participant recorded during the hospital stays, in various contexts (i.e., during educational interventions and / or in moments of unstructured play) labeled as self-talk, delight, dysregulation, frustration, request, or social exchange. A small, wireless recorder (Sony TX800 Digital Voice Recorder TX Series) will be attached to the participant's clothing using strong magnets. Next, the adults (caregiver and / or operators) must associate the sounds produced by the child to an affective and / or to the probable meaning of the vocalization -labels- through the use of a web app.
Time frame: immediately after the intervention
Participant-specific harmonic features derived by the audio signal samples
Temporal and spectral audio features -i.e., pitch-related features, formants features, energy-related features, timing features, articulation features- extracted from the samples and used next for supervised and unsupervised machine learning analysis. The collected audio signal samples will be segmented in the proximity of the temporal locations of labels. Next, it will be segmented and associated with temporally adjacent labels (affective states or probable meaning of vocalizations). Audio harmonic features (temporal/phonetic characteristics) will be then identified for each participant using supervised/unsupervised machine learning analysis of audio signal samples. Through this process, participant-specific patterns corresponding to specific communications purposes or emotional states will be identified.
Time frame: immediately after the intervention
Accuracy of machine learning prediction
The classification accuracy of machine learning analysis, i.e., the number of correct predictions divided by the total number of predictions, which will be tested in a retained test set of recorded audio signal samples. This outcome measures will estimate the usability/utility of the developed tool for vocalization interpretion based on a machine learning analysis of the recorded audio signal samples.
Time frame: immediately after the intervention
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.