A post by Paul Bejjani

A number of studies have pointed towards the existence of systematic cross-modal associations in the general population (Kitamura, Miyashita, Ozawa, Omata, & Imamiya, 2006; Palmer, Schloss, Xu, & Prado-Leon, 2013; Parise, Knorre, & Ernst, 2014; Slobodenyuk, Jraissati, Kanso, Ghanem, & Elhajj, 2015). Whether these cross-modal associations result from an inherently multi-modal neural structure, statistical regularity, or emotional processes, it does seem that they are consistent and recurrent. These findings do not provide any information allowing us to determine whether cross-modal associations are necessary for the optimal processing of perceptual information. However, the prevalence of these associations among particularly young children along with their apparent universality suggest that they likely have an evolutionary basis.

As such, significant irregularities in specific cross-modalities may predict certain psychopathologies. For instance, difficulty in audio-visual cross modal processing was found to be a common symptom of autism spectrum disorder (ASD), as demonstrated in a study by Bebko, Schroeder, & Weiss, (2014), in which a sample of children with autism showed a lower rate of the McGurk effect than samples of typically developing children, children with Asperger Syndrome, and children with Down Syndrome. Moreover, other findings have indicated that audio-visual integration deficiencies may underlie the emotion comprehension deficiencies that are also common to individuals with ASD (Matsuda & Yamamoto, 2014).

While little research has addressed the importance of other modalities in audio-visual cross-modal processing mechanisms, it does seem that motor and haptic feedback are essential components of audio-visual cross-modal functioning. Therefore, emotion comprehension difficulties displayed by young children with autism might not originate in a specific audio-visual cross-modal dysfunction, but rather in a general inability to process multi-modal information. As such, music therapy interventions that encourage multi-modal sensory integration might be extremely potent in treating emotion comprehension deficiencies among young children with autism. In what follows, I make this claim, and suggest ways to test it.

Although the McGurk effect is due to an incongruence between a given auditory speech signal and the perceived visual articulation, it may also involve motor and haptic information. Without motor information, visual input would only serve as a gauge to evaluate superficial properties such as overall facial symmetries or feature dispositions. These properties provide few cues relevant to subcutaneous activity, which seem essential in order to extract basic semantic information. According to the revised motor theory of speech perception, speech perception consists of matching motor actions displayed by the speaker with those represented in the listener’s brain (Liberman & Mattingly, 1985). Upon stimulation, these motor templates elicit linguistic information. This is also consistent with a study conducted by Vanvuchelen, Roeyers, and Weerdt (2007), which reveals impairments in general motor skills and gestural imitation among individuals with autism.

Motor information is similarly impotent without related haptic information. Haptic sensations provide the necessary data linking aural and motor modalities. Alterations in sound vibrations and pressure variations are continuously detected throughout the vocal tract. These provide essential cues for the recognition of self-generated aural information, and allow the speaker to assess the accuracy of effected motor actions and their supposed concordance with the intended sound emissions. Accordingly, haptic feedback is a major component of musical training for all instrumentalists, including professional singers. This is especially true in the context of large choirs where singers may find difficulty in hearing their own voice. Without haptic feedback, choir singers would considerably struggle in order to monitor pitch accuracy and adequate volume among other sonic properties (see the work of Sakajiri, Miyoshi, Nakamura, Fukushima, and Ifukube (2010) on the importance of haptic information for audio-motor cross-modal functioning in deaf-blind children).

In summary, audio-visual cross-modalities appear to be intricately linked with the processing of haptic and motor information. Thus, it is possible that individuals with ASD also have deficiencies in haptic and motor integration, as well as the observed deficiencies in audio-visual integration. One way to test whether impairment in multi-modal integration among young children with ASD is actually involved in emotion comprehension deficits would be to measure the therapeutic value of haptic, motor, audio, and visual cross-modal training in regards to these deficits. In another study conducted by Matsuda and Yamamoto (2013), four young children with ASD were successfully taught to relate affective prosodies to facial expressions through audio-visual cross-modal matching-to-sample (MTS) training. During the MTS training phase, the children were required to attend to the affective prosody and then select the corresponding facial expression, by pointing towards the chosen picture or handing it to the experimenter. Although all children were able to generalize their training to new aural stimuli (female voice instead of male voice), the term used (“sensei”) as well as the models used for the comparison stimuli (facial expressions) were the same during post-tests and training. Therefore, it is uncertain if these children could generalize their training to more complex cross-modal matching of emotional expression, with various aural and visual stimuli. Moreover, because of the design of the MTS training phase, it is likely that the children would not have developed haptic and motor integration, thereby limiting the improvement of audio-visual cross-modal functioning. As such, it is also probable that these children would perform poorly at a modified test with cross-modal matching of emotional expressions involving various aural and visual stimuli. Having the children verbally express the corresponding aural emotional expression themselves during MTS training might allow them to develop cross-modal mechanisms involving haptic and motor information, and consequently generalize audio-visual integration training.

The latter suggested modification to the MTS training procedure requires that children with ASD accurately and consistently emit a specific aural emotional expression. This would however be extremely difficult if not nearly impossible in light of the extensive verbal deficits observed in this population. A therapy that could provide haptic and motor integration training, while accounting for possible verbal limitations would avoid this issue. As described by Wan, Bazen, Baars, Libenson, Zipse, Zuk, Norton, and Schlaug (2011), the Auditory-Motor Mapping Training (AMMT) intervention “aims to promote speech production directly by training the association between sounds and articulatory actions using intonation and bimanual motor activities.” More precisely, the therapist presents a certain target phrase by simultaneously singing the words and tapping a pair of drums, which are tuned to the same two pitches that are produced vocally. Through intensive repetition, the child with ASD is progressively trained to do the same without the assistance of the therapist. The study conducted by Wan et al. (2011) showed that children with ASD improved considerably in their verbal abilities after an 8-week therapy (40 sessions). As noted by the researchers, the potency of this method may be due to the excellent musical abilities often displayed by individuals with ASD, as well as their strong tendency to appreciate music related activities. Put otherwise, verbal achievements are really the product of musical training, which is much more accessible to children with ASD than direct verbal training. Verbal improvements depend on the successful integration of the various musical components in the therapy, independently of the linguistic significance of vocal elements. This type of music therapy could therefore be very effective in developing auditory, haptic, motor, and visual integration among children with ASD.

Thus, it is likely that children trained with AMMT perform better than children trained with standard MTS in a modified test of cross-modal matching of emotional expression, involving various aural and visual stimuli. In addition, unlike standard MTS therapy, AMMT does not involve explicit emotional expression identification tasks, to the extent that no specific emotional intonations or expressions are involved in training tasks. Instead, the participants are required to use motor, haptic, auditory, and visual cues to appropriately replicate the given word’s pitch and rhythm variations. It would consequently provide substantial evidence for the importance of multi-modal interactions in emotion comprehension, if children who are treated with AMMT do in fact perform significantly better on complex cross-modal matching of emotional expression than children trained with standard MTS.

Photo credit: Washington University


Bebko, J. M., Schroeder, J. H., & Weiss, J. A. (2014). The McGurk effect in children with autism and asperger syndrome. Autism Research, 7, 50-59. doi:10.1002/aur.1343

Happe, F., & Firth, U. (2006). The weak coherence account: detail-focused cognitive style in autism spectrum discorders. Journal of autism and development disorders. doi:10.1007/s10803-005-0039-0

Kitamura, E., Miyashita, K., Ozawa, K., Omata, M., & Imamiya, A. (2006). Cross-Modality Between Haptic and Auditory Roughness with a Force Feedback Device. Journal of Robotics and Mechatronics, 18(4), 450–457.

Liberman, A.M., Mattingly I.G., (1985) The motor theory of speech perception revised, Cognition 21, 1–36.

Matsuda, S., & Yamamoto, J. (2013). Intervention for increasing the comprehension of affective prosody in children with autism spectrum disorders. Research in Autism Spectrum Disorders, 7, 938-946.

Matsuda, S., & Yamamoto, J. (2014). Intramodal and cross-modal matching of emotional expression in young children with autism spectrum disorders. Research in Autism Spectrum Disorders, 10, 109-115.

Palmer, S. E., Schloss, K. B., Xu, Z., & Prado-León, L. R. (2013). Music–color associations are mediated by emotion. PNAS, 110(22), 8836–8841.

Parise, C. V., Knorre, K., & Ernst, M. O. (2014). Natural auditory scene statistics shapes human spatial hearing. PNAS, 111(16), 6104–6108.

Sakajiri, M., Miyoshi, S., Nakamura, K., Fukushima, S., & Ifukube, T. (2010). Voice pitch control using tactile feedback for the deafblind or the hearing impaired persons to assist their singing. doi: 10.1109/ICSMC.2010.5642329

Slobodenyuk, N., Jraissati, Y., Kanso, A., Ghanem, L., & Elhajj, I. (2015). Cross-modal associations between color and haptics. Attention, Perception, & Psychophysics.

Vanvuchelen, M., Roeyers, H., De Weerdt, W. (2007). Nature of motor imitation prob- lems in school-aged boys with autism—a motor or a cognitive problem? Autism 11, 225–240.

Wan CY, Bazen L, Baars R, Libenson A, Zipse L, et al. (2011). Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: A Proof of Concept Study. PLoS ONE 6(9): e25505. doi:10.1371/journal.pone.0025505