A brief introduction to the modality effect

The gist of the modality effect is that the mode in which we learn new information impacts how well we can utilize and especially recall, that information. For about twenty-odd years, from the 1960s up until the early 1990s, if you read about the “modality effect”, it generally referred to the modality effect of sound over print. From the 1990s on, research on modality typically involved comparing multimedia, including animation, to a single format, usually print or static images.

Three crucial theories on memory were influential on modality research during the mid-1960s to the early 1970s. The first was Atkinson and Shiffrin’s modal model, which suggested that the mind works like a computer, in which some information linearly moves into the short term memory, and then only sometimes into long term memory, theorizing that making it into long term memory has to do with a mental “rehearsal”.  https://www.simplypsychology.org/multi-store.html is a brief summary of this cognitive theory.

In response to the modal model, Crowder and Morton theorized in 1969 that there was precategorical acoustic storage for sound – in other words, before your brain understands how to categorize what you hear, it would be “saved” in your short term memory as raw, acoustic sounds. They argued that Atkinson and Shiffrin were wrong about rehearsal, at least for the things we hear, and suggested that precategorical sound could rapidly decay or be wiped by additional audio. https://www.researchgate.net/publication/225757678_Precategorical_Acoustic_Storage_PAS is a very good and short summary of their theory.

The second theory was Baddeley and Hitch’s 1974 idea of a three part short term memory system (now called the Baddeley Model of Working Memory). This system, they suggested, had a visuo-spatial sketchpad and a phonological loop which separated out different kinds of information being learned. During the same time period Allan Paivio (1971; 1990; 1991) was first evolving his complementary theory of dual-coding, which hypothesized that there are two channels for processing information, one visual and another verbal.

Until roughly the early 1990s, the research on audio and memory often focused on a recency effect. If you heard a group of nonsense syllables or groups of numbers, you were better able to remember them in the short term, than if you read the same items – Murdock’s 1968 study of serial and paired items is typical of modality effect studies done during this time period, as is a more commonly cited study by Murdock and Walker (1969).

There was some interesting audio modality research testing whether adding the same suffix to all audio words would interfere with the modality effect: it didn’t (Engle, 1974); and whether silence or audio distractions would impact the modality effect (Gathercole, Gregg and Gardiner found in 1983 that a short amount of audio distractions didn’t cancel the audio modality effect, but more than thirty seconds did). Work by Drewnowski and Murdock (1980) expanded the field considerably by testing well over 1000 words, realizing that earlier studies were more likely to have acoustically similar words that could potentially influence recency. Penney’s 1989 theory of two codes, the A “auditory” code leading to the modality effect for recency, and the P code, phonological material that could come from articulated words (e.g. spoken aloud) and also those read visually, was also influential; she argued that modality would shape not only short-term processing, but also long-term memory processing. Surprenant, Pitt, and Crowder’s 1993 study, I would argue, bookends where audio modality kind of “fell out of fashion” in research. The exceptions are researchers looking at issues of audio in communications (such as advertising in broadcasting), and language learning (Vidal, 2011).

In 1973, Paivio and Csapo’s picture superiority effect was introduced as another question of modality: it appeared that visual images were better remembered through free recall, than text describing the images.

Educational television and multimedia was a trend that accelerated in the 1980s and 1990s as computer animation and video became more accessible to educators – and brought an increase in articles about the multimedia modality effect. Thompson and Paivio (1990) theorized that recency would be superior if subjects watched multimedia, rather than listening to audio imagery, or looking at static images – through this article and other research, Paivio’s dual coding theory became more commonly cited in educational psychology.  Paivio’s 1990 book Mental representations: a dual coding approach has been cited over 8,000 times, as has Imagery and verbal processes (Paivio, 2013; 1971) – 3,160 times in the last 10 years, almost as many references as it received in the first 30 years of the book (about 3700 according to Google Scholar).

Richard Mayer, the coauthor of our textbook, is by far the most influential writer on related multimedia modality research, and utilizes Paivio’s dual coding concept (Mayer & Sims, 1994) and cognitive load theory suggested by Sweller (1994), which builds on dual coding. Mayer and Roxana Moreno wrote many works in the field that relate to the multimedia modality effect, cognitive load theory, and working memory (Mayer & Moreno, 1998; Mayer & Moreno, 2002; Mayer & Moreno, 2003; Moreno & Mayer, 1999; Moreno & Mayer, 2000; Moreno & Mayer, 2002). Unfortunately, Moreno had an untimely death from cancer in 2010. A group of Dutch authors, JJG van Merrienboer, HK Tabbers and RL Martens have also been extremely influential for their studies of cognitive load theory, modality and multimedia (Tabbers, Martens, & Merrienboer, 2004). A large meta-analysis of modality effect papers by Ginns (2005) is a really helpful breakdown.

Meanwhile, there’s little on audio modality despite the nascent popularity of podcasting. Crutcher and Beer’s 2011 paper suggesting the existence of an audio superiority effect is fascinating, but nowhere near as cited or read as most of what exists on multimedia modality. Their research implies that you would remember an environmental sound better than the spoken words describing it (i.e. an elephant’s roar, versus the word “elephant”).

When I researched modality effects, there were excellent pieces such as the study by Crooks, Cheon, Inan, Ari & Flores (2012) reinforcing the multimedia modality effect. But I also found some mixed results, actually. One study found that students learning a new health warning had better recall of print or audio, rather than video presentation (Corston & Colman, 1997), suggesting that this has to do with self-pacing. In other words – if you’re a fast reader you can read the same item multiple times in the same time it would take you to hear the speaker read a script once – you can also choose to reread just the parts that you’re confused by. Byrne and Curtis (2000) also found a similar message was better remembered in print format. (Strangely, they also found that students had the highest multimedia recall – there were three different multimedia messages – of a message that theoretically, should have been the least helpful, which was a talking head video with no relevant images.)

Relating to our discussion from the other night, O’Keefe (1990), a communications researcher looking at persuasion in communications, made the point that watching a videotape (or arguably, listening to a podcast) often means that learners don’t have the same level of self-pacing, or signaling, that they would have with printed text. We can flip through the pages of a book rapidly, use the table of contents or page numbers to quickly pace ourselves, but in a long audio or video file that hasn’t been properly chunked or formatted by the creator, we don’t always have the same freedom. Increasingly videos are set up to let us fast-forward visually to see where we want to go back or go forward, but with audio, there still isn’t something quite like that – as Austin pointed out, JAWS/NVDA works that way for blind listeners, but in my experience, most sighted people find it hard to imagine using at the same speed – witness the following video:

Listen how quickly he listens to his audio browsing.

Furham, Gunter & Green (1990) argued in a piece about recall and modality that just because you provide a print script for something that was produced originally as an audio-visual presentation, it doesn’t mean that everything will easily translate back and forth to one another, stylistically. That’s pretty good common sense: we know that the audiovisual script was deliberately written with the assumption of a descriptive photograph or footage; will we really get all we need if one of the components is dropped? This is why, in a nutshell, media usability for people with visual and hearing disabilities is so crucial. This is why people with visual disabilities use “Descriptive Video” to watch television or movies, so someone can tell them, “Elizabeth hesitates and paces as she reads Mr. Darcy’s letter.” Otherwise they’re just hearing someone thumping around.

Many people who do not use JAWS/NVDA every day could conceivably learn how to listen at high speed: for instance, playback of a screen capture of a PowerPoint lecture could go at double speed. But there is still a stylistic cue for them to use – the images of the PowerPoint!

However, some research ignores the stylistic cues in media design. A really good example of this is a study by Daniel & Woody (2010) that was negative about the use of podcasting for learning new concepts. I would argue that their study was flawed: the researchers took a difficult, 3000 word article and provided it to developmental undergraduates. It was provided either in a print or supposedly “podcast” format, which involved a person reading the same article aloud for 22 minutes. If it’s between reading a complex 3000 word article and listening to someone read the article aloud for 22 minutes, with no way of zipping back and forth to the “important bits” on the audio file other than guesswork, which would you choose? The students specifically complained that they couldn’t review the podcast easily, but the researchers said that based on the test results, the podcast listeners didn’t learn as much. However, the file they created isn’t characteristic of podcasts for stylistic reasons – imagine if Crutcher and Beer’s “audio superiority effect” was tested here, for instance.

Byrne, M., & Curtis, R. (2000). Designing health communication: Testing the explanations for the impact of communication medium on effectiveness.  British Journal of Health Psychology, 5(2), 189-199. doi:10.1348/135910700168856

Corston, R., & Colman, A. M. (1997). Modality of communication and recall of health-related information. Journal of Health Psychology, 2(2), 185-194. doi: 10.1177/135910539700200215

Crooks, S. M., Cheon, J., Inan, F., Ari, F., & Flores, R. (2012). Modality and cueing in multimedia learning: Examining cognitive and perceptual explanations for the modality effect. Computers in Human Behavior, 28(3), 1063-1071.  doi:10.1016/j.chb.2012.01.010

Crutcher, R. J., & Beer, J. M. (2011). An auditory analog of the picture superiority effect. Memory & Cognition, 39(1), 63-74.

Daniel, D. B., & Woody, W. D. (2010). They hear, but do not listen: Retention for podcasted material in a classroom context. Teaching of Psychology, 37(3), 199-203

Drewnowski, A., & Murdock, B. B. (1980). The role of auditory features in memory span for words.  Journal of experimental psychology. Human learning and memory, 6(3), 319-332. doi: 10.1037/0278-7393.6.3.319

Engle, R. W. (1974). The modality effect: Is precategorical acoustic storage responsible? Journal of Experimental Psychology, 102(5), 824-829. doi:10.1037/h0036363

Fumham, A., Gunter, B., & Green, A. (1990). Remembering science: The recall of factual information as a function of the presentation mode. Applied Cognitive Psychology, 4(3), 203-212. doi: 10.1002/acp.2350040305

Gathercole, S. E., Gregg, V. H., & Gardiner, J. M. (1983). Influences of delayed distraction on the modality effect in free recall.  British Journal Of Psychology, 74(Pt 2), 223-232. doi:10.111 l/j.2044-8295.1983.tb01858.x

Ginns, P. (2005). Meta-analysis of the modality effect. Learning and Instruction, 15(4), 313-331.

Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual processing systems in working memory. Journal of Educational Psychology, 90(2), 312.

Mayer, R. E., & Moreno, R. (2002). Animation as an aid to multimedia learning. Educational Psychology Review, 14(1), 87-99. 10.1023/A:1013184611077

Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational psychologist, 38(1), 43-52

Mayer, R. E., & Sims, V. K. (1994). For whom is a picture worth a thousand words? Extensions of a dual-coding theory of multimedia learning. Journal of Educational Psychology, 86(3), 389-401. doi: 10.1037/0022-0663.86.3.389

Moreno, R., & Mayer, R. E. (1999). Cognitive principles of multimedia learning: The role of modality and contiguity. Journal of Educational Psychology, 91(2), 358.

Moreno, R., & Mayer, R. E. (2000). A coherence effect in multimedia learning: The case for minimizing irrelevant sounds in the design of multimedia instructional messages. Journal of Educational Psychology, 92(1), 117.

Moreno, R., & Mayer, R. E. (2002). Verbal redundancy in multimedia learning: When reading helps listening. Journal of Educational Psychology, 94(1), 156.

Murdock, B. B. (1968). Modality effects in short-term memory: storage or retrieval? Journal of Experimental Psychology, 77(1), 79-86. doi:10.1037/h0025786

Murdock, B. B., & Walker, K. D. (1969). Modality effects in free recall. Journal of Verbal Learning and Verbal Behavior, 8(5), 665-676.

O’Keefe, D. J. (1990). Persuasion: Theory and research. In J.G. Delia (Ed.), Current Communication: An Advanced Text Series [Series]. (Vol. 2.) Newbury Park, CA:Sage.

Paivio, A. (1971).  Imagery and verbal processes. New York: Holt, Rinehart, and Winston.

Paivio, A. (1990).  Mental representations: A dual coding approach. New York: Oxford University Press: Clarendon Press.

Paivio, A. (2013). Imagery and verbal processes. Hillsdale, NJ: Lawrence Erlbaum Associates.

Paivio, A. (1991). Dual coding theory: Retrospect and current status.  Canadian Journal of Psychology/Revue canadienne de psychologie, 45(3), 255-287. doi:10.1037/h0084295

Paivio, A., & Csapo, K. (1973). Picture superiority in free recall: Imagery or dual coding? Cognitive Psychology, 5(2), 176-206. doi:10.1016/0010-0285(73)90032-7

Penney, C. G. (1989). Modality effects and the structure of short-term verbal memory. Memory & Cognition, 17(4), 398-422. doi:10.3758/BF03202613

Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4(4), 295-312.

Surprenant, A. M., Pitt, M. A., & Crowder, R. G. (1993). Auditory recency in immediate memory.  The Quarterly Journal of Experimental Psychology Section A, 46(2),193-223. doi: 10.1080/14640749308401044

Tabbers, H. K., Martens, R. L., & Merrienboer, J. J. G. (2004). Multimedia instructions and cognitive load theory: Effects of modality and cueing.  British Journal of Educational Psychology, 74(1), 71-81. doi: 10.1348/000709904322848824

Thompson, V. A., & Paivio, A. (1994). Memory for pictures and sounds: Independence of auditory and visual codes.  Canadian Journal of Experimental Psychology, 48(3), 380-396. doi:10.1037/l 196-1961.48.3.380

Vidal, K. (2011). A comparison of the effects of reading and listening on incidental vocabulary acquisition. Language Learning, 61(1), 219-258.