Looking at archival sound: Enhancing the listening experience in a spoken word archive
First Monday

Looking at archival sound: Enhancing the listening experience in a spoken word archive by Annie Murray and Jared Wiercinski

What helps researchers listen in deep and engaged ways to poetry that is delivered on the Web? This paper considers how visual aspects of Web–based archives for poetry recordings can enhance the listening experience for users by providing more context and clarification that can help users better understand and use the recordings. Drawing from studies in a variety of disciplines that demonstrate that much of our learning is multimodal, the SpokenWeb project in Montreal, Canada is using digitized live recordings of a Montreal poetry reading series from 1965–1972 featuring performances by major North American poets to investigate the features that will be the most conducive to scholarly engagement with recorded poetry recitation and performance. Visual features such as tethering audio playback with a written transcript, sound visualization and including videos and images are discussed as means to enhance the listening experience in a digital spoken word archive.


The scholarly value of spoken word archives
Benefits of audio playback
Visual features that enhance the listening experience
Benefits of audio on its own




Our project is to anticipate and develop useful design and interface features for scholars of poetry and poetic performance. In developing a Web–based system for the delivery of archival audio recordings, we are guided by two questions. What do we look at when we listen? What helps researchers listen in deep and engaged ways to poetry that is delivered on the Web? We wonder what kinds of site navigation, audio visualization, design elements and functionalities could be offered by a Web–based spoken word interface, and how these might enhance the listening process and, ultimately, the scholarly endeavor. Most memory institutions have limited resources at their disposal, so we are mindful that given the vast amounts of cultural heritage stored on vulnerable audio formats, it is often imperative to focus on preservation, digitization and providing some kind of basic access to recorded heritage; however, we must also try to cooperatively develop interesting and cost–effective means of providing meaningful scholarly access to these audio–based cultural treasures.

SpokenWeb (http://spokenweb.concordia.ca) is a team of literary scholars, designers and librarians based at Concordia University in Montreal. We are in the midst of a federally–funded research project to develop a prototype for a Web–based digital spoken word interface for literary research. Our audio content for the SpokenWeb site consists of approximately 80 hours of digitized poetry readings that took place at Sir George Williams University (now Concordia University) from 1965 to 1972. This time period was a particularly dynamic one for North American poetry, and as such, the audio archive is a particularly promising research tool for scholars investigating North American poetry, literary history and poetic performance in the twentieth century. The variety and caliber of poets included in the archive makes this a particularly stimulating research project as well, since our hope to eventually provide full Web access to the recordings will undoubtedly generate new and unanticipated audiences and research activities.

According to a 2010 Cisco study of global broadband traffic, “Peer–to–peer has been surpassed by online video as the largest category” [1]. It seems reasonable to predict, then, that audio files will continue to be increasingly paired with some form of visual information and will decreasingly appear as a stand–alone entity. In our exploration of existing digital spoken word archives, we noticed an abundance of sites that provide basic metadata for a recording along with a basic audio player. The sites are often difficult to navigate or offer no enriching features that could help scholars who rely on primary audio sources for their work. Our interest in developing a rich visual experience for users of audio archives, then, coincides with both the prominence of audio as an intrinsic element of the Web and also the amount of still undigitized and undiscovered audio content held by museums, libraries, broadcasters, researchers, and archives. We believe the potential of Web–based sound archives is vast, and adopting a design and development perspective grounded in the user’s experience will allow for rich and stimulating archival research that maximizes the creative and research potential afforded simultaneously by both digital audio and trends in Web development.



The scholarly value of spoken word archives

Spoken word archives have relevance to researchers and students in a variety of fields. SpokenWeb’s focus is on the poetry reading itself as a valuable primary source with inherent creative and research potential. In our thinking about how to optimize a Web environment for delivery and scholarly use of archival poetry recordings, we realize that other types of spoken word recordings would benefit from similar consideration and development. Considering the extent of both known and hidden audio collections held by libraries, archives and museums, we can quickly understand the need for useful Web interfaces to deliver digitized and born digital recordings such as ethnographic field recordings, interviews, oral histories, radio programs, speeches, and the like. A vast body of cultural heritage could be much more helpfully presented to its users. So although we are addressing a body of poetry recordings in this paper, we are mindful of the sizeable amount of other spoken word collections that might benefit from Web development that is attentive to usability, display, design and functionality.

For our purposes, poetry readings are the building blocks of our digital spoken word archive. Poetry readings are a rich site for literary history and criticism since they allow for an engagement with both the poet and the poems that may not be possible through textual sources alone.

Charles Bernstein (2009) argues that poetry readings:

... have been regarded largely as extensions or supplements of the visual text of the poem. Indeed, there has been very little critical work on the poet’s performance of a poem: at least, up until very recently, literary criticism has pretty much been confined to the printed text.

The reason for this is practical as much as conceptual. While archives of poetry recordings exist, they are largely inaccessible. Very few editions of poet’s sound recordings have been published. As a result, basic principles of textual scholarship have not yet been applied to the sound archive. But the times “they are a–changin”. [2]

Indeed, Bernstein (2009) notes that “Within the context of literary criticism or textual scholarship, the performance of the poem has not been generally recognized as part of the work, a designation that is reserved for the text, understood as the scripted incarnation of the poem” [3]. Regarding the inaccessibility of audio recordings, Furr (2010) notes that “the ubiquitous practice of recording poetry seems to be principally documentary, the irony being that the documents generally go unheard” [4]. It is somewhat sad to think of the many terabytes of poetic performances that sit silent.

In her discussion of archival sound, Kate Eichhorn (2009) examines some of the recent critical work in archival theory that concerns the “silent” archive. She notes, “considerably less critical attention has been paid to the archive that literally speaks, as is true of a sound archive” [5]. She asks: “How might an archive provide access to past sound events, including poetry performances, without reducing them to flat and lifeless artifacts?” [6] In her consideration of the difficulties of archiving sound, she suggests that “the poetry reading raises especially pressing questions”, one of which is that the poetry reading is still regarded by many as “secondary to the written or printed poem” [7]. She notes that:

Any attempt to archive poetry in performance requires an interrogation of the poetry readings’ relationship to the written and printed poem, and an interrogation of the widely held assumption that the archive is necessarily a space of writing and, hence, opposed to speech and other performative acts. [8]

Eichhorn’s assessment of the marginality of poetry performance in the archive and her argument that “since the late 1950s the poetry reading has been one of the most important sites for the dissemination of poetry in North America” [9] provide incentive for cultural institutions with poetry recordings to make them available.

It appears that poetry readings and performed poems have not received adequate critical attention compared to published and printed poems. Bernstein (2009) notes that this may be due to inadequate dissemination of sound recordings of poets [10]. Indeed, as a research medium the sound recording presents many obstacles, including the serious and probable issue of being stored on fragile or obsolete media. However, the potential of audio recordings for new directions in understanding and criticism of poetry is considerable. Derek Furr (2010) notes a “symbiotic relationship between technology and interest. As recordings are made easily accessible, public and scholarly interest increases, which in turn drives efforts to recover and disseminate recordings” [11]. Thus, hopefully the increasing availability of audio poetry coupled with the call from scholars such as Bernstein, Eichhorn and Furr for critical attention to them will lead to rich scholarly engagement with audio archives.

We share Bernstein’s contention that there is much potential for literary scholarship that is attentive to or rooted in the performed versions of poems. In order to place our own archival recordings in context and to build a community of institutions that have similar collections, the SpokenWeb team has conducted a survey of Canadian libraries, museums, archives, English and Creative writing departments and other cultural organizations to gauge the extent of collections of recorded English–language poetry readings in Canada. The significance of the poetry reading in the dissemination of poetry and the development of poets in Canada in the twentieth century cannot be underestimated. Universities organized numerous reading series that brought in Canadian and American poets. Federal grants often supported Canadian poets on reading tours. The poetry reading was and is a central dissemination mechanism for poetry and remains a vital aspect of Canadian literary culture. We feel there could be a sizable body of archival poetry recordings in Canada that may be languishing, or in need of preservation or improved access. The desired outcome of the survey is to have a better understanding of the state of recorded poetry in Canada.

If we can locate other poetry readings performed by the same poets, scholars could compare performances of various poems over time, and have a better idea of a poem’s evolution in terms of content, performance and reception. Eichhorn (2009) notes that “many poets consider reading works in progress to an audience to be a vital part of the editing process” [12]. It would be possible to analyze the performance styles of different poets, and to have new insights into the poetry based on the way it has been performed. Increased use of poetry recordings would also yield interesting cultural and social history, because scholars could analyze the context in which poems were written and performed, and they could better know how audiences responded to certain poems. Since poetry readings often featured several poets performing their work, scholars could more closely explore how poets performed their work when other poets were performing before or after them, and how the presence of other poets may have affected selection and performance of individual poems. Preambles, introductions and comments after poems would yield similarly valuable explanatory information about the works performed.

In her examination of sound archives, Eichhorn (2009) suggests that “both the recordings and the archives consistently stop short of exploiting technologies in ways that might enable a deeper investigation, and even realization, of the sound archive’s potential” [13]. In her exploration of the PennSound digital archive and the Woodberry Poetry Room at Harvard, she concludes that “even when recordings are restored and made available to the public, they invariably lack the qualities that mark the poetry reading” [14]. Indeed, it is difficult to recreate the “liveness” and the particular local and evanescent qualities that characterize a poetry reading. However, librarians and archivists who work on providing access to poetry recordings, particularly on the Web, nonetheless have tools, technologies and most likely a willing and appreciative potential user base of scholars and poets available who can help create better online environments for listening to and looking at a poetry reading, an inherently multisensory event, on the Web, an inherently multimodal tool.



Benefits of audio playback

Before examining how certain visual features of an archive’s interface can enhance the listener’s experience, let’s consider the benefits of audio playback from a listener’s perspective. Or, to put it as a question: What are the benefits of hearing an audio recording of a spoken word performance as opposed to, for example, reading the same content in a text document? What cognitive, emotional, interpretive or aesthetic benefits can be gained when hearing content in audio form?

In his essay “Open ears”, Schafer (2011) reminds us of the Latin origins of the word audio:

The Latin word audire (to hear) has many derivations. One may have an “audience” with the king, that is, a chance to have him hear your petitions. One’s financial affairs are “audited” by an accountant, because originally accounts were read aloud for clarity. [15]

Implicit in these examples then, we think, is an understanding that the printed word can be ambiguous (i.e., subject to multiple interpretations) — sometimes to the point of being completely unclear in meaning — and that hearing something read aloud can reduce this ambiguity.

When something is read aloud it provides the listener with considerably more information than could be obtained from a written document alone, and provides certain auditory clues that help the listener to interpret and understand what is being said.

For example, an audio recording of a poetry reading, prepared speech, or interview may provide us with the following:

  • the speaker’s introductory comments prior to a reading and any explanatory comments throughout
  • volume changes in speech
  • pitch changes in speech (e.g., marking emphasis on certain words or syllables)
  • the pace of speech
  • the speaker’s phrasing of specific passages
  • improvised or changed sections of an established work
  • the speaker’s actual physical voice
  • the speaker’s emotional tone
  • pauses in speech
  • the speaker’s intentional and unintentional non-speech sounds (e.g., laughter, coughs, breathing, sighing, hesitations or stutters)
  • sounds originating from the audience (laughter, questions, comments, heckling, applause, silence)
  • sounds originating from accompanying performers (e.g., musicians)

Although we are concerned in this section with the benefits of a recorded spoken word performance, and therefore naturally this list largely concerns various qualities present in recorded speech, it is also worth emphasizing the potential value of non–speech sounds when deciphering meaning. Freud, as Schafer (2011) points out, “attached great significance to slips of the tongue (Freudian slips) and to other spontaneous or inadvertent sounds such as harsh breathing or the tapping of a foot or fingers...” [16]. While we are not concerned here with “Freudian slips”, spontaneous non–speech sounds are certainly relevant. As Schafer (2011) says:

That spontaneous or uncontrollable sound-making had important implications and could be deciphered like a secret language was a revelation. It was as if the human being was signaling in one way through controlled grammatical speech and in another way in the accents and accidents that surrounded conscious communication. [17]

Both speech and non–speech sounds, then, as well as the absence of sound (e.g., pauses, hesitations, the absence of laughter, et cetera) that are present (or absent) during a sound recording of a spoken word performance can all provide information that helps a listener to better understand its intellectual, emotional, and aesthetic meaning. Sound recordings may also generate new meanings. As Furr (2010) argues:

The recorded poem often troubles the shape and sounds of the poem as we’ve received it in print. Voice timbre and pitch, ambient sounds in the live recording, the juxtaposition of poetries not commonly anthologized together, the audio poem resonates in ways unique to the medium and causes us to attend equally to poetic craft, materials, form, and contextual constraint. [18]



Visual features that enhance the listening experience

In this section we examine how certain visual features of a digital spoken word archive’s interface can enhance the user experience. Specifically, we are interested in how visual features provide additional or complementary cognitive, emotional, and aesthetic benefits. It is worth emphasizing the word “complementary” here, since there are many situations when a user’s experience of an interface would be best characterized as being multimodal, or involving more than one sense modality at a time. We recognize from the outset that separating a user’s experience into separate sense modalities (e.g., auditory, visual, haptic, et cetera) is to a large degree artificial. While there are certainly situations when a user will be simply looking at a particular object in the interface, or only listening to a particular sound being sounded, there will be many times when these and other sense modalities are all engaged and working in concert. This, of course, is how things usually happen in the natural world, since, as Neisser pointed out, “most naturalistic situations involve simultaneous multimodal input” [19].

In this section we specifically address these multimodal (i.e., auditory and visual) interactions with an interface. As Sarter (2007) points out:

Multimedia output systems (i.e., systems that present information to the user via various media) can support functions such as synergy (i.e., the merging of information that is presented via several modalities and refers to various aspects of the same event or process) or redundancy (i.e., the use of several modalities for processing the exact same information). [20]

We are concerned with how certain visual features of a spoken word archive’s interface can affect a user who is already engaged in listening. Whereas reading a poem is essentially a visual–cognitive task, and listening to an audio recording of a spoken word performance of that poem is an auditory–cognitive task, we are interested in how certain visual features work to enhance or impair the cognitive processing that is occurring while a user is already engaged in listening to audio content. To put it as a question: How does what you look at while you listen change what you hear? How does it affect your ability to interact with and process the site’s content? Given that we are working with spoken word archives (i.e., audio archives), we are in some sense prioritizing audio, and taking it for granted that the user will typically be engaged in listening through the site’s interface. Now we ask: What visual features can affect the listener’s experience?

To begin assessing how visual features can enhance the listener’s experience of a spoken word archive, it will be helpful to examine the literature from a variety of disciplines, such as cognitive science, education, literary studies, and library and information studies. Although it is well outside the scope of this paper to provide a review of all adjacent research areas and disciplines, we survey and include as much as possible of what we consider to be the most relevant research.

Literature review

a. Cognitive studies

There are significant cognitive aspects related to the user experience of the audio and visual aspects of a spoken word archive’s Web interface. Concerning speech perception, a wide range of studies examine the relationship — sometimes complementary, other times competitive — between audio and visual information. On the complementary side, Theeuwes, et al. (2007) note that “in the 1950s it was shown that the presentation of the face of the speaker can improve speech recognition compared with auditory–only presentation” [21]. On the competitive side, there is evidence for the dominant effect of visual information over auditory information. The McGurk illusion (McGurk and McDonald, 1976) is an often–cited example, and occurs when “an auditory /ba/ dubbed onto a visual /ga/ [i.e., a video of a mouth pronouncing the /ga/ sound] is heard as /da/ by most observers” [22]. The effect of viewing speech while listening is so strong, they note, that if the two are in conflict (e.g., they are not synchronized or are otherwise mismatched), what a person hears will be partially or completely determined by what they see [23].

Concerning desktop user interfaces, Dowell and Shmueli state that the characteristics of visual text have historically made it generally preferable to speech output, and note that Smith and Mossier’s (1986) “[l]ong standing guidelines advise that speech not be used for presenting complex content and, wherever possible, that a visual display of information should be used instead of speech” [24]. In order to further examine this relationship, their 2008 study investigated how well a user comprehends verbal information across three different conditions: only visually, only verbally, and when a visual display is combined with speech output. Their results show that there is no difference across conditions for short sentences. For long sentences, however, they found that participants had better results with the visual–only and multimodal conditions (i.e., participants had poorer results when trying to comprehend long sentences during the speech–only condition). This further confirms Smith and Mossier’s guideline that speech output should not be used for complex information, at least not in isolation. Or, as they conclude: “The experiment suggests that a redundant multimodal display will neither assist nor disrupt understanding when compared with a purely visual display, but it will assist understanding of complex content when compared with speech output alone” [25].

An argument for pairing audio and visual representations of verbal content, however, can be found in Ho and Sarter’s (2004) study which examines the utility of different modes of communication among military officers. As Sarter (2007) states:

The ultimate goal of this study was to inform the design of adaptive interfaces that adjust to user preferences and task content to support both human–computer interaction and computer–supported collaborative work. [26]

Participants had the choice of any combination of visual (text message, drawing/referring), auditory (two–way radio, face–to–face communication) and tactile (vibrotactile) methods of communication. The study found that having access to the same or similar content via different methods of communication made it possible to clear up ambiguities: “87% of all multimodal exchanges in this study served the purpose of mutual disambiguation of two signals or messages to ensure their proper interpretation” [27].

Cognitive research on the use of different modalities together also provides relevant considerations for those deciding how best to offer content, and which mediums to choose. In some contexts, using multiple modalities in concert serves to lighten the overall cognitive burden. Theeuwes, et al. describe Wickens’ (1984) influential multiple–resource theory, which “posits that when tasks share common resources along a given dimension (e.g., both tasks require the visual channel), performance is poorer than tasks that utilize separate resources (i.e., one task uses the visual domain and the other task uses the auditory domain)” [28]. Their own 2007 study provides some evidence for the idea that there are independent resources for auditory and visual processing, and that interference across channels occurs when higher level cognitive processing of a particular sensory input is required. This situation, they say, “forces serialization between the operations requiring central processing” [29].

Sarter (2007) provides us with a helpful illustration of these ideas using the example of the situation on modern flight decks, “where the auditory channel is already used rather extensively for communication and alerting purposes” [30]. She describes the value of simultaneously presenting information visually, using a 2001 study by Nikolic and Sarter: “Simulator–based evaluations of these interfaces have shown that pilots were better able to track the [automated flight deck systems] if the corresponding information was presented in peripheral vision” [31].

Finally, cognitive studies of participants’ emotional responses to music and dance performances are also relevant. Adams’ (1994) study included participants (i.e., both musicians and non–musicians) who were exposed to three different conditions of a recorded concert (i.e., aural only, visual only, and aural and visual combined). While there was no difference between the responses of musicians and non–musicians in the aural–only and combined visual–aural conditions, there were significant differences between these two groups in the visual–only condition. Further, the degree of emotional experience was considerably stronger among musicians during the aural–visual condition than during the aural only condition. However, a similar study by Frego (1996), which studied emotional reactions to a recorded dance and musical performance, found that there were no significant differences between the responses of musicians and non–musicians, and no significant differences across the three conditions.

Vines (2005), like Adams, also finds differences across these three different perceptual conditions. He studied “the ways in which visual and auditory information convey emotion (as indexed by tension) and structure (as indexed by phrasing)” [32]. He found that the audio and visual channels conveyed both different and complementary information, noting that “the visual modality conveyed tension information that was independent from the tension in sound, though there were some convergent durations when visual information complemented auditory information” [33]. He also noted that participants who both saw and heard the performances sometimes had enhanced emotional responses, whereas at other times including the visual information reduced their emotional responses. Overall, he concludes that “interaction effects strongly suggest that the auditory and visual channels mutually enhance one another to convey content and that an emergent quality exists when a musician is both seen and heard” [34].

b. Education studies

Teachers and researchers in Education have evaluated the use of combining learning modalities by engaging both listening and viewing to increase comprehension and enjoyment in classroom settings. While researching comprehension of videos when subtitles are added for the benefit of students learning English as a second language, Chen (2011) draws upon Paivio’s work on dual coding theory, which contends that “cognition is served by two modality–specific symbolic systems that are experientially derived and differentially specialized for representing and processing information concerning nonverbal objects, events, and language” [35]. Paivio emphasizes that the two systems are “functionally interconnected so that activity in one system can initiate activity in the other” [36]. Chen reported on the students’ improved comprehension of videos once subtitles were made available. He measured this against the use of non–subtitled videos, and found that students were better able to grasp the contents of videos when words were also available. In France, Borrás and Lafayette (1994) conducted similar investigations with American students learning French and noted:

The study has proven that giving learners control over the pace of the subtitles may be an effective way to reduce channel layer density and improve the quantity and quality of learners’ oral intake and production. Further research on the issue should investigate whether the additional possibility of inserting and removing the subtitles may increase such benefits. [37]

Of course, the kinds of cognitive activities undertaken by language learners are different from those being undertaken by a scholar or student of poetry, but encountering audio–only performances of poems nonetheless presents ample opportunity for confusion that could be reduced by having another means whereby the listener could process the aural information of the poem. This could be as simple as providing a transcript. While we could not suggest that a subtitle of an ESL video offers the same kind of assistance for a literary scholar listening to a poem and reading a transcript at the same time, the relevance and broader implications of Paivio’s work on dual coding theory is worth noting in the context of interface design for spoken word archives.

The Scottish Music program at the Royal Scottish Academy of Music and Drama developed the Handing on Tradition by Electronic Dissemination (HOTBED) system as a means of delivering digital recordings of Scottish songs and music to students. Assessment of user needs was undertaken before the development of the software to see what students might be looking for in the future HOTBED database. Students evaluated existing online resources for learning traditional Scottish songs. In their list of most useful features accompanying a song, they cited the availability of transcriptions and being able to see contextual information about each song. They expressed interest in being able to slow down audio playback, which speaks to the difficulty of working with audio and the need for clarification whether through slower playback or visual confirmation of words in a transcript (Marshalsay, 2001).

In journals devoted to the teaching of literature in schools and colleges, there was enthusiastic discussion, particularly from the 1930s through the 1960s, of the use of poetry recordings to stimulate interest in poetry and increase comprehension of poems and plays [38]. Poetry memorization and recitation was already an established pedagogical method in schools, and the popularization of phonographs featuring poetry was a new opportunity for students to listen to recordings of poetry in the classroom. Educator Walter Ginsberg called recordings an “invaluable scientific aid to high–school English” [39]. In the same year, Ginsberg reported on a large–scale investigation conducted by the Committee on Scientific Aids to Learning of the effectiveness of using sound recordings for the teaching of Shakespeare. Twenty–five schools from different U.S. regions participated in the study, which involved students listening to Mercury recordings and reading along with accompanying texts of the Merchant of Venice and Twelfth Night. Following the use of the texts and the recordings, students were tested on the material and informal investigations of affective and attitudinal responses to the performances and texts were measured. The experimental groups “showed consistent superiority in mastery of content, understanding, and appreciation” and one student reported “for the first time we begin to see what the plays are all about” [40].

It has become a truism in Education that there are different kinds of learners (e.g., visual, auditory or kinesthetic). The studies cited here demonstrate the success of blending sensory modalities to increase comprehension and learning. This was more recently demonstrated in a series of eight experiments conducted by Mayer (1997), who studied problem solving in students. When the research team compared the performance of students who learned from visual and verbal explanations that were coordinated (multiple representation group) with the performance of students who received only verbal explanations (single representation group), they found that “students who received visual explanations coordinated with verbal explanations produced more than 75 percent more creative solutions to transfer problems than did students who received the explanation presented only in verbal form” [41].

The idea of improved learning through the use of multiple channels is echoed in multimedia theory, whose “central claim is that presenting information using more than one sensory modality will result in better learning” [42].

c. Literary studies

Poets and scholars of poetry such as Bernstein (2009) and Eichhorn (2009) have drawn attention to the importance of sounded poetry for increased, complementary or even new critical engagement with poems, and have noted the relative marginalization of poetry recordings as a subject for serious literary research. Bernstein (2009) writes that “within the context of literary criticism or textual scholarship, the performance of the poem has not been generally recognized as part of the work, a designation that is reserved for the text, understood as the scripted incarnation of the poem” [43].

Middleton (2005) and Swigg (2007) suggest that simultaneously listening to poetry and reading/seeing the text of a poem is essential. In “How to read a reading of a written poem,” Peter Middleton’s outline of the factors that influence the production of meaning in contemporary Anglophone poetry readings, he argues that “both the performance of the poem and silent reading of the poem are necessary to experience the poem.” He notes the existence of a “mutual interdependence of performance and silent interpretation,” concluding that “both silent reading and oral performance are incomplete scenes of reception” [44].

Swigg (2007) comments on the need to both hear and read William Carlos Williams at the same time, referring to the poet’s “vocal–visual art” and quoting Hugh Kenner who explains “It seems to me that there is an audio–visual counterpoint. You have to hear the poem and you have to see it ... . If you listen to a Williams recording, you’d better have the text in front of you, so you can see what he’s doing with it” [45].

In a similar spirit, Derek Furr’s Recorded poetry and poetic reception from Edna Millay to the circle of Robert Lowell suggests that “text, audio, and context describe a circle that constitutes the scene of listening to poetry from the audio archive. Even as we lean toward the center of the circle to hear the poet’s voice, echo, static, and resonance will cause us to turn our heads or glance at the page” [46]. As we can see, his argument that “aural and print media together may inform an understanding of modern poetry” [47] can be applied to a variety of learning situations involving sound and other media.

d. Library and information studies

Several reports and studies from the library and information studies literature have examined features that can enhance sound archives. Asensio (2003) conducted a user requirement study in order to determine enhancement and development features for moving pictures and sound portals. Asensio used a variety of methods (e.g., interviews, inquiry groups, telephone calls, and workshops) in order to assess the needs of 119 users, comprised of teaching staff, researchers, practitioners, developers, and support staff, who were drawn from 16 different institutions. A selection of some of the most relevant requests that emerged from the study include:

  • “Consider making manipulation tools available via the portal” [48]
  • “Search on waveform or image characteristics” [49]
  • “Leads to analysis and functional tools (if portal doesn’t have them)” [50]

Barrett, et al.’s (2004) report on the HOTBED project includes a user needs analysis which was used to inform the design of their system for the delivery of digital recordings of Scottish music to students, and is “focused on the transmission of oral/aural rather than written knowledge” [51]. Three user needs analysis sessions conducted with staff and students produced a “wish list” of desired features. A selection of the most relevant of these include:

  • “Slow it down without altering pitch” [52]
  • “Easily repeat by phrases and loop them” [53]
  • “Relevant images” [54]
  • “Visual component to learning by ear” [55]
  • “... it became clear that video is also of great benefit in learning aurally...” [56]

Finally, Breaden’s (2006) paper “examines the functionality of twenty-five on-line Web exhibits in their use of audio media” and uses a matrix to measure “specific aspects of audio performance” [57]. Given that this paper was written five years ago, and that the Web has advanced considerably in that time, features such as sound visualization, tethered transcripts, and the ability to manipulate audio playback are absent from the list of desired functionality. Our own paper may be viewed as an attempt to extend this list.

Specific visual features that enhance the listening experience

In this section we explore specific visual features of Web interfaces and digital audio tools that can enhance the listening experience. Although our paper is specifically concerned with spoken word archives, we draw our examples here from a variety of types of sound archives and sites — some that are firmly established, and others which are still in development — on the assumption that many of these features are potentially useful in a range of contexts.

a. Two–way tethering of audio playback and written transcript

Our literature review includes a variety of studies that argue for the cognitive and educational value of providing access to written or textual versions of content (e.g., Smith and Mossier, 1986; Marshalsay, 2001; Ho and Sarter, 2004; Middleton, 2005; Swigg, 2007; Dowell and Shmueli, 2008) or of providing access to sub–titles to accompany videos for language learners (e.g., Borrás and Lafayette, 1994; Chen, 2011).

An excellent example of pairing audio playback with a written transcription of the spoken word content comes from an experimental Web site, the Radiolab player demo (http://hyper-audio.org/r/), a collaborative design effort from Radiolab, Mozilla, and SoundCloud [58]; [59].

This site features both a transcript of the spoken word content as well as a waveform display, which is a visual representation of the recorded audio signal’s amplitude (i.e., the physical correlate of perceived volume) over time. More importantly, these two features are interactive. A user can click on different positions in either the transcript or waveform display in order to navigate the content. Impressively, the site also offers two–way synchronization or tethering (i.e., clicking on a particular point in the waveform display changes the highlighted position in the transcript, and vice versa). Figure 1 shows the Radiolab Player demo interface.


screenshot of the Radiolab Player demo site
Figure 1: A screenshot of the Radiolab Player demo site, which features two–way synchronization between the interactive transcript and waveform display.


These features provide several advantages to users. First, users can search, browse, or navigate the content using the method that best suits their purposes. For example, when searching for a particular word, phrase, or sentence, they can skim the transcript or use their browser’s search function. Or, they can click on a certain points in the waveform display in order to quickly jump to various positions in the audio playback. Secondly, this tethering allows for the listener to follow along in a very precise way. The synchronization of the transcript is done is such a way that the exact word being spoken is also simultaneously highlighted in the transcript (i.e., via the “edge” in the transcript between already–played and yet–to–come text). Therefore, if a user was listening and viewing the example shown in Figure 1, they would simultaneously hear and see the word “and”. This site is inspiring because it confronts and to some extent resolves an inherent problem with audio: it is not usually searchable. Thus the recording becomes more navigable, and, in a sense, more knowable. On the critical side, there are some minor bugs with the demo (e.g., scrolling needs to be done manually between pages of the transcription).

Further, it is worth mentioning other sites that also include similar features. Alexander Street Press’s Ethnographic Video Online database (http://anth.alexanderstreet.com/) also offers similar two-way tethering between videos and transcripts, without the bugs in the Radiolab Player demo. Finally, an “interactive transcript” feature embedded within talks from the TED Conferences Web site (http://www.ted.com/) also include tethering, although the tethering is one way (i.e., clicking on the transcript advances the video playback, but not the reverse).

b. Sound visualization

In the previous section we mentioned Radiolab’s use of a waveform display, which is a form of sound visualization. A waveform shows how loud or quiet different parts of the recording are. The SpokenWeb site also includes a waveform display via the SoundCloud (http://soundcloud.com) audio player. Figure 2 shows a waveform display through the SpokenWeb interface.


screenshot of the SpokenWeb site
Figure 2: A screenshot of the SpokenWeb site, which includes a waveform display. Abrupt changes in the volume of the recording often show up as noticeable changes in the waveform, as indicated by the different parts in this recording.


The waveform display can be useful in a number of ways. First, it allows a user to see a recording’s length, and to understand the current position of playback relative to the entire recording. Not all sites or archives that include audio content offer this or similar functionality. The excellent educational site UbuWeb (http://www.ubu.com/), for example, has chosen to include a very simple player that only allows for play or stop functionality, with no visual display to indicate the total length or position of playback.

Figure 3 shows UbuWeb’s minimal audio player.


screenshot of the UbuWeb site
Figure 3: A screenshot of the UbuWeb site, which uses a very basic audio player with only play and stop functionality.


Waveform displays also allow for improved navigation and browsing. A user can click on different sections of the waveform in order to quickly move around from section to section, in order to hear non–adjacent parts of the recording. Waveform displays are used in digital audio workstation software because they allow a user to see characteristics of the audio that would otherwise be indiscernible. The waveform display allows a user to better understand the audio and, consequently, to interact with, edit or manipulate it in sophisticated ways.

In some circumstances, users with some experience with waveform displays can navigate the content without even listening to it. This is possible because changes in the volume of the recording typically show up as noticeable changes in the waveform, and often these changes in the waveform indicate different parts or sections in the recording. Figure 2 shows where different parts of a recording of a poetry reading map to changes in the waveform. Part A is a musical introduction featuring Indian music; in Part B things quiet down with singing interspersed with music; in Part C, George Bowering introduces Allen Ginsberg; finally, in Parts D, E, and F, Ginsberg recites separate poems. These different phases of the recording can all be recognized through the waveform display.

In order to more clearly indicate different sections of the waveform, Jason Camlot [60], principal investigator of the SpokenWeb research project, has suggested the idea of color–coding the waveform. Different colors could be used, then, to indicate different types of audio content (e.g., music versus poetry), different voices (e.g., the person giving the introduction, the main speaker, questions or comments from the audience) and different sections (e.g., different poems in a multi–poem reading).

An example of sound visualization that uses color to differentiate separate aspects of an audio recording comes from the Web site Mashup Breakdown (http://mashupbreakdown.com/), which is a media player designed by Benjamin Rahn that clearly show the component parts of musical mashups. A mashup is a song composed entirely of parts taken from existing recordings, often put together in unexpected ways. The site’s interface includes a fairly simple master timeline at the top and uses different colored blocks, stacked vertically and of varying lengths, to indicate when different samples (differentiated by different colors) enter and exit the main mix.

Figure 4 shows the Mashup Breakdown interface.


screenshot of the Mashup Breakdown site
Figure 4: A screenshot of the Mashup Breakdown site, which uses colored blocks to indicate when different samples enter and exit the main mix.


The site’s usability is greatly enhanced by the use of fading and highlighting; a user can always see all of the samples (represented by colored blocks) that make up the track; active (or sounded) samples are highlighted whereas inactive samples are displayed faintly, and are barely visible. The site’s design is ingenious: it is simple, highly usable, and very effective. The visual display allows a listener to better understand what they are hearing, and to help tease apart the component parts. This is especially helpful when there are multiple samples overlapping, as for example, during Girl Talk’s All Day album where there can be as many as five samples playing together at once. Further, it’s easy to see how this interface could be adapted for musical education in other genres. The different colored blocks could be used, for instance, to represent different instruments or musical themes entering and exiting a classical or jazz piece. Finally, the colored blocks could potentially contain waveform displays within them, in order to provide a slightly more detailed representation of the audio information in each sample.

The Variations Audio Timeliner is another simple, yet effective visual interface, which is part of the Variations project from the Indiana University Digital Library Program. The Variations Audio Timeliner features a basic black line to represent a linear timeline, with a grey marker that moves along the timeline to indicate the current position of playback. On top of the line, however, are colored bubbles that delineate different sections of a musical work:

Bubbles are the time–spans between timepoints on the line which may be used to represent the phrases, periods, or larger formal sections of music. The visual properties of bubbles can be edited to better show relationships between sections or to suit personal preferences. Bubbles may also be grouped hierarchically. [61]

Further, the application also features annotations that are timed to display when playback enters a specific bubble or section. Figure 5 shows the Variations Audio Timeliner interface [62].


screenshot of the Variations Audio Timeliner interface
Figure 5: A screenshot of the Variations Audio Timeliner interface, which uses colored bubbles to indicate different sections of a musical work, and also features annotations that are timed to display when playback enters a specific bubble or section.


These annotations are intended to be used by instructors so that notes about specific sections of the piece appear at the appropriate time, and are correlated with a visual breakdown of the music in order to improve a student’s comprehension of a musical work’s component parts and of how they interrelate — all while a student listens. Finally, it is not hard to imagine how a similar tool could be used for other purposes such as for literary analysis, or for deconstructing interviews or speeches.

There are yet other sites and tools that provide more sophisticated forms of sound visualization as well as more advanced features for manipulating audio playback.

The Enabling Access to Sound Archives through Integration, Enrichment and Retrieval (EASAIER) tool, developed by the EASAIER Consortium consisting of three companies and four academic institutions (EASAIER, 2008, Partners section), was designed to address a number of key challenges faced by sound archives. The challenge that is most relevant to the current section is: “[s]ynchronisation of media components for enriched access and visualization ” [63] To this end, the EASAIER tool provides both a waveform and a spectrogram display, a graphic representation of an audio signal that is often displayed in a graph with three dimensions: time on the horizontal axis, sound wave frequency (i.e., the physical correlate of pitch) on the vertical axis, and a third dimension to represent amplitude (often represented by color).

Figure 6 shows the EASAIER interface [64]. The waveform and spectrogram displays provide useful information about the amplitude (i.e., volume) and frequency (i.e., pitch) of an audio recording across time, and also allow the user to interact with the content in sophisticated ways:

  • “Sections of audio can be selected manually or automatically and looped seamlessly for learning or analysis purposes.”
  • “A source separation tool allows a user to listen to individual instruments within the piece of music while a noise reduction tool can be used to eliminate unwanted artifacts.”
  • “Time-stretching allows a user to slow down (or speed up) recordings, without modifying the pitch in real time.” [65]


screenshot of the EASAIER interface
Figure 6: A screenshot of the EASAIER interface, which is displaying both a waveform and a spectrogram for an audio track.


EASAIER supports access through a Web client and a separate client that needs to be downloaded. Unfortunately, many of the advanced features are only available in the full client (i.e., non–Web) interface.

Our final example comes from the RECITE poetry project, which “approaches the acoustic performance of a poem as an opportunity for developing a new mode of literary prosody” [66]. The RECITE team has experimented with open source speech analysis software named Praat (http://www.praat.org), which uses waveform, spectrogram, and pitch curve analyses in order to support phonetic analysis. The RECITE team used Praat to analyze speech patterns in audio recordings of T.S. Eliot’s “The Waste Land”. More specifically, they studied the “visual curves that represent pitch changes in his reading” in order to help “describe the formal significance of Eliot’s mode of poetic recitation.” Their Praat–assisted findings were that Eliot made heavy use of what Camlot refers to as a “drone” style of intonation when reciting his poetry, and that the RECITE team could “track different aspects of vocal variation in his performance by studying the pitch contours” [67]. Camlot notes that “a typical declarative sentence will have a high then falling intonation” but that Elliot’s reading has fewer pitch variations when compared with an actor’s (i.e., Robert Speaight) reading of the same work [68]. Figure 7 shows waveform, spectrogram and pitch curve visualization of Robert Speaight’s reading of “The Burial of The Dead”, whereas Figure 8 shows Eliot’s own reading of the same work. The mark up (i.e., in blue) highlights how Eliot’s intonation is much flatter than Speaight’s.


waveform and spectrogram visualization of Robert Speaight reading of The Burial of The Dead
Figure 7: A waveform and spectrogram visualization of Robert Speaight’s reading of “The Burial of The Dead”.



waveform and spectrogram visualization of T.S. Eliot reading of The Burial of The Dead
Figure 8: A waveform and spectrogram visualization of T.S. Eliot’s reading of “The Burial of The Dead”.


Ultimately, the RECITE team discovered that “in using the drone as a reading technique, Eliot was exploring non–intonational methods of articulating variation in his spoken performance of the poem” [69]. The RECITE team’s use of Praat suggests how sound visualization features such as a spectrogram displays and visual curve analyses might be useful Web–interface features for scholars who are using spoken word audio collections.

c. Accompanying videos and images

The Web is intrinsically visual, and offers many possibilities for scholarly enrichment through the development of thoughtful graphical user interfaces. Video, images and other visual features of a site’s design can enhance comprehension, appreciation and scholarship of audio content. Archives with sound recordings may also have other media that would complement the content of a Web–based audio archive. Some other content that could enrich an audio archive include photos of the poet; photos taken at the poetry reading that show the poet, other speakers and audience members; or ephemera from the readings such as promotional brochures or posters, clippings from newspapers, or programs.

Many audio poetry sites on the Web include a photo of the poet with recordings. Just as a reader might curiously flip to the back of a book to see the author’s photo in a book, so too might a listener wonder “What person goes with this voice?” and benefit from seeing a photo of the poet. Sites such as the Poetry Archive (http://www.poetryarchive.org) and Poets.org (http://www.poets.org/) are graphically oriented and feature photos of poets quite prominently. Listeners may wish to orient themselves while listening to a poem by looking at the poet who is speaking.

Another visual feature that might help users connect with the original recording media or the particular time period of the performance are photos of the actual storage media on which the analog recordings are stored. It is possible that the digitization and display on the Web of archival sound recordings can make the performance become disembodied, even though the listener might simultaneously and paradoxically feel that the recording allows for a particular kind of intimacy. Archivists might consider including photos of the original reels or audio cassettes on which the poetry readings were originally recorded. This may ground the user in the recording’s material origins, and allow for a deeper sense of the contents of archives. Since users may not have access to fragile analog recordings, the photos of the originals might increase the thrill of archival immediacy that so many researchers enjoy. We can see examples of images of analog recording formats featured on Bob Dylan’s Web site (http://www.bobdylan.com). Visitors of the site can preview Bob Dylan songs by selecting a song and seeing a vinyl record spin around as the song plays. Not only does this give a user something to look at while listening, but also the listener is connected to the format on which the majority of Dylan’s older albums were originally released.

Another example of combining an image with a digital recording online is on the Library of Congress’s National Jukebox site (http://www.loc.gov/jukebox/). On this site, when a user accesses a digitized recording, an image of the original physical medium is available. This is a successful visual means of grounding the user in the original artifact.

Images from the archive paired with recordings from the archive have the effect of working in concert to gesture towards a more fully realized and embodied archive, one that allows users to simultaneously engage more than one sense and have a greater sense of the event or moment that was captured, however crudely, on one or more medium. Due to the fragility of archival images and recordings, users often approach archives and special collections tentatively, if not reverently. Because such artifacts are often fragile or difficult to discover, users often encounter them one format or medium at a time, resulting in a fragmented or asynchronous experience of an event in time and history. With the development of flexible digital collections platforms, librarians and archivists increasingly have the potential to curate and present related archival materials together in a Web environment. The inclusion of images, digitized ephemera and documentation from poetry readings have both an aesthetic and documentary value that can add much to a Web–based audio archive that features recordings of poetry readings.

The same could be said for archival video footage, if it is available. Our experience of users and researchers suggests that video material is often highly sought after and appreciated. For researchers interested in the circumstances of a poetry reading, video footage would most likely be very welcome. Video footage could open up new ways of understanding poetic performance by giving users the opportunity to see the poet in action, to observe body language of the poet and the crowd, to see facial expressions and to observe movement in the performance. As Bernstein points out, “[poetry] readings are the central social activity of poetry. They rival publishing as the most significant method of distribution for poetic works” [70]. Given the scholarly importance of poetry readings and their place in the publishing world, we suggest exploring the possibility of including visually oriented archival material alongside any Web–based audio recordings.



Benefits of audio on its own

While we have touted the potential of visual features that could enable increased engagement with audio files of poetry performances, we wish to avoid giving the impression that such features are necessary or even desirable for all types or phases of research. After all, in signaling the research potential of specifically audio versions of poems, we acknowledge and foreground that the primary research activity that needs to take place is listening, most likely of a deep and attentive nature. Sometimes close listening may best occur when the researcher is looking at nothing in particular, or has his or her eyes closed in order to shut out other sensory stimuli. If poetry performances captured in audio recordings contain so much auditory information that can enhance our appreciation or understanding of poetry, then it follows that designers of interfaces should also be willing to accommodate a researcher’s need to simply listen, without undue visual interference. A Web site built for listening should not force someone to look at anything in particular.

Although many specific research tasks might be undertaken on a Web–based digital spoken word archive such as SpokenWeb, some activities may only be related to close listening and therefore too many visual features on the site could become a distraction. If a listener is listening for some of the following things, he or she may wish to avoid too much looking:

  • musicality and rhythm of the poems
  • background noises in the reading
  • audience comments during or after poems
  • to listen for breath, pauses or line breaks
  • just to understand the words (e.g., during early stages of research)
  • the character of the poet’s voice
  • the non–poetic speech (e.g., introductions, banter, explanations, jokes, question and answer period)

How much sensory multitasking do people do when they are listening very intently to something? Ought we to force a busy or overly stimulating interface on a user who is after all wishing to listen to poetry? We appreciate that users of Web–based audio may be undertaking a variety of listening tasks when a recording is playing, and some of these activities might call for the simplest possible visual environment so as to privilege the act of listening over any other activities. Given the amount of clues or information that might be contained in an audio recording of a poetic performance, a user should feel comfortable to opt out of an excessively visual experience.

For this reason, we have approached the SpokenWeb project in a spirit of “digital bricolage” (Camlot, 2011b) by assembling components and functionalities from different content management systems, Web sites and services in order to mix in different functionalities. This modular approach helps us envision an interface with features that can be added or subtracted according to a user’s preference.

Therefore, just as an ideal Web–based digital spoken word archive should provide useful features for researchers, a user of the system should also have the ability to deselect features and functionalities. In his Five laws of library science, Ranganathan (1963) introduced his oft–quoted principles for the organization, management and use of the information in libraries. Although the laws specifically name books as the units of information in question, we can easily translate the laws into a digital context, and even into the more specific realm of interface design for users of cultural materials such as audio recordings.

As introduced by Ranganathan, the laws of library science are:

  1. Books are for use.
  2. Every reader, his book.
  3. Every book, its reader.
  4. Save the time of the reader.
  5. A library is a growing organism.

Applied to interface design for archival audio recordings, we could perhaps say:

  1. Recordings are for use.
  2. Every researcher, their recording.
  3. Every recording, its researcher.
  4. Save the time of the researcher.
  5. A spoken word archive is a growing organism.




Given the scholarly potential of spoken word archives and the rapid development of Web technologies to support media, developers of sound archives should consider incorporating visual features to enhance the user experience. Written transcripts tethered to audio playback, sound visualization and accompanying video and images are promising features that could enhance sound archives. In fact, studies in various disciplines attest to the multimodal nature of learning which these features exemplify. It is also important to provide users with control over enhancement features. Developers should provide a site with clean design so unfettered listening can take place, and provide a flexible interface that allows users to select or deselect visual features. Finally, they should respect the needs of users who mainly just want to listen. End of article


About the authors

Annie Murray is Digital & Special Collections Librarian at Concordia University in Montréal, Canada. She is a collaborator on the SpokenWeb project: “SpokenWeb 2.0: Conceptualizing and Prototyping a Comprehensive Web–based Digital Spoken Word Interface for Literary Research”.
E–mail: annie [dot] murray [at] concordia [dot] ca

Jared Wiercinski is Digital Services & Outreach Librarian at Concordia University in Montréal, Canada. He is a collaborator on the SpokenWeb project: “SpokenWeb 2.0: Conceptualizing and Prototyping a Comprehensive Web–based Digital Spoken Word Interface for Literary Research”.
E–mail: jared [dot] wiercinski [at] concordia [dot] ca



A version of this paper was delivered at the conference of the International Association of Sound and Audiovisual Archives held in Frankfurt, Germany in September 2011.

The authors would like to acknowledge the Social Sciences and Humanities Research Council of Canada for their funding for the SpokenWeb project. They also want to thank Concordia Libraries and Concordia University’s Department of English for their generous support. Finally, they would like to acknowledge SoundCloud, and in particular Ben Fawkes, for both their financial and technical support.



1. Highlights of the Cisco VNI Usage Results section, § 4.

2. Bernstein, 2009, p. 964.

3. Bernstein, 2009, p. 963.

4. Furr, 2010, p. 12.

5. Eichhorn, 2009, p. 184.

6. Id.

7. Eichhorn, 2009, p. 186.

8. Eichhorn, 2009, pp. 186–187.

9. Eichhorn, 2009, p. 187.

10. Bernstein, 2009, p. 964.

11. Furr, 2010, p. 3.

12. Eichhorn, 2009, p. 188.

13. Eichhorn, 2009, p. 190.

14. Id.

15. “The Ear of the Confessor” section, § 1.

16. “The Ear of the Confessor” section, § 3.

17. Id.

18. Furr, 2010, p. 3.

19. Neisser in Sarter, 2007, p. 187.

20. Sarter, 2007, p. 190.

21. Theeuwes, et al., 2007, p. 196.

22. Kislyuk, et al., 2008, p. 2,175.

23. Id.

24. Smith and Mossier, 2008, p. 782.

25. Dowell and Shmueli, 2008, p. 782.

26. Sarter, 2007, p. 190.

27. Sarter, 2007, p. 191.

28. Theeuwes, et al., 2007, p. 197.

29. Theeuwes, et al., 2007, p. 205.

30. Sarter, 2007, p. 188.

31. Nikolic and Sarter in Sarter, 2007, p. 188.

32. Vines, 2005, p. 56.

33. Id.

34. Vines, 2005, p. 67.

35. Paivio, 1990, p. 55.

36. Paivio, 1990, p. 54.

37. Borrás and Lafayette, 1994, p. 71.

38. Cf., “Improving College and University Teaching,” English Journal and College English.

39. Ginsberg, 1940b, p. 134.

40. Ginsberg, 1940a, p. 291.

41. Mayer, 1997, p. 8.

42. Sullivan, et al., 2006, p. 219.

43. Bernstein, 2009, p. 963.

44. Middleton, 2005, p. 9.

45. Kenner in Swigg, 2007, p. 187.

46. Furr, 2010, pp. 16–17.

47. Furr, 2010, p. 12.

48. Asensio, 2003, p. 21.

49. Asensio, 2003, p. 30.

50. Id.

51. Barrett, et al., 2004, p. 1.

52. Barrett, et al., 2004 p. 6.

53. Id.

54. Id.

55. Id.

56. Barrett, et al., 2004, p. 10.

57. Breaden, 2006, p. 33.

58. For more information on the Radiolab Player demo, see these blog posts from the development team:

59. The source code for the project can be found here: https://github.com/maboa/Radiolab-Soundcloud-Popcorn.js-Demo.

60. Camlot, personal communication, 29 March 2011.

61. Indiana University Digital Library Program, 2008, “Using Timeliner” section.

62. For a demonstration of the Variations Audio Timeliner, see http://www.dlib.indiana.edu/projects/variations3/demos/timeline.html.

63. Damnjanovic, et al., 2008, p. 121.

64. For a demonstration of the EASAIER interface, see http://www.elec.qmul.ac.uk/easaier/index-3.html.

65. Damnjanovic, et al., 2008, p. 125.

66. Camlot, 2011a, p. 1.

67. Jason Camlot, personal communication, 5 October 2011.

68. Camlot, 2011a, p. 6.

69. Jason Camlot, personal communication, 5 October 2011.

70. Bernstein, 1998, p. 22.



Bobby L. Adams, 1994. “The effect of visual/aural conditions on the emotional response to music,” Ph.D. dissertation, Florida State University.

Mireia Asensio, 2003. “JISC User Requirement Study for a Moving Pictures and Sound Portal: The Joint Information Systems Committee, Final Report,” at http://www.jisc.ac.uk/media/documents/programmes/portals/mpsportaluserreqsfinalreport.pdf, accessed 28 August 2011.

Stevie Barrett, Celia Duffy, and Karen Marshalsay, 2004. “HOTBED (Handing on Tradition by Electronic Dissemination) Report,” at http://www.hotbed.ac.uk/documents/docs/FinalReport.php, accessed 28 August 2011.

Charles Bernstein, 2009. “Making audio visible: The lessons of visual language for the textualization of sound,” Textual Practice, volume 23, number 6, pp. 959–973.http://dx.doi.org/10.1080/09502360903361550

Isabel Borrás and Robert C. Lafayette, 1994. “Effects of multimedia courseware subtitling on the speaking performance of college students of French,” Modern Language Journal, volume 78, number 1, pp. 61–75.http://dx.doi.org/10.1111/j.1540-4781.1994.tb02015.x

Ian Craig Breaden, 2006. “Sound practices: On–line audio exhibits and the cultural heritage archive,” American Archivist, volume 69, number 1, pp. 33–59.

Jason Camlot, 2011a. “T.S. Eliot’s Pitch Curves: Digital Analysis of Literary Recordings,” paper presented on 22 April at the American Culture Association Conference (San Antonio, Texas).

Jason Camlot, 2011b. “Digital bricolage,” message posted on 4 May, at http://spokenweb.concordia.ca/blog/digital-bricolage/, accessed 4 October 2011.

Hao–Jan Howard Chen, 2011. “Developing and evaluating SynctoLearn, a fully automatic video and transcript synchronization tool for EFL learners,” Computer Assisted Language Learning, volume 24, number 2, pp. 117–130.http://dx.doi.org/10.1080/09588221.2010.526947

Cisco Systems, 2011. “Cisco visual networking index: Usage study,” at http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/Cisco_VNI_Usage_WP.html, accessed 17 August 2011.

Ivan Damnjanovic, Christian Landone, Panos Kudumakis and Josh Reiss, 2008. “Intelligent infrastructure for accessing sound and related multimedia objects,” paper presented at International Conference on Automated Solutions for Cross Media Content and Multi–Channel Distribution, AXMEDIS 08, pp. 121–126.

John Dowell and Yael Shmueli, 2008. “Blending speech output and visual text in the multimodal interface,” Human Factors, volume 50, number 5, pp. 782–788.http://dx.doi.org/10.1518/001872008X354165

EASAIER Consortium, 2008. “EASIER: Enabling access to sound archives: Integration, enrichment, retrieval,” at http://www.elec.qmul.ac.uk/easaier/index.html, accessed 23 August 2011.

Kate Eichhorn, 2009. “Past performance, present dilemma: A poetics of archiving sound,” Mosaic, volume 42, pp. 183–198.

R.J. David Frego, 1996. “The effect of aural, visual, and aural/visual conditions on subjects’ response to perceived artistic tension in music and dance,” Ph.D. dissertation, Florida State University.

Derek Furr, 2010. Recorded poetry and poetic reception from Edna Millay to the circle of Robert Lowell. Basingstoke: Palgrave Macmillan.

Walter Ginsberg, 1940a. “How helpful are Shakespeare recordings?” English Journal, volume 29, number 4, pp. 289–300.http://dx.doi.org/10.2307/805934

Walter Ginsberg, 1940b. “Recordings for high–school English,” English Journal, volume 29, number 2, pp. 134–140.http://dx.doi.org/10.2307/805509

Chih–Yuan Ho and Nadine B. Sarter. 2004. “Supporting synchronous distributed communication and coordination through multimodal information exchange,” Proceedings of the 48th Annual Meeting of the Human Factors and Ergonomics Society, pp. 426–430.

Indiana University Digital Library Program, 2008. “Variations Audio Timeliner,” at http://variations.sourceforge.net/vat/, accessed 23 August 2011.

Daniel Kislyuk, Riikka Möttönen, and Mikko Sams, 2008. “Visual processing affects the neural basis of auditory discrimination,” Journal of Cognitive Neuroscience, volume 20, number 12, pp. 2,175–2,184.

Karen Marshalsay, 2001. “Report on the user needs analysis sessions,” at http://www.hotbed.ac.uk/documents/index.php, accessed on 17 August 2011.

Richard E. Mayer, 1997. “Multimedia learning: Are we asking the right questions?” Educational Psychologist, volume 32, number 1, pp. 1–19.http://dx.doi.org/10.1207/s15326985ep3201_1

Harry McGurk and John McDonald, 1976. “Hearing lips and seeing voices,” Nature, volume 264, number 5588, pp. 746–748.http://dx.doi.org/10.1038/264746a0

Peter Middleton, 2005. “How to read a reading of a written poem,” Oral Tradition, volume 20, number 1, pp. 7–34.http://dx.doi.org/10.1353/ort.2005.0015

Allan Paivio, 1986. Mental representations: A dual–coding approach. New York: Oxford University Press.

Shiyali Ramamrita Ranganathan, 1963. The five laws of library science. Second edition, reprinted with minor amendments. Bombay: Asia Publishing House.

Nadine B. Sarter, 2007. “Multiple–resource theory as a basis for multimodal interface design: Success stories, qualifications, and research needs,” In: Arthur F. Kramer, Douglas A. Wiegmann, and Alex Kirlik, (editors). Attention: From theory to practice. New York: Oxford University Press, pp. 187–195.

R. Murray Schafer, 2011. “Open ears,” at http://www.acousticecology.ca/open-ears-by-r-murray-shafer/?lang=en, accessed 15 August 2011.

Sidney L. Smith and Jane N. Mosier, 1986. Guidelines for designing user interface software. Report ESD–TR–86–278. Bedford, Mass.: Mitre Corp.

Briana Sullivan, Colin Ware, and Matthew Plumlee, 2006. “Linking audio and visual information while navigating in a virtual reality kiosk display,” Journal of Educational Multimedia and Hypermedia, volume 15, number 2, pp. 217–241.

Richard Swigg, 2007. “Hearing is believing: The recordings of Williams,” William Carlos Williams Review, volume 27, number 2, pp. 187–193.http://dx.doi.org/10.1353/wcw.0.0018

Jan Theeuwes, Erik van der Burg, Christian N.L. Olivers, and Adelbert Bronkhorst, 2007. “Cross–modal interactions between sensory modalities: Implications for the design of multisensory displays,” In: Arthur F. Kramer, Douglas A. Wiegmann, and Alex Kirlik, (editors). Attention: From theory to practice. New York: Oxford University Press, pp. 196–205.

Bradley W. Vines, 2005. “Seeing music: Integrating vision and hearing in the perception of musical performances,” Ph.D. dissertation, McGill University.

Christopher D. Wickens, 1984. “Processing resources in attention,” In: Raja Parasuraman and David Roy Davies (editors). Varieties of attention. New York: Academic Press, pp. 63–101.


Editorial history

Received 5 October 2011; accepted 29 March 2012.

Creative Commons License
“Looking at archival sound: Enhancing the listening experience in a spoken word archive” by Annie Murray and Jared Wiercinski is licensed under a Creative Commons Attribution–NonCommercial–ShareAlike 3.0 Unported License.

Looking at archival sound: Enhancing the listening experience in a spoken word archive
by Annie Murray and Jared Wiercinski
First Monday, Volume 17, Number 4 - 2 April 2012

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.