The idea of a technologically realizable spoken word archive dates back to the introduction of the tinfoil cylinder phonograph in 1877. In his first predictive essay on the implications of his invention, Thomas Edison suggested that the new medium would be used to create archives of national import for commemorative use:
It will henceforth be possible to preserve for future generations the voices as well as the words of our Washingtons, our Lincolns, our Gladstones, etc., and to have them give us their ‘greatest effort’, in every town and hamlet in the country upon our holidays.1
The voice archive, in this early formulation, represents a kind of Arnoldian repository for the ubiquitous annual observance of ‘the best that has been said’ (if not thought). While the word ‘record’ (as noun and verb) in 1878 certainly suggested new possibilities for the nature and contents of the public archive, and for our means of capturing historical events, Edison’s imagined use of archived voices was still rather limited in scope.2 It was enough for the voices to be preserved on primarily dormant artefacts and revived to be heard on occasion for the archive to prove its value as a repository of national, communal belonging.
In a subsequent predictive essay announcing the 1888 ‘Perfected Phonograph’, Edison likened the markings of a phonographic recording to those found on ancient Assyrian and Babylonian clay cylinders, suggesting the new archival medium of the phonograph cylinder as a ‘progressive’, self-articulating version of these earlier artefacts:
It is curious to reflect that the Assyrians and Babylonians, 2,500 years ago, chose baked clay cylinders inscribed with cuneiform characters, as their medium for perpetuating records; while this recent result of modern science, the phonograph, uses cylinders of wax for a similar purpose, but with the great and progressive difference that our wax cylinders speak for themselves, and will not have to wait dumbly for centuries to be deciphered.3
While I have discussed this last quotation elsewhere in the context of early fantasies of the phonograph as a medium of discursive transparency,4 what I would like to highlight here is the stress placed in each of these essays upon the materiality of the audible artefact, and the event-oriented scenario of its use. A consideration of these categories of artefact and event represents a useful point of departure for a historically motivated theorization of the voice archive.
Neither of the terms, artefact or event, is simple to define in relation to sound phenomena. R. Murray Schafer distinguishes between the sound object and the sound event by identifying the former with Pierre Schaeffer’s clinical definition of l’objet sonore, a specimen of recorded sound considered ‘in physical or psychophysical terms […] without their semantic or referential aspects’, and the latter as ‘something that occurs in a certain place during a particular interval of time’ for which questions of ‘context’ apply. The ‘soundscape’, by extension, ‘is a field of interactions’ consisting of component sound events.5 Don Ihde, in his explorations of sound phenomenology, observes that ‘insofar as all sounds are also “events,” all the sounds are within the first approximation, likely to be considered as “moving”’. Ihde, in this instance, imagines sounds as divorced from visible objects, existing as auditory presences within the horizons of silence that surround them.6 In both cases, one might say that the sound object is identified with some primitive version of the audio signal itself, rather than with a material artefact that either produces or preserves it. Historical sound seems to present itself as a unique kind of artefact to the critic in that its non-visual presence may make it seem more ephemeral than other kinds of visible, material artefacts, like manuscripts, for example, and because its presence depends upon controlled temporal movement.
This last point is at the heart of Friedrich Kittler’s insistence that sound recording technology has had a transformative impact upon our relationship to the past. Stressing first the importance of the nineteenth century’s break with notational systems of transcription (for music, harmony, etc.) through its introduction of the concept of frequency — by which ‘the measure of length is replaced by time as an independent variable’ — Kittler goes on to state that, due to this transition from transposition to Time Axis Manipulation (enabled by the concept of frequency), ‘the real takes the place of the symbolic.’7 Technological media capture, preserve, and (re)produce aspects of the temporal event that the alphabet cannot, delivering the past without ‘the bottleneck of syntactical regimentation’ associated with symbolic language; hence, the replacement of the symbolic with the real.8 As it captures and produces ‘the real’ in its capacity to record a temporal event, the artefact that arises from technological media is mathematically and materially intertwined with the event it preserves. The fact that time itself becomes a variable that can be manipulated with technological media (you can speed up, slow down, reverse the direction of the record) suggests that our capacity to manipulate the media artefact not only enables us to process historical ‘real time’ to be experienced as a temporal event in the present, but to transform historical ‘real time’ into events of alternate temporal orders as well. When we are talking about sound recording, the intervolved relationship of artefact to event suggests the possibilities of replaying history and of making history.
In using the term artefact I am, to begin with, aware of its primary meaning referring to ‘an object made or modified by human workmanship, as opposed to one formed by natural processes’.9 The early spoken recording sits oddly at the crossroads of art and nature as both a crafted speech and a sound wave, an empirical phenomenon mechanically captured. Just what is the artefact of the sound archive, and how has it shaped our ideas and expectations of what such a repository should be? Where the early Edisonian idea of an archive of voices was selective and specific in its imagined motivations of use, contemporary capacities for digitization and media aggregation raise the possibility of expanding our own idea of a historical archive of voices from including ‘the best that has been said’ to ‘everything said that has ever been recorded’. My exploration of the questions posed above, in the present article, will proceed with an awareness of the digitally mediated reception of our regular experience of the archival signal, the audio that emanates, post-media migration, as if from some notion of an ever-expanding archive of voices. Proceeding with an awareness of the digital does not imply a circumvention of the original material artefacts in question; nor does it erase a critical motivation to locate historically the voices in this expansive archive — but it qualifies both the artefacts and our historicist motivations in significant ways. As the concluding section of this article will suggest, conceptual and computational manipulation of the audio signal as divorced from the original material artefact allows us to engage in new kinds of contextualization and expands our capacity for philological engagement.
Still, I will begin by presenting the names of some of the artefacts in play as if they have been arranged upon a table before you, to handle and examine: a phonautograph sheet, or phonautogram (c. 1866), a cut from a tinfoil audio cylinder (c. 1877), an Edison wax cylinder (c. 1888), a Berliner flat disc gramophone record (c. 1897), an aluminium instantaneous disc (c. 1933), a reel of quarter-inch polyester audio tape (c. 1966), an audio cassette tape (c. 1979), a compact disc (c. 1990), and an MP3 file (c. 1999).
This list presents a variety of physical substances ranging from paper, metal, paraffin, to plastic, and ends in the ironic fact of the MP3 as a new kind of entity, one that is less substantial than representational (one cannot hold an MP3 file, but one can handle it — select it, ‘click’ it — as a digital representation presented through software, a Graphical User Interface, and the specific hardware, computer, iPod, etc.) that delivers the GUI. Each one of these media artefacts, whether it is substantial or not — and its audible content — demands the practice of historicist audio forensics. That is to say, such media artefacts demand extensive archival (and geek-ival) research — the use of manuscript and print documents surrounding the sound that the artefacts hold, and the extensive, obsessive knowledge of historians and audio technology researchers about the particularities of the media formats and technologies that preserve those sounds. Increasingly, audio artefacts demand the use of digital tools that enable new possibilities for navigation, manipulation, visualization, and examination of the audio signal, and, in some cases, the sonification of visualizations derived from the original material media formats themselves. It will be the digital side of historicist audio forensics that I will focus on in this article, as I meditate upon the status for historical research of the artefacts that comprise the archive of voices at the present time.
In a recent interview in the new open access journal Amodern, and then again in his book, A New Republic of Letters, Jerome McGann argues that ‘what digital technology has exposed is not that we need a new program of humanities study, a Digital Humanities, but a recovery of philological method for our changed circumstances. Philology in a New Key.’10 By philology, McGann means a discipline of skills designed ‘to preserve, monitor, investigate, and augment our cultural inheritance, including the various material means by which it has been realized and transmitted’.11 Here is a longer quotation from McGann on the same point:
The need to migrate our cultural heritage to a digital condition has exposed the serious limitations in such an approach to the study of our cultural inheritance. The historical record is composed of a vast set of specific material objects that have been created and passed along through an even more vast network of agents and agencies. The meanings of the record — the interpretation of those meanings — are a function of the operations taking place in that dynamic network. Only a sociology of the textual condition can offer an interpretive method adequate to the study of this field and its materials. (A New Republic of Letters, p. 22)
McGann’s call for a philology in a new key entails a renewed application of practical skills, procedural and interpretive tools that include descriptive bibliography, scholarly editing, theory of texts, book history, and, I would add, media history and theory, to the new kinds of databases and interfaces we are creating. This is the big work that needs to be done, the kind of work that has motivated initiatives like NINES, and many of the individual Web projects that have been aggregated into that consortium. Rigorous engagement with the physical materials — volumes of The Yellow Book, artworks, ceramics, and realia on display in John Ruskin’s St George’s Museum, multiple copies of Blake’s illuminated books — that populate sites designed for their digital presentation results in well-researched glosses adjacent to the digital images of the materials. And quite beyond the informative gloss, such rigorous historical engagement results in thorough metadata, and, consequently, expands the ways in which we may remodel and investigate our understanding of historical cultural artefacts. Online, we do not touch paper, canvas, or clay. We organize richly prepared simulacra of the artefacts we imagine to be a part of our rational archive within an environment that exposes them to transformative imagination.
In the sense I am describing it, the digital ‘archive’ is not to be understood as a preservation medium, but as a transformation medium that opens texts and material artefacts to new contexts, new interpretations, and new transformative uses. This work is not a replacement for more empirical methods of writing history, but importantly makes us aware of the significance of media contexts for the kinds of history we wish to pursue, and the kinds of assumptions about the status of the artefacts we wish to contextualize. It also allows us to discover aspects of our objects of inquiry that might not have been discoverable had they remained in a single media format.
To approach texts and objects in the digital environment is to encounter the destabilizing material element inherent in all cultural artefacts, which encourages an approach to research materials as ‘differential texts’, a term introduced by Marjorie Perloff to mean ‘texts that exist in different material forms, with no single version being the definitive one’.12 On the cue of media theorist Darren Wershler, I transmute Perloff’s term and support a claim for the concept of ‘differential media’ as one that demands our awareness of the transformative impact of media contexts as an object of interpretation migrates across, or exists multifariously within, different media platforms.13 The idea of differential media seems particularly well suited to a consideration of a historical sound recording that can be said to exist, uniformly (‘that Tennyson recording’) yet differentially, on cylinder, LP, audio cassette tape, and as an MP3 file (not to mention as a textual transcription or visual representation). There are other concepts with currency that might be considered for our purposes. For example, the idea of media specificity in the sense that Katherine Hayles has used the phrase to delineate the characteristics of media environments in relation to the development of digital texts — the movement ‘from the language of “text” to a more precise vocabulary of screen and page, digital program and analogue interface, code and ink, mutable image and durably inscribed mark, texton and scripton, computer and book’ — while certainly useful for our thinking about the text on page and screen, is somewhat too visual in its orientation to serve the present meditation on the archive of voices.14 Similarly, Jay David Bolter and Richard Grusin’s elaboration of the concept of remediation — while extremely useful for, among other things, its observation that ‘no medium […] can now function independently and establish its own separate and purified space of cultural meaning’ — seems similarly biased towards the visual arts and visual media: painting, computer games, film, photography, television, etc.15
The idea of differential media can better accommodate Kittler’s sense of the continuity between technological media and digital media as differential (yet continuous) forms of data processing, so that the wax cylinder and the MP3, the phonograph and the DAW (Digital Audio Workstation) can be conceptualized, at one level, as engaging in the same processes of temporal manipulation, but through different media formats and interfaces, and with significantly variable degrees of transformative impact upon the audible temporal event. Further, the idea of differential media is also useful, I think, for considering the possibilities and implications of migrating our own digital interventions, our digital ‘archive’ projects, maps, databases, and visualizations back into print or other media formats — perhaps even audible formats — and for imagining new forms of scholarly production that are not exclusively concerned with designing websites.
We are all engaged in acts of digital creation and transformation on a regular basis, although we are not always deliberately conscious of this fact. Cultural analysis of the implications of our emailing, googling, blogging, texting, word processing, and digital editing for our work as literary scholars and historical researchers is well underway. For example (one that is specifically relevant to Victorianist researchers), Catherine Robson’s ‘How We Search Now’ asks ‘what it means for our scholarship that our two distinct modes of searching [material v. digital] link us to different archives, require different reading and analytical practices, and produce markedly different results’. Her essay proceeds to provide rich description and documentation of specific processes that inform the relationship between two increasingly intertwined modes of research and discovery.16 On a broader theoretical scale, much significant work that can help us understand the import of how we search, write, think, and hear, etc., has emerged from media historians like Lisa Gitelman, Jonathan Sterne, Ian Bogost, Matthew Kirschenbaum, Jussi Parikka, and Wolfgang Ernst, who have developed concepts of comparative ‘new’ media, ‘format studies’, ‘platform studies’, and media archaeology, respectively.17
All of these cultural theorists and historians are working to articulate the presuppositions we bring to our new media environments, to have us see the historical underpinnings of what we suppose. This is part of the serious work that needs to be done, and can be identified as functioning in line with McGann’s call for a ‘Philology in a New Key’. And then there is a somewhat more ludic, experimental kind of digital engagement that I would argue also has an important role to play in the present recalibration of our understanding of what we, as literary and cultural historians, can do in relation to our changing media environment. This kind of digital engagement — let us call it ‘digitalling around’ — can function as a useful and revealing method of experimental critical play. As Jussi Parikka has argued:
Play is important when understood as part of didactics — the hands-on approach that allows us to try, to have tactile contact with, to touch and open media […]. Despite the fact that technical media often work in subphenomenal ways — in other words, their principles of operation are not directly open to observation by the human eye — such a manner of tinkering with media-technological effects forms a circuit with the theoretical work.18
Two main branches of digital work in the humanities have been those pertaining to ‘humanities computing’ (tool building, text analysis, and encoding) on the one hand, and ‘new media studies and design’ (often pursued by theorist-practitioners interested in exploring the nature and implications of new media) on the other. In a fairly recent and still extremely relevant article on ‘The State of the Digital Humanities’, Alan Liu frames his analysis of the value and potential of both branches just mentioned in terms of their relative degrees of critical awareness. Observing, first, that the expanding domain of digital humanities must ‘in some manner, for better or for worse, […] serve the postindustrial state’, Liu ultimately decides that ‘the digital humanities are not ready to take up their full responsibility [within the discipline] because the field does not yet possess an adequate critical awareness of the larger social, economic, and cultural issues at stake’. 19 Historicist audio forensics is concerned with the historical recontextualization of an analogue signal and its related media so that we can gather the aesthetic, social, and material relations that informed their use and meaning. As will soon become clear, digital historicist audio forensics can be deployed to this end as well.
In what follows, I will rehearse a short series of examples that shows how digital tools and processes can be used to amplify and transform our historical understanding of analogue media and sounds, and how digital processes entail a conceptual conversion of the material. In all cases, digital historicist audio forensics entails a combination of researched (neo-philological and documentary) contextualization, and of digital manipulation. It is through such synthesis of ‘old’ archival research practice and ‘new’ digital design and development-oriented experiments that the historical archive of voices is conceptualized, designed, and realized in the present.
‘A thing is when it isn’t doing’, Brian Massumi writes, in italics, as a starting maxim for thinking the category of motion back into critical practice.20 Technically speaking, a recorded sound, an audio record in analogue form, is always doing something because it is always in motion. It is a concrete example of Zeno’s virtual arrow in flight. Its resonance resides in its existence as a non-stop operation. Even a piece of sound (a sound ‘bite’) must be in motion to be audibly perceptible as a thing. Without motion there is no sound. As Ihde has speculated, sound — with its intrinsic movement, and its affirmation of the verb over predicate, represents a challenge to the ‘realm of mute objects’ that has functioned as ‘the implicit standard of a visualist metaphysics’ (p. 50). What might a static sound memento look like, and what could such a memento mean as a historical artefact? Lisa Gitelman has pursued these questions in her account of the souvenir foils that were cut from tinfoil recordings made at early demonstrations of Thomas Edison’s 1877 tinfoil phonograph (Fig. 1). The foil cuts were handed out to those who had ‘witnessed’ the demonstration of what Edison’s new invention could do:
Phonograph exhibitors ran through pounds of tinfoil, and audiences scrambled for keepsakes. In their sonic ‘capture’ and later, in their mute evocation of public experience, pieces of tinfoil in private hands formed souvenirs of immense power. They were belongings that vouched for belonging. They were artifacts that vouched for facts.21
The cut of souvenir recording foil functioned as a static object standing in (mutely and still) for a public event that inevitably escaped those who were present at its unfolding. Sound is always, obviously, ‘tinged with event’ (Massumi, p. 11). The visible grooves and bumps of a cylinder foil (for instance) are a static, material manifestation of a sonic event. Once cut up into a keepsake scrap, the souvenir foil, a piece of something that once had the potential to reproduce the eventfulness of the sound demonstration, now took its metonymic relational position as ‘object to event/experience’, to use a formulation from Susan Stewart.22 Unable, like an object in an Anglo-Saxon artefact riddle, to speak for itself, the object that stands (still) in relation to an event/experience is static, incomplete, and secondary. As Stewart explains, ‘the souvenir must remain impoverished and partial so that it can be supplemented by a narrative discourse, a narrative discourse which articulates the play of desire’ (p. 136).
But something happens when the analogue artefact is approached with a digital process. Here is a fragment of phonographic tinfoil captured by the University of Southampton’s Sound Archive Project researchers as a high-resolution 3D digital scan, performed for preservation purposes, and ultimately for digitally generated replay by a virtual stylus (Fig. 2). They use a process that performs a precise coordinate mapping (including the calculation of depth) of the cylinder’s surface, from which they can generate a quantitative analysis of signal reproduction.
The digital process works to render the material fragment eventful and potentially whole again. Assuming the possibility of artefacts as abstract, non-material entities — for example, the way a thesis exists as a necessary, universalizing artefact for philosophical discourse and the production of certain kinds and movements of knowledge — I would suggest that the analogue sound recording, when approached through digital processes, moves from a delimited, manifest material thing, to an eventful performance of the abstract ideal of that material thing.
Stewart’s metonymic image of how object relates to event, and its binary conception of silence as static and sound as kinetic, is reversed when we move from analogue models of sonification to digital ones. This is most dramatically apparent in my second example, the recent sonification through digital scanning of Édouard-Léon Scott de Martinville’s phonautograph transcripts (Fig. 3).23 Scott’s phonautograph was introduced in the 1860s as a way of writing sound, rendering it visible, and (ideally) legible. It was not designed for sound reproduction. Thus, from Scott’s perspective, a phonautogram was not a sound recording in the way we understand that category today. Rather, it was a graphically captured sound inscription; a machine-generated audio fossil — a static artefact bearing materially the trace of a once living sound event. The First Sounds researchers have reanimated these artefacts into actual sound events again by taking high-resolution digital scans of the phonautograms and playing them back (with a lot of adjustment and preconception) via digital sonification.24 One example of this work is the sonification of Scott’s voice singing the children’s song ‘Au clair de la lune’. The process used to generate such a recording represents an audible rendering, through new media, of a visual script of voice that was never intended to be heard and is, in effect, a digital reversal of Scott’s intention. Where he wanted to turn sound into a visual form of data, the First Sounds researchers have turned Scott’s visual data back into sound. And thus, the potential for a new historical sound archive of phonautograms is born, where once there were only silent sheets of smoke-blackened paper.
The process that allows for such a creation is an innovative form of digital historicist audio forensics that Patrick Feaster, folklorist and First Sounds researcher, calls paleospectrophony, the goal of which is to use digital tools to convert waveform images of different kinds into playable frequency and amplitude values, and then to play those values for us to hear in the present.25 As Feaster describes the process on his Phonozoic website that is ‘dedicated to the history of the phonograph and related media’:
Paleospectrophony is the name I’ve given to a distinctive application for reverse Fourier analysis software such as Coagula, AudioPaint and MetaSynth’s Image Synth. Each of these programs can ‘play’ any digital image as though it were a spectrogram: one axis is time, the other is frequency, and the intensity of individual pixels corresponds to amplitude. Paleospectrophony is the use of this type of software to ‘play’ old inscriptions of sound.26
The procedure works to convert waveform images into playable time, frequency, and amplitude values and represents a creative form of Time Axis Manipulation as Kittler has described it. While originating from a material visual object, the audio artefacts generated from this technique are born digital and thus represent a digital realization of the abstract idea of the phonautogram as an audible artefact. When we listen to such a sonified recording, one might say, we are listening to the sound of a conceptual artefact.
A variation on the two examples of digital processes applied to material artefacts that I have just described — those of digital restoration and sonification — proposes the possibility of playing the preserved, analogue sound artefact digitally, without ever playing it acoustically. I am referring to Alfred Clark’s early twentieth-century voice archive project, referred to as the ‘Museum of Voices’ — a collection of arias sung by ‘the voices of the greatest singers’ of the day, and then preserved beneath the Paris Opera in a time capsule with instructions for future use. As this procedure of preservation and use was reported in the Literary Digest of 1906:
The records themselves, which have been made on specially prepared plates, have been enclosed in hermetically sealed metal boxes containing a chemical compound to protect them for future years. These boxes have been engraved with the date upon which they are to be opened — one in fifty years, another in one hundred years, and so on, the dates having been chosen to conform with the musical festivities which will undoubtedly take place at that time.27
One hundred years later, EMI, heir to the Gramophone Company that made the recordings, reissued a three-CD set of the buried operatic voices called Les Urnes de l’Opéra (Treasures from the Paris Opera Vaults). What is interesting about this digital reissue of once entombed analogue artefacts is that the audio files delivered on the CDs have not, in fact, come from the flat disc records that were stored in asbestos-covered cloth and long-since shattered glass plates, but from the Bibliothèque nationale’s extensive pre-1938 collection of recordings. (Note that flat discs, unlike cylinders, were particularly useful for reproduction from a master.) The reasons for not using the Clark-stored recordings have to do with more than just the dangers of handling asbestos and broken glass. It was discovered, upon opening the vault in 2007, that about twenty-four of the recordings had already disappeared (stolen, apparently). And then it was decided by the project leader, Elizabeth Giuliani, that it would be better to use ‘identical’ surrogate recordings from the national library even for the recordings that they did have from the vault, so as not to compromise the material integrity of the discs that Clark stored over a century ago. As Giuliani explained, ‘even if these old records are played once, they are slightly damaged by the needle. We decided to await new optical technologies that can read them without touching them.’28 The digital reissue of voices from Les Urnes de l’Opéra does not actually provide voices captured on the exact material artefacts from the Opera’s vaults.
This digital version of Clark’s time capsule is part surrogate and part digital stand-in for twenty-four phantom recordings of the Opera that once were materially lodged in a tomb. Thus, the digital process, in this instance, simultaneously reinforces the idea of the material repository of sound and the sacred nature of its artefacts, even as it works quickly to lock them back up in the vault again so it can disseminate and play those sacred artefacts’ audible content without the fear of their actual, material destruction, and, ultimately, without the necessity of their physical presence.
Many (I would say, most) early analogue sound recordings are only available to us to listen to today because they have been digitized. The audible signals we come to know have been processed through an analogue-to-digital converter. This is the first step in any exploration of how other digital tools may be used to assist in a historical and formal analysis of early sound recordings. Very often, this is not the first media migration that the original acoustic cylinder audio signal has gone through. Numerous early cylinder recordings were reissued on flat disc, then, over time, have been preserved (usually in material archives) through migration to magnetic tape, then to any number of possible digital file formats. The arrival into digital does not end the migration process, as hardware and file format conversions are an ongoing reality for any fugitive acoustic cylinder or early flat disc recording signal. Ten years ago, archives would preserve digitized audio on archival quality CDs, usually in WAV format. Today, they tend to keep them on larger storage drives, often in both WAV and MP3 formats. And since hard drives do not live forever, the migration will continue so long as we care about these semi-tangible artefacts of cultural heritage.
A key point to keep in mind in the course of making such seemingly obvious observations is that ‘format’ as a concept represents a useful, even crucial, historicist preoccupation. As Jonathan Sterne argues:
Most crucial dimensions of format are codified in some way — sometimes through policy, sometimes through the technology’s construction, and sometimes through sedimented habit. They have a contractual and conventional nature. The format is what specifies the protocols by which a medium will operate. (p. 8)
The codification of formats inform use and, consequently, social, cultural, and aesthetic meaning. In considering the audio signal, format (and our historical understanding of formats) may help us to understand just what, exactly, we are listening to, what we are dealing with, and what we might want to pursue in terms of research, interpretation, and critical methodology.
Take the simple example of the differences in frequency range associated with early acoustic cylinder recordings and early electrical recordings (recordings that transduce the acoustic signal into electrical pulses and then reconvert them into acoustic sound waves on output). Below is a quickly generated frequency analysis (using the open source software Audacity) of a very early acoustic sound recording (Fig. 4):
The sound of this recording and the results of the frequency analysis match the audio characteristics associated with this early recording format. The majority of its frequency response is right up the middle at about 1000Hz (so not much of a dynamic range — little bass and treble, just mid-range frequencies). It has a relatively low amplitude, overall, and an audibly poor signal-to-noise ratio.
Another recording, allegedly from the same period, Walt Whitman reading the poem ‘America’, shows different results in this basic frequency analysis (Fig. 5).
This ‘Whitman’ recording was apparently first played on an NBC radio show in 1951 (so, if authentic, may have been captured from the air with an aluminium instantaneous disc recorder, or an early reel-to-reel tape machine). It then showed up on a tape of early spoken recordings issued in 1974 and was ‘rediscovered’ as a cultural object of interest by Larry Don Griffin when he mentioned it in an article about Whitman and ‘voice’, published in the Whitman Quarterly Review.29
Whitman scholar Ed Folsom, editor of the Whitman Quarterly Review, wishes to believe the recording is authentic (who doesn’t, really) and has focused on the signal-to-noise ratio of the recording to support the claim for its authenticity. Audio technicians at the Library of Congress have argued that the recording is too well equalized, and presents too great a separation of the voice signal from the surface noise of the cylinder that would have formed a part of the overall signal, the surface noise in this case sounding like it ‘is pretty much in the background’.30 Folsom cites the opinion of Dave Beauvais, an expert on 1890s sound recording technologies, to rebut the argument of the Library of Congress technicians, noting that vertically cut Edison cylinder recordings ‘exhibit this superlative richness, balance, and freedom from distortion in the lower and middle portions of the audible spectrum’ and consequently present ‘near-perfect equalization’ in their delivery of an audio signal. Therefore, according to Beauvais, the known frequency range capacity of certain ‘vertical-cut artifacts’ supports the argument for this recording’s authenticity (cited in Folsom, p. 215).
Digital frequency analysis may suggest otherwise — since the frequency range shown even in my fast (and inconclusive) Audacity analysis indicates the likelihood that a condenser microphone, used in electrical recording processes, may have been deployed in capturing this speech. But even recognition of the overly wide frequency range found in the Whitman recording is complicated by the possible migration history of the recording itself (possibly, from cylinder to instantaneous disc or reel-to-reel tape, to cassette tape, to WAV and MP3 files). We simply do not know what kinds of audio enhancement may have been performed to boost or transform the signal during the process of migration. While the findings of a thorough frequency analysis of the Whitman recording — should someone perform one — may never be completely reliable for deciding upon the recording’s authenticity, such digital analysis does reliably lead us to face and consider questions of media migration and interface, and that, I am suggesting, is a very useful thing. In the meantime, the audio signal of what may or may not be Walt Whitman’s voice is included in the audio domain of the Walt Whitman Archive.31 The 36-second recording does not exist to our knowledge in its original wax cylinder format. What we have is a signal that may, or may not, have been preserved through media migration, and the ability to speculate about the status of this signal as an actual archival artefact — perhaps an Edison vertical-cut cylinder — that has been divorced from its original, material medium. Digital visualizations of audible frequencies paint a picture of our desire to substantiate the fugitive signal as materially artefactual and historically authentic.
Visualizations generated from the digitized audio signal can be used in a great variety of ways, beyond the need for preservation or authentication. I will close this exploration of the status of the artefact we identify with an archive of voices with a few examples from a larger set of experiments I have been performing on early acoustic and electrically recorded performances of Alfred Tennyson’s ‘The Charge of the Light Brigade’: namely, recordings of the poem made by Victorian actors and elocutionists Canon Fleming, Lewis Waller, and Henry Ainley.32
Some overarching questions in this case ask: What might we (literary and cultural historians) do with such recordings? How might we visualize and describe the prosody of literary performance? How can digital tools help us understand Victorian elocutionary practice? Like Patrick Feaster, sonifying previously unheard visualizations of sound, we can experiment with software to explore the possibilities of sound visualization for new kinds of engagement with historical spoken recordings.
As we know, modes of recitation that combined prescribed vocal actions with gesture and facial expression were once an important part of the experience and study of literature. Elocution as a prescriptive, performative practice developed in the eighteenth century as a method for a public reader (other than the author) to convey to the hearer the meaning of the writer. Elocution was, in the words of John Rice, as he put it in his work An Introduction to the Art of Reading (1765), a method for ‘converting Writing into Speech’.33 The process of this conversion involved a self-conscious performance of natural expression. As Jacqueline George has put it, in elocution, ‘the reader must be at once self-consciously constructed and perfectly natural, adhering to the proper rules for reading — pronunciation, pitch, pauses, gestures — without revealing his reading to be a performance, as such’ (p. 374). The reader, in short, attempts to function as a good (that is to say, natural, immediate) vehicle of delivery between text and audience. In this sense elocution is all about enacting a sound interface. It seems counter-intuitive to us today, when we examine the pages of elocution manuals, with their extensive categories and instructions for vocal manipulation, bodily gesture, facial expression, and symbolic systems of annotating texts for performance, to think of them as handbooks for the naturalization of print. We are estranged in significant ways from understanding what an elocutionary model of reading meant in the nineteenth century: what it meant in relation to print media and written composition, social decorum, and the faculty psychology that informed the particular kind of communication circuit it entailed.
While Victorian elocutionary actions may no longer have the desired emotional effects upon us now (instead of awe we feel either embarrassed or amused when we hear them), a mapping of such performed speech effects as visualized in a sound recording, in relation to the psychological attributes commonly associated with them in elocution manuals, can help us decipher, to some extent, what a Victorian elocutionary performance was attempting to achieve. This kind of visual mapping may not help us immerse ourselves in the reciter’s affective motives — even with the most extreme attempts at achieving the kind of ‘critical sympathy’ Jerome McGann calls for in books like The Beauty of Inflections and The Poetics of Sensibility — but it may help us to understand how we were supposed to feel and respond to the inflections that we hear.34 In effect, a highly mediated engagement with the audible signal, freed from its original material format and consequently rendered immeasurably transformable, may allow us to experience and comprehend the artefacts of the Victorian archive of voices with greater depth and contextual specificity.
The exploration process I am describing is highly mediated in terms of format, tools, interface, and disciplinary methodology (among other mediating factors). With this set of explorations, I am proceeding from the assumption that linguistics as a discipline, and some of the digital tools used by linguists for phonetic analysis, can help literary scholars develop methods for approaching literary recordings. One tool I have been experimenting with is called Praat, an open source speech analysis software designed by linguists to ‘do phonetics by computer’. Praat provides a means of visualizing and analysing a variety of properties of speech, at various granularities, with highly accurate results.
The Praat software is not a solution for all forms of literary analysis unto itself as it has some significant shortcomings — the need for extensive familiarity with the details of acoustic analysis, an interface that cannot provide ‘distance’ views of the audio signal but is restricted to the analysis of short audio segments, to name just two. Praat is a tool that evokes possibilities of how we might immerse ourselves in the signal of the historical archive of voices in order to find new formal and prosodic properties, but it is specialized and ponderous enough to inhibit any sense of transparency in the pursuit of such immersion.
In using Praat as a way into thinking about historical recordings, I am using a tool from a discipline other than my own (a tool that embodies the assumptions and ideology of that discipline) against the grain of its own design. But I am also taking advantage of the descriptive vocabulary for speech prosody that the discipline of linguistics has developed. Linguistic information such as intonation, stress, phonemes, words, and phrases are not directly measurable in the acoustic signal. These are abstract concepts from linguistics that have to be found there. So, in using software like Praat, we are looking for acoustic correlates (defined by an academic discipline) in a digital visual representation of the sound wave. These are abstract concepts from linguistics (imposed by the listener) that we use to annotate (mark up) the waveform representing the audio signal, and then attempt to compute and render visually for the purpose of analysis. Speech analysis software can tell us about the measurable, acoustic properties of a speech signal, but can only give us clues about linguistic or expressive information conveyed by the signal. So, in saying I am interested in examining an elocutionist’s pitch contours, for example, I am actually saying that I am interested in discovering what a computer rendering of an isolated linguistic concept like ‘pitch’ might reveal about how Victorian trained elocutionists chose to read Tennyson’s poetry out loud.
The annotation approach I have taken with these early recordings of ‘The Charge’ has been to work quite directly from categories of vocal action as described in a large sampling of elocution manuals published between 1800 and 1922, and then to apply certain ubiquitous categories to speech effects heard in the recordings themselves. My analysis of these poems using Praat has thus far been limited to some basic prosodic details, roughly divided into the three overarching categories of Pitch, Duration, and Amplitude (Fig. 6).
Again, such categories are ultimately abstractions of far more complex phenomena. Being quite aware that these are abstractions, Praat is built in a way that such qualities can be tracked in isolation from each other, using Interval Tiers. Each phrase and the spaces between phrases constitute an interval. The first tier displays the text of the poem, and the subsequent tiers are used to annotate phrases that demonstrate speech effects identified with the categories of Pitch, Duration, and Amplitude.
Take, for example, the prevalent use of a combination of tremor (vibrato) and prolongation (holding or extending a uniform vocal sound) in Waller’s delivery of the lines ‘Cannon to right of them’. We can see the vibrato very clearly in Waller’s pristinely squiggly blue pitch contours (Fig. 7).
Waller delivers the vocal action here as if to represent the fury of the cannons as they are firing. But, in fact, Waller deploys vibrato almost all the way through his reading, giving the entire performance a frantic and relentless feeling of distress. This is an unorthodox use of the effect (according to Victorian elocution manuals), where vocal tremor is meant to be used sparingly to express ‘the condition of suffering, grief, tenderness, and supplication’.35 Fleming and Ainley, on the other hand, in accordance with the advice of the manuals, use vocal tremor very sparingly, in fact, only at what is arguably the final and penultimate stanza for the expression of tenderness and grief, that asks when the soldiers’ glory shall fade and then apostrophizes the charge, before demanding that the noble six hundred be honoured:
When can their glory fade?
O the wild charge they made!
All the world wonder’d.
Honour the charge they made!
Honour the light brigade,
Noble six hundred!36
Note how the line in bold, above, is performed by Fleming and Ainley (Figs. 8, 9). In each case, the technique of vocal tremor (or vibrato) is implemented uniquely at this important moment in the poem.
Vocal force, apparent most obviously in the peak amplitudes of the waveform itself, is a rich and complex category to which Victorian elocution manuals give a great amount of attention. In one sense this category of vocal action pertains directly to the contextual factors of audience and venue. It is a scaled property to be applied in proportion to the size of the auditorium (50, 500, 5000 people), as Robert Fulton, author of Practical Elements of Elocution, explains (Figs. 10a, 10b).37
Fulton develops numerous categories to describe the qualities and gradations of the many possible degrees of force. I will not pretend to understand all of the hairs the elocution manuals are attempting to split with their different qualifications of force, but what seems clear in Fulton, and in quite a few other elocution manuals, is that a sharp command, such as that in the line ‘Charge for the guns! he said’ demands loud or explosive force from the reader.38 I have annotated the words that are spoken, proportionately, the loudest, in each of the recordings, simply, with the tag ‘Force’, and each of our three readers (Waller, Fleming, and Ainley) fulfills the mandate of the elocution manuals, as far as the prescription of substantial force for a sharp command goes (Figs. 11, 12, 13).
One thing worth noting in the delivery of this command, and most prominently evident in the performances of Fleming and, especially, Ainley, is the difference in quality between the explosive command and the narrator’s attribution of that demand with the words ‘he said’ (Fig. 14).
Canon Fleming’s idea, articulated in The Art of Reading and Speaking (1896), that ‘the standard of good speaking is to express one’s self, just as one would in earnest conversation’, that is to say, his idea of reading naturally, entails a recognition in performance of the distinction between the voice of the narrator and the voice of the speaker within the poem.39
When Fleming and Ainley read the line, ‘Charge for the guns! he said’, we do hear such a distinction very clearly. While we do not see many broken pitch curves of the kind I have isolated here in Victorian elocutionary recordings, when they do appear we know we are looking at the complexities of pitch variation linguists identify with natural or conversational speech intonation.40 Natural speech intonation appears as broken lines and fluctuating frequency patterns because conversational, semantic-oriented speech is typically characterized by dramatic changes in pitch, enacted on a constant basis for the purpose of semantic and emotional intelligibility.
The deeper we immerse ourselves in such highly mediated representations of the signal — of Victorian voices — the more familiar we become with the elaborate, conceptual properties of the artefacts of the historical voice archive. In a paradoxical way, the further we move from the material artefact itself, the closer we come to touching its seemingly intangible, historically entrenched features. I will close my discussion on this important point as it highlights, again, the capacity of digital media to render our abstract concepts artefactual, even as they remediate material artefacts into sample bits.
This paradox of proximity allows us to imagine other possible modes of engagement with the historical corpus of literary readings (digitally sampled from all possible previous sound media platforms), for example, one that allows new media technology to exercise its own agency of discovery, or what is known as unsupervised learning processes. Unsupervised learning seeks to find hidden structures within data (in our case, digitized audio) that has not been marked up for the detection of particular properties (in the manner that I have annotated my tiny data set of Victorian elocutionary recordings for specific manifestations of pitch, duration, and amplitude). This conception of critical ‘listening’ (in quotation marks) moves away from discursive contextualization, historical narrative, and even traditional metadata, into an algorithmic reading of the sound signal itself. It is a vision that has been articulated elegantly by the media theorist Wolfgang Ernst who argues that the archive ‘is no longer simply a passive storage space but becomes generative itself in algorithmically ruled processuality’ (p. 29).
Such a ‘cool’ conception of media, and the anti-discursive, anti-narrative model of historical description and archival structure that it supports, is one among several that I am interested in exploring in relation to the conception and organization of audio poetry archives. It is important, I think, to locate the history of voice recordings within the discursive protocols that informed those historical voices, but it will also prove to be illuminating to explore the resemblances, connections, and reorganizations of the audible archive that digital media enable, that, in Ernst’s parlance, are ‘media inherent’ (p. 29). By engaging in both narrative and discursive as well as ‘media inherent’ methods of engagement, and by articulating the appositions we, as historical scholars, perceive between them, we can benefit significantly from the differential media structures that inform and will increasingly shape our attempts to understand the archive of historical voices.
All hyperlinks within the article text were accessed on 10 October 2015.
1Thomas A. Edison, ‘The Phonograph and its Future’, North American Review, May–June 1878, pp. 527–36 (p. 534).
2Lisa Gitelman, Always Already New: Media, History, and the Data of Culture (Cambridge, MA: MIT Press, 2008), p. 106.
3Thomas A. Edison, ‘The Perfected Phonograph’, North American Review, June 1888, pp. 641–50 (p. 645).
4Jason Camlot, ‘Early Talking Books: Spoken Recordings and Recitation Anthologies, 1880–1920’, Book History, 6 (2003), 147–73.
5R. Murray Schafer, The Soundscape: Our Sonic Environment and the Tuning of the World (Rochester, VT: Destiny Books, 1994), pp. 129–31 (first publ. as The Tuning of the World (New York: Knopf, 1977)).
6Don Ihde, Listening and Voice: A Phenomenology of Sound (Athens: Ohio University Press, 1974), p. 53.
7Friedrich A. Kittler, Gramophone, Film, Typewriter, trans. by G. Winthrop-Young and M. Wutz (Stanford: Stanford University Press, 1999), p. 24.
8Sybille Krämer, ‘The Cultural Techniques of Time Axis Manipulation: On Friedrich Kittler’s Conception of Media’, Theory, Culture & Society, 23.7–8 (2006), 93–109 (p. 94).
9‘artefact’, n. A.1.a., OED online <http://www.oed.com> [accessed 10 October 2015].
10Jerome McGann, ‘Towards Philology in a New Key’, interview by Scott Pound, Amodern, 1 <http://amodern.net/article/interview-with-jerome-mcgann/> [accessed 10 October 2015].
11Jerome McGann, A New Republic of Letters: Memory and Scholarship in the Age of Digital Reproduction (Cambridge, MA: Harvard University Press, 2014), p. 37, emphasis in original.
12Marjorie Perloff, ‘Screening the Page/Paging the Screen: Digital Poetics and the Differential Text’, in Contemporary Poetics, ed. by Louis Armand (Evanston: Northwestern University Press, 2007), pp. 379–92 (p. 379), emphasis in original.
13In the natural sciences, a differential medium refers to a growth (‘culturing’) medium containing compounds that work to visually distinguish microorganisms, either by the way the colony appears, or by the colony’s contrast with the surrounding medium. This idea of the differentiation of content by a medium is relevant to the present discussion insofar as the media through which a signal may migrate will distinguish certain of its characteristics, and erase others. For an expanded theory of differential media as an assemblage of materials (and their circulation) that characterize a ‘single’ cultural production, see Darren Wershler, Guy Maddin’s ‘My Winnipeg’ (Toronto: University of Toronto Press, 2010), pp. 9–11.
14N. Katherine Hayles, ‘Flickering Connectivities in Shelley Jackson’s Patchwork Girl: The Importance of Media-Specific Analysis’, Postmodern Culture, 10.2 (2000) <http://pmc.iath.virginia.edu/text-only/issue.100/10.2hayles.txt> [accessed 10 October 2015] (para. 3 of 58).
15Jay David Bolter and Richard Grusin, Remediation: Understanding New Media (Cambridge, MA: MIT Press, 1999), p. 55.
16Catherine Robson, ‘How We Search Now: New and Old Ways of Digging Up Wolfe’s “Sir John Moore”’, in Virtual Victorians: Networks, Connections, Technologies, ed. by Veronica Alfano and Andrew Stauffer (New York: Palgrave Macmillan, 2015), pp. 11–28 (p. 14).
17See Gitelman, Always Already New; Jonathan Sterne, MP3: The Meaning of a Format (Durham, NC: Duke University Press, 2012); Ian Bogost and Nick Montfort, Racing the Beam: The Atari Video Computer System (Cambridge, MA: MIT Press, 2009); Matthew Kirschenbaum, Mechanisms: New Media and the Forensic Imagination (Cambridge, MA: MIT Press, 2008) and Track Changes: A Literary History of Word Processing (Cambridge, MA: Harvard University Press/Belknap Press, forthcoming); Jussi Parikka, What is Media Archeology? (Cambridge: Polity Press, 2012); Wolfgang Ernst, Digital Memory and the Archive, ed. by Jussi Parikka, Electronic Mediations, 39 (Minneapolis: University of Minnesota Press, 2013).
18Jussi Parikka, ‘Archival Media Theory: An Introduction to Wolfgang Ernst’s Media Archaeology’, in Ernst, Digital Memory and the Archive, pp. 1–22 (p. 14).
19Alan Liu, ‘The State of the Digital Humanities: A Report and a Critique’, Arts and Humanities in Higher Education, 11 (2012), 8–41 (p. 11).
20Brian Massumi, Parables for the Virtual: Movement, Affect, Sensation (Durham, NC: Duke University Press, 2002), p. 6.
21Lisa Gitelman, ‘Souvenir Foils: On the Status of Print at the Origin of Recorded Sound’, in New Media, 1740–1915, ed. by Lisa Gitelman and Geoffrey B. Pingree (Cambridge, MA: MIT Press, 2003), pp. 157–73 (p. 166).
22Susan Stewart, On Longing: Narratives of the Miniature, the Gigantic, the Souvenir, the Collection (Durham, NC: Duke University Press, 1993), p. 136.
23The image in Fig. 3 depicting a phonautogram of the human voice from a distance (‘phonautographie de la voix humaine à distance’) is reproduced from Serge Benoit, Daniel Blouin, Jean-Yves Dupont, et Gérard Emptoz, ‘Chronique d’une invention: le phonautographe d’Édouard-Léon Scott de Martinville (1817–1879) et les cercles parisiens de la science et la technique’, Documents pour l’histoire des techniques, 17 (2009), 69–89 (p. 74) <http://dht.revues.org/502> [accessed 10 October 2015].
24The First Sound researchers include David Giovannoni, Patrick Feaster, Richard Martin, and Meagan Hennessey. Their work focuses on identifying, understanding, and playing very early sound recordings.
25For a useful description of Scott’s phonautograph, and a ‘discography’ of the phonautograms Scott produced, see Patrick Feaster, ‘Édouard-Léon Scott de Martinville: An Annotated Discography’, ARSC Journal, 41 (2010), 43–81.
26Patrick Feaster, ‘Paleospectrophony’, <http://www.phonozoic.net/paleospectrophony.html>. See also Patrick Feaster, ‘What is Paleospectrophony?’, <https://griffonagedotcom.wordpress.com/2014/10/15/what-is-paleospectrophony/> [both accessed 10 October 2015].
27‘A Museum of Voices’, Literary Digest, 15 September 1906, pp. 347–48.
28Alan Riding, ‘From a Vault in Paris, Sounds of Opera 1907’, New York Times, 16 February 2009, section C, p. 3 <http://www.nytimes.com/2009/02/17/arts/music/17vaul.html> [accessed 10 October 2015].
29William Grimes, ‘Poem is Whitman’s: Is the Voice?’, New York Times, 16 March 1992, section B, pp. 1–2 <http://www.nytimes.com/1992/03/16/books/poem-is-whitman-s-is-the-voice.html> [accessed 10 October 2015].
30Ed Folsom, ‘The Whitman Recording’, Whitman Quarterly Review, 9 (1994), 214–16 (p. 215).
31‘Pictures & Sound: Audio’, Walt Whitman Archive <http://www.whitmanarchive.org/multimedia/audio.html> [accessed 10 October 2015].
32Lewis Waller, ‘Recitation — Mr. Lewis Waller — Charge of the Light Brigade’ (Gramophone Company, 1907); Canon Fleming, ‘The Charge of the Light Brigade’ (Gramophone and Typewriter Company, 1906); Henry Ainley, ‘The Charge of the Light Brigade (Tennyson)’ (HMV, 1915).
33Cited in Jacqueline George, ‘Public Reading and Lyric Pleasure: Eighteenth Century Elocutionary Debates and Poetic Practices’, ELH, 76 (2009), 371–97 (p. 375).
34Jerome McGann, The Beauty of Inflections: Literary Investigations in Historical Method and Theory (Oxford: Clarendon Press, 1988), p. 202; The Poetics of Sensibility: A Revolution in Poetic Style (Oxford: Oxford University Press, 1998), p. 136.
35James Rush, The Philosophy of the Human Voice (Philadelphia: Lippincott, 1867), p. 447.
36‘The Charge of the Light Brigade’, in Tennyson: A Selected Edition, ed. by Christopher Ricks (Berkeley: University of California Press, 1989), p. 511.
37Robert I. Fulton and Thomas C. Trueblood, Practical Elements of Elocution (Boston: Ginn, 1893), pp. 151, 153.
38See, for example, S. S. Hamill, New Science of Elocution (New York: Philips & Hunt, 1886); and George Lansing Raymond, The Orator’s Manual (London: Putnam’s Sons, 1910).
39James [Canon] Fleming, The Art of Reading and Speaking, 2nd edn (London: Edwin Arnold, 1904), p. 53.
40For a discussion of intonation in conversation within the context of the Spoken English Corpus (SEC) consisting of a range of speech styles, mostly scripted and/or read, see Anne Wichmann, Intonation in Text and Discourse: Beginnings, Middles, Ends (New York: Routledge, 2013), pp. 123–47.