continent. maps a topology of unstable confluences and ranges across new thinking, traversing interstices and alternate directions in culture, theory, biopolitics and art.
Issue 5.3 / 2016:

Sounding Silence

Jacob Gaboury

What is the sound of an empty room? All spaces encase ambient sound shaped by the acoustic affordances of their dimensions, shaping the silence that forms the potential for sound to be heard. This same acoustic infrastructure likewise informs sounds that are produced in that space, leaving a trace that marks a sound’s initial mediation as it travels to the body of a listener. Indeed the unique distortional effect of any given space is most present when it resonates in this way, that is, when what was in the first instance silence is transformed into the noise that muddies or distorts a desired sound. Put another way: “Every room sounds different.”[1] 

“Introducing Sonos Trueplay feat. Ratatat ‘Cream On Chrome’” Sonos (2015)

In 2015 the American HiFi audio company Sonos released its Trueplay audio technology for their line of WiFi connected speakers. Trueplay offers users the ability to acoustically map any room in order to calibrate their Sonos speaker to adapt to and negate the acoustic effect of a given space. “With Trueplay we are able to use your iPhone to listen to the different effects of […] objects in your room, and then adapt the speaker that so it sounds correct even though it is in a poorly acoustic placed location. […] Now any room, from a large glass living room to a small tile bathroom, can give you perfect sound clarity.”[2] The software accomplishes this negation by calculating an infinite impulse response filter using the Sonos iOS application, which the user runs while sonically “scanning” the room by moving their phone or tablet in a series of long sweeping gestures.[3] The filter is then stored on the speaker to modify playback, such that the effect of that space on the sound of the speaker is erased. The technology promises to more closely reproduce music as the producer originally intended it to be heard, assuming that such a “correct” sound would replicate its original recording in a studio constructed to erase ambient noise and interference. In other words a correct sound for Trueplay is a sound entirely divorced from the acoustic shape of a particular space, a sound that has been thoroughly abstracted.

This claim for total abstraction can be viewed as the solution to a challenge that has plagued recorded sound from its inception, often referred to by sound engineers and audio enthusiasts as the “second-venue problem.” The term refers to the alienation of a sound captured and shaped by one acoustic infrastructure that is then reproduced or simulated in another. The remediation of this second venue produces an uncanny dissonance in which the disparate shape of each space may be felt. With any recorded sound, the space of its recording acts as an instrument that shapes the sound prior to its capture by a body or recording device. What we experience as the silence of an empty room is in this sense the resonant potential of a particular environment. In reproducing that sound in a second space, this shaping is made present in its decontextualization.

“Sonos Trueplay: Behind the Scenes” Sonos (2015)

While the second venue problem is as old as our ability to capture and reproduce sound, it has not always been identifiable as a problem per se. In fact one of the earliest promises of audio recording was, in part, its ability to indexically capture and reproduce sound in its totality, including any ambient noise that marked the space of its recording. Much as with early cinema, in which audiences were transfixed less by the movement of bodies than with what Dai Vaughn has termed the incidentals of scenes — smoke from a forge, steam from a locomotive, brick dust from a demolished wall — early audio recording was able to capture not only the intentional performance of sound, but also the incidental and ambient noise that helped to shape it.[4] Of course, this indexical claim was troubled early on by the advent of synthetic sound in the 1930s, which fundamentally changed the ontological stability of all recorded sound – later exacerbated, of course, by digitization.[5] This shift can be understood not simply against some unreconstructed notion of the indexical, but for the way it comes to shape the desires and expectations of a presumed listener. Put another way, our understanding of sound as mutable changes both our expectation that all recorded sound may be manipulated, as well as our sense for what a neutral, authentic, or pure sound might be. Of interest to me here is less the manipulation of recorded sound in terms of fidelity, veracity, or liveness, than the manipulation of sound in order to silence or otherwise erase the contextual noise that marks its mediation.[6] In other words: noise reduction.[7]

Erasing the Medium

It is not a coincidence that the development of noise reduction technology coincides with some of the earliest experiments into digital audio in the early-1960s. The first commercial noise reduction technology was developed by Ray Dolby in 1965, based on the logic of a homomorphic compander.[8] Companding is a signal processing technique that reduces noise by first compressing the source material's dynamic range in anticipation of its being recorded in a relatively noisy medium. Then, on playback, the noisy encoded material is passed through an expander that restores the original dynamic range of the source material. The contaminating or noisy signal is masked by this dynamic expansion process, resulting in a significant reduction in perceived noise. In Dolby’s case the noise to be eliminated was that of the recording medium itself — the hiss of magnetic tape — but a similar logic could be applied to the reduction of ambient sound produced by the space of production. Companders have been used in signal processing since the early 20th century, particularly in early telephony as they allowed signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability. They are not by necessity digital technologies, but rather technologies for the processing of signals, acoustic or otherwise.        

TX-2 Computer at MIT Lincoln Labs, ca. 1963

While Dolby was the first to develop a commercial noise reduction system, the very first homomorphic compander for digital audio was developed by Thomas Stockham at MIT’s Lincoln Laboratory beginning in 1959. At the time Stockham was an Associate Professor of Computer Science, working with then-colleague Amar Bose on some of the earliest experiments in digital audio using the transistor-based TX-2 computer.[9] Bose and Stockham were primarily interested in acoustic applications for digital sound, specifically with the challenge of the second-venue problem and a desire to neutralize the acoustic specificity of this second space. As part of their research Bose identified an experimental site intended to mimic the ideal home listening environment — a prototypical living room — and using digital signal processing developed a speaker to minimize the acoustic quality of that space.[10] The result was the Bose 2201, made up of a collection of 22 small mid-range speakers dispersed over an eighth of a sphere and designed to sit in the corner of a room. In 1966 it would become the Bose Corporation’s first loudspeaker product.

Bose 2201 Loudspeaker (1966)

Image by Hideya Hamano, Licensed Under CC BY-NC-ND 2.0

Bose 2201 Owner’s Manual (1966)

As part of this work with Bose at MIT Stockham produced what are considered the first digital recordings of high-fidelity sound, in the process reducing the computation time required for digital recording from roughly twenty hours per second of music to seven minutes per second.[11] Stockham continued to work on digital signal processing throughout the 1960s, but left MIT in 1968 to accept a tenured position in the newly-formed computer science program at the University of Utah, an ARPA-funded “Center for Excellence” for research into early computer graphics, led by graphics pioneer David C. Evans. At Utah Stockham expanding this early work, developing a technique for a process he termed “blind deconvolution,” whereby two convolved or mixed signals could be computationally disentangled without any knowledge of either signal in isolation.[12] Stockham first applied this technique to image processing, digitally restoring images that had been artificially blurred or pixelated, but his most ambitious application was the restoration of the voice of the opera singer Enrico Caruso.[13]  


Gramophone Co., Ltd.

Caruso was perhaps the most famous singer of the early twentieth century, due in no small part to the availability of his performances as early gramophone recordings produced by the Victor Talking Machine company from roughly 1904 until the singer’s death in 1921. These early recordings were made in an era prior to the invention of electrical sound, and so Caruso’s voice was captured using a mechanical horn that would resonate along with the music, distorting the sound of his voice. It was Stockham’s goal to recover the true voice of Caruso by deconvolving the horn’s acoustic characteristics from the voice itself, to extract the voice by silencing the noise of the medium. The success of these experiments would lead Stockham to found Soundstream, Inc. — the first digital recording company in the United States.[14]

Caruso: A Legendary Performer, Soundstream, Inc. 1975

Here we have perhaps the first example of digital noise reduction, a now ubiquitous practice readily available as part of any audio editing software and used in most all professional audio recording. Of course it would be difficult to identify a single moment in which a desire and expectation for noiseless audio recording took hold, as the conditions of any recording are mutually shaped by the technical affordances and cultural expectations of that event. As Jonathan Sterne has argued, the performance of a sound is designed to capture its recording, and as such speaking of recording as a unidirectional indexing of some a priori un-mediated event is, at best, disingenuous.[15] Nonetheless we can identify a tendency toward noiseless or otherwise silenced recordings, and toward the erasure of those environments that shape and mediate a given sound. 

Empty Rooms

Jacob Kirkegaard “Bloku” (2008) Documentation of 4 Rooms

“Swimming Pool” (2006) 

How might we begin once again to perceive the mediation offered by these kinds of acoustic infrastructures? Here at last I turn to the work of the Danish sound artist Jacob Kirkegaard whose practice engages the sonification of otherwise imperceptible noise through the mediation of recording technology. Kirkegaard’s fascination with the ambient and imperceptible is perhaps best exemplified by his 4 Rooms series, a sonic portrait of four abandoned rooms inside the “Zone of Exclusion” in Chernobyl, Ukraine.

Recorded in October 2005, the sound of each room was evoked by an elaborate method: Kirkegaard made a recording of 10 minutes and then played the recording back into the room, recording it again. This process was repeated up to ten times. As the layers got denser, each room slowly unfolded its own unique drone of various resonant frequencies. Rather than produce a sound that might be used to make present the reverberant quality of each space, Kirkegaard sounds and re-sounds the silence of each room until the ambient noise reveals the underlying acoustic shape of that otherwise empty space.

Here Kirkegaard follows in the tradition of earlier minimalist and ambient composers such as Alvin Lucier and R. Murray Schafer, but in distinction to these works here the body is almost entirely absent.[16] Indeed in producing each take Kirkegaard left the room for the duration of the recording, such that the sound that is amplified is the resonant frequency of the room articulated not by the speech of a body but by the recording medium itself. 

Jacob Kirkegaard, Labyrinthitis (2008)

When the body does appear in Kirkegaard’s work, it is deployed as a medium for sound in place of a sensing subject. In his 2008 work Labyrinthitis Kirkegaard explores the phenomenon of otoacoustic emission, in which the ear can — spontaneously or in response to a specific pair of tones — resonate to produce a third tone, a tone generated by the ear itself and audible to others. For Labyrinthitis Kirkegaard recorded the sound of his own otoacoustic emissions in a soundproof chamber and then tone-shifted the noise to produce an extended 40-minute composition.[17] When performed this composition will, in turn, invoke otoacoustic emissions in the ears of its listeners — an uncanny sensation in which a third tone seems to resonate and move inside the listener’s head.[18] Here the body acts as a resonant medium for sound – a medium that, much as with 4 Rooms, is only made sensible through its externalization by means of a recording technology.

Herein lies the dual function of mediation: all media obscure the signals they mediate, but in doing so they make sensible the properties of that mediation. While our desire may be to silence these noisy channels and to disappear those infrastructures that shape the production of sound, this erasure is itself an act of mediation. All silence is shaped by the affordances of its medium, be it the shape of a room, an algorithm for deconvolution, or the resonant potential of the body itself. In examining this silence we make present its contextual mediation.

[1] Introducing Sonos Trueplay feat. Ratatat “Cream On Chrome”

[2] Ibid. n. page.

[3] Infinite impulse response is distinguished by having an impulse response which tends asymptotically toward zero, continuing, theoretically indefinitely. They are more efficient to implement than finite impulse response filters and require less computational power, part of the reason they have become the defacto standard implementation and why they were used for these early experiments.

[4] Vaughn, Dai. “Let there be Lumière” in The Documentary Film Reader: History, Theory, Criticism, Jonathan Kahana ed. Oxford: Oxford University Press (2016) 46. Cited in Jordan Schonig “Kant’s ‘Fuzzy Objects’: Rethinking Contingency From Early Cinema to CGI” Precarious Aesthetics Oct 15-17, 2015, University of California, Berkeley.; See also Mary Ann Doane, The Emergence of Cinematic Time Cambridge, MA: Harvard University Press (2002) 62-63, 129.

[5] Levin, Thomas “Tones from out of Nowhere” New Media, Old Media: A History and Theory Reader, Wendy Hui-Kyong Chun and Thomas Keenan, eds. New York: Routledge (2006) 69-71.

[6] Discussions of sound fidelity are deeply entangled in cultural, historical, and technological shifts in the production of sound and the expectations these shifts engender. For more on sound fidelity, see Auslander, Philip Liveness: Performance in a Mediatized Culture”New York: Routledge (1999) 61–111; Lastra, James Sound Technology and American Cinema: Perception, Representation, Modernity. New York, Columbia University Press (2000) 123–153; Sterne, Jonathan The Audible Past: Cultural Origins of Sound Reproduction. Durham: Duke University Press (2003) 215–286.

[7] While it might seem logical to focus on synthetic audio technologies such as the vocoder or the theremin, or even more recent and familiar processing technologies such as Auto-Tune, what interests me here is not the production of synthetic sound but the synthetic erasure of ambient noise. See Mills, Mara. Media and Prosthesis: the Vocoder, the Artificial Larynx, and the History of Signal Processing, qui parle: Critical Humanities and Social Sciences 21,1 (Fall/Winter 2012) 107-149 and Glinsky, Albert. Theremin: ether music and espionage. Champaign, IL: University of Illinois Press (2000).

[8] A competing technology was developed simultaneously by David E. Blackmer for dbx laboratories in 1971, later incorporated into dynamic noise reduction or DNR systems for telephony in 1981. It should be noted that noise reduction by means of soundproofing is as old as mechanical recording itself, and evolved along with recording technology over the course of the 20th century. See: Horning, Susan Schmidt. Chasing Sound: Technology, Culture, and the Art of Studio Recording from Edison to the LP. Baltimore: Johns Hopkins University Press (2013).

[9] The TX-2 is perhaps best known as the machine used by Ivan Sutherland to develop his highly influential Sketchpad computer graphics program in 1963.

[10] They began by setting up a microphone and recording music produced by the speaker in this "ideal" listening room. They then found the impulse response of the room by setting off a spark – an ideal speaker – in the corner where the speaker had been and recording it over and over again, and then convolving the sound of the music with the spark recording to see how much poorer the loudspeaker was than the spark, and developing filters to neutralize that difference. See: Levitin, Daniel. “Signal Parent: Thomas Stockham on the birth of digital audio.” National Academy of Recording Arts & Sciences Journal, Vol 5(1), 1994 10.

[11] Bose, Amar G. “Alumni Profile” RLE Currents Vol. 8 No. 1 (Spring 1996):  8

[12] Oppenheim, Alan V., Ronald W. Schafer, and Thomas G. Stockham Jr. "Nonlinear filtering of multiplied and convolved signals." Audio and Electroacoustics, IEEE Transactions on 16, no. 3 (1968): 437-466.

[13] Stockham Jr, Thomas G., Thomas M. Cannon, and Robert B. Ingebretsen. "Blind deconvolution through digital signal processing." Proceedings of the IEEE 63, no. 4 (1975): 678-692.

[14] Over the course of the 1970s Soundstream worked with classical and popular musicians to produce some of the first commercial digital audio recordings with artists from the Cincinnati Symphony and John Williams to Fleetwood Mac and Linda Ronstadt.

[15] Sterne, Jonathan “The Death and Life of Digital Audio” Interdisciplinary Science Reviews, 2006, Vol. 31, No. 4 pp. 342.

[16] Here I refer most explicitly to Alvin Lucier's minimalist performance "I am sitting in a room" [1970], in which the artist recorded the sound of his own voice in a room and repeatedly played this recording back into that same space in an effort to capture “the natural resonant frequencies of the room articulated by speech.” 4 Rooms evokes the soundscape work of R. Murray Schafer, particularly his 1977 book The Tuning of the World.

[17] The initial recording was made in a soundproof chamber using a small pair of speakers and microphones, an environment that clearly evokes John Cage’s 1951 visit to the anechoic chamber at Harvard University that inspired his now infamous 4’33”.

[18] I am describing here my own experience with the work, which was performed in 2013 at the Visions of the Now conference in Stockholm, Sweden at which I was also a speaker.