Audio
Overview[edit]
This document is intended to act as a teaching tutorial for sound terminology, theory and practice, across multiple disciplines, but focusing on acoustics, psychoacoustics, environmental acoustics, electroacoustics, speech acoustics, audiology, noise and soundscape studies. In many cases, we draw comparisons between these disciplines and attempt to explain their basic models and how they differ, beginning with the Introductory module.
INTRODUCTION: Sound is .....[edit]
A survey of basic concepts in each discipline
1: Sound-Medium Interface[edit]
ACOUSTIC
2. Vibration: Frequency and Pitch[edit]
3. Vibration: Spectrum and Timbre[edit]
4. Magnitude: Levels and Loudness[edit]
5. Sound-Environment Interaction[edit]
6. Binaural Hearing and Acoustic Space[edit]
7. Sound-Sound Interaction[edit]
8. Speech Acoustics[edit]
9. Audiology and Hearing Loss[edit]
10. Effects of Noise and Noise Measurement Systems[edit]
ELECTROACOUSTIC
11. Field Recording[edit]
12. Filters and Equalization[edit]
13. Modulation and Auto-Convolution[edit]
14. Time Delays and Phasing[edit]
15. Time Delays and Reverberation[edit]
16. Dynamic Range and Compression[edit]
17. Microsound and Granular Synthesis[edit]
18. Voice and Text-based Composition[edit]
19. Soundscape Composition[edit]
The Tutorial is divided into a number of modules which are designed to cover a particular topic similar to a lab-based class or a set of studio demos. They are divided into an Acoustic set and an Electroacoustic set. Subtopics in each module can be accessed separately by a link in the series A, B, C, etc.
Interdisciplinary Thematic Search Engine[edit]
The subject matter of this document is organized according to various themes, the first five of which are traced through various subdisciplines, each of which treats the theme differently. The relevant terms for each theme and each discipline are grouped together. The themes are:
Analytical Dimensions of Sound[edit]
Magnitude[edit]
Vibration[edit]
Levels of Acoustic Interaction[edit]
Sound - Medium Interface[edit]
Sound - Environment Interaction[edit]
Sound - Sound Interaction[edit]
Specific Subdisciplines[edit]
Audiology and Hearing Loss[edit]
Noise Measurement Systems[edit]
Electroacoustic and Tape Studio Terms[edit]
Linguistics and Speech Acoustics[edit]
Communications Theory[edit]
The principal discipline which is the "home" for each term is indicated by an icon, as follows:
acoustics psychoacoustics soundscape Noise electroacoustics [linguistics]audiology [music] [1]
Terms that are found in more than one discipline are indicated as follows: Acoustics / Electroacoustics
Components of Electronic Instrument[edit]
If we want to play some music, sound has to be generated somehow, right? Then the first family of modules that we are going to tackle is that of the sound sources: oscillators, noise sources, and samplers, mainly.
Oscillators[edit]
Oscillators are those modules that generate a pitched tone. Their frequency content
varies depending on the waveform that is generated. Historically, only simple waveforms were generated, according to the electronic knowledge available. Typical waveforms are thus triangular, rectangular, sawtooth, and sinusoidal. These are all very simple waveforms that can be obtained by a few discrete components. From simple designs come simple spectra: their shape is very straight and unnatural, thus requiring additional processing to obtain pleasant sounds. Their spectral properties are discussed in Section 2.8, after the basic concepts related to frequency-domain analysis have been discussed (Figure 1.5). Figure 1.4: Connection between two modules. The output of Module 1 is connected to the input of Module 2 by a TS jack. This way, the input voltage of Module 2 follows the output voltage of Module 1, and thus the signal is conveyed from Module 1 to Module 2. Figure 1.5: Typical synthesizer oscillator waveforms include (from left to right) sawtooth, triangular, and rectangular shapes.
Oscillators usually have at least one controllable parameter: the pitch (i.e. the fundamental frequency they emit). Oscillators also offer control over some spectral properties. For example, rectangular waveform oscillators may allow pulse width modulation (PWM) (i.e. changing the duty cycle Δ, discussed later). Another important feature of oscillators is the synchronization to another signal.
Synchronization to an external input (a master oscillator) is available on many oscillator designs. So- called hard sync allows an external rising edge to reset the waveform of the slave oscillator and is a very popular effect to apply to oscillators. The reset implies a sudden transient in the waveform that alters the spectrum, introducing high-frequency content. Other effects known as weak sync and soft sync have different implementations. Generally, with soft sync, the oscillator reverses direction at the rising edge of the external signal. Finally, weak sync is similar to hard sync, but the reset is applied only if the waveform is close to the beginning or ending of its natural cycle. It must be noted, however, that there is no consensus on the use of the last two terms, and different synthesizers have different behaviors. All these synchronization effects require a different period between slave and master. More complex oscillators have other ways to alter the spectrum of a simple waveform (e.g. by using waveshaping). Since there are specific modules that perform waveshaping, we shall discuss them later. Oscillators may allow frequency modulation (i.e. roughly
speaking, controlling the pitch with a high-frequency signal). Frequency modulation is the basis for FM synthesis techniques, and can be either linear or logarithmic (linear FM is the preferred one for timbre sculpting following the path traced by John Chowning and Yamaha DX7’s sound designers).
To conclude, tone generation may be obtained from modules not originally conceived for this aim,such as an envelope generator (discussed later) triggered with extremely high frequency.
Noise sources[edit]
also belong to the tone generators family. These have no pitch, since noise is a broadband signal, but may allow the selection of the noise coloration (i.e. the slope of the spectral rolloff), something we shall discuss in the next chapter. Noise sources are very useful to create percussive sounds, to create drones, or to add character to pitched sounds. Finally, the recent introduction of digital modules allows for samplers to be housed in a Eurorack module. Samplers are usually capable of recording tones from an input or to recall recordings from a memory (e.g. an SD card) and trigger their playback. Other all-in-one modules are available that provide advanced tone generation techniques, such as modal synthesis, FM synthesis, formant synthesis, and so on. These are also based on digital architecture with powerful microcontrollers or digital signal processors (DSPs). 1.3.2 Timbre Modification and Spectral Processing As discussed, most tone generators produce very static sounds that need to be colored, altered, or emphasized. Timbre modification modules can be divided into at least four classes: filters, waveshapers, modulation effects, and vocoders. Filtering devices are well known to engineers and have played a major role in electrical and communication engineering since the inception of these two fields. They are devices that operate in the frequency domain to attenuate or boost certain frequency components. Common filters are the low-pass, band-pass, and high-pass type. Important filters in equalization applications are the peak, notch, and shelving filters. Musical filters are rarely discussed in engineering textbooks, since engineering requirements are different from musical requirements. Among these, we have a low implementation cost, predetermined spectral roll-off (e.g. 12 or 24 dB/oct), and the possibility to introduce a resonance at the cutoff frequency, eventually leading to self-sustained oscillation.4 10 Modular Synthesis
While engineering textbooks consider filters as linear devices, most analog musical filters can be operated in a way that leads to nonlinear behavior, requiring specific knowledge to model them in
the digital domain.
Waveshaping devices[edit]
have been extensively adopted by synthesizer developers such as Don Buchla and others in the West Coast tradition to create distinctive sound palettes. A waveshaper introduces new spectral components by distorting the waveform in the time domain. A common form of waveshaper is the foldback circuit, which wraps the signal over a desired threshold. Other processing circuits that are common with guitar players are distortion and clipping circuits. Waveshaping in the digital domain requires a lot of attention in order to reduce undesired artifacts (aliasing). Other effects used in modular synthesizers are so-called modulation effects, most of which are based on delay lines: chorus, phaser, flanger, echo and delay, reverb, etc. Effects can be of any sort and are not limited to spectral processing or coloration, so the list can go on. Vocoders have had a large impact in the history of electronic music and its contaminations. They also played a major role in the movie industry to shape robot voices. Several variations exist; however, the main idea behind it is to modify the spectrum of a first sound source with a second one that provides spectral information. An example is the use of a human voice to shape a synthesized tone, giving it a speech-like character. This configuration is very popular. Figure 1.6: A CRB Voco-Strings, exposed at the temporary Museum of the Italian Synthesizer in 2018, in Macerata, Italy. This keyboard was manufactured in 1979–1982. It was a string machine with vocoder and chorus, designed and produced not more than 3 km from where I wrote most of this book. Photo courtesy of Acusmatiq MATME. Owner: Riccardo Pietroni.
Envelope, Dynamics, Articulation[edit]
Another notable family of effects includes all the envelope, dynamics, and articulation devices. Voltage-controlled amplifiers (VCAs) are meant to apply a time-varying gain to a signal in order to shape its amplitude in time and create a dynamic contour. They can be controlled by a high- frequency signal, introducing amplitude modulation (AM), but more often they are controlled by envelope generators (EGs). These are tools that respond to a trigger or gate signal to generate a voltage that rises and decays, determining the temporal evolution of a note or any other musical event. Usually, such evolution is described by four parameters: the attack, decay, and release times and the sustain level, producing an ADSR scheme, depicted in Figure 1.7. Most envelope generation schemes follow the so-called ADSR scheme, where a tone is divided into three phases, requiring four parameters: • A: The attack time. This parameter is expressed as a time parameter in [s] or [ms] or a percentage of a maximum attack time (i.e. 1–100).
• D: The decay time. The time to reach a steady-state level (usually the sustain, or zero when no sustain is provided by the EG), also expressed as a time ([s], [ms]) or a percentage of a maximum decay time (1–100).
• S: The sustain level. The steady-state level to be reached when the decay phase ends. This is usually expressed as a percentage of the peak level that is reached in the attack phase 1–100).
• R: The release time. The time to reach zero after the musical event ends (e.g. note off event).
This is also expressed in [s], [ms], or percentage of a maximum release time (1–100). Subsets of this scheme, such as AR, with no sustain phase, can still be obtained by ADSR. An EG
generates an envelope signal, which is used as an operand in a product with the actual signal to shape. It is important to distinguish between an EG and a VCA; however, sometimes both
functionalities are comprised in one device or module.
Envelope generators are also used to control other aspects of sound production, from the pitch of the oscillator to the cutoff of a filter (Figure 1.8).
Similarly, low-frequency oscillators (LFOs) are used to control any of these parameters. LFOs are very similar to oscillators, but with a frequency of oscillation that sits below the audible range or slightly overlapping with its lower part. They are used to modulate other parameters. If
they modulate the pitch of an oscillator, they are performing vibrato. If they modulate the Figure 1.7: A linear envelope generated according to the ADSR scheme.
VCA[edit]
amplitude of a tone through a VCA, they are performing tremolo. Finally, if they are used to shape the timbre of a sound (e.g. by modulating the cutoff of a filter), they are performing what is sometime called wobble. Other tools for articulation are slew limiters, which smooth step-like transitions of a control voltage. A typical use is the smoothing of a keyboard control voltage that provides a glide or portamento effect by prolonging the transition from one pitch value to another.
A somewhat related type of module is the sample and hold (S&H). This module does the inverse of a slew limiter by taking the value at given time instants and holding it for some time, giving rise to a step-like output. The operation of an S&H device is mathematically known as a zero-order hold filter. An S&H device requires an input signal and depends on a clock that sends triggering pulses. When these are received, the S&H outputs the instantaneous input signal value and holds it until a new trigger arrives. Its output is inherently step-like and can be used to control a range of other modules.
1.3.4 “Fire at Will,” or in Short: Sequencers Step sequencers is another family of modules that allow you to control the performance. Sequencers specifically had – and still have – a distinctive role in the making of electronic music, thanks to their Figure 1.8: Advanced envelope generation schemes may go beyond the ADSR scheme. The panel of a Viscount-Oberheim OB12 is shown, featuring an initial delay (DL) and a double decay (D1, D2) in addition to the usual controls.
Modular Synthesis 13
machine-like precision and their obsessive repetition on standard time signatures. Sequencers are made of an array or a matrix of steps, each representing equally spaced time divisions. For drum machines, each step stores a binary information: fire/do not fire. The sequencer cycles repeatedly along the steps and fires whenever one of them is armed. We may call this a binary sequencer. For synthesizers, each step has one or more control voltage values associated, selectable through knobs or sliders. These can be employed to control any of the synth parameters, most notably the pitch, which is altered cyclically, following the values read at each step. Sequencers may also include both control voltage and a binary switch, the latter for arming the step. Skipping some steps allows creating pauses in the sequence. Sequencers are usually controlled by a master clock at metronome rate (e.g. 120 bpm), and at each clock pulse a new step is selected for output, sending the value or values stored in that step. This allows, for example, storing musical phrases if the value controls the pitch of a VCO, or storing time-synchronized modulations if the value controls other timbre-related devices. Typical sequencers consist of an array of 8 or 16 steps, used in electronic dance music (EDM) genres to store a musical phrase or a drumming sequence of one or two bars with time signature 4/4. The modular market, however, provides all sorts of weird sequencers that allow for generative music, polyrhythmic composition, and so on. Binary sequencers are used for drum machines to indicate whether a part of the drum should fire or not. Several rows are required, one for each drum part. Although the Roland TR-808 is widely recognized as one of the first drum machines that could be programmed using a step sequencer, the first drum machine ever to host a step sequencer was the Eko Computer Rhythm, produced in 1972 and developed by Italian engineers Aldo Paci, Giuseppe Censori, and Urbano Mancinelli. This sci-fi wonder has six rows of 16 lit switches, one per step. Each row can play up to two selectable drum parts (Figure 1.9).
Figure 1.9: The Eko Computer Rhythm, the first drum machine ever to be programmed with a step sequencer. It was devised and engineered not more than 30 km away from where this book was written. Photo courtesy of Acusmatiq MATME. Owner: Paolo Bragaglia. Restored by Marco Molendi.
14 Modular Synthesis
Utility Modules[edit]
There are, finally, a terrific number of utility modules that, despite their simplicity, have a high value for patching. Attenuators and attenuverters, mixers, multiples, mutes, and multiplexers and demultiplexers are very important tools to operate on signals. A brief definition is given for each one of these:
• Attenuators and attenuverters. An attenuator is a passive or active circuit that just attenuates the signal using a potentiometer. In the digital domain, this is equivalent to multiplying a signal by any value in the range [0, 1]. Attenuverters, additionally, are able to invert the signal, as if multiplying the signal by a number in the range [−1, 1]. Please note that inversion of a periodic signal is equivalent to a phase shift of 180° or π.
• Mixers. These modules allow you to sum signals together. They may be passive, providing just an electrical sum of the input voltages, they may be active, and they may have faders to control the gain of each input channel. Of course, in VCV Rack, there will be no difference between active and passive; we will just be summing discrete-time signals.
• Multiples. It is often useful to duplicate a signal. Multiples are made for this. They provide one input signal into several outputs. In Rack, this is not always required, since cables can be stacked from outputs, allowing duplication without requiring a multiple. However, they can still be useful to make a patch tidy.
• Mutes. It is sometimes useful to mute a signal, especially during a performance. Mutes are just switches that allow the signal flow from an input to an output or not.
• Multiplexers and demultiplexers. These modules allow for complex routing of signals.
A multiplexer, or mux, has one input and multiple outputs and a knob to select where to route the input signal. A demultiplexer, or demux, on the contrary, has multiple inputs and one output.
In this case, the knob selects which input to route to the output. Mux and demux devices only allow one signal to pass at a time.
Interface and control modules are also available to control a performance with external tools or add expressiveness. MIDI-to-CV modules are necessary to transform Musical Instruments Digital Interface (MIDI) messages into a CV. Theremin-like antennas and metal plates are used as input devices, while piezoelectric transducers are used to capture vibrations and touch, to be processed by other modules.
Elements of Signal Processing for Synthes[edit]
2.1 Continuous-Time Signals TIP: Analog synthesizers and effects work with either voltage or current signals. As any physical quantity, these are continuous-time signals and their amplitude can take any real value – this is what we call an analog signal. Analog synthesizers do produce analog signals, and thus we need to introduce this class of signals. A signal is defined as a function or a quantity that conveys some information related to a physical system. The information resides in the variation of that quantity in a specific domain. For instance,
the voltage across a microphone capsule conveys information regarding the acoustic pressure applied
to it, and henceforth of the environment surrounding it.
From a mathematical standpoint, we represent signals as functions of one or more independent
variables. The independent variable is the one we indicate between braces (e.g. when we write
y ¼ f xð Þ, the independent variable is x). In other domains, such as image processing, there are
usually two independent variables, the vertical and horizontal axes of the picture. However, most
audio signals are represented in the time domain (i.e. in the form f tð Þ, with t being the time
variable).
Table 2.1: Notable continuous-time signals of interest in sound synthesis
Name Mathematical Description Representation
Sine sin 2πftð Þ
Cosine cos 2πftð Þ
Decaying exponential e�αt; α40 Another less common
form is: at; 05a51
Sawtooth t modTð Þ
White noise Zero mean, aleatory signal
Elements of Signal Processing for Synthesis 17
Time can be defined as either continuous or discrete. Physical signals are all continuous-time
signals; however, discretizing the time variable allows for efficient signal processing, as we shall see
later.
Let us define an analog signal as a signal with continuous time and continuous amplitude:
s ¼ f tð Þ : R ! R
The notation indicates that the variable t belongs to the real set and maps into a value that is
function of t and equally belongs to the real set. The independent variable is taken from a set (in this
case, R) that is called domain, while the dependent variable is taken from a set that is called
codomain. For any time instant t, s tð Þ takes a known value.
Most physical signals, however, have finite length, and this yields true for musical signals as well,
otherwise recording engineers would have to be immortal, which is one of the few qualities they still
miss. For a finite continuous-time signal that lives in the interval T1;T2
� �, we define it in this shorter
time span:
s ¼ f tð Þ : T1;T2
� � ! R
A class of useful signals is reported in Table 2.1.
2.2 Discrete-Time Signals
TIP: Discrete-time signals are crucial to understand the theory behind DSP. However, they differ
from digital signals, as we shall see later. They represent an intermediate step from the real
world to the digital world where computation takes place.
Let us start with a question: Why do we need discrete-time signals? The short answer is that
computers do not have infinite computational resources. I will elaborate on this further. You need to
know that:
1. Computers crunch numbers.
2. These numbers are represented as a finite sequence of binary digits. Why finite? Think about it:
Would you be able to calculate the sum of two numbers with an infinite number of digits after
the decimal point? It would take you an infinite amount of time. Computers are not better than
you, just a little faster. Small values are thus rounded to reduce the number of digits required to
represent them. They are thus said to be finite-precision (quantized) numbers. Similarly, you
cannot express larger numbers with a finite number of digits. This is also very intuitive: If your
pocket calculator only has five digits, you cannot express a number larger than 99,999, right?
3. Computing is done in a short but not null time slice. For this reason, the less data we feed to
a computer, the shorter it takes to provide the result. A continuous-time signal has infinite
values between any two time instants, even very close ones. This means that it would take an
infinite amount of time to process even a very small slice of signal. To make computation feas-
ible, therefore, we need to take snapshots of the signal (sampling) at regular intervals. This is an
18 Elements of Signal Processing for Synthesis
approximation of the signal, but a good one, if we set the sampling interval according to certain
laws, which we shall review later.
In this chapter, we shall discuss mainly the third point (i.e. sampling). Quantization (the second point)
is a secondary issue for DSP beginners, and is left for further reading. Here, I just want to point out
that if you are not familiar with the term “quantization,” it is exactly the same thing you do when
measuring the length of your synth to buy a new shelf for it. You take a reference, the measuring tape,
and compare the length of the synth to it. Then you approximate the measure to a finite number of
digits (e.g. up to the millimeter). Knowing the length with a precision up to the nanometer is not only
unpractical by eye, but is also useless and hard to write, store, and communicate. Quantization has
only marginal interest in this book, but a few hints on numerical precision are given in Section 2.13.
Let us discuss sampling. As we have hinted, the result of the sampling process is a discrete-time
signal (i.e. a signal that exists only at specific time instants). Let us now familiarize ourselves with
discrete-time signals. These signals are functions, like continuous-time signals, with the independent
variable not belonging to the real set. It may, for example, belong to the integer set Z (i.e. to the set
of all positive and negative integer numbers). While real signals are defined for any time instant t,
discrete signals are defined only at equally spaced time instants n belonging to Z. This set is less
populated than R because it misses values between integer values. There are, for instance, infinite
values between instants n ¼ 1 and n ¼ 2 in the real set that the integer set does not have. However,
do not forget that even Z is an infinite set, meaning that theoretically our signal can go on forever.
Getting more formal, we define a discrete-time signal as s ¼ f n½ : Z ! R, where we adopt
a notation usual for DSP books, where discrete-time signals are denoted by the square brackets and
the use of the variable n instead of t. As you may notice, the signal is still a real-valued signal. As
we have discussed previously, another required step is quantization. Signals with their amplitude
quantized do not belong to R anymore. Amplitude quantization is a step that is independent of time
discretization. Indeed, we could have continuous time signals with quantized amplitude (although it
is not very usual). Most of the inherent beauty and convenience of digital signal processing is
related to the properties introduced by time discretization, while amplitude quantization has only
a few side effects that require some attention during the implementation phase. We should state
clearly that digital signals are both discretized in time and their amplitude. However, for simplicity,
we shall now focus on signals that have continuous amplitude.
A discrete-time signal is a sequence of ordered numbers and is generally represented as shown in
Figure 2.1. Any such signal can be decomposed into single pulses, known as Dirac pulses. A Dirac
pulse sequence with unitary amplitude is shown in Figure 2.2, and is defined as:
Figure 2.1: An arbitrary discrete-time signal.
Elements of Signal Processing for Synthesis 19