Auditory Theory: Acoustics
Lecture 019 Timbre II
Reading Assignment for Lecture 020
Before next lecture please read Sections
6.1 Acoustics of enclosed spaces 247
pages 247 to 267 of Acoustics and Psychoacoustics. We may have a brief quiz on these sections at the beginning of the next class.
Brain Bullets 
Sum and Difference Tones
- When two pure tones are played simultaneously, they are not always perceived as two separate pure tones.
- When two pure tones are heard together, other tones with frequencies lower than the frequencies of either of the two pure tones themselves may be heard also. These lower tones are not acoustically present in the stimulating signal and they occur as a result of the stimulus consisting of a 'combination' of at least two pure tones and they are known as 'combination tones'. The frequency of one such combination tone which is usually quite easily perceived is the difference (higher minus the lower) between the frequencies of the two tones, and this is known as the 'difference tone':
- These tones are always below the frequency of the lower pure tone, and occur at integer multiples of the difference tone frequency below the lower tone. No listeners hear all and some hear none of these combination tones. The difference tone and the combination tones for n = 1 and n = 2, known as the 'second order difference tone' and the 'third-order difference tone', are those that are perceived most readily.
- When the two tones are not members of a harmonic series, the combination tones have no equivalent fo, but they will be equally spaced in frequency. Combination tones are perceived quite easily when two musical instruments which produce fairly pure tone outputs, such as the descant recorder, baroque flute or piccolo, whose fo values are high and close in frequency. When the two notes played are themselves both exact and adjacent members of the harmonic series formed on their difference tone, the combination tones will be consecutive members of the harmonic series adjacent and below the lower played note (i.e. the 10 values of both notes and their combination tones would be exact integer multiples of the difference frequency between the notes themselves). The musical relationship of combination tones to notes played therefore depends on the tuning system in use. Two notes played using a tuning system which results in the interval between the notes never being pure, such as the equal-tempered system, will produce combinations tones which are close but not exact harmonics of the series formed on the difference tone.
Masking of one sound by another
- Almost every sound we hear in music consists of at least two frequency components. When two or more pure tones are heard together an effect known as 'masking' can occur, where each individual tone can become more difficult or impossible to perceive, or it is partially or completely 'masked', due to the presence of another tone. In such a case the tone which causes the masking is known as the 'masker' and the tone which is masked is known as the 'maskee'. These tones could be individual pure tones, but given the rarity of such sounds in music, they are more likely to be individual frequency components of a note played on one instrument which either mask other components in that note, or frequency components of another note. The extent to which masking occurs depends on the frequencies of the masker and maskee and their amplitudes.
- The dependence of masking on the frequencies of masker and maskee can be illustrated by reference to Figure 2.9 in which an idealised frequency response curve for an auditory filter is plotted. The filter will respond to components in the input acoustic signal which fall within its response curve whose bandwidth is given by the critical bandwidth for the filter's centre frequency. The filter will respond to components in the input whose frequencies are lower than its centre frequency to a greater degree than components which are higher in frequency than the centre frequency due to the asymmetry of the response curve. Masking can be thought of as the filter's effectiveness in analysing a component at its centre frequency (maskee) being reduced to some degree by the presence of another component (masker) whose frequency falls within the filter's response curve. The degree to which the filter's effectiveness is reduced is usually measured as a shift in hearing threshold, or 'masking level', as illustrated in Figure 5.7. The figure shows that the asymmetry of the response curve results in the masking effect being considerably greater for maskees which are above rather than those below the frequency of the masker.
- At low amplitude levels, the masking effect tends to be similar for frequency above and below fmasker. As the amplitude of the masker is raised the low masks high effect increases and the resulting masking level curve becomes increasingly asymmetric. Thus the masking effect is highly dependent on the amplitude of the masker.
- The masking effect of individual components in musical sounds which are complex with many spectral components can be determined in terms of the masking effect of individual components on other components in the sound. If a component is completely masked by another component in the sound, the masked component makes no contribution to the perceived nature of the sound itself and is therefore effectively ignored. If the masker is broadband noise, or 'white noise', then components at all frequencies are masked in an essentially linear fashion (i.e. a 10 dB increase in the level of the noise increases the masking effect by 10 dB at all frequencies). This can be the case, for example, with background noise or a brushed snare drum (see Figure 3.6) which have spectral energy generally spread over a wide frequency range and this can mask components of other sounds that fall within that frequency range.
- The masking effects considered so far are known as 'simultaneous masking' because the masking effect on the maskee by the masker occurs when both sound together (or simultaneously). Two further masking effects are important for the perception of music where the masker and maskee are not sounding together, and these are referred to as 'non-simultaneous masking'. These are 'forward masking' or 'post-masking' and 'backward masking' or pre-masking. In forward masking, a pure tone masker can mask another tone (maskee) which starts after the masker itself has ceased to sound. In other words the masking effect is 'forward' in time from the masker to the maskee. Forward masking can occur for time intervals between the end of the masker and the start of the maskee of up to approximately 30 ms. In backward masking a maskee can be masked by a masker which follows it in time, starting up to approximately 10 ms after the maskee itself has ended.
- Moore (1996) makes the following observations about non-simultaneous masking:
- backward masking is considerably lessened (to zero in some cases) with practice
- recovery rate from forward masking is greater at higher masking levels
- the forward masking effect disappears 100-200 ms after the masker ceases
- the forward masking effect increases for masker durations up to about 50 ms.
MP3
- Masking is exploited practically in digital systems that store and transmit digital audio in order to reduce the amount of information that has to be handled, and therefore reduce the transmission resource, or bandwidth, and memory, disk or other storage medium required. Such systems are generally referred to as perceptual coders because they exploit knowledge of human perception. For example, perceptual coding is the operating basis of the MP3 system that is used to transmit music over the Internet, MP3 players that store many hours of such music in a pocket-sized device, multi-channel sound in digital audio broadcasting and satellite television systems, MiniDisk recorders.
- There are international standards that define perceptual coding schemes for the encoding (recording) and decoding (playback) parts of these systems which enable different manufacturers to produce equipment, and the Moving Pictures Expert Group (MPEG) was set up in 1988. Their task was then and still is now to develop international standards for the coding of moving pictures and associated audio, and their work has resulted in standards such as MPEG-l, MPEG-2 and MPEG-4, each of which includes three layers: I, 2, and 3. MP3 itself is based on MPEGI, layer III (not MPEG-3 as this does not exist!).
- The input signal is first split into a number of frequency bands, generally by means of a bank of bandpass filters, and these are sometimes referred to as sub-bands giving some coders the often used name sub-baud coders. The extent to which this process matches the human peripheral hearing system critical band analysis depends on the complexity of the particular coding scheme itself. The energy in each of these sub-bands is used with reference to the original signal to calculate the simultaneous (and in some cases also the non-simultaneous) masking effects for that instant of input signal . Those elements of the sub-bands that the system decides would not be masked are then digitally coded for transmission and/or storage. At the receiving end there is an encoder which reverses this process to reproduce the original audio material, which is not of course an exact copy of the original input since masking predictions have been employed to remove material that the listener would not have heard in that context.
Note grouping illusions
- There are some situations when the perceived sound is unexpected, as a result of either what amounts to an acoustic illusion or the way in which the human hearing system analyses sounds. Whilst some of these sounds will not be found in traditional musical performances using acoustic instruments since they can only be generated electronically, some of the effects have a bearing on how music is performed. The nature of the illusion and its relationship with the acoustic input which produced it can give rise to new theories of how sound is perceived, and in some cases, the effect might have already or could in the future be used in the performance of music.
- Diana Deutsch describes a number of note grouping acoustic illusions, some of which are summarised below with an indication of their manifestation in music perception and/or performance. Deutsch describes an 'octave illusion' in which a sequence of two tones an octave apart with high (800 Hz) and low (400 Hz) to values are alternated between the ears. Most listeners report hearing a high tone in the right ear alternating with a low tone in the left ear as illustrated in the figure, no matter which way round the headphones are placed. She further notes that righthanded listeners tend to report hearing the high tone in the right ear alternating with a low tone in the left ear whilst left-handed listeners tend to hear a high tone alternating with a low tone but it is equally likely that the high tone is heard in the left or right ear. This illusion persists when the stimuli are played over loudspeakers.
- In a further experiment (Deutsch, 1975) played an ascending and descending C major scale simultaneously with alternate notes being switched between the two ears as shown in the lower part of Figure 5.12. The most commonly perceived response is also shown in the figure. Once again the high notes tend to be heard in the right ear and the low notes in the left ear, resulting in a snippet of a C major scale being heard in each ear. Such effects are known as 'grouping' or 'streaming', and by way of explanation, Deutsch invokes some of the grouping principles of the 'Gestalt school' of psychology known as 'good continuation', 'proximity' and 'similarity'. She describes these (Deutsch, 1982) as follows:
- grouping by good continuation-'elements that follow each other in a given direction are perceived as blending together'
- grouping by proximity-'nearer elements are grouped together in preference to elements that are spaced farther apart:
- grouping by similarity-'like elements are grouped together' .
- The finding that the majority of listeners hear the high notes in the right ear and the low notes in the left ear may have some bearing on the natural layout of groups of performing musicians. For example, a string quartet will usually play with the cellist sitting on the left of the viola player who is sitting on the left of the second violinist who in turn is sitting on the left of the first violinist. This means that each player has the instruments playing parts lower than their own on their left-hand side, and those instruments playing higher parts on their right-hand side. Vocal groups tend to organise themselves such that the sopranos are on the right of the altos, and the tenors are on the right of the basses if they are in two or more rows. Small vocal groups such as a quartet consisting of a soprano, alto, tenor and bass will tend to be in a line with the bass on the left and the soprano on the right. In orchestras, the treble instruments tend to be placed with the highest pitched instruments within their section (e.g. first violin, piccolo, trumpet etc.) on the left and bass instruments on the right. Such layouts have become traditional and moving players or singers around such that they are not in this physical position with respect to other instruments or singers is not welcomed. This tradition of musical performance layout may well be in part due to a right-ear preference for the higher notes.
- However, whilst this may work well for the performers, it is back-to-front for the audience. When an audience faces a stage to watch a live performance, the instruments or singers producing the treble notes are on the left and the bass instruments or singers are on the right. This is the wrong way round in terms of the right-ear treble preference, but the correct way round for observing the performers themselves. It is interesting to compare the normal concert hall layout as a listener with the experience of sitting in the audience area behind the orchestra which is possible in halls such as the Royal Festival Hall in London. Unfortunately this is not a test that can be carried out very satisfactorily since it is not usually possible to sit far enough behind the players to gain as good an overall balance as can be obtained from the auditorium in front of the orchestra. It is, however, possible to experience this effect by turning round when listening to a good stereo recording over loudspeakers.
|