| Icecast Installation and Management: A Guide to Open Source Audio Streaming | ||
|---|---|---|
| <<< Previous | Audio Fundamentals | Next >>> |
Part of the process of mental audio filtering, which occurs at an unconscious level, involves a process called masking, and relates closely to the study of psychoacoustics, or the study of the interrelation between the ear, the mind, and vibratory audio signal. Two separate masking effects come into play in MP3 encoding: auditory and temporal.
The simultaneous masking effect, sometimes referred to as "auditory masking", may be best described by analogy. Think of an object flying in front of the sun. This object, whether a bird or plane, flies in from the left and then disappears in front of the sun due to the strong contrast in light. As it moves past the sun, it becomes visible again. In audio terms, you can sometimes hear an acoustic guitarist's fingers sliding over the ridged spirals of the guitar strings during quiet passages. Of course, you seldom if ever hear this effect during a full-on rock anthem, because the wall of sound surrounding the guitar all but completely drowns these subtle effects.
The MP3 codec is unconcerned with guitar strings; all it knows are relative frequencies and volume levels. So, to put simultaneous masking into more concrete terms, let's say you have an audio signal consisting of a perfect sine wave fluctuating at 1,000Hz. Now you introduce a second perfect sine wave, this one fluctuating at a pitch just slightly higher. Let's make it 1,100Hz-but also much quieter-say, -10db. Most humans will not be able to detect the second pitch at all. However, the reason the second pitch is inaudible is not just because it's quieter; that is because its frequency is very close or similar to that of the first. In other words, the second frequency's tone is masked by the louder and similar-sounding first frequency.
To illustrate this fact, slowly change the frequency or pitch of the second tone until it's fluctuating at around 4,000Hz. Its volume stays exactly as it was, at -10db. As the second pitch becomes more dissimilar from the first, it becomes more audible, until at a certain point, most humans will hear two distinct tones, one louder than the other, as illustrated in the figure below. At Point A, Tone 2 is barely audible next to Tone 1. At Point B, Tone 2 is quite audible, even though its volume remains unchanged.

As two simultaneous tones become more dissimilar, they become more recognizable as separate entities.
This is a psychoacoustic phenomenon called "simultaneous masking," which demonstrates an important aspect of the mind's role in discerning audible tones and sounds: Any time frequencies are close to one another, we have difficulty perceiving each as a unique or separate tone, much as mountains on the distant horizon may appear to be evenly textured and similarly colored, even while the same mountains might be full of variation and rich flora if one were hiking in them. In effect, we have the aural equivalent of an optical illusion; a trick of our perceptual capacity that contributes to our brain's ability to filter out the less relevant and give focus to stronger elements.
Now, consider for a moment the fact that an audio signal consisting of two sine waves, even if the one is quieter and "masked" by the louder tone, contains almost twice as much data as a signal containing a single wave. If you were to try and compress an audio signal containing two sine waves, you would want the ability to devote less disk storage space to the nearly inaudible signal, and more to the dominant signal. This, of course, is precisely what the algorithms behind most audio compression formats do; they exploit certain aspects of human psychoacoustic phenomena to allocate storage space intelligently. Whereas a raw (waveform or PCM) audio storage format will use just as much disk space to store a texturally constant passage in a symphonic work as it will for a dynamically textured one. An MP3 file will not. Thus, MP3 and similar audio compression formats are called "perceptual codecs" because they are, in a sense, mathematical descriptions of the limitations of human auditory perception. The MP3 codec is based on perceptual principles but also encapsulates many other factors, such as the number of bits per second allocated to storing the data and the number of channels being stored, i.e., mono, stereo, or in the case of other formats such as AAC or MP3 with MPEG-2 extensions, multi-channel audio.
![]() | Pulse Code Modulation or PCM is the standard designator for the digitization of uncompressed audio, such as that found on audio CDs. PCM audio is sampled 8000 times per second at 8 bits, for a total storage consumption of 64 kbps. |
In addition to auditory masking, which is dependent on the relationship between frequencies and their relative volumes, there is a second masking that comes into play, based on time rather than on frequency. The idea behind temporal masking is that humans also have trouble hearing distinct sounds that are close to one another in time. For example, if a loud sound and a quiet sound are played simultaneously, you would not be able to hear the quiet sound. If, however, there is sufficient delay between the two sounds, you will hear the second, quieter sound. The key to the success of temporal masking is in determining or quantifying the length of time between the two tones at which the second tone becomes audible, i.e., significant enough to keep it in the bitstream rather than throwing it away. This distance, or threshold, turns out to be around five milliseconds when working with pure tones, though it varies up and down in accordance with different audio passages.
This process also works in reverse; you may not hear a quiet tone if it comes directly before a louder one, so premasking and postmasking both occur and are accounted for in the algorithm.
![]() | For more information on psychoacoustics, read any of the excellent papers on the subject at http://www.cpl.umn.edu/auditory.htm. |
| <<< Previous | Home | Next >>> |
| What is "Lossiness" | Up | Frequencies |