The Encoding Process

The entire MP3 encoding process, along with other encoders, can be subdivided into a few discrete tasks, though not necessarily in this particular order:

  1. The encoder breaks the audio signal into smaller component pieces called "frames," each typically lasting a fraction of a second.

  2. It next analyzes the signal to determine its "spectral energy distribution." In other words, it computes how the bits should be distributed for the best audio encoding. Due to slight variations in the encoding algorithm for more efficient encoding, the signal breaks into sub-bands, which are processed independently for better results.

  3. Next, the encoding bitrate is determined and the maximum number of bits allocated to each frame is calculated. For example, if encoding at 128 kbps, this is then the upper limit of data that can be stored in each frame. This changes with variable bitrates, but that is addressed later. This step determines how much of the available audio data is stored and how much is discarded.

  4. The frequency of each frame is compared to mathematical models of human psychoacoustics which are stored in the codec as a reference table. From this model, it can be determined which frequencies need to be rendered accurately, those that are perceptible to humans, and those that can be dropped or allocated fewer bits, since the human ear could not detect it.

  5. The bitstream then uses "Huffman coding," which compresses redundant information throughout the sample. Huffman coding does not work with a psychoacoustic model, but achieves additional compression via more traditional means. The entire MP3 encoding process is a two-pass system. First, all psychoacoustic models are utilized, discarding data in the process, and then the remainder is compressed to minimize storage space required by any redundancies. This second step, the Huffman coding, does not discard data. It simply stores what is left in a smaller amount of space.

  6. The collection of frames is assembled into a serial bitstream, with header information preceding each data frame. The headers contain instructional "meta-data" specific to that frame.

  7. Other factors enter into the equation, often as the result of options chosen prior to beginning the encoding. In addition, algorithms needed for encoding an individual frame often rely on the encoding results of the frames that precede or follow them. The entire process is performed almost simultaneously, though the preceding steps are not necessarily run in order.