A Short Introduction to Audio Encoding

This page will provide a very summarized introduction in audio encoding. I have published part of this text in an article of the Dutch iPod magazine iPodFan.

Since the introduction of the Compact Disc, we as a consumer are confronted with digitally recorded music. One second of CD quality music requires 1411 kbit of storage (e.g. AIFF or WAV file format). Roughly, for each 10 minutes of music, 100MB of storage is required. This would imply that for instance an iPod Nano can only carry about 8 CDs with an average of 50 minutes of music. A 128MB memory stick could only carry about 10 minutes of music. No really useful.

Compression enables more efficient storage. Compression "decreases the size" of the music, which means it can be stored in a smaller file. Hence, more music will fit on a storage device like a hard disk, iPod or memory stick. It also enables more efficient transmission of music files, as less bytes need to be transfered for a certain piece of music. During playback of compressed music, the music is decompressed piece-by-piece in real-time.

If the compression/decompression procedure needs to be transparent, i.e. the music is exactly as it was before this procedure, loss-less compression is required (e.g. Apple Lossless). By applying all kind of mathematical tricks, the music can be compacted by roughly a factor of 2. This still is not a significant amount for portable usage or transmission.

Another technique, called lossy compression, uses properties of the human hearing system. Depending on the situation, humans can't hear particular tones. An example is the conversation of 2 people, almost inaudible when a train passes by. The sound of the train "masks" the sound of the conversation. Pulling a string of a guitar does not create a single tone, but a large amount of tones simultaneously, defining its timbre. As a human-being, we don't hear all the tones, but only a subset of tones. A loud tone of a certain pitch or frequency defines a masking area, and all tones that are inside this masking area are inaudible.



Similar, a loud tone at a certain time masks softer tones before or after this loud tone.



Lossy compression tries to find out which tones we don't or hardly hear, and excludes them from the stored or transmitted files. The compression factor that can be accomplished with lossy compression is between a factor of 5 (320kbit/s) to 10 (128kbit/s)! This is the central idea beyond MP3 and AAC encoding.

The higher the compression rate, the more files fit on an iPod. But a (too) high compression rate also affects the audio quality, as the encoding algorithms have to exclude audible information from the audio. This effect is audible as gurgling in the music (as if instruments are played under water), flanging (cymbals sound like "pslllssllllsss" instead of "pssssssssssss") and pre-echoes (a noise in front of the click of percussion, piano, harpsicord).

The quality of the compression also depends on the brand of encoder used (e.g. LAME versus Apple). The analysis to determine which sounds are inaudible is not straightforward. There is still research going on in this field, and new state-of-the-art techniques result in better encoders. Furthermore, encoders also have to make a trade-off between encoding speed and encoding quality. If an encoder can spend a whole night on a song, the quality of this encoder at say 128kbit/s will be much better than an encoder that has to encode the same song using 256kbit/s in a few seconds.

Choosing an encoding with a right trade-off (good sound quality versus file size) is important, as an initial a wrong choice might result in hundreds of CDs encoded wrongly. The iPod web pages on this site proposes a good initial choice, plus a procedure to determine a personal choice by yourself.

For more in-depth information about sound encoding, take a look at for instance:

- Wikipedia about AAC
- Wikipedia about MP3
- Fraunhofer institute

next->