Extensive Listening Experiment
In 2003, I wanted to determine a suitable audio compression setting to compress my 400 CDs. At that time only a little conflicting information was available at the web. Therefore I've conducted my own experiment to determine a suitable setting.The test setup
Conducting a test like this requires careful selection of source material and listening equipment. The source material should consist of songs where compression artifacts are quickly audible (e.g. harpsichord, castanets, cymbals, piano), and has a wide range of styles (from popular to classical). Songs that I've used are for example:- Tori Amos, Boys for Pele, “Blood Roses”, Atlantic 7567828622
- Diana Krall, The girl in the other room, "Temptation", Verve 229336-2
- Mariza, Fado en Mim, “Maria Lisboa”, World Connection WC43038.
- Mandalay, Empathy, “Opposites”, V2, VVR1001292.
- Shostakovich, Symphony No. 14, “Loreley”, Haitink, Philips, 444430-2.
- Debussy, Etudes, Uchida, "Pour les arpeges...", Philips 422 412-2
- Bush, The Science of Things, “Warm Machine”, Trauma Records 490483-2
The tests have been primarily performed on an iPod 5G Video, with a Sennheiser HD497 headphone, but I've also tested on other headphones (like Etymotic ER-4P, Beyerdynamics 311, iPod earbuds, and AKG 340 electrostatic headphones), and hifi sets like my own (Ayre CX-7, Ayre AX-7, Avalon Opus Ceramique).
All tracks have been converted to the following format:
- AIFF: 1444 (as a reference file)
- AAC iTunes: 96, 128 (+VBR), 160 (+VBR), 192 (+VBR), 224, 256 (+VBR), 320 (+VBR).
- MP3 iTunes: 96, 128 (+VBR), 160 (+VBR), 192 (+VBR), 256 (+VBR), 320 (+VBR).
- MP3 LAME: 96, 128, 160, 192VBR, 320VBR (using the -h -v -V0 -b setting).
Recognize compression artifacts
First of all, I've listened to all different tracks to get an idea about audible differences. How can these differences be characterized, and how prominent do they reveal themselves at different bit rates? Start listening to 96kbit/s encoded files reveals a lot of artifacts. Going towards roughly 192kbit/s requires comparison with the original uncompressed song to get an idea about the differences.Especially in the beginning of my tests I guessed the encodings a couple of times wrong. This improved a lot over time, which means there is a learning effect. I still make a mistake so now and then, which might have to do with fatigue, because conducting and doing such a test intensively demands quite some energy. After these experiment I got a better feeling about where I should focus on to hear and describe the differences. Though I've tested my observations with many tracks, I will only discuss the Tori Amos track in detail:
• The velocity and timbre of the voice is sharper (for better encodings).
• The transients of the harpsicord are better defined (for better encodings).
• The decay and timbre of harpsicord is not interfered by the attack of new harpsicord notes (for better encodings).
• The timbre of the harpsicord is "thin" (for better encodings).
• There is better seperation of the voice from the harpsicord (for better encodings).
• There is better pace and rhythm, and less fuziness (for better encodings).
• The timbre of the harpsicord is 'fat' (for worse encodings).
• The overall sound is more 'granular' (for worse encodings).
Impressions
A quality label is a personal opinion. This listening experiment is performed under less than ideal conditions for a scientifical justified conclusion (i.e., a double blind test), although I oftenly used the iPod or CD player shuffle function a lot to add a "fair amount" of blindness to the tests and to check my observations.Some iTunes MP3 impressions
The 96 kbit/sec (file size 508KB for 42 seconds) is a joke. The artifacts are huge, as if the harpsicord is constantly pushed under water (heavy tremelo). This encoding is not useful for music.The 128 kbit/sec (file size 676KB for 42 seconds) is also very bad. The artifacts are still huge, and the transients of the harpsicord very clearly interfere with the voice.
The 128 kbit/sec VBR (file size 760KB for 42 seconds) is a bit better, though still very fuzzy. Although the artifacts are still very very big, the transients of the harpsicord don't interfere with the voice that heavily anymore. For less demanding music, this encoding is on the edge of usable.
The 160 kbit/sec VBR contains still a significant amount of gurgling. Not useable.
The 192 kbit/sec VBR (file size 1MB for 42 seconds) is just acceptable. There are still artifact, which are especially obvious with background sounds (decay of harpsicord tones). The transients are still not convincing, the voice seems to be merged with the harpsicord. The tapping of the feet (on pedals) of Tori results in loud "bassy" plops (which cannot be heard in the AIFF file). The timbre of the harpsicord is "fat".
The 320 kbit/sec VBR (file size 1.6MB for 42 seconds) is quite acceptable, but still shows some artifact on the transients of the harpsicord, and loud plops due to tapping of the feet/pedals.
As you can guess, (iTunes) MP3 encodings are not my type of music. There is also a bug in the playback of MP3 files on the iPod 3G. Hans Erik Hazelhorst informed me about severe gurgling in some situations . It only shows up using the iPod, not when played via iTunes on the computer. Another sample I've tried out from Tori Amos, Talula, even shows severe pops and clicks during playback on the iPod. This does not depend on the encoder, e.g. it happens both with LAME or iTunes. I've also tried the tracks on a 4G iPod, and these are artifact free!!! That basically means the some models of the iPod are badly supported for updates regarding decoding, which I think is VERY arrogant behavior of Apple. The AAC files are free of these severe artifacts. Even translating these MP3 files to AAC files make them more acceptable, realizing a lot of sound quality is lost because of the re-encoding.
Some LAME MP3 impressions
The 96 kbit/sec is very different from the iTunes one. Treble is filtered, making the artifects less prominent, though the whole sounds quite dull. Overall it sounds more acceptable than the iTunes MP3 version, though it still sounds bad overall.The 128 kbit/sec has its treble back. Overall, it sounds less gurgling/tremelo than the iTunes version. Although the artifacts are quite obvious, it is towards listenable.
The 128 kbit/sec VBR has the same file size as the LAME 192 kbits/sec VBR file. It also sounds indistinguishable. I suspect "VBR" of LAME to be VERY variable, and it is not really fair to compare. I therefore excluded it from the table.
The 160 kbit/sec is better, and comparable to the 160 kbit/sec VBR iTunes one. Though it shows less artifacts than the 128 kbit/sec AAC, it misses accuracy.
The 192 kbit/sec VBR is comparable to the iTunes 192 kbit/sec VBR MP3.
The 320 kbit/sec VBR is also comparable to the iTunes 320 kbit/sec VBR MP3.
Some iTunes AAC impressions
The 96 kbit/sec (file size 556KB for 42 seconds) shows large artifact on the harpsicord, and is instable (tremelo). The sound timbre in the upper frequency range is coloured.The 128 kbit/sec (file size 724KB for 42 seconds) is grainy, the dynamics are not very well, the voice is merged in the rest of the music, and the music sounds "busy" or "stressed".
The 160 kbit/sec (file size 892KB for 42 seconds) already sounds quite acceptable. There is still a level of grain on the harpsicord, and the voice is a bit less dynamic as in the 192 kbit/sec version, but I could imagine that this encoding will work fine for less demanding music.
The 192 kbit/sec (file size 1MB for 42 seconds) is OK. The transients are still a bit fuzzy, the voice is still merged in the music, and the harpsicord is a bit fat compared to the 224 kbit/sec.
The 224 kbit/sec (file size 1.1MB for 42 seconds) is quite good. This is the first encoding where the harpsicord sounds "thin" like in the original, the voice is seperated from the harpsicord, the tonal balance seems to be OK, and the transients of the harpsicord are fierce. Especially w.r.t. the transients of the harpsicord and the strength of the treble of the voice, there are still minor audible differences compared to the original AIFF file.
The 320 kbit/sec (file size 1.6MB for 42 seconds) is almost the same as the AIFF. The AIFF seems to sound a fraction more 'peacefull' and 'thin'. The AAC has a bit of 'grain' over the file. This is more obvious on the iPod; the same track listened to via iTunes is less grainy.
For all AAC encodings, enabling the VBR setting does not change the overall observations. It does however add some sort of "warmth" to the sound, and makes the sound less grainy, which is closer to the original, so it makes sense to always enable it.
Conclusions
I've classified my observations according to the following ratings:Bad: Too much artifacts to be able to enjoy the music.
Not good: Many artifacts, but towards music.
Acceptable: Artifacts just audible, suitable for background music.
Good: Artifacts slightly audible, but too small to fuzz about.
Very good: No audible artifacts.

The generic conclusions are:
- For equal bit rates, AAC sounds better than MP3.
- AAC 160kbit/s and larger gives acceptable sound quality. In some cases you can still hear some encoding artifacts in the sounds.
- AAC 192kbit/s and larger gives good sound quality. Encoding artifacts are rare. Direct comparison with the original source material is required to get an idea of the effects of the encoding. This makes it very suitable for practical usage.
- The VBR setting improves the sound quality a little bit. In case of MP3 it improves overall, in case of AAC it adds a bit of "warmth" and less grain to the sound.
- For uncompromised listening, loss-less compression is still the way-to-go.
next->