It's been a recurring fascination of mine, that being data compression. It's what makes the modern internet possible at all. Only recently have we been able to share multi-gigabyte files quickly and fairly cheaply. In the 90s? There was no question that whatever you uploaded need to get squished somehow. If you wanted to share music with friends who weren't nearby, you sent MP3s. Your only other option was to mail them a CD. (Which was certainly better sounding, but you had to pay for postage. Tradeoffs, tradeoffs.)
MP3 is probably the poster child for media compression, even close to 30 years after its introduction. It's second only to JPEG in ubiquity, and with neither being protected by patents at this point, there's no stopping them, regardless of whatever new technology Google wants to make you use this week. But once upon a time, that wasn't quite the case! In fact, not all MP3s are made equal! When Fraunhofer, the German company that developed much of the MP3's technology, was still fiercely guarding its patents and intellectual property, knockoff MP3 encoders of various stripes popped up across the internet, all sounding very, very different from one another.
Data compression of any stripe is a rabbit hole topic, but I thought
it'd be curious to take some songs and run them through different
encoders and compare the differences in quality and speed. I'll also be
rambling a bit about the innards of an MP3, and how five different,
equally playable MP3s can all sound so massively different. I've even
got a robot listening buddy on board!
MP3 is a lossy compression
codec, which means it'll toss out information to produce a smaller
file, the goal being that it's data you can't hear or won't mind if
it's gone. In the case of MP3, your audio (which is normally
represented as a series of hundreds of thousands or millions of voltage
samples) is converted to a series of frames and then split into 20
of frequencies in each frame, and the encoder will determine which
bands are the most audible and which ones can be encoded with lower
accuracy (again, in each frame). It's actually not far off from how
JPEG works, which splits your image into 8x8 blocks and converts the
individual pixel bitmap data in each block to lower-accuracy
frequencies that represent roughly what each block should look like.
This is where JPEG infamous blocking artifacts occur.
There's a lot of misinformation, consumer confusion, and overall snake oil about MP3s. A lot of people think they can skirt the loss of information by using a bitrate of 320kbps (don't, it's just irritating bloat and you're still throwing away data, use a lossless codec), or they think you can take a lower-bitrate MP3 and convert it to a higher-bitrate one or even to WAV for better quality (you can't, the data is permanently gone, start with a lossless or uncompressed original), but one place that MP3 is genuinely a little strange is on the encoder side. Using different encoders will get you different results!
How can that be, though? Don't they all just make MP3s?
MP3 is interesting because its specification is more strict on the decoding process than it is on the encoding process. In other words,
so long as the encoder produces an end MP3 that looks and is structured
like one, it can write pure noisy garbage and all is up to code. While
the MP3 specification does have some sample psychoacoustic models (a
whole bunch of math to approximate what we can hear and what we can't)
and the like, if you were writing an encoder, you aren't bound to use
any of it. You can come up with your own way to throw away data with
impunity. The thought at the time being, if better models were to come
along, we could encode higher-quality MP3s that still worked on all our
old gear. And largely, that's what happened, as you'll soon discover.
The five "families" of encoders are:
For my tests, I chose two songs, both 44.1KHz and lossless (one from
CD, one from Bandcamp). One's a
well-mastered, clean-sounding crunchy rock track (27.7MB) with tons
cymbal noise, and the other is a
fairly dry, low-key acoustic guitar number (23.5MB) with two
singers recorded to fairly noisy tape. Both of these have plenty of
characteristics that could easily trip up less capable MP3 encoders,
especially at low bitrates. (Both of those links are to FLAC encodes of
the original WAV.)
Each encoder was set to encode a 128kbps CBR MP3 file. 128kbps is considered rather low these days, but was incredibly common when a lot of these encoders were competing. It's also a good, medium bitrate where artifacting will be noticeable, but not irritating. For projects that survived longer, I've used both period-appropriate and new versions for completeness. The encoders used are:
One final bit of housekeeping involves a certain "ODG score" you'll
see in the table below. All of these encoders were sourced from a site
which specializes in mirroring bizarre, old-school audio software, be
it encoders (for MP3, AAC, or far more obscure formats) or players and
editors. In browsing RRW, I came across a program called EAQUAL, which is
essentially a robot taking a listening test. "ODG" stands for the
Objective Difference Grade, or how big a
difference it can hear between the original and each MP3. The scale
goes from -4 (which sounds terrible and annoying) and 0 (which sounds
exactly the same as the original), so the closer to 0, the better.
While I've of course listened to each sample thoroughly myself, I
EAQUAL to get a second opinion on which encodes sounded the best.
Surprisingly, we agreed most of the time. Linked under each score is
the exact output of the program in each trial, which is mostly a lot of
other numbers. I'd say run it yourself on each of the samples, but they
need to be perfectly time-aligned (not a sample off!) WAV files—so you
have to encode to MP3, import into an audio editor, line the MP3 up
perfectly with the original file, then transcode both back to WAV.
Anyway, let's get to the clips! These are ranked worst to best.
First, the crunchy rock song:
||Okay, this one is super
unfair, but mostly, I wanted to see how the quality was from the very
beginning. Answer: it sounds rough. There's incredibly distracting
twinkling and graininess
over the entire track, basically. ffmpeg's MP3 decoder (like used in
Audacity) also treats this one like it's skipping constantly, and thus
only lasts half the runtime. foobar2000 plays the file without issue.
||The watery, tweeting artifacts
are super strong on this one, and there's a hissing, ringing whisper on
hi-hat and vocals in the verses. There's also a "blip" at the very
start of the track, meaning the encoder took data somewhere as audio. I
had no metadata chunks anywhere in my input WAV, I promise. Shine is far closer sounding to the
better-performing encoders up at 192kbps
(and quite nice at 256kbps)—even
though this means a respective 50% and
100% increase in size.
||Super noticeable whispering on
the vocals, even before the band kicks in. Basically any time the snare
kicks, the track warbles. I swear I'm hearing a lot of what I can only
describe as "MP3 flutter" if I listen super closely.
||Surprisingly listenable! The
swishing is mostly noticeable in the crash cymbals in the choruses,
which are panned hard left (hard panning is notoriously difficult to
encode). It definitely sounds like a 128kbps MP3, but a not at all
offensive one. Would encode music to put on a Rio S50 with.
||State of the art for 2017 should
damn well come out on top, but not quite, something I'll explain in the
comments for LAME 3.90. Fairly close to Xing, but ever so
slightly cleaner. The hard panned crashes don't trip it up quite as
hard, but the ride cymbal still rings very faintly through the entire
chorus. It doesn't sound quite as "full" as Xing? Not too sure how that
works, like Xing is either exaggerating the bottom end of the track or
LAME is suppressing it.
||About neck and neck with LAME
3.90! I thought I was noticing some clicky distortion bits in the
quiet, vocal-only intro, but that's just how the song sounds. I think
the warbling ride in the choruses is just a tiny bit more noticeable
here, and the hard-panned crash sounds fairly different from any of the
other encoders somehow. ffmpeg decodes these files without issue,
unlike the old l3enc.
||Yes, LAME 3.90 came out on top
of all the encoders, for this song at least. It somehow sounds cleaner
than its cousin from 15 years later! I'm not hearing much artifacting
at all on the snare in the verses, while on 3.100's encode,
it sounds like the snare is "breathing" slightly. I swear the ride
cymbal ringing in the choruses, while still present, is much less
noticeable, though the hard-panned crashes definitely warble some.
And for the acoustic song:
||This one made me laugh. Each pluck of each string
becomes a dead, wobbly mess, and the vocals need to be heard to be
believed. Literal underwater noises.
||Again with the blip at the start
of the track. Funnily enough, both happened in the left ear, suggesting
an encoder defect regardless of what you feed it. The tape hiss becomes
a bed of watery noise, the guitar picking is definitely smeared, and
the entire mix starts to warble when the vocals kick in. Bad/10.
||A little messy on the guitar and tape hiss, but again, the vocals are really what trips this one up. They seem a lot quieter than they should be at first, as if it filtered out all the body in their voices.|
||Surprisingly, 3.90 seems to
handle the tape hiss the least gracefully of the top encoders, but
still plenty listenable.
||Clean encode, and the fastest of
the bunch. The piano in the bridge sounds a little smeary, but only
||The guitar sounds incredibly
clean on this one, very impressive! Even their voices came out pretty
clean. No complaints at all about how Xing's performed over these two
songs, given its age. Impressively, EAQUAL liked it the best too.
||About on par with FhG and Xing. All four of the top encoders in this test are rather interchangeable. I suppose it's not the most complex or brightest-sounding song, but hey, not all of them are.|
l3enc_fp.exe nightmarex.wav nightmarex_l3enc.bit -br 128000
amcenc.exe -c "Fraunhofer IIS MPEG Layer-3 Codec (professional)" -b 128 nightmarex.wav nightmarex_fhg340.mp3
x3enc.exe nightmarex.wav nightmarex_xing.mp3 -b 128000
bladeenc.exe -128 nightmarex.wav nightmarex_blade.mp3
lame.exe -b 128 nightmarex.wav nightmarex_lame.mp3
So what didn't surprise me: l3enc and Shine performed like garbage. l3enc being such an early encoder and Shine being so simple explains both, but it does show you can have wildly varying results, even if the resulting MP3 is perfectly playable (mostly, in l3enc's case). BladeEnc performed slightly less like garbage. I've used BladeEnc to destroy samples of audio for my music before, so I knew it wasn't exactly a top encoder. Both versions of LAME and the ACM Fraunhofer encoder performed wonderfully. LAME was head and shoulders above nearly everyone else, even 20 years ago (which is why you'll find even very old LAME encodes rather frequently), and Fraunhofer stayed competitive just by being a giant corporation with the money to throw at codec research.
What did surprise me was how incredibly well Xing did. In both tests, it regularly output incredibly clean MP3s for its age and for the low bitrate I gave it, and it always came in first place or runner-up in execution speed. Obviously, this isn't a very scientific listening test, and I only tried out two songs. Both of those songs sounded ace though. In a world without LAME, Xing would be the go-to easily available MP3 encoder—and if you're looking to encode audio on older computers or for older devices, Xing is still very well worth a look.
EAQUAL and I also have a surprisingly similar set of ears, and I very much appreciated that. I was genuinely pretty hype when I saw it preferred Xing out of every other encoder in the "Nightmare X" test. Feels good when someone agrees with you, especially if that someone is a robot.
Finally, the sizes of all the encodes didn't vary all that much. All
of the encodes of "Nightmare X" landed around 3.8MB, and all of the
encodes of "You Dodged a Bullet" came out to around 4.3MB.
MP3s are a wonderful bit of technology I think most people don't think too much about. We've only had lossy compression for a good 30 years or so, and already, the fact that something as complex as a piece of music can get transferred across the world in ten minutes in the worst possible scenario and near-instantly in the best is absolutely wild. When Apple proclaimed that you could carry a thousand songs with you at any given time, regardless of how you feel about Apple, they weren't wrong. That had simply never happened before. Best you were gonna get was 80 or so across a few CDs, and they weren't about to fit in your pocket.
But even wilder is that MP3s are just the most visible example of this crazy world of smaller and smaller sizes for such complex data. Did you know there was an MP1 and MP2? If you're listened to terrestrial radio in the past 15 years, likely, you were listening to an MP2 stream broadcast over the air! If we're willing to get really experimental, try the MP3pro and HE-AAC codecs, which actually recreate some of the audio data on-the-fly so they can throw more of it away. And these days, Opus is the dominant ultra-efficient codec, built half of what Skype used to use (SILK) and half of a more MP3-like scheme (CELT) for the best of both worlds in terms of audio quality. And somehow, it streams ridiculously fast and stores ridiculously small. Used Discord's voice chat lately? Opus.
I'm likely to make more pages about the ins and outs of the audio world. Even back in 2003, all modern lossy codecs of the time were more than good enough for casual listening, let alone these days. And as I get older, I kinda dig the sizzly, slightly off sound of it more and more. Not to mention, you can take MP3 with you easier, especially if all you've got is 128MB to work with.
This page last updated June 16, 2021.
*stares at old MP3 players and wants them all*