The Five Families of MP3 Encoders

It's been a recurring fascination of mine, that being data compression. It's what makes the modern internet possible at all. Only recently have we been able to share multi-gigabyte files quickly and fairly cheaply. In the 90s? There was no question that whatever you uploaded need to get squished somehow. If you wanted to share music with friends who weren't nearby, you sent MP3s. Your only other option was to mail them a CD. (Which was certainly better sounding, but you had to pay for postage. Tradeoffs, tradeoffs.)

MP3 is probably the poster child for media compression, even close to 30 years after its introduction. It's second only to JPEG in ubiquity, and with neither being protected by patents at this point, there's no stopping them, regardless of whatever new technology Google wants to make you use this week. But once upon a time, that wasn't quite the case! In fact, not all MP3s are made equal! When Fraunhofer, the German company that developed much of the MP3's technology, was still fiercely guarding its patents and intellectual property, knockoff MP3 encoders of various stripes popped up across the internet, all sounding very, very different from one another.

Data compression of any stripe is a rabbit hole topic, but I thought it'd be curious to take some songs and run them through different encoders and compare the differences in quality and speed. I'll also be rambling a bit about the innards of an MP3, and how five different, equally playable MP3s can all sound so massively different. I've even got a robot listening buddy on board!

How can there be any different-sounding encoders out there, let alone five?

MP3 is a lossy compression codec, which means it'll toss out information to produce a smaller file, the goal being that it's data you can't hear or won't mind if it's gone. In the case of MP3, your audio (which is normally represented as a series of hundreds of thousands or millions of voltage samples) is converted to a series of frames and then split into 20 bands of frequencies in each frame, and the encoder will determine which bands are the most audible and which ones can be encoded with lower accuracy (again, in each frame). It's actually not far off from how JPEG works, which splits your image into 8x8 blocks and converts the individual pixel bitmap data in each block to lower-accuracy frequencies that represent roughly what each block should look like. This is where JPEG infamous blocking artifacts occur.

There's a lot of misinformation, consumer confusion, and overall snake oil about MP3s. A lot of people think they can skirt the loss of information by using a bitrate of 320kbps (don't, it's just irritating bloat and you're still throwing away data, use a lossless codec), or they think you can take a lower-bitrate MP3 and convert it to a higher-bitrate one or even to WAV for better quality (you can't, the data is permanently gone, start with a lossless or uncompressed original), but one place that MP3 is genuinely a little strange is on the encoder side. Using different encoders will get you different results!

How can that be, though? Don't they all just make MP3s?

MP3 is interesting because its specification is more strict on the decoding process than it is on the encoding process. In other words, so long as the encoder produces an end MP3 that looks and is structured like one, it can write pure noisy garbage and all is up to code. While the MP3 specification does have some sample psychoacoustic models (a whole bunch of math to approximate what we can hear and what we can't) and the like, if you were writing an encoder, you aren't bound to use any of it. You can come up with your own way to throw away data with impunity. The thought at the time being, if better models were to come along, we could encode higher-quality MP3s that still worked on all our old gear. And largely, that's what happened, as you'll soon discover.

So what are the families?

The five "families" of encoders are:

  1. Fraunhofer-based encoders (l3enc, mp3enc, fastenc). These are the official encoders you could license from Fraunhofer if you wanted to build them into a product. If you've ever converted a song to MP3 using iTunes or Adobe Audition, it was using a Fraunhofer encoder.
  2. dist10-based encoders (8hz, SoloH, Blade). "dist10" was the reference encoder for MP3 files, and infamously, its code was stolen from a university server and used to create a ton of identical, competing encoders. These all sound rather artifact-y, given that dist10 was meant to be a reference for codec implementation and not a brilliant quality encoder. Nearly all dist10 encoders were struck down by Fraunhofer, who sent many, many cease-and-desist orders to various projects using their code. Notably, 8hz was a project focused on rewriting dist10 for speed, and what LAME originally patched before reverting completely to a dist10 base.
  3. Xing (and later Helix). Xing (zing) is an interesting case in that its codebase is completely unique to it, and written plenty in x86 assembly, which means it ran much faster than other encoders. (The fact that it was completely custom also kept Fraunhofer from going after them.) Xing was later bought by none other than RealNetworks, who continued development and distributed Xing instead as Helix, later making it open-source completely.
  4. LAME. Ah, the big one! The highest quality one, the one with the most work put into it, and the best known. LAME started as a patch on the 8hz (and later dist10) sources, which is what helped to keep it off Fraunhofer's radar at first; not redistributing Fraunhofer code means Fraunhofer can't get upset, yeah? Later, since it was only distributed as source code and not binaries directly (their position being "source code counts as speech" and written descriptions of patents aren't illegal), LAME was again bulletproof. Since the expiration of the pertinent MP3 patents, FOSS OSes have included LAME directly, and it continues to get updates to this day.
  5. Shine. A unique encoder built for simplicity rather than speed or quality, essentially becoming a new encoder programmer's scrap wood project. It also happens to have a fixed-point version, the only open-source encoder that doesn't therefore require an FPU on board to work. This one was written by Gabriel Bouvigne, who's one of the main LAME developers.

Comparing the encoders

For my tests, I chose two songs, both 44.1KHz and lossless (one from CD, one from Bandcamp). One's a well-mastered, clean-sounding crunchy rock track (27.7MB) with tons of cymbal noise, and the other is a fairly dry, low-key acoustic guitar number (23.5MB) with two harmonizing singers recorded to fairly noisy tape. Both of these have plenty of characteristics that could easily trip up less capable MP3 encoders, especially at low bitrates. (Both of those links are to FLAC encodes of the original WAV.)

Each encoder was set to encode a 128kbps CBR MP3 file. 128kbps is considered rather low these days, but was incredibly common when a lot of these encoders were competing. It's also a good, medium bitrate where artifacting will be noticeable, but not irritating. For projects that survived longer, I've used both period-appropriate and new versions for completeness. The encoders used are:

One final bit of housekeeping involves a certain "ODG score" you'll see in the table below. All of these encoders were sourced from a site called ReallyRareWares, which specializes in mirroring bizarre, old-school audio software, be it encoders (for MP3, AAC, or far more obscure formats) or players and editors. In browsing RRW, I came across a program called EAQUAL, which is essentially a robot taking a listening test. "ODG" stands for the Objective Difference Grade, or how big a difference it can hear between the original and each MP3. The scale goes from -4 (which sounds terrible and annoying) and 0 (which sounds exactly the same as the original), so the closer to 0, the better.

While I've of course listened to each sample thoroughly myself, I used EAQUAL to get a second opinion on which encodes sounded the best. Surprisingly, we agreed most of the time. Linked under each score is the exact output of the program in each trial, which is mostly a lot of other numbers. I'd say run it yourself on each of the samples, but they need to be perfectly time-aligned (not a sample off!) WAV files—so you have to encode to MP3, import into an audio editor, line the MP3 up perfectly with the original file, then transcode both back to WAV. Annoying.

Anyway, let's get to the clips! These are ranked worst to best. First, the crunchy rock song:

Comparing various encodes of c.layne's "You Dodged a Bullet"
l3enc 0.99a
Click to download MP3 132.7s
Okay, this one is super unfair, but mostly, I wanted to see how the quality was from the very beginning. Answer: it sounds rough. There's incredibly distracting twinkling and graininess over the entire track, basically. ffmpeg's MP3 decoder (like used in Audacity) also treats this one like it's skipping constantly, and thus only lasts half the runtime. foobar2000 plays the file without issue.
Click to download MP3 7s
The watery, tweeting artifacts are super strong on this one, and there's a hissing, ringing whisper on the hi-hat and vocals in the verses. There's also a "blip" at the very start of the track, meaning the encoder took data somewhere as audio. I had no metadata chunks anywhere in my input WAV, I promise. Shine is far closer sounding to the better-performing encoders up at 192kbps (and quite nice at 256kbps)—even though this means a respective 50% and 100% increase in size.
Click to download MP3 4s
Super noticeable whispering on the vocals, even before the band kicks in. Basically any time the snare kicks, the track warbles. I swear I'm hearing a lot of what I can only describe as "MP3 flutter" if I listen super closely.
Click to download MP3 2s
Surprisingly listenable! The swishing is mostly noticeable in the crash cymbals in the choruses, which are panned hard left (hard panning is notoriously difficult to encode). It definitely sounds like a 128kbps MP3, but a not at all offensive one. Would encode music to put on a Rio S50 with.
LAME 3.100
Click to download MP3 4s
State of the art for 2017 should damn well come out on top, but not quite, something I'll explain in the comments for LAME 3.90. Fairly close to Xing, but ever so slightly cleaner. The hard panned crashes don't trip it up quite as hard, but the ride cymbal still rings very faintly through the entire chorus. It doesn't sound quite as "full" as Xing? Not too sure how that works, like Xing is either exaggerating the bottom end of the track or LAME is suppressing it.
FhG 3.4.0
Click to download MP3 2.8s
About neck and neck with LAME 3.90! I thought I was noticing some clicky distortion bits in the quiet, vocal-only intro, but that's just how the song sounds. I think the warbling ride in the choruses is just a tiny bit more noticeable here, and the hard-panned crash sounds fairly different from any of the other encoders somehow. ffmpeg decodes these files without issue, unlike the old l3enc.
LAME 3.90
Click to download MP3 4s
Yes, LAME 3.90 came out on top of all the encoders, for this song at least. It somehow sounds cleaner than its cousin from 15 years later! I'm not hearing much artifacting at all on the snare in the verses, while on 3.100's encode, it sounds like the snare is "breathing" slightly. I swear the ride cymbal ringing in the choruses, while still present, is much less noticeable, though the hard-panned crashes definitely warble some.

And for the acoustic song:

Comparing various encodes of alaska!'s "Nightmare X"
l3enc 0.99a
Click to download MP3 110.8s
This one made me laugh. Each pluck of each string becomes a dead, wobbly mess, and the vocals need to be heard to be believed. Literal underwater noises.
Click to download MP3 6s
Again with the blip at the start of the track. Funnily enough, both happened in the left ear, suggesting an encoder defect regardless of what you feed it. The tape hiss becomes a bed of watery noise, the guitar picking is definitely smeared, and the entire mix starts to warble when the vocals kick in. Bad/10.
Click to download MP3 5s
A little messy on the guitar and tape hiss, but again, the vocals are really what trips this one up. They seem a lot quieter than they should be at first, as if it filtered out all the body in their voices. 
LAME 3.90
Click to download MP3 3s
Surprisingly, 3.90 seems to handle the tape hiss the least gracefully of the top encoders, but still plenty listenable.
FhG 3.4.0
Click to download MP3 1.8s
Clean encode, and the fastest of the bunch. The piano in the bridge sounds a little smeary, but only very briefly.
Click to download MP3 2s
The guitar sounds incredibly clean on this one, very impressive! Even their voices came out pretty clean. No complaints at all about how Xing's performed over these two songs, given its age. Impressively, EAQUAL liked it the best too.
Lame 3.100
Click to download MP3 3s
About on par with FhG and Xing. All four of the top encoders in this test are rather interchangeable. I suppose it's not the most complex or brightest-sounding song, but hey, not all of them are.

Results, part A: encoder caveats

Results, part B: the, uh, results

So what didn't surprise me: l3enc and Shine performed like garbage. l3enc being such an early encoder and Shine being so simple explains both, but it does show you can have wildly varying results, even if the resulting MP3 is perfectly playable (mostly, in l3enc's case). BladeEnc performed slightly less like garbage. I've used BladeEnc to destroy samples of audio for my music before, so I knew it wasn't exactly a top encoder. Both versions of LAME and the ACM Fraunhofer encoder performed wonderfully. LAME was head and shoulders above nearly everyone else, even 20 years ago (which is why you'll find even very old LAME encodes rather frequently), and Fraunhofer stayed competitive just by being a giant corporation with the money to throw at codec research.

What did surprise me was how incredibly well Xing did. In both tests, it regularly output incredibly clean MP3s for its age and for the low bitrate I gave it, and it always came in first place or runner-up in execution speed. Obviously, this isn't a very scientific listening test, and I only tried out two songs. Both of those songs sounded ace though. In a world without LAME, Xing would be the go-to easily available MP3 encoder—and if you're looking to encode audio on older computers or for older devices, Xing is still very well worth a look.

EAQUAL and I also have a surprisingly similar set of ears, and I very much appreciated that. I was genuinely pretty hype when I saw it preferred Xing out of every other encoder in the "Nightmare X" test. Feels good when someone agrees with you, especially if that someone is a robot.

Finally, the sizes of all the encodes didn't vary all that much. All of the encodes of "Nightmare X" landed around 3.8MB, and all of the encodes of "You Dodged a Bullet" came out to around 4.3MB.

Wrapping up

MP3s are a wonderful bit of technology I think most people don't think too much about. We've only had lossy compression for a good 30 years or so, and already, the fact that something as complex as a piece of music can get transferred across the world in ten minutes in the worst possible scenario and near-instantly in the best is absolutely wild. When Apple proclaimed that you could carry a thousand songs with you at any given time, regardless of how you feel about Apple, they weren't wrong. That had simply never happened before. Best you were gonna get was 80 or so across a few CDs, and they weren't about to fit in your pocket.

But even wilder is that MP3s are just the most visible example of this crazy world of smaller and smaller sizes for such complex data. Did you know there was an MP1 and MP2? If you're listened to terrestrial radio in the past 15 years, likely, you were listening to an MP2 stream broadcast over the air! If we're willing to get really experimental, try the MP3pro and HE-AAC codecs, which actually recreate some of the audio data on-the-fly so they can throw more of it away. And these days, Opus is the dominant ultra-efficient codec, built half of what Skype used to use (SILK) and half of a more MP3-like scheme (CELT) for the best of both worlds in terms of audio quality. And somehow, it streams ridiculously fast and stores ridiculously small. Used Discord's voice chat lately? Opus.

I'm likely to make more pages about the ins and outs of the audio world. Even back in 2003, all modern lossy codecs of the time were more than good enough for casual listening, let alone these days. And as I get older, I kinda dig the sizzly, slightly off sound of it more and more. Not to mention, you can take MP3 with you easier, especially if all you've got is 128MB to work with.

Where to get the songs used

This page last updated June 16, 2021.

*stares at old MP3 players and wants them all*