yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Debunked: Making Music With Cars (Bootboxing and Techno Jeep)


4m read
·Nov 8, 2024

I saw a couple of videos in the last few months through boxing, featuring snobs gorillas and Julian Smith technology original. Both of them featured cars being played by a group of people. The people appeared to be manipulating various parts of the cars in real time to create beats. A lot of people are impressed with the videos, and some have reacted quite angrily to suggestions that there may be foul play going on. It's not my intention to make a value judgment here or to comment on the artistic merit of the videos, but this is an attempt to explain why I think there's very good reason to think that the performances are fake.

So, a bit of background about me: I've been working with audio sequences and digital audio for about 12 years. Most of that time, my audio work has consisted of arranging very small slices of sound that have a percussive character. Lately, I've written software to measure the precision with which a human performs a back a musical passage. So, when I watched these videos, in both cases, it was obvious to me that the audio tracks we were hearing weren't being created by the people we were watching. Instead, the audio for the duration of most of the videos was the output of a computer sequence of playing back digital samples of recorded car noises.

In the case of both car performances, there are two unmistakable fingerprints of digital audio sequencing. The first one: the timing is exactly on the grid; it's metronomic. Only a computer can perform music that way. Even the most technically skilled humans, on that precise level, wouldn’t perform music that way. Musicians naturally push and pull the timing of notes by tiny amounts. This is a big part of what gives a musical performance its character. The producers of the car clips could have made them considerably more lifelike by applying processing to humanize the timing of the sequence. Humanizing, in the language of music software, just means applying a random push or pull to each musical event, so it doesn't stay exactly on the timing grid, and it makes the sequence sound less robotic.

The second thing was that the sounds repeated themselves exactly. I don't mean that I heard the car door slamming repeatedly; I mean that I heard the same recording of a car door slam over and over. If you repeatedly playback a small piece of audio, for instance, a recording of a snare drum, it can quickly sound unnatural, especially if you play it repeatedly in quick succession. That's what happens in the car clips, and again, the producers missed a simple way to create something more lifelike using the technique called multi-sampling. What they should have done was to sample many different door slams and then have the computer select a random sample each time it needed to play a slam sound.

So, to finish with, we'll take a look at some waveforms. I recorded myself clicking my fingers. I imported it into Audacity, which is a free sample editor software. I made a copy of the recording onto a second track. Here, it's a stereo recording, by the way, so that's why you see four horizontal lines in total. I offset the copy of the recording so that different clicks would align with each other as closely as possible. So, at this level of magnification, you can see that there clicks look and whatever the waveforms looks. And here's how they sound: so this is the first one, and here's the second one. They sound pretty similar, not identical though, pretty close.

As we continue to zoom in, though, we notice that the similarity between the waveforms starts to drift. For instance, at this level of magnification, we already see in detail that this top one has several distinct islands of amplitude, whereas in this one, we have a much more even tail. So, as we get closer, the differences become more apparent. We already see that there's a lot of disagreement occurring between the peaks and troughs of this initial attack portion here. In the second one, you see this area of high peak trough frequency just here, and many of the peaks and troughs don't align.

So, here I've imported the audio from the video Do Boxing featuring Snot Scroller. I've done a similar thing; I'd say I made a copy of the track and offset one of them so that two different bars of the audio coincide with each other a bit closer. So, this is a double door slam that we're seeing, and here's how the other one sounds. There's an immediate similarity between their two waveforms, but as we zoom in, this time we see that the similarity holds really well.

So remember that because these are stereo tracks, you should compare the top two and the bottom two. So, again we have a very good agreement still; this distinctive 3-3 peak here is repeated in this one. So, as we see here, the waveform appears to be made up of individual points. Each of these points is one sample; confusingly, sample in this context doesn't mean a short audio clip, but the smallest unit that a digital waveform is made up of. A sample in an audio file is analogous to a pixel in an image file. In this example, there are 44,100 of these samples for every second playback, so this is an absolutely tiny slice of audio we’re looking at, and the waveforms still agree.

So down to the sample level, we have a really high agreement still between the two waveforms. Here, I've done the same with the audio from the track Techno Jeep, so I'm focusing on this section here. Here's how the first track sounds pretty much. Of course, we can see the waveform looks very similar to start with and focus on the first of these slams here. So you see the similarities are really high still—just focus on this little peak here. Again, even down to the sample level, the agreement is very, very high between the two tracks. This shows beyond doubt that the slamming door sounds in these two videos are sequenced audio clips and not the recording of a human performance.

More Articles

View All
Alienated | Vocabulary | Khan Academy
Hey wordsmiths! Just checking in; you doing okay? The word we’re talking about today is “alienated.” “Alienated” it’s an adjective and it means feeling excluded and apart from other people. Kind of a bummer word, but at the same time, a fascinating one. …
Mathematical Approaches to Image Processing with Carola Schönlieb
We ought to start with a little bit of your background. So what did you start researching and then what are you researching now? Okay, so I started out my research in mathematics in Austria, in Vienna, where I actually didn’t look at image processing or …
15 Costliest Mistakes Billionaires (and YOU!) Make
Billionaires, they’re actually just like you. You’re one successful adventure away from claiming it, and they are one big mistake away from losing everything. We all make the same mistakes, but the bigger your bank account, the harder your fall. So, you s…
12 STOIC SECRETS FOR DOING YOUR BEST | STOICISM INSIGHTS
Imagine going through your entire life believing that every single setback, every challenge, was actually setting you up for something greater. Now, I know that might sound like just another inspirational quote you scroll past on your social media feed, b…
Chavin, Nazca, Moche, Huari and Tiwanaku civilizations | World History | Khan Academy
The western or Northwestern coast of South America has been an interesting place for ancient civilizations. We believe it to be one of the places that agriculture developed independently, and as we’ll see in this video—and we’ve talked about in other vide…
What it’s like to be half Japanese half Turkish 🇯🇵/ 🇹🇷
What’s up! It’s me, Ruri. I’m a first-year medical student here in Turkey, and today we’re talking about what it’s like to be growing up half Japanese and half Turkish. I will timestamp every single thing that I mention in the description below so that yo…