yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Debunked: Making Music With Cars (Bootboxing and Techno Jeep)


4m read
·Nov 8, 2024

I saw a couple of videos in the last few months through boxing, featuring snobs gorillas and Julian Smith technology original. Both of them featured cars being played by a group of people. The people appeared to be manipulating various parts of the cars in real time to create beats. A lot of people are impressed with the videos, and some have reacted quite angrily to suggestions that there may be foul play going on. It's not my intention to make a value judgment here or to comment on the artistic merit of the videos, but this is an attempt to explain why I think there's very good reason to think that the performances are fake.

So, a bit of background about me: I've been working with audio sequences and digital audio for about 12 years. Most of that time, my audio work has consisted of arranging very small slices of sound that have a percussive character. Lately, I've written software to measure the precision with which a human performs a back a musical passage. So, when I watched these videos, in both cases, it was obvious to me that the audio tracks we were hearing weren't being created by the people we were watching. Instead, the audio for the duration of most of the videos was the output of a computer sequence of playing back digital samples of recorded car noises.

In the case of both car performances, there are two unmistakable fingerprints of digital audio sequencing. The first one: the timing is exactly on the grid; it's metronomic. Only a computer can perform music that way. Even the most technically skilled humans, on that precise level, wouldn’t perform music that way. Musicians naturally push and pull the timing of notes by tiny amounts. This is a big part of what gives a musical performance its character. The producers of the car clips could have made them considerably more lifelike by applying processing to humanize the timing of the sequence. Humanizing, in the language of music software, just means applying a random push or pull to each musical event, so it doesn't stay exactly on the timing grid, and it makes the sequence sound less robotic.

The second thing was that the sounds repeated themselves exactly. I don't mean that I heard the car door slamming repeatedly; I mean that I heard the same recording of a car door slam over and over. If you repeatedly playback a small piece of audio, for instance, a recording of a snare drum, it can quickly sound unnatural, especially if you play it repeatedly in quick succession. That's what happens in the car clips, and again, the producers missed a simple way to create something more lifelike using the technique called multi-sampling. What they should have done was to sample many different door slams and then have the computer select a random sample each time it needed to play a slam sound.

So, to finish with, we'll take a look at some waveforms. I recorded myself clicking my fingers. I imported it into Audacity, which is a free sample editor software. I made a copy of the recording onto a second track. Here, it's a stereo recording, by the way, so that's why you see four horizontal lines in total. I offset the copy of the recording so that different clicks would align with each other as closely as possible. So, at this level of magnification, you can see that there clicks look and whatever the waveforms looks. And here's how they sound: so this is the first one, and here's the second one. They sound pretty similar, not identical though, pretty close.

As we continue to zoom in, though, we notice that the similarity between the waveforms starts to drift. For instance, at this level of magnification, we already see in detail that this top one has several distinct islands of amplitude, whereas in this one, we have a much more even tail. So, as we get closer, the differences become more apparent. We already see that there's a lot of disagreement occurring between the peaks and troughs of this initial attack portion here. In the second one, you see this area of high peak trough frequency just here, and many of the peaks and troughs don't align.

So, here I've imported the audio from the video Do Boxing featuring Snot Scroller. I've done a similar thing; I'd say I made a copy of the track and offset one of them so that two different bars of the audio coincide with each other a bit closer. So, this is a double door slam that we're seeing, and here's how the other one sounds. There's an immediate similarity between their two waveforms, but as we zoom in, this time we see that the similarity holds really well.

So remember that because these are stereo tracks, you should compare the top two and the bottom two. So, again we have a very good agreement still; this distinctive 3-3 peak here is repeated in this one. So, as we see here, the waveform appears to be made up of individual points. Each of these points is one sample; confusingly, sample in this context doesn't mean a short audio clip, but the smallest unit that a digital waveform is made up of. A sample in an audio file is analogous to a pixel in an image file. In this example, there are 44,100 of these samples for every second playback, so this is an absolutely tiny slice of audio we’re looking at, and the waveforms still agree.

So down to the sample level, we have a really high agreement still between the two waveforms. Here, I've done the same with the audio from the track Techno Jeep, so I'm focusing on this section here. Here's how the first track sounds pretty much. Of course, we can see the waveform looks very similar to start with and focus on the first of these slams here. So you see the similarities are really high still—just focus on this little peak here. Again, even down to the sample level, the agreement is very, very high between the two tracks. This shows beyond doubt that the slamming door sounds in these two videos are sequenced audio clips and not the recording of a human performance.

More Articles

View All
Importance of building a relationship.
the people in the industry and building those relationships, you won’t really know how to navigate the dynamics of closing deals. Josh, you’ve been absorbing a lot of information, too. What’s been your biggest takeaway so far? I’ve learned that the jet …
Did People Used To Look Older?
Hey, Vsauce! Michael here. At the age of 18, Carl Sagan looked like a teenager. But it doesn’t take long in an old high school yearbook to find teenagers who look surprisingly old. These people are all in their 20s, but so are these people. This is Elizab…
Interpreting graphs with slices | Multivariable calculus | Khan Academy
So in the last video, I described how to interpret three-dimensional graphs. I have another three-dimensional graph here; it’s a very bumpy guy. This happens to be the graph of the function ( f(x,y) = \cos(x) \cdot \sin(y) ). You know, I could also say th…
Voter turnout | Political participation | US government and civics | Khan Academy
What we’re going to talk about in this video is voter turnout, which is a way of thinking about how many of the people who could vote actually do vote. It’s often expressed as a number, as a percentage, where you have the number who vote over the number o…
How To Beat The Odds When Buying Stocks (Mohnish Pabrai: The Dhandho Investor)
[Music] So there’s been a lot of people trying to get into the stock market over the past year or so, and I actually just finished re-reading Monish Pabrai’s book, “The Dondo Investor,” which is a very good stock market book. But I’ve actually forgotten h…
Character change | Reading | Khan Academy
Hello readers! One of the wonderful things about stories when they’re given the room to grow and expand is the idea of character change or growth over time. Characters in stories are just like real people; they have the capacity to change, to make mistake…