Can You Recover Sound From Images?
This video was filmed without sound. Is it possible to use only these images to reconstruct the sound? That is, can you hear pictures? In this video, I'm gonna try to demonstrate that it's possible to get sound from pictures, but it's not gonna be easy, so I'm gonna need some help. This episode was sponsored by LastPass, which allowed me to fly to the Bay Area to meet with one of my science heroes, so now on with the video.
"How're you doing? I'm sorry; my place of my concern is like threw all this crap over here. So this is the experiment that I came up with."
"It's like a crumpled up ball of tinfoil?"
"Yeah, like if I had a more powerful camera and I had like the right lens, then you know we could do something that looks more like you're spying on somebody. But we should be able to demonstrate that you can recover like a rhythm or a sound from, you know, whatever camera you have."
Now you might think it would be easy to record sound in video because, after all, sound is just vibrations, so the air is vibrating back and forth and everything it hits should vibrate back and forth too. So you'd think all we need to do is video that motion and plot displacement versus time and then recover the sound. But it's not that simple because, for one thing, I mean these sound vibrations are incredibly tiny. They move objects only about one micrometer, and even if you're super zoomed in, that is way less than a pixel. We're talking a hundredth or a thousandth of a pixel.
"We're not seeing something that is at one pixel move to an adjacent pixel. You're seeing one pixel get slightly darker and the next pixel get slightly brighter."
"What objects work best for recovering sound?"
"So the things that work best are things that have a lot of damping but are also very light so that they move very readily with changes in the air pressure. So what are some good examples?"
"Well, like a bag of chips. You know the initial experiments were very like contrived in a way. You know we had these objects on optical benches; we were blasting light at them; the sounds were like as loud as we could make them. Mary had a little lamb, little lamb, little lamb. I figured we'd try to do a rhythm. This is... shave and a haircut. Let's put that camera on a tripod."
"Oh yeah, that's, I mean that'd be great."
"All right, let's give it a shot. This is the actual clip I recorded, and I want you to notice two things: first, you can't really see much motion; and second, there are plenty of pixels getting dimmer and brighter because of image noise. I mean it's not a pristine, perfect image so how do you tell the difference between pixels getting brighter and dimmer due to tiny movements versus noise? Essentially, you look for edges in the image and then you say, well if the object moves by some fraction of a fraction of a pixel in one particular direction, then pixels on one side of that edge will get a little bit brighter, pixels on the other side of that edge will get a little bit darker. And so basically what we do is we sum together all the ones that are supposed to get brighter and subtract all the ones that are supposed to get darker, and then that gives us sort of one number, right? And if you track that number over time, then that gives you an estimate of the displacement over time. This is time but in samples, and then this is position."
"Hmm, so what do you do now?"
"Well, we're gonna try to do some filtering on that."
"It's clipping."
"You don't say."
"I mean it's not much... it's not much but you can recognize two of the beats. This is what we've recovered from a hundred and eighty frames per second, which isn't really a whole lot within the range of like audible frequencies, so that's why, you know, kind of the most we can hope for here is a bit of a rhythm."
Now, of course, the main limiting factor is framerate because we can hear sounds from 20 Hertz to 20,000 Hertz, but most cameras only shoot 30 frames per second, so they miss virtually all of these sound frequencies. Imagine this is the motion created by a 30 Hertz sound; if you try to capture this with a camera at 30 frames per second, you would end up seeing the object always in the same position because it's at the same point in the wave cycle.
So your perception would be that the object is not moving at all. So in order to measure a frequency of sound, you actually need to sample at least twice that frequency, which is why a lot of music is sampled at 44 or 48 kilohertz - that's more than twice as much as the highest sound we can hear. At any rate, if you want to get something more intelligible, you're going to need some higher frame rate camera.
"So we just went to the camera store and picked up a new camera that should be able to shoot a thousand frames per second or thereabouts."
"Is that gonna be enough?"
"It'll be enough for something."
"Hehehe I love that confidence. This is one modulation away from dubstep right here. Just a little bit more of a wump wump, and then we have the next big track."
"I've set it to a thousand frames per second; now we're talking."
"Yeah, okay, okay, so you have the footage there and you're cropping in a bit; tell me about that."
"Well, we're running this on my laptop as opposed to the servers that I had back at MIT, and that is gonna mean that if we run it on the full video, my laptop will crash."
"Okay, so we're gonna crop it and try it on that."
"I can see a little bit of motion."
"Yeah, I mean I think that one question is whether that's like-"
"Resonant motion?"
"Yeah, well in this case would be kind of like the equivalent of a rocking chair; like if the foil has a like a rocking mode, then that's actually not gonna give us a lot of sound information."
"Mm-hmm. Can you tell whether this is gonna work or not?"
"I am optimistic. I think because I know what I'm listening for, I can hear it in there. But yeah, you've gotta be careful though."
"Really? That you're-"
"Well, just gotta be careful that you're not like confirmation bias."
"Sure. Let's try that. That's about 60 Hertz; that seems a little much. Okay, we're gonna try one more time."
"We have basically put the piece of foil on top of the speaker; we're dialing up the volume to... 11."
"Well, I mean it's a shower speaker, so..."
"Cool. So what do you think of that image there?"
"It's beautiful; it's gorgeous. It has been like a couple of years since I've looked at this code, so I suspect I might have forgotten something and I'm using it wrong, but I want to point something out here, which is that here's what we recovered and here is the piano roll of the signal that we sent. This looks like duh duh to me, and that's the duhhh."
"Yeah, yeah, that's it. That is the... ---dih duh dud din duhhhh din dih."
"Yeah, okay, so let's see if we can actually get it to hear that."
"We can see it; we just can't hear it."
"I think I know what it is."
"What is it?"
"I don't think my laptop can play those frequencies. Hold on a second; let me get some headphones."
"We need real speakers or something."
"What do you hear, Abe?"
"Hold on. I hear shave and a haircut, two bits. Here, listen to this."
"All right."
"Here, I'll hold it. Thank you."
"Yeah. And... pitch shift [recovered sound of Shave and a haircut, two bits] yep, there you go, visual microphone."
"[laughter] This was a basic proof of concept, but Abe showed that with more powerful equipment he could recover human speech from outside soundproof glass."
"Have you ever considered that your computer is a physical system that gives off vibrations? Each key on the keyboard produces a unique sound due to its unique location. In fact, research has shown that audio recordings of typing reveal 96% of keystrokes accurately."
Now this portion of the video was sponsored by LastPass, an app that stores all your usernames and passwords so you never have to type them in. And this prevents people from stealing your passwords by, say, recording audio of your keystrokes or just looking over your shoulder. What's even better is the convenience of never having to fill in usernames and passwords or getting locked out again.
With LastPass, you don't have to write, remember, or reset passwords. You get unlimited password storage and free cross-device sync. When you open an app or site on your computer or on an iOS or Android device, LastPass fills in your username and password. This saves you valuable brain space, so put your passwords on autopilot with LastPass.
Now something I particularly like is if you upgrade to LastPass premium, you get advanced multi-factor authentication, and it works like magic. So click the link below to find out more about LastPass, and thanks to LastPass for sponsoring today's show.