What Exactly is the Present?
At the 1939 world's fair in New York, the exciting new tech was the live television broadcast. Roosevelt became the first president to address the nation live on TV. But for years leading up to this event, engineers have been working on one particular technical problem: How to ensure the audio and video remain perfectly synced during the live broadcast? Without this, words and lip movements wouldn't match up, which would be annoying and distracting for viewers.
So, how did they do it? Well actually, they didn't. Instead, they discovered something pretty incredible: We are not very good at discerning whether audio and video are in sync. For example, I intentionally delayed the audio of this entire monolog nearly a 1/10th of a second, and did you notice? I'll clap to make it more obvious.
The engineers also found that there's an asymmetry in our tolerance for this misalignment. We don't really notice if a sound lags video by up to 125 milliseconds, but we can tell something is wrong if it's leading the video by more than 45 milliseconds. And to understand why, take a look at this: Here I am bouncing a basketball as I walk away from the camera. The sight and the sound of the bounces match up perfectly; but as I walk away, you know the sound will be increasingly delayed due to the extra time it takes the sound to reach the camera, but the sound still appears synced. This is because your brain is not reporting to you each instance exactly as it happens, but rather a short interval of time reorganized to make sense.
So, in this case, your brain automatically aligns the sound with the sight of the bounce. At least, up to a point. Once I'm over 30 meters away, the sound is now delayed by over 100 milliseconds and your brain no longer integrates the information from your eyes and ears. Here, let me play the actual sound of the bounce together with the sound as received by the camera. This explains why sound can lag video by more than it can lead.
I mean, imagine you were at a basketball game, and because of how far away you're sitting, the sound is delayed. Your brain can handle that. But if the sound precedes the sight of an event, that would look really odd because that's something that would never happen in nature.
This is why the broadcast guidelines for acceptable audio and video mismatches are skewed in favour of audio lagging behind the video. Our brains are good at aligning audio with the vision that preceded it. We can actually exploit our audio-syncing capabilities to produce some strange results.
For example, we created this computer program where when you press the space bar, a light appears on the screen. But not immediately, there is an 80ms delay between the button push and the light coming on. In a study, participants who familiarized themselves with a similar program came to believe that the light turned on immediately after they pushed the button, just as our brains synchronized the sight and sound of the basketball bounce.
Press the space key once to begin. This is just the section where you get the idea of what it does, so you push the spacebar. Now, watch what happens when you remove the delay. That last one came up without me even pressing anything!
You didn't press anything and it just flashed up there? Right! Some participants were convinced that the light came on before they pushed the button. They believed that something else caused the light to come on, even though it was their action that made it happen.