Scientific polling introduction
In this video, we're going to think about what makes a poll or a survey credible. Because remember, the whole reason why we're going to do a poll or survey is we want to understand public opinion. But if it's not statistically credible, if we can't believe what it's saying, or if we don't understand exactly what it's saying, then we might not have a good sense of what public opinion actually is.
So the first really important thing, if you're running any type of survey, whether it's in politics or government or not, is that you are taking a random sample. What does that mean? Well, we go into a lot more depth on it in our statistics content on Khan Academy, but it means that you should look at the people, the voting population. It does get a little bit nuanced and tricky on who are likely voters, but you would go to, say, the voting population or the people in an area whose opinion you care about. You would want any one of them to have an equal shot of being selected for your survey.
So good examples of random samples are when you have a random phone number generator of likely voters in your district, and you call them up. Maybe, and even then, you might say, "Well hey, maybe certain types of people pick up the phone and certain people don't," or "certain people have a phone and certain people don't." So you have to be very careful about this design. But examples of a non-random sample would be good to go hang outside of the Democratic Party headquarters in your district and just survey people coming out of that building. That would be very not random, and you would get very skewed results.
Now, the next thing that you would want when you are taking a survey is that you want a large population size. Now, typically speaking, it's going to be at least 500 folks. But you're going to see a lot of surveys that are about 500 to 1,000 participants. Sometimes in statistics, they'll say your n, which is the number of people you surveyed, is 500 to a thousand. This is so you have a good chance of getting close to the true public opinion. We're going to talk in a second about the margin of error. The larger this is, and especially if you're doing a true random sample, this means larger reduces the margin of error.
I'll explain in depth what a margin of error is in a second. Now, another key thing is whatever you ask in your survey, the language needs to be as neutral and unbiased as possible. Let's say there's a new proposition. You could get very different results if the wording of the question is, "Do you support funds for those in need?" or if you said, "Do you support funds for those who are not working?" Those could get very, very different results. There's really an art to trying to get a neutral, unbiased question.
Last but not least, you want to make it very transparent to the public how you conducted your poll so that they can decide for themselves how credible your poll actually is. With that out of the way, let's look at real poll results, and then I'm going to dig into what the idea of margin of error is because you'll see this a lot when you read the news.
This is from the Sun Sentinel in Florida: they have a senatorial race going on, and it says, "Despite millions of dollars in television ad spending, Florida's U.S. Senate race between Bill Nelson and Rick Scott hasn't budged. A Florida Atlantic University poll shows Republican Scott with 44 percent of voters surveyed, Democrat Nelson with 40."
So the first thing you might say is, "Well, what about this? It only adds up to 84 percent. What about the other 16?" Well, those could be undecided voters, or maybe there's a third candidate there. That's a four-point advantage in Scott's favor, but it's within the survey's margin of error, which means the race could be tied or Scott could have a lead. The FAU Business and Economics Polling Initiative survey of 800 Florida registered voters was conducted online and through automated calls to people with landline telephones. Researchers said it had a margin of error of plus or minus three percentage points.
So let's think about whether this is a credible poll and what inferences we can make from it. The first thing is, were they transparent about their methods? It seems like they were. This whole paragraph right over here, they're very transparent about their methods. How many people did they survey? They say how they actually conducted the survey, and so check on transparency. Then, because they were transparent, we can check whether these other things are true.
First, random sample: they don't go into a lot of depth on how it was conducted online, but if it is a neutral site where any Florida voter is equally likely to show up, well then that might be a good random sample. Now, you have to be careful with online because who has access to online, who does not have access to online, who might go to whatever site it is being conducted on. So this has a couple of question marks right over here.
Then they say through automated calls to people with landline telephones. So once again, this seems reasonably random. It's along the lines that I talked about earlier. Maybe they had some random phone number picker in Florida and they called those folks. But maybe, you know, there are certain people who have landlines. A lot of people now only have mobile phones. There are certain people who pick up and might answer things and certain people who might not. So, as you can see, there's an art to getting a truly random sample, but you can tell that the Florida Atlantic University group tried to get a random sample.
The next was the large population size. Well, they said a survey of 800 Florida voters, so that's pretty good, and it's not just 800 for 800's sake. It's the number of voters that's going to drive the margin of error. So this is a good time to say, what is the margin of error? A margin of error is going to be associated with a confidence level. It's typically going to be a 95 confidence level if they don't tell you otherwise.
And so what this means is that 95 percent of the time, when you take a random sample of 800 Florida residents, you will get a result that is within three percentage points of the true result. Remember, there's some true result we don't know unless we could perfectly get into everyone's head, and this poll is a way of trying to estimate that.
So, one way to think about within three percent of what you got: Republican Scott is really, you could create a confidence interval where you could say the confidence interval is going to be three percent less than this, which is 41 percent, all the way to three percent more than this, which is 47. One way to think about it is there's a 95 percent chance that this will overlap with the true result. You might say, "Hey, well the low end of this range is still higher than Democrat Nelson." But that confidence interval applies to Democrat Nelson as well.
So Democrat Nelson's confidence interval would be between 37 percent, three percent less than this, and 43 percent, three percent more than this. There is still a reasonable chance maybe the Democrat's at the higher end of this range, which could still mean that the Democrat could win, even though the headline numbers show a four-point advantage for the Republican.
Outside of the statistical error, the margin of error we're talking about, you have to remember that this poll is before the actual election. People change their minds; there's a whole campaign going on. So even if you were to get the exact result, this is just a snapshot in time. It could change on election day.
Now, this last question: where they asked a neutral, unbiased question? We don't know exactly, at least just from this article. They might have published it in the details of the poll, but if they ask something like, "Who will you vote for on the election on this day?" that would be very neutral and unbiased. But if they say, "Are you going to vote for the public servant who has served Florida for many years or that Democrat guy?" well, that would be not unbiased.