- Joined
- Jan 2, 2012
- Messages
- 11,201
- Reaction score
- 3,253
- Location
- Walking the Underworld
- Website
- www.richardgarfinkle.com
Polls are an irritating fact of political life, much maligned and badly misunderstood. They also seem to be a source of recurrent P&CE questions and disputes.
This thread is meant to address most of those questions, hopefully answering the basics and concentrating the more sophisticated inquiries into one useful resource.
This thread is meant to deal with those questions, answering the basics and concentrating the more sophisticated inquiries into one useful resource.
Many of these issues have been addressed in a number of threads over the years by our various knowledgeable members, but those answers are scattered rather than stickied in one place. Feel free to add to this thread any insights or knowledge you wish.
The underlying statistical principle behind polling is the idea of sampling.
Suppose you have a lot of objects that fall into a few categories. By way of example, suppose you have a bin of ping pong balls, some red, some blue, and some libertarian -- I mean orange.
You wish to know what proportion of the balls are which color. Why? Because you're a statistician and this is what your life has come to -- I mean, because in the land where math problems are invented that's what people do for fun.
There is a perfect method by which to determine this.
Take out the balls one by one and mark down what color each belongs to. When you are done you will have an exact determination of which color got the most votes -- I mean, what the proportion of each color is.
For example, if there were 1000 balls, 300 red, 650 blue and 50 orange, then the results would be:
Red 30%
Blue 65%
Orange 5%.
A thousand balls is an annoying task. Imagine if you had 350 million ping-pong balls. It would be ridiculous to try to count them all.
Suppose however you know the balls are mixed together reasonably well, so that when you pull a ball out your odds of pulling a ball of a particular color are about the same as the proportion of that color in the whole mix.
One of the fundamental theorems of probability and statistics is called the "law of large numbers" (often called the law of averages). One way to phrase what this law says for this particular situation is that the bigger the sample you take the more likely the proportions in the sample are to be very close to the proportions in the entire set of objects you are sampling from.
So, in short, if we can take a sample that is sufficiently large our measurements of the sample are likely to be an accurate reflection of what measuring the whole would be like.
Well, maybe.
First, there's the problem of mixing. If all the orange balls are on top and we scoop from the top we'll end up with a disproportionate number of oranges.
Second, there's the problem that you need a large enough sample. If it's too small your results can easily be skewed by sheer bad luck.
There are statistical methods to discern how good your sample size is and these are used in determining margin of error.
Third, there's the fact that even if you do get a decent sample size the law of large numbers says its unlikely, but not impossible, that you'll get a weird result. That's why sampling once is not enough.
Margin of error and the law of large numbers are both critical in understanding any sampling result. If a result says 50% +/- 3% it means that it is most likely that the answer lies somewhere between 47% and 53%.
Suppose you got the following results in five samplings with a margin of error of 5%:
45%, 39%, 60%, 44%, 42%.
The best guess is that the 60% is an aberration and the others are all within the same range, so probably the true value is somewhere around the low 40%.
But suppose you saw this as a graph. It would have a big dip between the first and second values, then a jump up to 60 then a fall back down. The temptation would be to tell a story of massive volatility in results when in fact they all fit neatly into a single statistical result.
This thread is meant to address most of those questions, hopefully answering the basics and concentrating the more sophisticated inquiries into one useful resource.
This thread is meant to deal with those questions, answering the basics and concentrating the more sophisticated inquiries into one useful resource.
Many of these issues have been addressed in a number of threads over the years by our various knowledgeable members, but those answers are scattered rather than stickied in one place. Feel free to add to this thread any insights or knowledge you wish.
Sampling
Suppose you have a lot of objects that fall into a few categories. By way of example, suppose you have a bin of ping pong balls, some red, some blue, and some libertarian -- I mean orange.
You wish to know what proportion of the balls are which color. Why? Because you're a statistician and this is what your life has come to -- I mean, because in the land where math problems are invented that's what people do for fun.
There is a perfect method by which to determine this.
Take out the balls one by one and mark down what color each belongs to. When you are done you will have an exact determination of which color got the most votes -- I mean, what the proportion of each color is.
For example, if there were 1000 balls, 300 red, 650 blue and 50 orange, then the results would be:
Red 30%
Blue 65%
Orange 5%.
A thousand balls is an annoying task. Imagine if you had 350 million ping-pong balls. It would be ridiculous to try to count them all.
Suppose however you know the balls are mixed together reasonably well, so that when you pull a ball out your odds of pulling a ball of a particular color are about the same as the proportion of that color in the whole mix.
One of the fundamental theorems of probability and statistics is called the "law of large numbers" (often called the law of averages). One way to phrase what this law says for this particular situation is that the bigger the sample you take the more likely the proportions in the sample are to be very close to the proportions in the entire set of objects you are sampling from.
So, in short, if we can take a sample that is sufficiently large our measurements of the sample are likely to be an accurate reflection of what measuring the whole would be like.
Well, maybe.
First, there's the problem of mixing. If all the orange balls are on top and we scoop from the top we'll end up with a disproportionate number of oranges.
Second, there's the problem that you need a large enough sample. If it's too small your results can easily be skewed by sheer bad luck.
There are statistical methods to discern how good your sample size is and these are used in determining margin of error.
Third, there's the fact that even if you do get a decent sample size the law of large numbers says its unlikely, but not impossible, that you'll get a weird result. That's why sampling once is not enough.
Margin of error and the law of large numbers are both critical in understanding any sampling result. If a result says 50% +/- 3% it means that it is most likely that the answer lies somewhere between 47% and 53%.
Suppose you got the following results in five samplings with a margin of error of 5%:
45%, 39%, 60%, 44%, 42%.
The best guess is that the 60% is an aberration and the others are all within the same range, so probably the true value is somewhere around the low 40%.
But suppose you saw this as a graph. It would have a big dip between the first and second values, then a jump up to 60 then a fall back down. The temptation would be to tell a story of massive volatility in results when in fact they all fit neatly into a single statistical result.