Want proof that all of this normal distribution talk actually makes sense? Then you’ve come to the right place.
In this lesson, we look at sampling distributions and the idea of the central limit theorem, a basic component of statistics.
What is a Sampling Distribution?
Statisticians sound pretty sure of themselves when talking about normal distribution. But, what makes them so confident that it works? After all, couldn’t there be other sample distributions, the name given to the graphical result of incidences? Look, I understand your skepticism. I’ll tell you what, take two dice, roll them, and add the results. If you were a betting woman, I’d say if you did that 10 times, you would get more 5s, 6s, and 7s than anything else. Go ahead, you can press the pause button. I’ll be here.
Did it work out like I said? Or was I wrong? If I was wrong, go ahead and do the same thing again another 90 times. Trust me, I’ll be waiting.
Why Use a Normal Distribution?
Countless statistics students have expressed the same doubt that some of you may have. Countless other statisticians have used supercomputers to run millions and billions of those operations. What they have come up with is the normal distribution, a roughly bell-shaped distribution that occurs over and over throughout populations and samples. Simply put, when something is staring you back in the face as obvious, statisticians tend not to ignore it, especially when it’s as useful as the normal distribution.
The Central Limit Theorem
On many graphs of normal distributions, you’ll see that there’s a line that runs right through the middle, at the highest point of the curve. This is aptly named the central line, and has a theorem named after it. The central limit theorem states that if you run a random experiment enough times the results will follow a normal distribution. In fact, the central limit theorem also states that the greater the opportunity for deviation amongst the variables, the greater that the final curve will resemble a normal distribution. Adding the results of two dice together will definitely look like a normal distribution, given enough rolls, while adding four or five dice together on each throw will look like a normal distribution much earlier.
Protecting the Central Limit Theorem
Of course, such a regular prediction of data is only useful as long as we can protect it from corruption.
Everything must be random. If you were using a set of loaded dice, then chances are your graph looks quite different than mine. The same goes if you were not making sure that each roll of the rice was an honest attempt at randomness.
Across a larger population, we can’t always double check every input to make sure that it was free of influence from any other data. However, we can still make special note of cases that exist that would corrupt such data.Let’s say that you had a class of students that was normally distributed in height and randomly selected from the student body at large. If you were going to expand that class from 30 to 40, the data set only maintains integrity if the new students are drawn from a random sampling of students. If, on the other hand, your class is suddenly flooded with members of the basketball team who chose to take the class in particular, the result could change. As basketball players are statistically taller than the rest of the population, your class would no longer have a normal sample.
Calculating the Z-Score
Why do we care that the data sets inevitably end up on a normal distribution? In short, because it means that we can understand it much more implicitly.
If a data set were not normally distributed, we would have to resort to calculus to figure out much of anything about the nature of the data, including finding new equations for every set of numbers. With a normal distribution, we simply have to find the Z-score, a measure of distance from the mean in terms of numbers of standard deviation, and then check a pre-made table to find the relevant percentages.
In this lesson we examine the concepts of a sampling distribution and the central limit theorem. A sampling distribution is the way that a set of data looks when plotted on a chart.
As data sets grow, these have a tendency to mirror normal distributions. The normal distribution is a roughly bell-shaped distribution that occurs over and over throughout populations and samples. This is especially true as the sets have greater opportunity to span out over possible values. The reason for this is the central limit theorem, which states that the more an experiment is run, the more its data will resemble a normal distribution. However, this only holds if each new point is random in nature, as our example of flooding a normally distributed class with basketball players demonstrated.
With a normal distribution, we can avoid calculus and find the Z-score, a measure of distance from the mean in terms of numbers of standard deviation, and then check a pre-made table to find the relevant percentages.
sampling distribution: the way that a set of data looks when plotted on a chartnormal distribution:a roughly bell-shaped distribution that occurs over and over throughout populations and samplescentral limit theorem: states that the more an experiment is run, the more data will resemble a normal distributionZ-score: a measure of distance from the mean in terms of numbers of standard deviation
Clear understanding of the lesson can contribute to your ability to:
- Write the definition of a standard distribution
- Recognize the practicality of the normal distribution
- State the central limit theorem from memory
- Protect the theorem from corruption
- Find the Z-score