This lesson explores what a chisquare test is and when it is appropriate to use it. Using a simple example, we will work on understanding the formula and how to calculate the pvalue.
Definitions Involved in ChiSquare Test
I’ve been reading a lot about undercover officers lately, and it made me start wondering how many police officers work undercover versus how many apply to be in the program. I mean, not everyone who applies can work undercover because they may not fit a need or their scores on psychological tests just don’t measure up.If the numbers were really close between those who applied and those who got in, we would need to know if there is a statistically significant difference. Statistically significant means the difference in the results did not occur by random chance.
This is almost always represented by a lowercase p, which stands for probability.If you have read any psychological research articles, you may have seen p < .05, which means that the probability of these results being a fluke is less than 1 in 20 times. This has been the agreed upon level of chance that results can be wrong for quite a while. We’ll get into how you figure it out for a chisquare in just a moment.What we need is a specific statistical test to allow us to take categorical data, like those who did make it into the undercover program and those who did not.
What we need is a chisquare, which is a statistical test used to compare expected data with what we collected.What a chisquare will tell us is if there is a large difference between collected numbers and expected numbers. If the difference is large, it tells us that there may be something causing a significant change. A significantly large difference will allow us to reject the null hypothesis, which is defined as the prediction that there is no interaction between variables.
Basically, if there is a big enough difference between the scores, then we can say something significant happened. If the scores are too close, then we have to conclude that they are basically the same.
Statistics
The actual formula for running a chisquare is actually very simple:(o–e)^2 / eYou take your observed data (o), and subtract what you expected (e). You square the results, and then divide by the expected data in all the categories.To use the number we find, we refer to the degrees of freedom, usually labeled as df for short, and is defined for the chisquare as the number of categories minus 1.
Due to the nature of the chisquare test, you will always use the number of categories minus 1 to find the degrees of freedom. The reason this is done is because there is an assumption that your sample data is biased, and this helps shift your scores to allow for error.You will then locate a chisquare distribution table, which is found in almost every statistical textbook printed. Using your degrees of freedom, you will locate the pvalue you’re interested in using the process below; typically the pvalue is .05. If you can, see if your number is greater than .01, which means that your results could only happen by chance 1 in 100 times.
Because of copyright restriction issues, we won’t be able to provide a full image of the chisquare distribution table, but below is basically what they look like and how you find the digit you’re looking for.
When rate, or the observation, is about
