In this lesson, we’ll talk about how psychologists use descriptive statistics and inferential statistics in social research. You’ll learn how these statistics differ and why a researcher would use one over the other.

## Statistics & Psychological Research

Did you know that about 50% of statistics are made up on the spot? Kidding! You have probably heard some version of that joke before, but for psychologists and other social scientists, statistical analysis is a powerful tool for research.

Psychologists use statistics for a number of reasons, including to find relationships between different variables, identify correlations among different things, and to use data to draw more general conclusions about our society.

When psychologists begin a research project, they start with a hypothesis. A hypothesis is an explanation for something that a researcher then examines, using data, to see if her hypothesis is the correct explanation for a particular phenomenon. To do this, psychologists often use statistics. There are two major types of statistics you should know about: descriptive statistics and inferential statistics. This lesson explores both and explains how researchers choose which one to use in their projects.

## Descriptive Statistics

**Descriptive statistics** describe something in a dataset. Descriptive statistics are useful for asking questions about what is common or typical about a dataset. For example, what is the average household income in the city of Boston? Or, do all students at Harvard have similar SAT scores? Descriptive statistics will give you this type of information. All you’d need to do is look at your data and make some calculations.

You’ll want to remember the three ‘M’s’ of descriptive statistics: mean, median, mode. These are statistics that can easily be calculated from a dataset. Let’s take a closer look at each.

The **mode** is perhaps the easiest to remember as this simply means the most frequently occurring item. So, in your data set, you’d simply count how many times each value occurs and the one that occurs the most is your mode. Let’s say you have the following set of numbers:

1, 5, 6, 7, 5, 8, 3, 2, 14, 15, 3, 3, 14

Here, the mode is 3 since this number occurs more than any other number in the set.

The **median** means the middle value in a group of data. The median requires a little bit more calculation, but you can do it pretty easily. Let’s say you have a set of numbers: 3, 7, 19, 24, 11, 32, 5. What’s the median?

Remember, the median is just the middle number so all you have to do is put the numbers in order and find the middle: 3, 5, 7, 11, 19, 24, 32. So, start on each side and work your way to the middle: the median in this case is 11 because there are three numbers to the left of the 11 and three numbers to the right of the 11.

The **mean** is another way of saying the average of a set of numbers. So, let’s take our same example from above. To find the mean, you need to add up all of the numbers and then divide by the number of items you have:

3 + 5 + 7 + 11 + 19 + 24 + 32 = 101

101 / 7 = 14.4

So, the mean, or the average, of this set of numbers is 14.4.

Descriptive statistics are useful for telling us basic information about a dataset. The key takeaway is this: descriptive statistics describe something.

## Inferential Statistics

Now let’s say you’ve described something from your dataset. You know the mean and the mode, for instance, but what if you want to know a bit more? What if you’re interested in drawing a conclusion from your dataset?

In this case, you will need to use **inferential statistics**, which are a bit more complex than descriptive statistics. Inferential statistics allow you to draw conclusions from a dataset. In other words, inferential statistics let you infer that the findings from your dataset apply to a larger population.

Before we dig any deeper into inferential statistics, let’s discuss variables. Whenever a psychologist is about to conduct a research study, he or she must first identify the variables, or the things that a study is trying to measure.

There are two major kinds of variables: independent variables and dependent variables. An **independent variable** is something that does not change, but is thought to cause a change in the dependent variable. The **dependent variable** is just that, something dependent on the independent variable.

Let’s gain a better understanding of variables with an example. Say you’re looking for a relationship between parental income and the grades of incoming college freshman.

The independent variable in this case is parental income, and the dependent variable is average GPA of college freshman. Another way of saying this is: the average GPA of incoming freshman depends on the income of their parents. Think about what happens if you were to reverse that: it wouldn’t make much sense, right?

You can use inferential statistics to calculate whether this relationship applies to more than just the people you have in your dataset. You’d probably formulate a hypothesis something along the lines of, ‘Do college freshman whose parents have higher incomes have higher grades?’ Inferential statistics will let you know if you’re onto something that applies widely.

## Lesson Summary

Psychologists use statistics to identify relationships between variables. Sometimes, you might just want to describe something from data, like the average income in a particular city. **Descriptive statistics** describe something in a dataset and allow you to do some basic calculations to find this information.

However, if you need to now more about your data than some descriptives, you’ll want to use inferential statistics. Remember, **inferential statistics** allow you to draw conclusions from a dataset and generalize them to a bigger population. You also need to remember the basics of gathering statistics in a study, such as the difference between an **independent variable**, something that doesn’t change but is thought to cause a change in the dependent variable, and a **dependent variable**, something that is dependent on the independent variable. You also need to understand the three ‘M’s,’ the **median**, middle number value in data set; the **mode**, the most frequently occurring number in the dataset; and the **mean**, the total average.