Correlation Analysis: Complete Definition & Example

Correlation Analysis is a method to describe the linear relationship between two different variables. With this method, we can see the patterns and define how linear it is. We can conclude that two variable is associated if a change in one variable causes a change in another variable

Things that we need to mark is that the correlation coefficient does not answer whether variable A causes a change in variable B or whether variable B causes a change in variable A.

It just explains about a relationship, but we can’t make conclusion variable A causes change to variable B  just by using correlation analysis. Do not do that!

Let see the chart below!

simple detection of correlation analysis

From the chart, you can see that there is a relationship between age and height. The greater the age, the height is kept raising up. Likewise, the lower the age, height decreased. That’s how correlation works!

It shows you the connection and relation among two different things!

Types of correlation analysis

There are two kinds of relationship of correlation analysis :

1. Positive correlation

A positive correlation is a relationship between 2 variables which the increase of one variable causes an increase for another variable.

Or it can also be defined otherwise, the lower a variable, the more it moves down as well as other variables.

For examples, in case we are planting fruit, increasing use of fertilizers will increase the probability of increasing production. If we are part-time workers, the longer we work, the greater the pay.

2. Negative correlation

The Negative correlation is the opposite, it’s a relationship between 2 variables which the increase of one variable causes a decrease for another variable. This applies otherwise.

For examples, when the price of rice continues to rise, then the people’s purchasing power will decrease. The longer a person learns, the fewer mistakes he makes

Using scatterplot to help you understand the correlation

Correlation analysis always involves two variables that tied together. Usually, statistician use scatterplot to help and give an initial sign of analyzing. Scatterplots help provides a general picture so that we can see the correlation between the two variables.

The Scatterplot also helps check if there are outliers in the data set. Outliers need to be checked if it will affect the results of the analysis, both descriptive and inferential analysis.

We should have two group set of data and transform them into horizontal dimension and vertical dimension. It should be numerical data and have numbers.

Take a look below :

negative correlation example

From the scatterplot, at a glance, you can see that there is a contrast correlation between price and sell products. The higher the price, the sold products are decreasing. We can conclude it as a negative correlation.

Now, let’s take a look of different example

From the scatterplot above, you’ll see a different pattern from the previous one. We can see that the older the kid, the weight is getting an increase. We can conclude it as a positive correlation.

positive-correlation-example

How to interpret the correlation coefficient?

Basically, the closer to the value of 1, the stronger the relationship between the two variables. When it approaches zero, the association between the two variables is getting weaker. When you get a negative value, it means there is a negative correlation.

The way the interpretation is the same. The closer to -1, the stronger the negative correlation. The closer to 0, the weaker the negative correlation.

– 0,00 – 0,19 = Correlation is very weak

– 0.20 – 0.39 = Weak correlation

– 0.40 – 0.59 = Moderate correlation

– 0.60 – 0.79 = Strong correlation

– 0.80 – 1.00 = Correlation is very strong

Measuring correlation analysis

We can see the pattern and direction of two variable from the scatterplot but we need to measure how strong the relationship between two variable. We need a specific number to define the strength of the correlation.

Usually, we can use the correlation coefficient to calculate how associated two variables. Several formulas have been created. In this article, there are two methods prepared.

1. Pearson Correlation Coefficient

One of the most used ones is the Pearson Correlation Coefficient. We can call it just the correlation coefficient. This coefficient is used to calculate the correlation with the terms:

1. The data have interval or ratio scale.

2. The relationship between the two variables must be linear, it means that the distribution of data generally scatters along a straight line.

3. Data is normally distributed.

Here is the formula :

pearson correlation formula

r = coefficient correlation

x = data values of x data set

y = data values of y data set

\overline {x} = mean of x data set \overline {y} = mean of y data set S_{x} \ = standard deviation of x data S_{y} \ = standard deviation of y data
examples-of-perfect-negative-correlation
Examples of perfect negative correlation

If we get a negative value, just add a negative correlation for the explanation. Also, if the r = -1, it indicates a perfect negative correlation. If the r = 0, it indicates no correlation exists. If the r = 1, it indicates a perfect positive correlation.

Example :

Let we use the data for the graph above, and find the correlation

No.Age (years)Weight
1539
2542
3643
4744
5742
6746
7848
8952
9953
101055
111060
121161
131262
141262
151360
161363
171464

Answer :

Let we say, x is age and y is weight. We can define some parts of the formula here!

Now, let’s use the Pearson correlation formula!

calculating correlation manually

The correlation value is 0.96.

Based on the correlation value, we can conclude that there is a very strong positive correlation between age and weight. The greater someone age, there the heavier he is.

2. Spearman Correlation

Spearman Correlation is is a correlation measurement method for data that has an ordinal (rank) scale. Both variables are quantitative but normal conditions are not met.

There is two spearman Condition. First, when there is no double rank or double data. Second, when there is double rank or data.

Now, take a look at the steps in using the Spearman correlation test:

  • Arrange and order data rankings from smallest to largest. If there is the same data give an average rating value.
  • Find the difference between the ranking of the first variable with the second variable.
  • Use the calculation formula according to the data conditions

1. No double/rank data

Now, let us check the first, no double rank or double data. The formula for the spearman correlation is :

no double data spearman correlation formula

rs= spearman correlation

di= difference from rank pair …

n = total of pair rank

We can rank data from the biggest or the smallest before the correlation calculation according to the needs and types of questions.

Now, take a deep breath for the example!

We are examining ten students mark for math and science. We want to know that is there a relationship between science score and math score?

example of spearman correlation

Use the formula above and you’ll find this result!

calculation of spearman correlation

Well, based on the calculation, we found the correlation value between science score and math score is -0.66.

There is a negative correlation between math score and science score, it has a moderate relationship. The higher the value of the science subject, the lower the value of mathematics.

2. Double rank/data

Sometimes, there is double data in ranking the Spearman correlation test. Therefore, the formulas that we use are also different and have special treatment.

double data spearman formula

Because we have double data or double rank, we need to use the correction factor using the following formula.

correction factor of spearman correlation

Example:

Suppose we have Biology score data and History score data of 10 students. We would like to know how strong the correlation is. Let’s use the formula!

double data spearman correlation example
calculation of spearman corelation in double data case

Conclusion: There is a weak correlation between Biology score and History score. The correlation value is -0.146.

How to use correlation analysis using Excel Formula

If you want to use Microsoft Excel Formula, it’s really easy. You can use this simple formula find the result instantly.

=correl(array1,array2) ; array 1 = first group of data, array 2 = second group of data

pearson correlation using microsoft excel formula

It is very simple. Just put the simple formula, block the correlation variable, and hit enter. You’ll find exactly the same value as the example as I wrote above on Pearson section.

But, you can’t use this formula to for ordinal data or spearman formula. So, you have to use another statistical tool such as SPSS, SAS, Minitab, or others to find your correlation value.

It is better and useful for your correlational research.

How to use correlation analysis using SPSS

First, let us take a look for Pearson correlation. Follow these steps:

  • Open your SPSS
  • Fill your variable and data set. I am still using the same data with the previous excel formula.
prepare your correlation data set
  • Select the analyze >> correlate >> bivariate
  • Move the variable to be analyzed
choosing spearman correlation with SPSS
  • Click the spearman section
  • Click Ok
  • You find the result!
spearman-correlation-with-SPSS

See? The result using manual calculation, Microsoft excel, and SPSS is the same. The correlation value is 0.96.

Let’s move to the Spearman Correlation using SPSS. I am using the same data with previous I calculate manually above. Let’s consider the steps again!

  • Prepare your data set!
data-set-for-spearman-correlation
  • Select the analyze >> correlate >> bivariate menu
  • Move the variable to be analyzed
  • Click the spearman section
  • Click Ok
result of speaman correlation with SPSS

Conclusion: The correlation value is -0.66, exactly the same with manual calculation. Isn’t it?

Things that we need to remember about correlation analysis

1. Correlation can not explain causation

Correlation can only explain the strength of relationships between variables. Correlation cannot conclude whether a variable has a significant effect on other variables.

Remember, this is not inferential statistics technique.

2. Negative correlation does not mean no correlation

Sometimes, we think that negative correlation means there is no relationship between variables. No, you are wrong. A negative correlation means there is an opposite direction relationship between the two variables.

3. Both variables can be switched

If you are analyzing two variable and switch them, it won’t affect the correlation value. For example, the correlation value of age and height is 0.9. Otherwise, the correlation value of height and age is still 0.9.

Summary

Correlation analysis is one of the most favorite indicator analysis. It can explain how far a variable affects other variables.

You can choose various software or you can use manual calculation. If you use the formula correctly, the result is the same value.

Correlation also helps us to give a whole picture of the data condition. If you want to analyze and know how far the relationship of two variable, you have to use it.

Do not hesitate to leave a comment!