Correlation Analysis is a method to describe the linear relationship between two different variables. With this method, we can see the patterns and define how linear it is. We can conclude that two variable is associated if a change in one variable causes a change in another variable
Things that we need to mark is that the correlation coefficient does not answer whether variable A causes a change in variable B or whether variable B causes a change in variable A.
It just explains about a relationship, but we can’t make conclusion variable A causes change to variable B just by using correlation analysis. Do not do that!
Let see the chart below!
From the chart, you can see that there is a relationship between age and height. The greater the age, the height is kept raising up. Likewise, the lower the age, height decreased. That’s how correlation works!
It shows you the connection and relation among two different things!
- Types of correlation analysis
- Using scatterplot to help you understand the correlation
- How to interpret the correlation coefficient?
- Measuring correlation analysis
- 1. Pearson Correlation Coefficient
- 2. Spearman Correlation
- How to use correlation analysis using Excel Formula
- How to use correlation analysis using SPSS
- Things that we need to remember about correlation analysis
- You have to read this!
Types of correlation analysis
There are two kinds of relationship of correlation analysis :
1. Positive correlation
A positive correlation is a relationship between 2 variables which the increase of one variable causes an increase for another variable.
Or it can also be defined otherwise, the lower a variable, the more it moves down as well as other variables.
For examples, in case we are planting fruit, increasing use of fertilizers will increase the probability of increasing production. If we are part-time workers, the longer we work, the greater the pay.
2. Negative correlation
The Negative correlation is the opposite, it’s a relationship between 2 variables which the increase of one variable causes a decrease for another variable. This applies otherwise.
For examples, when the price of rice continues to rise, then the people’s purchasing power will decrease. The longer a person learns, the fewer mistakes he makes
Using scatterplot to help you understand the correlation
Correlation analysis always involves two variables that tied together. Usually, statistician use scatterplot to help and give an initial sign of analyzing. Scatterplots help provides a general picture so that we can see the correlation between the two variables.
The Scatterplot also helps check if there are outliers in the data set. Outliers need to be checked if it will affect the results of the analysis, both descriptive and inferential analysis.
We should have two group set of data and transform them into horizontal dimension and vertical dimension. It should be numerical data and have numbers.
Take a look below :
From the scatterplot, at a glance, you can see that there is a contrast correlation between price and sell products. The higher the price, the sold products are decreasing. We can conclude it as a negative correlation.
Now, let’s take a look of different example
From the scatterplot above, you’ll see a different pattern from the previous one. We can see that the older the kid, the weight is getting an increase. We can conclude it as a positive correlation.
How to interpret the correlation coefficient?
Basically, the closer to the value of 1, the stronger the relationship between the two variables. When it approaches zero, the association between the two variables is getting weaker. When you get a negative value, it means there is a negative correlation.
The way the interpretation is the same. The closer to -1, the stronger the negative correlation. The closer to 0, the weaker the negative correlation.
– 0,00 – 0,19 = Correlation is very weak
– 0.20 – 0.39 = Weak correlation
– 0.40 – 0.59 = Moderate correlation
– 0.60 – 0.79 = Strong correlation
– 0.80 – 1.00 = Correlation is very strong
Measuring correlation analysis
We can see the pattern and direction of two variable from the scatterplot but we need to measure how strong the relationship between two variable. We need a specific number to define the strength of the correlation.
Usually, we can use the correlation coefficient to calculate how associated two variables. Several formulas have been created. In this article, there are two methods prepared.
1. Pearson Correlation Coefficient
One of the most used ones is the Pearson Correlation Coefficient. We can call it just the correlation coefficient. This coefficient is used to calculate the correlation with the terms:
1. The data have interval or ratio scale.
2. The relationship between the two variables must be linear, it means that the distribution of data generally scatters along a straight line.
3. Data is normally distributed.
Here is the formula :
r = coefficient correlation
x = data values of x data set
y = data values of y data set= mean of x data set = mean of y data set \ = standard deviation of x data \ = standard deviation of y data
If we get a negative value, just add a negative correlation for the explanation. Also, if the r = -1, it indicates a perfect negative correlation. If the r = 0, it indicates no correlation exists. If the r = 1, it indicates a perfect positive correlation.
Let we use the data for the graph above, and find the correlation
Let we say, x is age and y is weight. We can define some parts of the formula here!
Now, let’s use the Pearson correlation formula!
The correlation value is 0.96.
Based on the correlation value, we can conclude that there is a very strong positive correlation between age and weight. The greater someone age, there the heavier he is.
2. Spearman Correlation
Spearman Correlation is is a correlation measurement method for data that has an ordinal (rank) scale. Both variables are quantitative but normal conditions are not met.
There is two spearman Condition. First, when there is no double rank or double data. Second, when there is double rank or data.
Now, take a look at the steps in using the Spearman correlation test:
- Arrange and order data rankings from smallest to largest. If there is the same data give an average rating value.
- Find the difference between the ranking of the first variable with the second variable.
- Use the calculation formula according to the data conditions
1. No double/rank data
Now, let us check the first, no double rank or double data. The formula for the spearman correlation is :
rs= spearman correlation
di= difference from rank pair …
n = total of pair rank
We can rank data from the biggest or the smallest before the correlation calculation according to the needs and types of questions.
Now, take a deep breath for the example!
We are examining ten students mark for math and science. We want to know that is there a relationship between science score and math score?
Use the formula above and you’ll find this result!
Well, based on the calculation, we found the correlation value between science score and math score is -0.66.
There is a negative correlation between math score and science score, it has a moderate relationship. The higher the value of the science subject, the lower the value of mathematics.
2. Double rank/data
Sometimes, there is double data in ranking the Spearman correlation test. Therefore, the formulas that we use are also different and have special treatment.
Because we have double data or double rank, we need to use the correction factor using the following formula.
Suppose we have Biology score data and History score data of 10 students. We would like to know how strong the correlation is. Let’s use the formula!
Conclusion: There is a weak correlation between Biology score and History score. The correlation value is -0.146.
How to use correlation analysis using Excel Formula
If you want to use Microsoft Excel Formula, it’s really easy. You can use this simple formula find the result instantly.
=correl(array1,array2) ; array 1 = first group of data, array 2 = second group of data
It is very simple. Just put the simple formula, block the correlation variable, and hit enter. You’ll find exactly the same value as the example as I wrote above on Pearson section.
But, you can’t use this formula to for ordinal data or spearman formula. So, you have to use another statistical tool such as SPSS, SAS, Minitab, or others to find your correlation value.
It is better and useful for your correlational research.
How to use correlation analysis using SPSS
First, let us take a look for Pearson correlation. Follow these steps:
- Open your SPSS
- Fill your variable and data set. I am still using the same data with the previous excel formula.
- Select the analyze >> correlate >> bivariate
- Move the variable to be analyzed
- Click the spearman section
- Click Ok
- You find the result!
See? The result using manual calculation, Microsoft excel, and SPSS is the same. The correlation value is 0.96.
Let’s move to the Spearman Correlation using SPSS. I am using the same data with previous I calculate manually above. Let’s consider the steps again!
- Prepare your data set!
- Select the analyze >> correlate >> bivariate menu
- Move the variable to be analyzed
- Click the spearman section
- Click Ok
Conclusion: The correlation value is -0.66, exactly the same with manual calculation. Isn’t it?
Things that we need to remember about correlation analysis
1. Correlation can not explain causation
Correlation can only explain the strength of relationships between variables. Correlation cannot conclude whether a variable has a significant effect on other variables.
Remember, this is not inferential statistics technique.
2. Negative correlation does not mean no correlation
Sometimes, we think that negative correlation means there is no relationship between variables. No, you are wrong. A negative correlation means there is an opposite direction relationship between the two variables.
3. Both variables can be switched
If you are analyzing two variable and switch them, it won’t affect the correlation value. For example, the correlation value of age and height is 0.9. Otherwise, the correlation value of height and age is still 0.9.
Correlation analysis is one of the most favorite indicator analysis. It can explain how far a variable affects other variables.
You can choose various software or you can use manual calculation. If you use the formula correctly, the result is the same value.
Correlation also helps us to give a whole picture of the data condition. If you want to analyze and know how far the relationship of two variable, you have to use it.