Ordinary least square (OLS) or Lease square method in linear regression is mathematical analysis and its used to find the best fit line of regressor by using this method we can optimize the error between actual points and regression points. Lease square method seeks the best fit line that tries to explain the relationship between dependent variables (Y values) and independent variable (X values). In the regression analysis will try to plot data point in the graph where independent variables show on horizontal X-axis and dependent variables are show vertical Y-axis.
In the previous article, we cover Linear Regression basic and its high-level formulas, In this article will cover how linear regression find our best fit line which has a minimum error by using Ordinary least square (OLS) method with some mathematical calculation.
Ordinary Least Square:-
So to understand how Ordinary least square (OLS) method work will first look our line of slop line or linear regression line with dependent and independent variable which we cover in the last article.
Y = M*X + C
As per this above basic equation Y in the dependent variable and X is an independent variable and we need to find out M and C value. Let’s do some mathematical calculation.
Let assume X axis and Y axis points where X-axis value (1,2,3,4,5) and Y-axis value (3,4,2,4,5).
Let’s draw a simple graph between X values and Y values to understand how datapoint look for X and Y values.
We make a simple table to cover all mathematical calculation which used in the OLS model.
As we know how to calculate the mean of X and Y values.
1+2+3+4+5/5 = 15/5 = 3
3+4+2+4+5/5 = 18/5 = 3.6
Is a mean of corresponding value.
As we know linear regression line formula is Y = MX + C
This is basically the line of slope formula.
Where M is the slope of the line.
X is the x values and Y is the Y values.
C is the y-intercept of the line.
As the line of slope formula is defined below we will calculate M value.
From the above table, we will put corresponding value and find out M value which is M = 0.4
As we know the regression line formula, now try to put above M value and find C value.
Y = MX + C
3.6 = 0.4 * 3+ C
3.6 = 1.2 + C
C = 2.4
Once we have M value and C value, will try to put X value one by one and get corresponding Y predicted value.
M = 0.4
C = 2.4
Yp = 0.4x + 2.4
for give M = 0.4 & C = 2.4 let’s predict value for Yp
Yp = 0.4*1 + 2.4 = 2.8
Yp = 0.4*2 + 2.4 = 3.2
Yp = 0.4*3 + 2.4 = 3.6
Yp = 0.4*4 + 2.4 = 4.0
Yp = 0.4*5 + 2.4 = 4.4
let draw a plot for actual Y Vs predicted Y (Yp).
As we can see from the above graph blue line shows actual values and orange line show the regression line. Regression line tries to fit for the given dataset, as we can see it’s predicting very closely for real data points, like datapoint 3 and 5. Regression line predicts very accurately for data point 4 as an actual point, so at this point, the error is zero. So as we can see from the regression line it tries to fit overall datapoint and gives the best fit line which has less error between real point and predicted point.
Wrapping up:- Ordinary least square (OLS) method in linear regression create a model which try to minimize the sum of square error between the real data point Y and predicted datapoint Yp.