To train the model we actually predict the new value for given independent features, however, that features have some real value in datasets. In Regression, if model predicted value is closer to corresponding real value will be the optimal model. Cost function measure how close predicted with respect to real value. Gradient Descent method will be used to minimize the cost function.

**Gradient-based and gradient free algorithms are tow type of algorithms to solve the model optimization based problem.**

Gradient Descent method uses three steps to optimize the model.

**Search Direction.****Step Size.****Convergence check.**

As we know the slop of the line will represent with below equation

In Regression, we represent the model in hypothesis term which develops from slop of line formula.

In regression, we find the accuracy of the cost function.

The goal is to minimize value.

To minimize the will do partial differentiation with respect to .

In below equation, we use formula and replace value with hypothesis equation.

When we are partial differentiation with respect to will consider as a constant and differentiation of constant become zero.

If is constant then differentiation of become zero.

is the learning rate.

**m** is no. of rows.

**n** is no. columns.

In starting values is high because model picks random values and start tuning cost function by differentiation. After differentiation of new value it reduced and try to reach the bottom of the slope. If error decrease it means the model hopefully reaching to global minima or convergence point if error increase mean model overshooting and model is going to far from global minima point.

At that time if we adjust our learning rate our model will quickly reach to global minima point.

If our learning rate value is very (case 1) low then because of reducing new value model try to reach global minima point but because of low learning rate it takes very little step size and takes a long time to reach global minima point.

In other cases, if learning rate value is high (case 2) step size crosses the global minima point and reach to another side of the curve by doing this it takes a long time to reach global minima point. So we need to define the value in such a way so it reaches to global minima fast, normally it value is 0.1.

So as we can look above screenshot in starting when value is random our model prediction is very bad but as soon as the new decreased value with proper it optimizes the model performance when value near global minima where the model predicted value is same as actual value model performance improved and it is the best fit model.

Wrapping Up:- Cost function in machine learning is used to measure how close our model predicted value with respect to real value. Gradient Descent method will be used to minimize the cost function.