This article talks about the Machine learning algorithm named **Linear Regression**, After the data is analysed and prepared it has to be processed using a ML algorithm, let’s talk about this, every dataset has **Features **and **Target **values, features are represented as the parameters passed to a function and the targets are the return / output value(s).

The below study is about processing the data set using Linear Regression algorithm , here we find the **slope**, **y-intercept **, **y-prediction** values . here we also find the **R ^{2}**

**value, here R**

^{2}predicts the difference between the actual and predicted values, if the value falls in the range of 0.5 – 1.0 , then this equation can be considered as the good-fit.

Lets visualize this on a graph plot with X and y axis, here X is termed as a the independent variable and y as dependent variable. Assume as the X values increases so even the y value also increases then there is a positive slope

**y^= b**_{0}** + b**_{1}**X**

Here **b**** _{1}** is called the

**slope**,

**b**

**the y intercept and**

_{0}**y^**is the predicted value value Here we represent the x as Capital X, considering it as matrix and y as the vector.

Here we also have a negative slope as well , let’s discuss about it , as the X values increases , if y value decreases then it has negative slope

**y^= b**_{0}** – b**_{1}**X**

Lets understand this with a simple data-set using Excel using a step by step approach and also how Python’s machine learning (Linear Regression) algorithm predicts the values

Here we will take a sample data-set with X ( Features) and y (Targets) variables.

**Step 1:** lets calculate the mean of X and y variables

Here x̄ ( check the bar symbol on top) is the mean of x and ȳ is the mean of y.

**Step 2 :** Subtract the distance of X with the X mean value of it and y for the same as shown below

**Step 3:** Now we will calculate the slope of which is b1 , the formula for this ,

**b**_{1}** = Σ (x – x̄)(y – ȳ ) / Σ(x – x̄) ^{2}**

here **Σ (Sigma) **stands for the Sum Of

**Step 4:** Now we need to find the y intercept

** 4 = b**_{0}** + 0.6 * 3 ( Mean of x)**

** 4 = b**_{0}**+ 1.8**

** 4 – 1.8 = b**_{0}** + 1.8 – 1.8 **

** b**_{0}** = 2.2**

Now we have the slope and y intercept , using this we can calculate the predicted values y^.

**y^ = 2.2 + 0.6 * 1**

**y^ = 2.8**

Slope = 0.6 and y intercept = 2.2 for the first value of x = 1 is 2.8, similarly calculating for all the values ( 2, 3, 4 and 5).

So here the values for **y are [2, 4, 5, 4, 5] ** are the actual values and **y^ = [2.8, 3.4 4, 4.6, 5.2]** are the predicted values.

**Step 5: **Now we need to find the R^{2} value to check the difference between the actual and predicted values. The formula for this is

**r2 = Σ (y^ – ȳ ) ^{2} / Σ(y – ȳ )^{2}**

**R**^{2}** = 3.6 /6 = 0.6**

**This section completes Linear Regression using excel****.**

Now let’s check how the Python’s machine learning algorithm ( Linear regression ) works with the same data-set used above. Here we will be using the jupyter notebook.

**Step 1 :** Loading the Python libraries which are needed.

**Step 2:** Reading the data-set and display the first 3 rows of it

**Step 3: **Plotting the actual values using a scatter plot to check how the actual values are displayed on the graph.

**Step 4:** Fitting the data into a Linear regression model and find the **Slope**, **y intercept **and the **predicted values**

Here we got the same values for slope and y intercept as excel ( 0.6 and 2.2). Both manual ( excel ) and ML ( Python ) predicted the same values.

Lets pass the predicted values to a pandas dataframe and join both the tables.

Lets plot the graphical representation using the python matplotlib library. The red line indicates the predicted values and the black line represents the actual values.

Now we will calculate the R^{2} value using the sklearn library as shown below.

Here also it displays the R^{2} value as 0.6. Which is a good fit meaning there is not much difference between the actual and predicted values , if the R^{2} value is equal to 1.0 , then both the Actual and Predicted values are equal.