This article talks about the Machine learning algorithm named Linear Regression,  After the data is analysed and prepared it has to be processed  using a ML algorithm, let’s talk about this, every dataset has Features and Target values, features are represented as the parameters passed to a function and the targets are the return / output value(s).

The below study is about processing the data set using Linear Regression algorithm , here we find the slope, y-intercept , y-prediction values . here we also find the R2  value, here  R2 predicts the difference between the actual and predicted values, if the value falls in the range of 0.5 – 1.0 , then  this equation can be considered as the good-fit.

Lets visualize this on a graph plot  with X and y axis, here X is termed as a the independent variable and y as dependent variable. Assume as the X values increases so even the y value also increases then there is a positive slope

y^= b0 + b1X

Here b1 is called the slope,  b0 the y intercept and y^ is the predicted value value  Here we represent the x as Capital X, considering it as matrix and y as the vector.

 

 

Here we also have a negative slope as well , let’s discuss about it , as the X values  increases , if y value decreases then it has negative slope

y^= b0 – b1X

Lets understand this with a simple data-set using Excel using a step by step  approach and also how Python’s machine learning (Linear Regression) algorithm predicts the values

Here we will take a sample data-set with X ( Features) and y (Targets) variables.

 

Step 1: lets calculate the mean of X and y variables

 

Here x̄  ( check the bar symbol on top) is the mean of x and ȳ is the mean of y.

Step 2 : Subtract the distance of X with the X mean value of it and y for the same as shown below

Step 3: Now we will calculate the slope of which is b1 , the formula for this ,  

b1 = Σ (x – x̄)(y – ȳ ) /  Σ(x – x̄)2

here Σ (Sigma) stands for the Sum Of

 

Step 4: Now we need to find the y intercept

4 = b0 + 0.6 * 3 ( Mean of x)

 4 = b0+ 1.8

 4 – 1.8 = b0 + 1.8 – 1.8

  b0 = 2.2

Now we have the slope and y intercept ,  using this we can calculate the predicted values y^.

y^ = 2.2 + 0.6 * 1

y^ = 2.8

Slope = 0.6 and y intercept = 2.2  for the first value of x = 1 is 2.8, similarly calculating for all the values ( 2, 3, 4 and 5).

So here the values for y are  [2, 4, 5, 4, 5]  are the actual values and y^ = [2.8, 3.4 4, 4.6, 5.2] are the predicted values.

Step 5: Now we need to find the R2  value to check the difference between the actual and predicted values. The formula for this is

r2 = Σ (y^ – ȳ )2 / Σ(y – ȳ )2

R2  = 3.6 /6 = 0.6

This section completes Linear Regression using excel.

Now let’s check how the Python’s machine learning algorithm ( Linear regression )  works with the same data-set used above. Here we will be using the jupyter notebook.

Step 1 : Loading the Python libraries which are needed.

Step 2: Reading the data-set and display the first 3 rows of it

Step 3: Plotting the actual values using a scatter plot to check how the actual values are displayed on the graph.

Step 4: Fitting the data into a Linear regression model  and find the Slope, y intercept and the predicted values

Here we got the same values for slope and y intercept as excel ( 0.6 and 2.2). Both manual ( excel ) and ML ( Python ) predicted the same values.

Lets pass the predicted values to a pandas dataframe and join both the tables.

Lets plot the graphical representation using the python matplotlib library. The red line indicates the predicted values and the black line represents the actual values.

 

 

Now we will calculate the R2   value using the sklearn library as shown below.

Here also it displays the R2 value as 0.6. Which is a good fit meaning there is not much difference between the actual and predicted values , if the R2  value is equal to 1.0 , then both the Actual and Predicted values are equal.