KDnuggets HomeNews2019MarTutorials, Overviews Neural Networks with Numpy for Absolute Beginners  Part 2: Linear Regression (19:n11)Neural Networks with Numpy for Absolute Beginners  Part 2: Linear Regression

In this tutorial, you will learn to implement Linear Regression for prediction using Numpy in detail and also visualize how the algorithm learns epoch by epoch. In addition to this, you will explore two layer Neural Networks. BySuraj Donthi, Computer Vision Consultant & Course Instructor at DataCamp

In the previous tutorial, you got a very brief overview of a perceptron.

Neural Networks with Numpy for Absolute Beginners: Introduction

In this tutorial, you will dig deep into implementing a Linear Perceptron (Linear Regression) from which youll be able to predict the outcome of a problem!

This tutorial will apparently include a bit more of math as it is inevitable, but theres no need to worry as I will explain them ground up. Regardless of this, it must be realized that all machine learning algorithms are basically mathematical formulations which are finally implemented in the form of code.

Before we start off, remember that we had used the threshold activation function to mimic the function of AND and NOR Gates?!

Here we will use another extremely simple activation function called linear activation function (equivalent to not having any activation!).

Let us find out the wonders that this activation function can do!

Lets assume that there is only one input and bias to the perceptron as shown below: The resulting linear output (i.e., the sum) will be

. This is the equation of a straight line, as shown in the below figure. It must be noted here that when no activation function is used, we can say that the activation function is linear.

This is amultivariate(multiple variables) linear equation.

Let us see how this is utilized for predicting the actual output of y in the next section i.e.,Linear Regression.

Fitting a linear equation on a given set of data in n-dimensional space is calledLinear Regression. The below GIF image shows an example of Linear Regression. In simple words, you try to find the best values ofmandbthat best fits the set of points as shown in the above figure. When we have obtained the best possible fit, we can predict theyvalues givenx.

A very popular example is thehousing price predictionproblem. In this problem you are given a set of values like the area of the house and the number of rooms etc. as features and you must predict the price of the house given these values.

So, the big question is How does the prediction algorithm work? How does it learn to predict?

Lets start by importing the required packages.

Numpy for efficient Matrix and mathematical operations. import numpy as np Pandas for table and other related operations import pandas as pd Matplotlib for visualizing graphs import matplotlib.pyplot as plt from matplotlib.pylab import rcParams Sklearn for creating a dataset from sklearn.datasets import make_regression train_test_split for splitting the data into training and testing data from sklearn.model_selection import train_test_split % matplotlib inline Set parameters for plotting params = axes.titlesize: xx-large, Set title size belsize: x-large, Set label size figure.figsize: (8, 6) Set a figure Size rcParams.update(params)

Youll use thesklearndataset generator for creating the dataset. You will also use the package for splitting the data into training and test data. If you are not aware ofsklearn, it is a rich package with many machine learning algorithms. Although, you get pre-built functions for performing linear regression, you are going to build it from scratch in this tutorial.

For creating the dataset, you must first set a list of hyperparameters  whilemandbare parameters, the number of samples, the number of input features, the number of neurons, the learning rate, the number of iterations/epochs for training etc. are called hyperparameters. You shall learn about these hyperparameters as you implement the algorithm.

For now, you shall set the number of training samples, the number of input features, the learning rate and epochs. You shall understand learning rate and epochs in a short while.

Your first task would be to import or generate the data. In this tutorial, youll generate the dataset usingsklearns make_regressionfunction.

For purpose of learning, we shall keep the number of features minimal so that it is easy to visualize. Hence, you must choose only one feature.

Now, its time to visualize what the data generator has cooked up!

def plot_graph(X, y): Plot the original set of datapoints _ = plt.scatter(X, y, alpha=0.8) _ = plt.title(Plot of Datapoints generated) _ = plt.xlabel(x) _ = plt.ylabel(y) plt.show()

Lets check the shape of the vectors for consistency.

We need reset the size ofyto(200, 1)so that we do not get errors during vector multiplications.

Next you will have to split the dataset into train and test sets, so that you can test the accuracy of the regression model using a part of the dataset once you have trained the model.

Now lets split the data into train set and test set.

In our case, the training set is 80% and the test set is 20%.

Lets check the shape of the Train and Test datasets created.

As you can see, 80% of the data i.e., 80% of 200 data points is 160 which is correct.

We have done the initialdata preprocessingand alsoexploredthe data through visualizing it. This is typically the first step while modeling any machine learning algorithm. We have also split the data for testing the accuracy of the model once it is trained.

Clearly as shown in the above Linear Regression GIF image, we need to consider a random line at first and then fit it on the data through training.

Therefore, the next step is to randomlygenerate a line with a random slope and an intercept(bias). The goal is to achieve the best fit for the line.

Function to generate parameters of the linear regression model, m & b. def init_params(): m = np.random.normal(scale=10) b = np.random.normal(scale=10) return m, b

Now, givenm&b, we can plot the line generated.

Lets update the function plot_graph to show the predicted line too.

def plot_graph(dataset, pred_line=None): X, y = dataset[X], dataset[y] Plot the set of datapoints _ = plt.scatter(X, y, alpha=0.8) if(pred_line != None): x_line, y_line = pred_line[x_line], pred_line[y_line] Plot the randomly generated line _ = plt.plot(x_line, y_line, linewidth=2, markersize=12, color=red, alpha=0.8) _ = plt.title(Random Line on set of Datapoints) else: _ = plt.title(Plot of Datapoints) _ = plt.xlabel(x) _ = plt.ylabel(y) plt.show()

Function to plot predicted line def plot_pred_line(X, y, m, b): Generate a set of datapoints on x for creating a line. x_line = np.linspace(np.min(X), np.max(X), 10) Calculate the corresponding y with random values of m & b y_line = m * x_line + b dataset = X: X, y: y pred_line = x_line: x_line, y_line:y_line plot_graph(dataset, pred_line) return

Since the line is now generated, youll need to predict the values it is producing for a given value ofx. From this value, all there is to do is to calculate their mean squared error. Why?

How could we find the difference between the actual output and the predicted output?

The simplest way would be to just subtract these two differences. We have a random line that gives an outputy_predfor everyxthat is given, but its surely not the actual output. Luckily, we have the actual output of allxtoo! So what we do is instead of taking the difference directly (which is technically called absolute distance or L1 distance), we square it (called the Euclidean distance or L2 distance) and take the mean for all the given points & this is calledMean Squared Error.

Let us now predict the values ofy_predfrom the parametersm&bgiven the datapointsX_trainby defining a functionforward_prop.

Top 10 Coding Mistakes Made by Data Scientists

How to Recognize a Good Data Scientist Job From a Bad One

Data Visualization in Python: Matplotlib vs Seaborn

An Introduction on Time Series Forecasting with Simple Neural Networks & LSTM

9 Must-have skills you need to become a Data Scientist, updated

How To Go Into Data Science: Ultimate Q&A for Aspiring Data Scientists with Serious Guides

Another 10 Free Must-See Courses for Machine Learning and Data Science

Top 10 Coding Mistakes Made by Data Scientists

Data Visualization in Python: Matplotlib vs Seaborn

An Introduction on Time Series Forecasting with Simple Neural Networks LSTM

How to Recognize a Good Data Scientist Job From a Bad One

Best Data Visualization Techniques for small and large data

Statistical Thinking for Industrial Problem Solving (ST…

Strata SF day 2 Highlights: AI and Politics, Chatbots I…

How to Automate Tasks on GitHub With Machine Learning f…

Modeling Price with Regularized Linear Model XGB…

Top tweets, Apr 2430: Another 10 Free Must-Read Bo…

Top tweets, Apr 2430: Another 10 Free Must-Read Books fo…How to correctly select a sample from a huge dataset in machin…Which Deep Learning Framework is Growing Fastest?Build Your First Chatbot Using Python NLTKKDnuggets 19:n17, May 1: The most desired skill in data sci…Powerful like your local notebook. Sharable like a Google Doc.Top Stories, Apr 22-28: The most desired skill in data science…Learn About Data Science the Future of Investing from H…Interview Questions for Data Science Three Case Interv…Seeking KDnuggets Editors, part-time, work remotelyNormalization vs Standardization  Quantitative analysisOn Stage at PAW Industry 4.0: Bayer, Continental, HP, Vodafone…Strata SF day 1 Highlights: from Edge to AI, scoring AI projec…Top Data Science and Machine Learning Methods Used in 2018, 2019Monash University: Lecturer/Senior Lecturer (Machine Learning …Delivering Trusted AI with DataRobot and Microsoft KDnuggets HomeNews2019MarTutorials, Overviews Neural Networks with Numpy for Absolute Beginners  Part 2: Linear Regression (19:n11) 