Photo by Mehdi MeSSrro on Unsplash
The Polynomial Regression
Cousin of Linear Regression to fit the non-linear data
Prerequisite
- Linear Regression from Scratch - learnml.hashnode.dev/linear-regression-with..
- Project on Linear Regression - learnml.hashnode.dev/project-predicting-sal..
Notebook
Colab Notebook: colab.research.google.com/drive/1GACG2UAucd..
Feel free to try out the learning in this article.
Introduction
In the linear regression, we had one assumption; that the data should be linear in nature. Simple Linear Regression can't handle the non-linearity in the data. This could be solved by introducing quadratic variables inside the function and this is called polynomial regression. We will see in detail of these steps and let's derive the polynomial regression.
What is the problem with Linear regression?
Let us create a dataset,
# Imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
plt.style.use('fivethirtyeight')
# figure size in inches
rcParams['figure.figsize'] = 11.7,5.27
np.random.seed(0)
x = 2 - 3 * np.random.normal(0, 1, 20)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 20)
plt.scatter(x,y, s=10)
plt.show()
Output will look like this,
Now let us try applying Linear Regression on top of this dataset,
from sklearn.linear_model import LinearRegression
# transforming the data to include another axis
x = x[:, np.newaxis]
y = y[:, np.newaxis]
model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)
plt.scatter(x, y, s=10)
plt.plot(x, y_pred, color='r')
plt.show()
The resultant linear model will look like this,
Now let us evaluate the model using the Regression Evaulation Metrics . Here we can use RMSE and R2
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
rmse = np.sqrt(mean_squared_error(y, y_pred))
r2 = r2_score(y, y_pred)
print(f"RMSE of linear regression is {rmse}.")
print(f"R2 score of linear regression is {r2}")
we can see the RMSE is pretty high and R2 score is also low. It's performing low. we can't bend the curve to get it fit for the data. Since our line equation is
Y = mX + C
There is no quadratic part in our equation. . . . . . . . 💡Wait did you get the idea, Why can't we introduce the quadratic or polynomial variables inside the equation to create a curve and apply gradient descent on top of it to get the best fit line.
Rise of Polynomial Regression
Polynomial regression is a form of Linear regression where only due to the Non-linear relationship between dependent and independent variables do we add some polynomial terms to linear regression to convert it into Polynomial regression.
To generate a higher-order equation we can add powers of the original features as new features. The linear model,
can be transformed to,
This is still considered to be a linear model as the coefficients/weights associated with the features are still linear. x² is only a feature. However, the curve that we are fitting is quadratic in nature.
To convert the original features into their higher-order terms we will use the PolynomialFeatures class provided by sci-kit-learn. Next, we train the model using Linear Regression.
It is quite clear from the plot that the quadratic curve is able to fit the data better than the linear line. Computing the RMSE and R²-score of the quadratic plot gives:
We can see that RMSE has decreased and R²-score has increased as compared to the linear line.
we can try to increase the degree of the polynomial and experiment,
if we are trying to substitute degree = 20, then we can see the line is connecting most of the data points ( Line is wiggled more to fit those points ) and this leads to Curse of Dimensionality
Advantages of polynomial regression
- You can model non-linear relationships between variables
- There is a large range of different functions that you can use for fitting.
- Good for exploration purposes: you can test for the presence of curvature and its inflections.
Disadvantages of polynomial regression
- Even a single outlier in the data plot can seriously mess up the results.
- PR models are prone to overfitting. If enough parameters are used, you can fit anything. As John von Neumann reportedly said: “with four parameters I can fit an elephant, with five I can make him wiggle his trunk.”
- As a consequence of the previous, PR models might not generalize well outside of the data used.
Conclusion
Polynomial regression is a simple yet powerful tool for predictive analytics. It allows you to consider non-linear relations between variables and reach conclusions that can be estimated with high accuracy. This type of regression can help you predict disease spread rate, calculate fair compensation, or implement a preventative road safety regulation software.
Interview Questions
- Can linear regression be used for representing quadratic equations?
- What are the problems of Polynomial Regression.
- Will overfitting occur in polynomial regression.
- What happens when the degree is getting increased to a greater value.
- What is the loss function of polynomial regression?