Photo by Sangga Rima Roman Selia on Unsplash
Project - IMDB Sentiment Classification for 50K Reviews
Naive Bayes Classification and Deployment to Heroku with FastApi
Resources
- Notebook - kaggle.com/syedjaferk/imdb-sentiment-classi..
- Github - github.com/syedjafer/Naive-Bayes-Sentiment-..
- Heroku Endpoint - classification-imdb-naivebayes.herokuapp.co..
- Dataset - kaggle.com/datasets/lakshmi25npathi/imdb-da..
Introduction
In this project we will be creating a classifier model based on the IMDB movie reviews. And will be converting the models to end point using FastApi and deploying it to heroku. This will be a very breif flow of the project. Please tryout with different values in the endpoint.
Data cleaning and Preprocessing
In the dataset, currently we have some
tags and some slash / .
We can remove html tags using BeautifulSoup, punctuations using regex and stopwords using nltk. After doing the cleaning steps, we will be lemmatizing each word in the review. Please checkout the notebook for more reference.
Model Building
After preprocessing, we need to convert the textual data to vectors. For the Train dataset we will use CountVectorizer and for the target we will be using Label Encoder. Then we will feed the data into Multinomial Naive Bayes from sklearn and using grid search, find the optimal values for the parameters.
Model Saving
Now, we need to save the model and the vectorizer using joblib. So that in the FastApi we can include these files and convert the user given input to a required format needed by our Model.
Fast API
from typing import Union
from fastapi import FastAPI
from joblib import load
from pydantic import BaseModel
app = FastAPI()
vector = load('vectors.joblib')
model = load('classifier.joblib')
class get_review(BaseModel):
review: str
@app.get("/")
def read_root():
return {"Hello": "World"}
@app.post("/prediction")
def get_prediction(gr: get_review):
text = [gr.review]
vec = vector.transform(text)
prediction = model.predict(vec)
prediction = int(prediction)
print(prediction)
if prediction == 1:
prediction = 'positive'
else:
prediction = 'negative'
return {"sentence": gr.review, "prediction": prediction}
Deployment
This application is now deployed to heroku on classification-imdb-naivebayes.herokuapp.co...