Project - IMDB Sentiment Classification for 50K Reviews

Naive Bayes Classification and Deployment to Heroku with FastApi

Resources

  1. Notebook - kaggle.com/syedjaferk/imdb-sentiment-classi..
  2. Github - github.com/syedjafer/Naive-Bayes-Sentiment-..
  3. Heroku Endpoint - classification-imdb-naivebayes.herokuapp.co..
  4. Dataset - kaggle.com/datasets/lakshmi25npathi/imdb-da..

Introduction

In this project we will be creating a classifier model based on the IMDB movie reviews. And will be converting the models to end point using FastApi and deploying it to heroku. This will be a very breif flow of the project. Please tryout with different values in the endpoint.

Data cleaning and Preprocessing

In the dataset, currently we have some
tags and some slash / .

image.png

We can remove html tags using BeautifulSoup, punctuations using regex and stopwords using nltk. After doing the cleaning steps, we will be lemmatizing each word in the review. Please checkout the notebook for more reference.

image.png

Model Building

After preprocessing, we need to convert the textual data to vectors. For the Train dataset we will use CountVectorizer and for the target we will be using Label Encoder. Then we will feed the data into Multinomial Naive Bayes from sklearn and using grid search, find the optimal values for the parameters.

image.png

Model Saving

Now, we need to save the model and the vectorizer using joblib. So that in the FastApi we can include these files and convert the user given input to a required format needed by our Model.

image.png

Fast API

from typing import Union
from fastapi import FastAPI
from joblib import load
from pydantic import BaseModel


app = FastAPI()
vector = load('vectors.joblib')
model = load('classifier.joblib')

class get_review(BaseModel):
    review: str


@app.get("/")
def read_root():
    return {"Hello": "World"}


@app.post("/prediction")
def get_prediction(gr: get_review):
    text = [gr.review]
    vec = vector.transform(text)
    prediction = model.predict(vec)
    prediction = int(prediction)
    print(prediction)
    if prediction == 1:
        prediction = 'positive'
    else:
        prediction = 'negative'
    return {"sentence": gr.review, "prediction": prediction}

Deployment

This application is now deployed to heroku on classification-imdb-naivebayes.herokuapp.co...

image.png

image.png

Did you find this article valuable?

Support Makereading by becoming a sponsor. Any amount is appreciated!