Fake Review Detection System using NLP and ML

Fake Review Detection System using NLP and ML

๐Ÿ” Introduction

In the age of online shopping, product reviews play a crucial role in influencing customer decisions. However, not all reviews are genuine โ€” some are fake, either to promote products or sabotage competitors. To tackle this issue, we developed a Fake Review Detection System using Natural Language Processing (NLP) and Machine Learning, wrapped in a user-friendly Streamlit web interface.

This project enables users to upload a CSV file of product reviews and receive two separate downloadable files: one containing real reviews and the other with fake reviews.


๐ŸŽ“ What You Will Learn

  • How to preprocess review data
  • How to train an NLP-based ML model
  • How to classify fake vs. real reviews
  • How to create a web interface using Streamlit
  • How to handle file upload and download in a web app

Heart Attack Prediction Using Machine Learning : Click here

๐Ÿ  Tech Stack

  • Frontend: Streamlit (Python-based web framework)
  • Backend: Logistic Regression with TF-IDF Vectorizer
  • Language: Python
  • Libraries: Pandas, NumPy, scikit-learn, re, string

๐ŸŒ Streamlit App Flow

1. Upload the CSV file

Users upload a CSV containing product review data.

2. NLP Model Processes Reviews

The system preprocesses the text (lowercasing, punctuation and digit removal) and uses a pre-trained TF-IDF + Logistic Regression model to classify reviews.

3. Download the Results

Two downloadable CSVs are generated: real_reviews.csv and fake_reviews.csv.


New Real World Projects : Click Here

๐Ÿ“‚ Required CSV Format

Ensure your file follows this structure:

categoryratinglabeltext_
Home_and_Kitchen_55CGLove this! Well made, sturdy.
Home_and_Kitchen_51ORMissing information on how to use it.
  • category: Product category
  • rating: Star rating (1-5)
  • label: CG for genuine, OR for other
  • text_: The review content

๐Ÿ”ฎ How Fake/Real Is Determined

We used TF-IDF (Term Frequency-Inverse Document Frequency) to transform text data into numerical vectors. Then, we trained a Logistic Regression model using labeled data:

  • Label CG is treated as real (1)
  • Others are treated as fake (0)

๐ŸŒ Full Streamlit Code Overview

The app is a single Python file:

  • Loads and trains a model using a sample dataset
  • Allows CSV upload
  • Validates column structure
  • Preprocesses text
  • Classifies reviews
  • Generates download buttons for real and fake reviews

๐Ÿš€ How to Run the App

  1. Save the app as fake_review_app.py
  2. Install dependencies:
pip install streamlit pandas numpy scikit-learn
  1. Run the Streamlit app:
streamlit run fake_review_app.py
  1. Open your browser at http://localhost:8501
  2. Upload your review CSV and download results!

Report

The report will include:

โœ… Abstract
โœ… Introduction (Overview, Problem Statement, Motivation)
โœ… Literature Review
โœ… Existing System & Drawbacks
โœ… Proposed System
โœ… System Architecture (Diagrams)
โœ… System Specifications
โœ… Experimental Design Diagrams
โœ… Implementation (Setup, Modules, Sample Code)
โœ… System Testing
โœ… Results & Screenshots
โœ… Conclusion & Future Scope
โœ… References

image-8 Fake Review Detection System using NLP and ML
Fake Review Detection System
AD_4nXceBVijZXuiSTpqKetbrsL6KYRr94ruH1PHUvwqkaOKhVlEQ-fjmY8GTXwx8mChU1cQqCcoi-mQnGLVzzs57Hp497rQ2tmbgx4BFca_5lD7VRXbDNPS2Um-NJezJAURYHUmL9mt7w?key=FeGpu5VOQrmp5ssEXJ-roaMl Fake Review Detection System using NLP and ML
Fake Review Detection System

Share this content: