In this project, I implemented an algorithm using python to predict if the given review is positive or negative. I used “Bag of words” method to get the frequency of the words and Naive Bayes to determine the class.

I used this project for my Machine Learning Class.

The Dataset 

Movie review dataset  contains positive and negative reviews with a rating score (a negative review has a score smaller than or equal to 4 out of 10, while a positive review has a score bigger than or equal to 7 out of 10). You will try to implement Naive Bayes algorithm to predict the sentiment of the movie review.

  • • It contains 50,000 classified reviews as a text file in separate folders (25,000 for training and 25,000 for validation).
  • • Both of the training and validation sets include 12,500 positive reviews and 12,500 negative reviews). Each text file’s name includes their id and rating (id rating.txt).
  • • You can download the dataset from BBM4XX/BBM409_ML/Assignment_2/

All files are in my GitHub, so you can examine and read my report about it. You can contribute if you like.