Large Movie Review
License: Unknown


This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more details.


When using this dataset, please cite our ACL 2011 paper:

  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan
 and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {}
Data Summary
Provided by
Stanford University
Stanford University is a private research university in Stanford, California. Stanford's undergraduate program is the most selective in America. Due to its academic strength, wealth, and proximity to Silicon Valley, it is often cited as one of the world's most prestigious universities.
Start Building AI Now