graviti
Products
Resources
About us
Stanford Sentiment Treebank
Text
NLP
|...
License: Unknown

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten
    Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt
    contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees
    in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt
    file. The Matlab code of this paper will show you how to read this format if you are not familiar
    with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string
    separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt
    contains the sentence index (corresponding to the index in datasetSentences.txt file) followed
    by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning
and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}
Data Summary
Type
Text,
Amount
--
Size
11.38MB
Provided by
Stanford
Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California.
| Amount -- | Size 11.38MB
Stanford Sentiment Treebank
Text
NLP
License: Unknown

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten
    Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt
    contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees
    in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt
    file. The Matlab code of this paper will show you how to read this format if you are not familiar
    with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string
    separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt
    contains the sentence index (corresponding to the index in datasetSentences.txt file) followed
    by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning
and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}
0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti