graviti
Products
Resources
About us
C3
Text
NLP
|...
License: Custom

Overview

C3 is the first free-form multiple-Choice Chinese machine reading Comprehension dataset, containing
13,369 documents (dialogues or more formally written mixed-genre texts) and their associated
19,577 multiple-choice free-form questions collected from Chinese-as-a-second language examinations.

Data Format

data/c3-{m,d}-{train,dev,test}.json: the dataset files, where m and d represent "mixed-genre"
and "dialogue", respectively. The data format is as follows.

[
  [
    [
      document 1
    ],
    [
      {
        "question": document 1 / question 1,
        "choice": [
          document 1 / question 1 / answer option 1,
          document 1 / question 1 / answer option 2,
          ...
        ],
        "answer": document 1 / question 1 / correct answer option
      },
      {
        "question": document 1 / question 2,
        "choice": [
          document 1 / question 2 / answer option 1,
          document 1 / question 2 / answer option 2,
          ...
        ],
        "answer": document 1 / question 2 / correct answer option
      },
      ...
    ],
    document 1 / id
  ],
  [
    [
      document 2
    ],
    [
      {
        "question": document 2 / question 1,
        "choice": [
          document 2 / question 1 / answer option 1,
          document 2 / question 1 / answer option 2,
          ...
        ],
        "answer": document 2 / question 1 / correct answer option
      },
      {
        "question": document 2 / question 2,
        "choice": [
          document 2 / question 2 / answer option 1,
          document 2 / question 2 / answer option 2,
          ...
        ],
        "answer": document 2 / question 2 / correct answer option
      },
      ...
    ],
    document 2 / id
  ],
  ...
]

Citation

@article{sun2019investigating,
  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  url={https://arxiv.org/abs/1904.09679v3}
}

License

Custom

Data Summary
Type
Text,
Amount
19.577K
Size
3.09MB
Provided by
dataset.org
Our basic research areas include computer vision, speech recognition, natural language processing, and machine learning. Applied exploration combines Tencent's scenarios and business advantages to create four categories of content, games, social networking, and platform-based tools AI. At present, Weiqi AI has become a 'superb art' and its technology has also been developed by Weixin, QQ, Daily Express and QQ music. Hundreds of Tencent products are used.
| Amount 19.577K | Size 3.09MB
C3
Text
NLP
License: Custom

Overview

C3 is the first free-form multiple-Choice Chinese machine reading Comprehension dataset, containing
13,369 documents (dialogues or more formally written mixed-genre texts) and their associated
19,577 multiple-choice free-form questions collected from Chinese-as-a-second language examinations.

Data Format

data/c3-{m,d}-{train,dev,test}.json: the dataset files, where m and d represent "mixed-genre"
and "dialogue", respectively. The data format is as follows.

[
  [
    [
      document 1
    ],
    [
      {
        "question": document 1 / question 1,
        "choice": [
          document 1 / question 1 / answer option 1,
          document 1 / question 1 / answer option 2,
          ...
        ],
        "answer": document 1 / question 1 / correct answer option
      },
      {
        "question": document 1 / question 2,
        "choice": [
          document 1 / question 2 / answer option 1,
          document 1 / question 2 / answer option 2,
          ...
        ],
        "answer": document 1 / question 2 / correct answer option
      },
      ...
    ],
    document 1 / id
  ],
  [
    [
      document 2
    ],
    [
      {
        "question": document 2 / question 1,
        "choice": [
          document 2 / question 1 / answer option 1,
          document 2 / question 1 / answer option 2,
          ...
        ],
        "answer": document 2 / question 1 / correct answer option
      },
      {
        "question": document 2 / question 2,
        "choice": [
          document 2 / question 2 / answer option 1,
          document 2 / question 2 / answer option 2,
          ...
        ],
        "answer": document 2 / question 2 / correct answer option
      },
      ...
    ],
    document 2 / id
  ],
  ...
]

Citation

@article{sun2019investigating,
  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  url={https://arxiv.org/abs/1904.09679v3}
}

License

Custom

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti