graviti
Products
Resources
About us
ChID
Text
NLP
|...
License: Apache-2.0

Overview

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

Data Format

One example is shown below:

{
    "content": "世锦赛的整体水平远高于亚洲杯,要如同亚洲杯那样“鱼与熊掌兼得”,就需要各方面密切配合、#idiom#。作为主帅的俞觉敏,除了得打破保守思想,敢于破格用人,还得巧于用兵、#idiom#、灵活排阵,指挥得当,力争通过比赛推新人、出佳绩、出新的战斗力。",

    "realCount": 2,
    "groundTruth": ["通力合作", "有的放矢"],
    "candidates": [
        ["凭空捏造", "高头大马", "通力合作", "同舟共济", "和衷共济", "蓬头垢面", "紧锣密鼓"],
        ["叫苦连天", "量体裁衣", "金榜题名", "百战不殆", "知彼知己", "有的放矢", "风流才子"]
    ]
}
  • content: The given passage where the original idioms are replaced by placeholders #idiom#
  • realCount: The number of placeholders or blanks
  • groundTruth: The golden answers in the order of blanks
  • candidates: The given candidates in the order of blanks

Citation

The ChID Dataset for paper ChID: A Large-scale Chinese IDiom Dataset for Cloze Test.

If your research is related to or based on our ChID dataset (or the version adapted for the
competition), please kindly cite it:

@inproceedings{zheng-etal-2019-chid,
    title = "{C}h{ID}: A Large-scale {C}hinese {ID}iom Dataset for Cloze Test",
    author = "Zheng, Chujie  and
      Huang, Minlie  and
      Sun, Aixin",
    booktitle = "Proceedings of the 57th Conference of
the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1075",
    pages = "778--787",
}

License

Apache-2.0

Data Summary
Type
Text,
Amount
3.848K
Size
195.83MB
Provided by
Chujie Zheng
I am Chujie Zheng (郑楚杰), a first-year Ph.D student of THUCoAI group, supervised by Prof. Minlie Huang. My major research interest lies in open-domain dialog. I got my B.Sc. in Dept. of Physics, Tsinghua University.
| Amount 3.848K | Size 195.83MB
ChID
Text
NLP
License: Apache-2.0

Overview

ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

Data Format

One example is shown below:

{
    "content": "世锦赛的整体水平远高于亚洲杯,要如同亚洲杯那样“鱼与熊掌兼得”,就需要各方面密切配合、#idiom#。作为主帅的俞觉敏,除了得打破保守思想,敢于破格用人,还得巧于用兵、#idiom#、灵活排阵,指挥得当,力争通过比赛推新人、出佳绩、出新的战斗力。",

    "realCount": 2,
    "groundTruth": ["通力合作", "有的放矢"],
    "candidates": [
        ["凭空捏造", "高头大马", "通力合作", "同舟共济", "和衷共济", "蓬头垢面", "紧锣密鼓"],
        ["叫苦连天", "量体裁衣", "金榜题名", "百战不殆", "知彼知己", "有的放矢", "风流才子"]
    ]
}
  • content: The given passage where the original idioms are replaced by placeholders #idiom#
  • realCount: The number of placeholders or blanks
  • groundTruth: The golden answers in the order of blanks
  • candidates: The given candidates in the order of blanks

Citation

The ChID Dataset for paper ChID: A Large-scale Chinese IDiom Dataset for Cloze Test.

If your research is related to or based on our ChID dataset (or the version adapted for the
competition), please kindly cite it:

@inproceedings{zheng-etal-2019-chid,
    title = "{C}h{ID}: A Large-scale {C}hinese {ID}iom Dataset for Cloze Test",
    author = "Zheng, Chujie  and
      Huang, Minlie  and
      Sun, Aixin",
    booktitle = "Proceedings of the 57th Conference of
the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1075",
    pages = "778--787",
}

License

Apache-2.0

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti