graviti
Products
Resources
About us
UD-Chinese-GSD
Text
NLP
|...
License: CC BY-SA 4.0

Overview

Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.

Tokenization and Word Segmentation

  • This corpus contains 4997 sentences and 123291 tokens.
  • This corpus contains 122962 tokens (100%) that are not followed by a space.
  • This corpus does not contain words with spaces.
  • This corpus contains 41 types of words that contain both letters and punctuation. Examples:
    #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud,
    Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM,
    NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x),
    g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com

Click Here to learn more.

License

CC BY-SA 4.0

Data Summary
Type
Text,
Amount
--
Size
400.69MB
Provided by
Universal Dependencies
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing more than 150 treebanks in 90 languages.
| Amount -- | Size 400.69MB
UD-Chinese-GSD
Text
NLP
License: CC BY-SA 4.0

Overview

Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.

Tokenization and Word Segmentation

  • This corpus contains 4997 sentences and 123291 tokens.
  • This corpus contains 122962 tokens (100%) that are not followed by a space.
  • This corpus does not contain words with spaces.
  • This corpus contains 41 types of words that contain both letters and punctuation. Examples:
    #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud,
    Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM,
    NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x),
    g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com

Click Here to learn more.

License

CC BY-SA 4.0

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti