Language resource #: 3330 Results 1361 - 1370 of 2023
Current query
Input keywords
Select items
  • C-003870: Academia Sinica Tagged Corpus of Early Mandarin Chinese
    The www version of Sinica Early Chinese Corpus was open to research community in November, 2001. Literature available for searching is "Hong-lou-mong" and "San-sui-ping-yao-zhuan". Although the searching function and criteria for segmentation and tagging are mostly the same with Sinica Corpus for modern Chinese, it has its own features as well. For example, while the searching result provides the part-of-speech with the lemma, the source of the cited sentence is also given. This feature helps researchers on the reference. Besides, for the segmentation and tagging standard, some changes are also made because of the different focus from the standard for analyzing modern language. For instance, the verb-complement structure is labelled more detailed in our Sinica Early Chinese Corpus.
    Scholarly Exchange and Institute of History and Philology in Academia Sinica. The objective was only to collect the raw text at that time. Ever since, the colletion for raw corpus has never stopped. The collected data has entended from Primitive Chinese to Medieval Chinese and Early Chinese. The work of collection is mainly managed by Prof. Pei-Chuang Wei and is founded by Academia Sinica. The tagging of Primitive Chinese data started in 1995. For Early Chinese, the tagging system was designed in 1997 and applied immediately. The project of Early Chinese was led by Prof. Pei-Chuang Wei and Prof. Cheng-hui Liu (Hsing-Hua University) The financial support came from Academia Sinica and National Science Council; technical support on tagging system and computer science were provided by Prof. Chu-Ren Huang, Prof. Ker-Jiann Chen, and ASCC.
  • C-003873: MAT-160
    These speech files are collected from 81 male and 79 female speakers through telephone networks.
  • C-003874: MAT-400
    These speech files are collected from 216 male and 184 female speakers through
    telephone networks.
  • C-003875: MAT-2000Edu
    The original database is MAT-2400 where speech data are collected through telephone networks in Taiwan.
  • C-003876: MAT-2000Com
    The original database is MAT-2400 where speech data are collected through telephone networks in Taiwan.
  • C-003877: MAT-2500ExtV-Edu
    The original database is MAT-2500Ext where speech data are collected through
    telephone networks in Taiwan. The database contains files provided by 2573 speakers
    (1268 males and 1305 females).
  • C-003878: MAT-2500ExtV-Com
    The original database is MAT-2500Ext where speech data are collected through telephone networks in Taiwan. The database contains files provided by 2573 speakers.
  • C-003879: TCC-300Edu
    This is a collection of microphone speech databases produced by National Taiwan University, National Cheng Kung University, and National Chiao Tung University. The speech data of each university are provided by 100 speakers (50 males and 50 females). Totally TCC-300 contains speech data from 300 speakers.
  • C-003880: TCC-300Com
    This is a collection of microphone speech databases produced by National Taiwan University, National Cheng Kung University, and National Chiao Tung University. The speech data of each university are provided by 100 speakers (50 males and 50 females). Totally TCC-300 contains speech data from 300 speakers.
  • C-003881: Sinica MCDC
    The Sinica MCDC includes the sound files and transcripts of eight Mandarin conversations.