Language resource #: 3330 Results 181 - 190 of 2023
Current query
Input keywords
Select items
  • C-000457: F-Korean01 - Foreign Speakers’ Korean
    Speech of Korean by Korean
    Japanese
    Chinese
    and English speakers
    The prompts are designed in consideration of Korean phonemes and phonetic environments and foreigners' frequent errors in speaking Korean
  • C-000458: K-SEC - Korean Speakers’ Korean and English
    English words and sentences uttered by 342 speakers from
    primary schools, middle schools, and other areas all over
    the country
  • C-000463: Multimodal01 - Multimodal Speech Corpus
    Multimodal corpus of voice and video of the frontal face captured by the camcorder
  • C-000464: Simultaneous Interpretation Database (conversation)
    It's a corpus which has built simultaneous interpretation(self talk&conversation)for 5 years from 1999 to 2003at CIAIR of Nagoya University. Overall it contains approximately 182 hours sound recorded, and has finished making the scripts, visualized the recordings and analyzing the language.The numbers of words(form elements)of the dictated script data is about 1 million and it's the biggest simultaneous interpretation corpus in the world.The dialog data has simulated-dialogs.
  • C-000465: SynthFemale01- Read Sentences Speech Corpus for Prosody Synthesis
    Speech recorded for prosody synthesis
    K-ToBI labeling (1,000 sentences)
    • references: KAIST Tagged Corpus
  • C-000467: The Babel English-Chinese Parallel Corpus
    The Babel English-Chinese Parallel Corpus consists of 327 English articles and their translations in Mandarin Chinese. Of these, 115 texts (121,493 English words plus 135,493 Chinese words) were collected from the World of English between October 2000 and February 2001 while the remaining 212 texts (132,140 English words plus 151,969 Chinese words) were collected from Time from September 2000 to January 2001. The corpus contains a total of 544,095 words (253,633 English words and 287,462 Chinese words). Both English and Chinese texts are tagged for part of speech. The parallel corpus is aligned at the sentence level. Sentence alignment was done automatically and corrected by hand.
  • C-000470: The Bergen Corpus of London Teenage Language
    The Bergen Corpus of London Teenage Language (COLT) is the first large English Corpus focusing on the speech of teenagers. It was collected in 1993 and consists of the spoken language of 13 to 17-year-old teenagers from different boroughs of London. The complete corpus, half a million words, has been orthographically transcribed and word-class tagged, and is a constituent of the British National Corpus.
    • isPartOf: Bristish National Corpus
  • C-000471: The Chinese Treebank
    The Chinese Treebank is a segmented, POS tagged and bracketed Chinese corpus which currently has 800 thousand words. Portion s of this data have been annotated with predicate-argument structures, discourse relations, word sense and coreference links. The richly annotated data is primarily for use in Natural Language Processing, but it can also be used for linguistic analysis.
  • G-000473: The Enabling Minority Language Engineering Corpus
    A set of corpora for fifteen languages of South Asia. The corpus includes a re-coded version of the Central Institute for Indian Language (CIIL)'s corpus collection. Data includes monolingual written data, monolingual spoken data, and parallel data. Total size is 97 million words.
  • C-000474: The ICAME Corpus Collection
    written,spoken,historical,tagged,parsed collections