Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 181 - 190 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-000457: F-Korean01 - Foreign Speakers’ Korean
Speech of Korean by Korean
Japanese
Chinese
and English speakers
The prompts are designed in consideration of Korean phonemes and phonetic environments and foreigners' frequent errors in speaking Korean
C-000458: K-SEC - Korean Speakers’ Korean and English
English words and sentences uttered by 342 speakers from
primary schools, middle schools, and other areas all over
the country
C-000463: Multimodal01 - Multimodal Speech Corpus
Multimodal corpus of voice and video of the frontal face captured by the camcorder
C-000464: Simultaneous Interpretation Database (conversation)
It's a corpus which has built simultaneous interpretation(self talk&conversation)for 5 years from 1999 to 2003at CIAIR of Nagoya University. Overall it contains approximately 182 hours sound recorded, and has finished making the scripts, visualized the recordings and analyzing the language.The numbers of words(form elements)of the dictated script data is about 1 million and it's the biggest simultaneous interpretation corpus in the world.The dialog data has simulated-dialogs.
- isPartOf: C-003270: Simultaneous Interpretation Database
- hasVersion: C-000553: Simultaneous Interpretation Database (speech)
C-000465: SynthFemale01- Read Sentences Speech Corpus for Prosody Synthesis
Speech recorded for prosody synthesis
K-ToBI labeling (1,000 sentences)
- references: KAIST Tagged Corpus
C-000467: The Babel English-Chinese Parallel Corpus
The Babel English-Chinese Parallel Corpus consists of 327 English articles and their translations in Mandarin Chinese. Of these, 115 texts (121,493 English words plus 135,493 Chinese words) were collected from the World of English between October 2000 and February 2001 while the remaining 212 texts (132,140 English words plus 151,969 Chinese words) were collected from Time from September 2000 to January 2001. The corpus contains a total of 544,095 words (253,633 English words and 287,462 Chinese words). Both English and Chinese texts are tagged for part of speech. The parallel corpus is aligned at the sentence level. Sentence alignment was done automatically and corrected by hand.
C-000470: The Bergen Corpus of London Teenage Language
The Bergen Corpus of London Teenage Language (COLT) is the first large English Corpus focusing on the speech of teenagers. It was collected in 1993 and consists of the spoken language of 13 to 17-year-old teenagers from different boroughs of London. The complete corpus, half a million words, has been orthographically transcribed and word-class tagged, and is a constituent of the British National Corpus.
- isPartOf: Bristish National Corpus
C-000471: The Chinese Treebank
The Chinese Treebank is a segmented, POS tagged and bracketed Chinese corpus which currently has 800 thousand words. Portion s of this data have been annotated with predicate-argument structures, discourse relations, word sense and coreference links. The richly annotated data is primarily for use in Natural Language Processing, but it can also be used for linguistic analysis.
G-000473: The Enabling Minority Language Engineering Corpus
A set of corpora for fifteen languages of South Asia. The corpus includes a re-coded version of the Central Institute for Indian Language (CIIL)'s corpus collection. Data includes monolingual written data, monolingual spoken data, and parallel data. Total size is 97 million words.
C-000474: The ICAME Corpus Collection
written,spoken,historical,tagged,parsed collections

SHACHI - Language Resource Metadata Database