Language resource #: 3330
Results 1011 - 1020 of 2023
-
C-003112: CSLU: Yes/No Version 1.2
*Introduction*
This file contains documentation for CSLU:Yes/No Version 1.2, Linguistic Data Consortium (LDC) catalog number LDC2007S05 and isbn 1-58563-445-X.
CSLU: Yes/No Version 1.2 is a collection of answers to yes/no questions from various telephone speech corpora created by the Center for Spoken Language Understanding, Oregon Health and Science University (CSLU). The corpus contains approximately 20,000 examples of roughly 18,000 speakers saying "yes" or "no" in response to various questions.
Each speech file in the corpus has a corresopnding orthographic transcription following the CSLU Labeling Conventions. In cases where a transcription did not already exist, the utterance was run through a speech recognizer to automatically obtain the transcription.
The data were collected from both analog and digital phone lines. The analog data were recorded using a Gradient Technologies analog-to-digital conversion box. These files were recorded as 16-bit, 8 khz and stored in a linear format. The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded.
*Samples*
For a sample of the audio in this corpus, please listen to this sample .- replaces: Yes/No Corpus Version 1.1
- isReferencedBy: (Online Documentation) http://www.ldc.upenn.edu/Catalog/docs/LDC2007S05/
- isReferencedBy: Mike Noel 2007 CSLU: Yes/No Version 1.2 Linguistic Data Consortium, Philadelphia
-
C-003152: NICT JLE Corpus
"NICT-JLE Corpus" is speech corpus of Japanese students studying English produced by the National Institute of Information and Communications Technology (NICT), with corporation of 1281 applicants for SST. This corpus would be the largest class of all the speech corpus of English learners.
- hasPart: Normative Corpus
- hasPart: back-translation corpus
-
C-003153: Comprehensive Database of Chinese Name Variants
Chinese names can be spelled in a bewildering variety of ways. Our databases of Chinese names and non-Chinese proper nouns in both Simplified and Traditional Chinese, including romanized variants, contain nearly two million entries. There are several well-established systems for romanizing/transcribing Chinese, as well as various popular ones and many older ones that have fallen out of use.
- isReferencedBy: C-003171: Chinese-English Database of Proper Nouns
-
C-003171: Chinese-English Database of Proper Nouns
Comprehensive database of proper nouns in Chinese This includes,
1. Non-Chinese personal names
2. Chinese place names
3. Non-Chinese place names
4. Companies and organizations
5. Publications and literary works
6. Names of famous people
7. Miscellaneous proper nouns- isPartOf: C-003153: Comprehensive Database of Chinese Name Variants
- hasVersion: D-003208: SC JAPANESE PROPER NOUNS
- hasVersion: C-003172: CHINESE LEXICAL DATABASE
-
C-003172: CHINESE LEXICAL DATABASE
The Chinese Lexical Database (CLD) is a comprehensive monolingual lexical database of Chinese.
- references: D-003207: SC AND TC CHINESE PINYIN DATABASE
-
C-003173: Multilingual Dictionary of Proper Nouns
This edition, the Multilingual Database of Proper Nouns, (CJKE-DPN) currently contains about 150,000 entries (including variants), covering the most common CJK and Western personal names and surnames, brings together five languages (CJKE) -- Simplified Chinese (SC), Traditional Chinese (TC), Japanese, Korean and English, in a multidirectional format, and is s now being expanded to include Arabic and Spanish.
-
C-003200: Linguistic Development Corpus
The Linguistic Development Corpus is a part of the French Learner Language Oral Corpora project aimed at promoting research relating to the acquisition of French as a second/foreign language. It consists of speech data in French by learners aged 13, 14, 15 years at a local secondary school in the UK, and was made in order to complement the database already collected for years 7, 8 (aged 11-12) and beginning of 9 (Progression Project). Four oral tasks were administered to all speakers; a story re-telling task (Cartoon story task), an interrogative elicitation task using a drawing (Interrogative elicitation task), a one-to-one interview using photographs as prompts (Photos task), and elicitation task using picture cues saying what they do and do not do (Negative elicitation task). The corpus contains morphosyntactically tagged transcripts.
- isPartOf: N-003064: French Learner Language Oral Corpora (FLLOC)
- hasVersion: C-003201: Progression Corpus
- requires: CHILDES (http://childes.psy.cmu.edu/)
-
C-003201: Progression Corpus
The Progression Corpus is a part of the French Learner Language Oral Corpora project aimed at promoting research relating to the acquisition of French as a second/foreign language. It consists of speech data in French by beginning classroom learners aged 11 - 14, in the UK. In the project, students were tracked through two years (six terms), from the second term of Year 7 until the first term of Year 9 inclusive. Tasks administered were; a story re-telling task (Task L), an informal conversation task using photos as prompts (Task A), an interrogatives elicitation task (Task I), and other tasks done in learner pairs or groups. The corpus contains morphosyntactically tagged transcripts.
- isPartOf: N-003064: French Learner Language Oral Corpora (FLLOC)
- hasVersion: C-003200: Linguistic Development Corpus
- requires: CHILDES (http://childes.psy.cmu.edu/)
-
C-003202: Salford Corpus
The Salford Corpus is a part of the French Learner Language Oral Corpora project aimed at promoting research relating to the acquisition of French as a second/foreign language. The longitudinal study was of 12 undergraduates studying French at a British university, from their first year to the fourth year. The participants were recorded carrying out a variety of production tasks in French including general chat, cartoon description, oral translation, and others. There are also some examples of the participants carrying out the same tasks in English for comparison. The corpus contains morphosyntactically tagged transcripts.
-
C-003203: Brussels Corpus
Brussels Corpus was collected for the investigation of the simultaneous learning of two foreign languages (French and English) in an educational context (secondary education in the Dutch-speaking region of Flanders, Belgium). Some 150 Dutch-speaking Flemish students from different schools were tested for their proficiency (speaking, reading, writing, reading proficiency and metalinguistic knowledge) in both target languages by administering an individual 15 minute interview with a (near-)native speaker of French and English. The interviews were designed to elicit a variety of discourse types (personal conversation, narrative, descriptive, expository) that could be expected to contain a variety of linguistic structures.. The corpus available on the FLLOC website contains only tagged transcripts of narrative task, with morphosyntactic coding. Sound data are not availabe on the website.