Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1431 - 1440 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004028: HKCAC
An adult language corpus of spoken Hong Kong Cantonese (HKCAC) has recently been developed consisting of spontaneous speech recorded from phone-in programs and forums on the radio in Hong Kong. The database represents the speech of a total of sixty-nine speakers in addition to the program hosts, and has approximately 170,000 characters. It is believed that HKCAC will be of great value to linguists who are interested in studying Cantonese, and speech therapists and educators who work with the Cantonese speaking population.
C-004029: HKCPSC
This study investigated the development of the mental representation of Chinese disyllabic words. Unlike alphabetical languages, Chinese is a logographic system where character is the basic unit of meaning. Most Chinese words are composed of two characters. Theoretically, Chinese compound word can be read either as a whole unit or as the component character. Subjects were asked to read aloud a list of two-character words, controlled for word and component character frequencies across grades. The correct percentage was analyzed using three two-way analyses of variance. Results indicated that children are able to make use of both levels of reading as early as Grade 1. Lower graders tended to use both the component character level reading processes more, while higher graders tended to read words as whole units more.
C-004030: NTU Corpus of Formosan Languages
Most of the Formosan languages lack written records and many have either become extinct or are now seriously endangered. The creation of this linguistic database is an attempt not only to preserve valuable linguistic heritage, but is also to provide a systematic recording of these languages, for the benefit of linguistic research.This database contains first-hand data transcribed largely according to Du Bois et al (1993). The Intonation Unit (IU) serves as the basic unit for a detailed recording of linguistic phenomena, including pauses, repetitions, repair, and intonation. Aside from recorded data, the database also contains hundreds of field notes gathered in the course of field research. These pieces of information are precious for the linguistics researcher, as they reveal the structure of a language and show the interface between language and the human cognitive system.http://corpus.linguistics.ntu.edu.tw/index_en.php
C-004031: Cleaneval development dataset
This is a perl script which takes two arguments: first, the file to be scored, and second, the gold-standard file to compare it with. It calculates scores based on (1) the edit distance between the two and the extent to which contestant-inserted markup tags indicate blocks of text starting and ending in the same places; and (2) based on alignment of text alone, ignoring the contestant-inserted markup tags. Comments in the code provide more detail. It has been well tested for English but not so well tested for Chinese: we hope to publish an amended version for Chinese shortly. Script available at http://cleaneval.sigwac.org.uk/cleaneval_scorer.zip (-- zipped so our server does not try to run it).
- isRequiredBy: Cleaneval
C-004032: SCoRE: Singapore Corpus of Research in Education
a large collection of data on classroom interactions, teaching materials and students' assignments in Singapore primary and secondary schools from its various research projects. The proposed deliverables include a speech subcorpus, a lexical subcorpus, and several multilevel annotated subcorpora at different development stages. Eventually all these subcorpora will be indexed and incorporated into a large corpus database, which will be provided with sophisticated query tools for both online and offline queries.
C-004033: Singaporean Preschoolers Oral Competence in Mandarin
This is a specific focus project investigating the relationship between Singaporean Chinese children's home language use and their oral Mandarin competence. In this project, random sampling approach was adopted, where 1000 of boys and girls aged at 5 and 6 years old from 36 childcare centers and kindergartens (17 public, 10 church and 9 private) were recruited. In addition to the equal number of their parents’ sociolinguistic questionnaires collected and processed, the oral production of 600 of the 1000 participants (300 hours audio recordings) and 24 video taped classroom observations (12 hours video recordings) were transcribed and annotated. The ultimate goal of this project is to compile a multi-modal corpus of Singapore preschool children's oral language in Mandarin.
C-004034: Hindi Speech Data base
This is related to the speech technology. The data base is meant to be supportive for developing Automatic Speech Recognition (ASR) systems in Hindi.
C-004035: Mandarin Topic-oriented Conversation Corpus
The annotation system is designed to mark discourse functions in natural conversations. Opening, main discussion and closing are the three main parts of a natural, topic-oriented conversation. The main discussion contains discourse functions intended to start a discussion, to negotiate a topic, to introduce a topic, to talk about a topic, and to end the discussion.
C-004036: Mandarin Map Task Corpus
The Mandarin Map Task Corpus (MMTC) was recorded in 2002, from January to March. There are 30 task-oriented conversations between familiar persons. One speaker with a detailed map had to give oral instructions to the other speaker with a simplified map to three destinations on the map. The total length of the conversations are 5 hours. The average length of each conversation is 10 minutes.
C-004037: The Mandarin Conversational Dialogue Corpus
The Mandarin Conversational Dialogue Corpus (MCDC) was recorded in 2001, from March to July. The conversations are natural conversations between two strangers. The conversation partners had to introduce themselves at the beginning of the conversation. The rest of the conversation was completely up to the conversation partners. There are 60 speakers in total. The total length of the 30 conversations is 25.6 hours; the average length of each conversation is 50 minutes.
- hasFormat: C-003881: Sinica MCDC

SHACHI - Language Resource Metadata Database