言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 71 - 80 件目

C-000143: TEDphone (Polyphone-like Translanguage English Database)
Telephone
TEDphone (1 CD) is part of the TED speech database, described in this catalogue. It is a Polyphone-like recordings for 64 speakers in English and in their native language. For 25 of the speakers there are also simultaneous laryngograph recordings. The TED (Transnational English Database) corpus contains recordings of speeches made at the Eurospeech-93 conference in Berlin. The name of the corpus --- and its nickname, ``The Terrible English Database'' --- reflects the fact that a high percentage of the presentations at Eurospeech-93 were given in English by non-native speakers of English.
C-000150: Telephone Speech Data Collection for Czech
Telephone
This database contains speech collected in Czech Republic during summer 1999. This database comprises telephone recordings from 1227 speakers (590 males and 637 females) recorded directly over the fixed telephone network using an ISDN interface.
Speech files are stored as sequences of 8bit 8 kHz A-law uncompressed speech samples. Each prompted utterance is stored within a separate file. Each speech file has an accompanying ASCII SAM label file according to the specifications of the SpeechDat project (URL: http://www.speechdat.com).
Corpus contents:
? connected digits (prompt sheet number, telephone number, credit card number),
? sequences of isolated digits (5 digits),
? answers to yes/no questions,
? common application words and phrases.
The following age distribution has been obtained: 36 speakers are below 16 years old, 537 speakers are between 16 and 30, 306 speakers are between 31 and 45, 259 speakers are between 46 and 60, 88 speakers are over 60, and 1 speaker whose age is unknown.
The transcription included in this database is an orthographic, lexical transcription with a few details that represent audible acoustic events (speech and non speech) present in the corresponding waveform files. SpeechDat conventions were used in this database.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000151: Telephone speech corpus for recognition-Male voice
This corpus was recorded at 8kHz sample rate. It includes 664 speakers with different accents, ages, and knowledge levels. The data of the corpus were collected in diverse channels and the long-distance telephone calls of it were come from 38 provinces, cities, and municipalities. In order to ensure the output models trained by this corpus have good effect, the texts of the records were designed both considering the most common situations in telephone speech and covering with the Chinese syllables and their connecting relationships?B
http://www.chineseldc.org/EN/doc/CLDC-SPC-2004-010/intro.htm
C-000152: Text of Northern Bunun (Taiwan)...New TextTexts
[This data base is an extract from a revised version of a text published in “Bunun Texts No.1” (in Edit. Moriguchi, T. (2001) A Linguistic Examination of the Oral Traditions and its Relationship to Anthropology. Shizuoka University). This text is currently under revision and edit; and at present, quotes and reviews of dissertations and other papers based on this data are not permitted. Readers who wish to obtain a copy of the book containing the original text should contact the 21st Century COE Program “Usage-Based Linguistic Informatics” at Tokyo University of Foreign Studies via e-mail (coelang@tufs.ac.jp).
- isReferencedBy: Bunun Texts No.1
C-000153: The EMILLE/CIIL Corpus
Written Corpora
The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Oriya, Punjabi, Sinhala, Tamil, Telegu and Urdu. The EMILLE monolingual corpora contain approximately 92,799,000 words (including 2,627,000 words of transcribed spoken data for Bengali, Gujarati, Hindi, Punjabi and Urdu). The parallel corpus consists of 200,000 words of text in English and its accompanying translations in Hindi, Bengali, Punjabi, Gujarati and Urdu. The annotated component includes the Urdu monolingual and parallel corpora automatically annotated for parts-of-speech, together with twenty written Hindi corpus files annotated to show the nature of demonstrative use. All other components are annotated at the sentence level. The corpus is marked up using CES-compliant SGML and encoded using Unicode.

References: Xiao, Z, McEnery, A., Baker, P. and Hardie, A. 2004. Developing Asian language corpora: standards and practice in Sornlertlamvanich, V., Tokunaga, T. and Huang, C. (eds.) Proceedings of the Fourth Workshop on Asian Language Resources, pp. 1-8. March 25, Sanya.

This database is available for research use by academic organisations only. For a use by commercial organisations, a subset of the EMILLE/CIIL Corpus is available under the reference ELRA-W0038 The EMILLE Lancaster Corpus.
- references: C-001540: The EMILLE Lancaster Corpus
C-000156: The identifiable speech database of tabletop speech--the people?fs name, the place?f name (120 persons)
The number of people involving recording: The product totally uses 70 speakers (38 males, 32 females). The speakers have different accent, age, and education background.The recording?fs content: 50 speakers: The content includes 4 parts: people ?fname, country?fs name, the Chinese city?fs name, street?f name, company institution?fs name, and geographical name. 60 sentences of people ?fname, 20 sentences of country?fs name+10 sentences of Chinese city?fs name +30 sentences of street?f name, 50 sentences of company institution?fs name, and 10 sentences of geographical name.The capability of product: The total product data is 2228 MB, totally 15 hours.
http://www.chineseldc.org/EN/doc/CLDC-SPC-2006-014/intro.htm
- hasVersion: The identifiable speech database of telephone speech——the name of person, the name of place ( 265 people using mobile telephone )
- hasVersion: The identifiable speech database of telephone speech——the name of person, the name of place (285 speakers using stable telephone )
- hasVersion: The identifiable speech database of telephone speech——the number string (265 people using mobile telephone )
- hasVersion: The identifiable speech database of telephone speech——the number string (285 speakers using stable telephone)
- hasVersion: The identifiable speech database of telephone speech——stock (265 people using mobile telephone )
- hasVersion: The identifiable speech database of telephone speech——the stock (285 people using stable telephone )
- hasVersion: The identifiable speech database of telephone speech——the message (64 people using mobile telephone )
- hasVersion: The identifiable speech database of telephone speech——the message (86 people using mobile telephone )
- hasVersion: The identifiable speech database of tabletop speech——the message (200 persons )
- hasVersion: The identifiable speech database of tabletop speech——the number string (200 persons )
- hasVersion: The identifiable speech database of tabletop speech——the number string (10 persons )
- hasVersion: the identifiable speech database of tabletop speech——the message (120 persons )
- hasVersion: The identifiable speech database of tabletop speech——the number string (120 persons )
- hasVersion: The identifiable speech database of tabletop speech——the stock (70 persons )
- hasVersion: The identifiable speech database of tabletop speech——free topic (50 persons )
- hasVersion: The identifiable speech database of Chinese mandarin -----wide label
C-000188: VERBMOBIL - VM CD 1.1 (new edition)
Desktop/Microphone
Verbmobil is a long-term project of the German Federal Ministry of Education, Science, Research and Technology (BMBF, Projekträger DLR). Its aim is to give Germany an international top position in language technology and its economical application in the next millenium by cooperation and concentration of as many as possible specialists from industry and science. The long-sighted aim is the development of a mobile translation system for the translation of spontaneous speech in face-to-face situations.The following resources are spontaneous speech databases recorded in a dialogue task (appointment scheduling) .
VM CD 1.1 (new edition) consists of 1 CD-ROM with 63 Dialogues 209 Appointments, 1840 Turns in German. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
C-000189: VERBMOBIL - VM CD 12.1 (new edition)
Desktop/Microphone
Verbmobil is a long-term project of the German Federal Ministry of Education, Science, Research and Technology (BMBF, Projekträger DLR). Its aim is to give Germany an international top position in language technology and its economical application in the next millenium by cooperation and concentration of as many as possible specialists from industry and science. The long-sighted aim is the development of a mobile translation system for the translation of spontaneous speech in face-to-face situations.The following resources are spontaneous speech databases recorded in a dialogue task (appointment scheduling) .
VM CD 12.1 (new edition) consists of 1 CD-ROM with 207 Dialogues, 207 Appointments, 2,154 Turns in German. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files (files describing the different parts which constitute the corpus - word order, phrase order, etc.). All files were validated according to BAS guidelines.
C-000190: VERBMOBIL - VM CD 14.1 (new edition)
Desktop/Microphone
Verbmobil is a long-term project of the German Federal Ministry of Education, Science, Research and Technology (BMBF, Projekträger DLR). Its aim is to give Germany an international top position in language technology and its economical application in the next millenium by cooperation and concentration of as many as possible specialists from industry and science. The long-sighted aim is the development of a mobile translation system for the translation of spontaneous speech in face-to-face situations.The following resources are spontaneous speech databases recorded in a dialogue task (appointment scheduling) .
VM CD 14.1 (new edition) consists of 1 CD-ROM with 97 speakers, 1891 turns, 156 spontaneous dialogues, transliteration, PhonDat 2 headers, partitur files (files describing the different parts which constitute the corpus - word order, phrase order, etc. ) in German.
C-000191: VERBMOBIL - VM CD 2.1 (new edition)
Desktop/Microphone
Verbmobil is a long-term project of the German Federal Ministry of Education, Science, Research and Technology (BMBF, Projekträger DLR). Its aim is to give Germany an international top position in language technology and its economical application in the next millenium by cooperation and concentration of as many as possible specialists from industry and science. The long-sighted aim is the development of a mobile translation system for the translation of spontaneous speech in face-to-face situations.The following resources are spontaneous speech databases recorded in a dialogue task (appointment scheduling) .
VM CD 2.1 (new edition) consists of 1 CD-ROM with 81 Dialogues 227 Appointments, 1538 Turns in German. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validaed according to BAS guidelines.

SHACHI - Language Resource Metadata Database