Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 651 - 660 of 2023

C-001203: 863 Program in 2004 Assessment and test data of text classification
Amount to 3600 files.Log in http://www.863data.org.cn
http://www.chineseldc.org/EN/doc/2004-863-005/intro.htm
C-001204: 863 program in 2003 automatic index evaluation data
Language material Include 10 articles,The number of words ranges from 1755 to 4502 .Log in http://www.863data.org.cn
http://www.chineseldc.org/EN/doc/2003-863-005/intro.htm
C-001207: 863 program in 2003 part-of-speech evaluation data
242 files, about 400 thousand Chinese characters.Log in http://www.863data.org.cn
http://www.chineseldc.org/EN/doc/2003-863-008/intro.htm
C-001211: Chinese POS Tagged Corpus
Word segmented and POS-Tagged Chinese corpus with size of 5,000,000 Chinese Characters.
http://www.chineseldc.org/EN/doc/CLDC-LAC-2003-003/intro.htm
C-001212: Chinese and English speech corpus
Underthe support of the-863-project, Building the bilingual speech corpus for TTS tech research?Asystem development and evaluation to synthesis speech. The corpus include high quality speech data and corresponding text?Alabeling information in prosodic and phonemic field. The corpus can be used in text to speech research?Aprosody modeling and auto-labeling speech corpus research. And at the same time the corpus is expected for prompting internationaltechnical corporationin speech synthesis research field.
http://www.chineseldc.org/EN/doc/CLDC-SPC-2003-008/intro.htm
C-001215: Chinese-English Sentence aligned Bilingual Corpus
Under the support of the-973-project, the specifications for annotating the bilingual text based on Dublin Core Element Set is proposed. Also, researches on the sentence alignment technology for domain-independent bilingual texts has been carried out and a large-scale Chinese-English Bilingual Corpus which is sentence-aligned has been built.
http://www.chineseldc.org/EN/doc/CLDC-LAC-2003-004/intro.htm
C-001216: Czech SpeechDat(E) Database
Telephone
The Czech SpeechDat(E) Database (Eastern European Speech Databases for Creation of Voice Driven Teleservices) comprises 1052 Czech speakers (526 males, 526 females) recorded over the Czech fixed telephone network. This database is partitioned into 6 CDs. The speech databases made within the SpeechDat(E) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat(E) format and content specifications.
The speech files are stored as sequences of 8-bit, 8kHz A-law speech files and are not compressed, according to the specifications of SpeechDat(E). Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.
Corpus contents:
- 6 application words;
- 1 sequence of 10 isolated digits;
- 4 connected digits: 1 sheet number (5+ digits), 1 telephone number (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits);
- 3 dates: 1 spontaneous date (birthday), 1 prompted date (word style), 1 relative and general date expression;
- 1 spotting phrase using an application word (embedded);
- 1 isolated digit;
- 3 spelled-out words (letter sequences): 1 spontaneous e.g. own forename; 1 spelling of directory assistance city name; 1 real/artificial name for coverage;
- 2 currency money amounts: 1 Czech money amount, 1 International money amount (USD, EURO)
- 1 natural number;
- 6 directory assistance names: 1 spontaneous, e.g. own forename; 1 city of birth / growing up (spontaneous); 1 most frequent city (out of 500); 1 most frequent company/agency (out of 500); 1 "forename surname" (set of 150 ), 1 "surname" (set of 150 )
- 2 questions, including "fuzzy" yes/no: 1 predominantly "yes" question, 1 predominantly "no" question;
- 12 phonetically rich sentences;
- 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style);
- 4 phonetically rich words.
- 4 additional questions (spontaneous)
The following age distribution has been obtained: 20 speakers are below 16 years old, 490 speakers are between 16 and 30, 238 speakers are between 31 and 45, 230 speakers are between 46 and 60, 71 speakers are over 60, and 3 speakers of unknown age.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included
C-001217: French Speechdat(II) FDB-5000 database
Telephone
The French SpeechDat(II) FDB-5000 database contains the recordings of 5,040 French speakers (2,693 females, 2,347 males) recorded over the French fixed telephone network. 40 speakers have been added to the original 5,000 speakers to fit the requirements of the database. This database is partitioned into 18 CDs, which comprise 300 speakers sessions each (except for CD 4, with 100 speakers sessions).

Speech samples are stored as sequences of 8-bit, 8kHz A-law and are not compressed. They contain a file header of 16 bytes. Each prompt utterance is stored within a separate file (file extension FRA) and has an accompanying ASCII SAM label file (file extension FRO).

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Each speaker uttered the following items:

5 application words
1 sequence of 10 isolated digits
4 connected digits (1 sheet number -5+ digits, 1 telephone number 9/11 digits, 1 credit card number 14/16 digits, 1 PIN code -6 digits)
3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression)
2 word spotting phrases using an embedded application word
1 isolated digit
3 spelled words (1 spontaneous e.g. own forename, 1 spelling of directory assistance city name, 1 real/artificial name for coverage)
1 currency money amount
1 natural number
5 directory assistance names and 1 spelled name (1 spontaneous e.g. own forename, 1 city of birth/hometown, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname", 1 spelled-out city of birth)
2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
9 phonetically rich sentences
2 time phrases (1 spontaneous time of day, 1 word style time phrase)
8 phonetically rich words.
The following age distribution has been obtained: 215 speakers are under 16, 2531 speakers are between 16 and 30, 1208 speakers are between 31 and 45, 910 speakers are between 46 and 60, and 176 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
- isRequiredBy: C-000938: Fixed1fr Design
C-001222: Norwegian SpeechDat(II) FDB-1000
Telephone
The Norwegian SpeechDat(II) FDB-1000 comprises 1016 Norwegian speakers (517 males, 499 females) recorded over the Norwegian fixed telephone network. The FDB-1000 database is partitioned into 4 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
1 isolated single digit
1 sequence of 10 isolated digits
4 numbers : 1 sheet number (8 digits), 1 telephone number (8 digits), 1 credit card number (16 digits), 1 PIN code (6 digits)
1 currency money amount
2 natural numbers
3 dates : 1 spontaneous (date or year of birth), 1 prompted date, 1 relative or general date expression
2 time phrases : 1 time of day (spontaneous), 1 time phrase (word style)
3 spelled words : 1 spontaneous (own forename), 1 city name, 1 artificial letter sequence for coverage
5 directory assistance utterances : 1 spontaneous own forename, 1 city of calling (spontaneous), 2 city names, 1 common forename and surname
2 yes/no questions : 1 predominantly ?yes? question, 1 predominantly ?no? question
6 application words
1 word spotting phrase using an embedded application word
4 phonetically rich words
9 phonetically rich sentences
1 additional sentence
The following age distribution has been obtained: 3 speakers are below 16 years old, 301 speakers are between 16 and 30, 363 speakers are between 31 and 45, 195 speakers are between 46 and 60, 137 speakers are over 60, and 17 speakers whose age is unknown.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-001223: RASC863-annotated 4 regional accent speech corpus(I)
RASC863 consists of two parts of natural spoken language(spoken language monologue and familiar questions' answers) and reading language(speech balance sentences?Afrequently used spoken language sentences and frequently used dialect vocabularies).The part of natural spoken language is divided into two parts of spoken language monologue and questions?f answers according to some topics. The part of spoken language monologue is performed in the way that the speaker randomly selects one of 160 topics designed in advance and tells something about it in 3--5 minutes; the part of answering questions is performed in the way that every speaker answers 15 familiar questions.
http://www.chineseldc.org/EN/doc/CLDC-SPC-2004-003/intro.htm
- hasVersion: RASC863-annotated 4 regional accent speech corpus(Ⅱ)
- hasVersion: RASC863-annotated 4 regional accent speech corpus(Ⅲ)

SHACHI - Language Resource Metadata Database