Language resource #: 3330
Results 661 - 670 of 2023
-
C-001224: RASC863-annotated 4 regional accent speech corpus(III)
RASC863 consists of two parts of natural spoken language(spoken language monologue and familiar questions' answers) and reading language(speech balance sentences?Afrequently used spoken language sentences and frequently used dialect vocabularies).The part of natural spoken language is divided into two parts of spoken language monologue and questions?f answers according to some topics. The part of speech fine label includes: answering questions?Afrequently used spoken language sentences?Aspeech balance sentences etc.; Label includes the label of words?Asyllable layers with time segmentation?Avowel-consonant layer of actual speech etc..
http://www.chineseldc.org/EN/doc/CLDC-SPC-2004-005/intro.htm- hasVersion: RASC863-annotated 4 regional accent speech corpus(Ⅰ)
- hasVersion: RASC863-annotated 4 regional accent speech corpus(Ⅱ)
-
C-001225: SCSC--Syllable Corpus of Standard Chinese
Mandarin mono-syllable corpus is comprised by mono-syllable wave data, the list of mono-syllable and management software, which is suited for the research of speech and language, the development of speech software and the foundational teach for mandarin.
http://www.chineseldc.org/EN/doc/CLDC-SPC-2005-014/intro.htm -
C-001226: Siemens Synthesis Corpus - SI1000P
Desktop/Microphone
The SI1000P recordings were done to provide material for high quality concatenate speech synthesis. It contains 1000 newspaper sentences read by two German professional broadcasting announcers in studio quality together with the laryngographic signal and the glottal pulse stream. Parts of the corpus were labelled and segmented phonemically (SAM-PA) and prosodically (borders + accents).
Both speakers are trained and experienced broadcast announcers at the local state broadcasting unit. They were asked to read the texts in a speaking style like broadcast announcing, very correct, but fluently and without pausing between words.
The recordings were done in a total echo-cancelling studio at the Institute of Phonetics at the University of Munich. Recording channels were:
- speech signal recorded by Sennheiser MKH20 omnidirectional, 30 cm from mouth.
- laryngograph signal, LxProc of Laryngograph Ltd. London.
- glottis pulse stream by laryngograph
- start/stop pulse at beginning and end of utterance
Recording machine was a high quality 4 channel DAT (48 kHz, 16 bit). The data were copied to hard disk and cut according the pulse information in the forth channel into separate utterances (one utterance per file).
Speech signals were filtered and down-sampled from 48 kHz to 16 kHz. Laryngograph signals were filtered and downsampled to 16 kHz. The format of the signal files is PhonDat 2.
The resulting segmentation and all information accompanying the signal is summed up in the corresponding Partitur File. The Partitur File format is an open structure that allows the easy description and processing of information aligned to a speech signal.
The database also provides an ordered list of all occurring words together with the standard pronunciation in SAM-PA and the orthography of all spoken utterances in the corpus. -
C-001227: Spanish SpeechDat Database for the Mobile Telephone Network
Telephone
The Spanish SpeechDat database for the mobile telephone network comprises 1066 Spanish speakers (526 males, 540 females) calling from GSM telephones and recorded over the fixed PSTN using and ISDN-BRI interface. The MDB-1000 database is partitioned into 6 CDs in ISO 9660 format. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
- 2 isolated digits
- 1 sequence of 10 isolated digits
- 4 connected digits: 1 sheet number (6 digits), 1 telephone number (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits)
- 3 dates: 1 spontaneous date (e.g. birthday), 1 prompted date (word style), 1 relative and general date expression.
- 1 word spotting phrase using an application word (embedded).
- 6 application words
- 3 spelled words: 1 spontaneous name (own forename), 1 city name, 1 real / artificial word for coverage.
- 1 currency money amount.
- 1 natural number.
- 6 directory assistance names: 1 surname (set of 500), 1 city of birth / growing up, 1 most frequent cities (set of 500), 1 most frequent company / agency (set of 500), 1 "forename surname" (set of 150), 1 spontaneous forename.
- 2 questions including "fuzzy" yes / no: 1 predominantly "Yes" question, 1 predominantly "No" question.
- 9 phonetically rich sentences.
- 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style).
- 4 phonetically rich words.
- Call environment.
The following age distribution has been obtained: 3 speaker are below 16 years old, 147 speakers are between 16 and 30, 149 speakers are between 31 and 45, 56 speakers are between 46 and 60, 48 speakers are over 60.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included -
C-001228: Special Scene and special domain dialogue corpus
Under the support of the HTRDP Corpora Resources for Chinese Language Processing and Intelligent Human-Machine Interaction, A special scene, special domain dialogue speech corpus has been
http://www.chineseldc.org/EN/doc/CLDC-LAC-2003-008/intro.htm -
C-001229: Swiss-French SpeechDat(II) FDB-3000
Telephone
The Swiss-French SpeechDat(II) FDB-3000 comprises 3000 Swiss-French speakers (1500 males, 1500 females) recorded over the Swiss fixed telephone network. This database is partitioned into 6 CDs, each of which comprises 500 speakers sessions. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The following items were recorded:
- 5 application words
- 1 sequence of 6 isolated digits including the hash (#) and the star (*)
- 3 connected digits: 1 sheet number, 1 telephone, 1 credit card number (16 digits)
- 2 dates: 1 spontaneous date, e.g. birthday, 1 prompted date, word style
- 3 spelled words from a list of name and titles
- 2 currency money amounts
- 2 numbers: 1 natural number, 1 quantity number (prompted)
- 1 place (province of longest residence)
- 7 optional item: 1 name (spelling table), 1 city name, 1 mother tongue of speaker (spontaneous), 1 education level of speaker (out of 3 choices), 1 type of telephone used, 1 query to telephone directory
- 1 free comment on session
- 1 yes/no question
- 10 phonetically rich sentences
- 1 time phrase (word style)
The following age distribution has been obtained: 69 speakers are below 16 years old, 1006 speakers are between 16 and 30, 944 speakers are between 31 and 45, 629 speakers are between 46 and 60, 311 speakers are over 60, and 41 speakers whose age is unknown.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-001230: Swiss-German SpeechDat(II) FDB-2000
Telephone
The Swiss-German SpeechDat(II) FDB-2000 comprises 2000 Swiss-German speakers (992 males, 1008 females) recorded over the Swiss fixed telephone network. This database is partitioned into 6 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat format and content specifications.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The following items were recorded:
3 application words
1 sequence of 9 isolated digits including the hash (#) and the star (*)
1 sequence of isolated digits ? only digits which are not representing in B1
3 connected digits: 1 area code, 1 spontaneous phone number, 1 credit card number (16 or 15 digits)
3 dates: 1 spontaneous date, e.g. birthday, 2 prompted dates
3 word spotting phrases using an application word (embedded)
1 isolated digit
4 spelled words from a list of proper names and cities
1 currency money amount
1 natural number
1 place of education (spontaneous)
1 type of telephone used (spontaneous)
1 query to telephone directory (spontaneous)
2 yes/no questions: one about smoker/non smoker and another about sex gender.
9 phonetically rich sentences
2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style)
4 phonetically rich words
The following age distribution has been obtained: 33 speakers are below 16 years old, 565 speakers are between 16 and 30, 623 speakers are between 31 and 45, 442 speakers are between 46 and 60, and 337 speakers are over 60.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-001231: TC-STAR 2005 Evaluation Package - ASR English
Desktop/Microphone
TC-STAR is a European integrated project focusing on Speech-to-Speech Translation (SST). To encourage significant breakthrough in all SST technologies, annual open competitive evaluations are organized. Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text-To-Speech (TTS) are evaluated independently and within an end-to-end system.
The first TC-STAR evaluation campaign took place in March 2005.
Two core technologies were evaluated during the campaign:
Automatic Speech Recognition (ASR),
Spoken Language Translation (SLT).
Each evaluation package includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the first evaluation campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the English language. The same packages are available for both Spanish (ELRA-E0003) and Mandarin (ELRA-E0004) for ASR and for SLT in 3 directions, English-to-Spanish (ELRA-E0005), Spanish-to-English (ELRA-E0006), Chinese-to-English (ELRA-E0007).
To be able to chain the components, ASR and SLT evaluation tasks were designed to use common sets of raw data and conditions. Two evaluation tasks, common to ASR and SLT, were selected: EPPS (European Parliament Plenary Sessions) task and VOA (Voice of America) task. This package was used within the EPPS task and consists of 2 data sets:
- Development data set: consists of audio recordings of Parliaments sessions from 25 to 28 October 2004, manually transcribed. Approximately 3.5 hours of recordings were selected and transcribed, corresponding to approximately 35,000 running words in English.
- Test data set: consists of audio recordings of Parliaments sessions from 15 to 18 November 2004. As for the development set, the test data set is made of 3.5 hours (35,000 running words).- hasVersion: C-001233: TC-STAR 2005 Evaluation Package - ASR Spanish
- hasVersion: C-001232: TC-STAR 2005 Evaluation Package - ASR Mandarin Chinese
- hasVersion: N-001235: TC-STAR 2005 Evaluation Package - SLT English-to-Spanish
- hasVersion: N-001236: TC-STAR 2005 Evaluation Package - SLT Spanish-to-English
- hasVersion: N-001234: TC-STAR 2005 Evaluation Package - SLT Chinese-to-English
-
C-001232: TC-STAR 2005 Evaluation Package - ASR Mandarin Chinese
Desktop/Microphone
TC-STAR is a European integrated project focusing on Speech-to-Speech Translation (SST). To encourage significant breakthrough in all SST technologies, annual open competitive evaluations are organized. Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text-To-Speech (TTS) are evaluated independently and within an end-to-end system.
The first TC-STAR evaluation campaign took place in March 2005.
Two core technologies were evaluated during the campaign:
Automatic Speech Recognition (ASR),
Spoken Language Translation (SLT).
Each evaluation package includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the first evaluation campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Mandarin Chinese language. The same packages are available for both English (ELRA-E0002) and Spanish (ELRA-E0003) for ASR and for SLT in 3 directions, English-to-Spanish (ELRA-E0005), Spanish-to-English (ELRA-E0006), Chinese-to-English (ELRA-E0007).
To be able to chain the components, ASR and SLT evaluation tasks were designed to use common sets of raw data and conditions. Two evaluation tasks, common to ASR and SLT, were selected: EPPS (European Parliament Plenary Sessions) task and VOA (Voice of America) task. This package was used within the VOA task and consists of 2 data sets:
- Development data set: consists of 3 hours of audio recordings from the broadcast news of Mandarin Voice of America between 1 and 3 December 1998 which corresponds more or less to 42,000 Chinese characters.
- Test data set: consists of 3 hours of audio recordings from news broadcast between 14 and 22 December 1998 and corresponds to 44,000 Chinese characters.- hasVersion: C-001231: TC-STAR 2005 Evaluation Package - ASR English
- hasVersion: C-001233: TC-STAR 2005 Evaluation Package - ASR Spanish
- hasVersion: N-001235: TC-STAR 2005 Evaluation Package - SLT English-to-Spanish
- hasVersion: N-001236: TC-STAR 2005 Evaluation Package - SLT Spanish-to-English
- hasVersion: N-001234: TC-STAR 2005 Evaluation Package - SLT Chinese-to-English
-
C-001233: TC-STAR 2005 Evaluation Package - ASR Spanish
Desktop/Microphone
TC-STAR is a European integrated project focusing on Speech-to-Speech Translation (SST). To encourage significant breakthrough in all SST technologies, annual open competitive evaluations are organized. Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text-To-Speech (TTS) are evaluated independently and within an end-to-end system.
The first TC-STAR evaluation campaign took place in March 2005.
Two core technologies were evaluated during the campaign:
Automatic Speech Recognition (ASR),
Spoken Language Translation (SLT).
Each evaluation package includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the first evaluation campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Spanish language. The same packages are available for both English (ELRA-E0002) and Mandarin (ELRA-E0004) for ASR and for SLT in 3 directions, English-to-Spanish (ELRA-E0005), Spanish-to-English (ELRA-E0006), Chinese-to-English (ELRA-E0007).
To be able to chain the components, ASR and SLT evaluation tasks were designed to use common sets of raw data and conditions. Two evaluation tasks, common to ASR and SLT, were selected: EPPS (European Parliament Plenary Sessions) task and VOA (Voice of America) task. This package was used within the EPPS task and consists of 2 data sets:
- Development data set: consists of audio recordings of Parliaments sessions from 25 to 28 October 2004, manually transcribed. 3.75 hours of recordings were selected and transcribed, corresponding to approximately 33,000 running words in Spanish.
- Test data set: consists of audio recordings of Parliaments sessions from 15 to 18 November 2004. As for the development set, the test data set is made of 3.75 hours (33,000 running words).- hasVersion: C-001231: TC-STAR 2005 Evaluation Package - ASR English
- hasVersion: C-001232: TC-STAR 2005 Evaluation Package - ASR Mandarin Chinese
- hasVersion: N-001235: TC-STAR 2005 Evaluation Package - SLT English-to-Spanish
- hasVersion: N-001236: TC-STAR 2005 Evaluation Package - SLT Spanish-to-English
- hasVersion: N-001234: TC-STAR 2005 Evaluation Package - SLT Chinese-to-English