Language resource #: 3330
Results 1781 - 1790 of 2023
-
C-004509: Norwegian EUROM1
Desktop/Microphone
EUROM1 is the first really multilingual speech database produced in Europe. Equivalent corpora for each of the European languages were collected with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats. Initially eight European countries have made recordings: Italy, United Kingdom, Germany, Netherlands, Denmark, Sweden, Norway, France. Additional recordings have been then completed (thanks to CEE Esprit Project SAM-A), in Greece, Spain and Portugal. More than sixty speakers were recorded per language.
The content consists of:
1) Continuous speech:
40 passages made of five task related sentences.
2) Numbers:
The numbers were divided into five blocks, each containing twenty numbers. Each block was recorded as one single take.
3) CVC words:
The CVC word lists contain five list types and also carrier phrases of the suggested type.- hasVersion: C-000061: EUROM1g German
- hasVersion: C-000915: EUROM1f French
- hasVersion: C-000916: EUROM1i
- hasVersion: C-001403: EUROM1e English
- hasVersion: C-004471: Swedish EUROM1
- hasVersion: C-004488: Danish EUROM1
-
C-004510: SpeechDat(M) Italian Mobile Network Speech Database
Telephone
The SpeechDat(M) Italian Mobile Network Speech Database contains the recordings of 342 speakers (156 males, 186 females) of Italian recorded over the mobile telephone network. This database is distributed on 1 CD-ROM. The database complies with the common specifications created in the SpeechDat project.
Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
6 application words
3 digit strings : prompt sheet number, telephone number, credit card number,
3 dates : 1 spontaneous calling date, 2 prompted dates (word style)
3 application word phrases
1 isolated digit
3 spelled word : city name, forename, surname
2 money amounts
3 natural numbers
1 place: city of birth
3 spontaneous yes/no questions
9 phonetically rich sentences
3 time phrases : 1 time of day (spontaneous), 2 time phrases (prompted)
The following age distribution has been obtained: 8 speakers are under 16, 191 are between 16 and 30, 85 are between 31 and 45, 49 are between 46 and 60, and 9 speakers are over 60.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-004511: TC-STAR female baseline voice: Laura
Desktop/Microphone
Laura was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.
Laura contains the recordings of one female English (British) speaker recorded in a noise-reduced room through a headset microphone. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 9 DVDs. The database complies with the common specifications created in the TC-STAR project.
The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.
Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (lohi or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The TC-STAR male baseline voice: Ian is also available via ELRA under reference ELRA-S0303.- hasVersion: C-004512: TC-STAR male baseline voice: Ian
-
C-004512: TC-STAR male baseline voice: Ian
Desktop/Microphone
Ian was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.
Ian contains the recordings of one male English (British) speaker recorded in a noise-reduced room through a headset microphone. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 9 DVDs. The database complies with the common specifications created in the TC-STAR project.
The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.
Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (lohi or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The TC-STAR female baseline voice: Laura is also available via ELRA under reference ELRA-S0302.- hasVersion: C-004511: TC-STAR female baseline voice: Laura
-
C-004514: BABEL Polish database
Desktop/Microphone
The BABEL Polish Database is a speech database that was produced by a research consortium funded by the European Union under the COPERNICUS programme (COPERNICUS Project 1304). The project began in March 1995 and was completed in December 1998. The objective was to create a database of languages of Central and Eastern Europe in parallel to the EUROM1 databases produced by the SAM Project (funded by the ESPRIT programme).
The BABEL consortium included six partners from Central and Eastern Europe (who had the major responsibility of planning and carrying out the recording and labelling) and six from Western Europe (whose role was mainly to advise and in some cases to act as host to BABEL researchers). The five databases collected within the project concern the Bulgarian, Estonian, Hungarian, Polish, and Romanian languages.
The Polish database consists of the basic "common" set which is:
The Many Talker Set: 30 males, 30 females; each to read 100 numbers, 3 connected passages and 5 filler sentences (or 4 passages if no fillers needed).
The Few Talker Set: 5 males, 5 females, normally selected from the above group: each to read 5 blocks of 100 numbers, 15 passages and 25 filler sentences ( or 20 passages if fillers not needed), and 5 lists of syllables.
The Very Few Talker Set: 1 male, 1 female, selected from many-talker set: 5 blocks of syllables, with and without carrier sentences. -
C-004517: Egyptian Arabic Speecon database
Desktop/Microphone
The Egyptian Arabic Speecon database is divided into 2 sets:
1) The first set comprises the recordings of 550 adult Egyptian speakers of Modern Standard Arabic as spoken in Egypt (273 males, 277 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
2) The second set comprises the recordings of 50 child Egyptian speakers of Modern Standard Arabic as spoken in Egypt (24 boys, 26 girls), recorded over 4 microphone channels in 1 recording environment (children room).
This database is partitioned into 25 DVDs (first set) and 4 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications. Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items (over 290 items for adults and over 210 items for children):
Calibration data:
6 noise recordings
The silence word recording
Free spontaneous items (adults only):
5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)
17 Elicited spontaneous items (adults only):
3 dates, 2 times, 3 proper names, 2 city names, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language
Read speech:
30 phonetically rich sentences uttered by adults and 60 uttered by children
5 phonetically rich words (adults only)
4 isolated digits
1 isolated digit sequence
4 connected digit sequences
1 telephone number
3 natural numbers
1 money amount
2 time phrases (T1 : analogue, T2 : digital)
3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
3 letter sequences
1 proper name
2 city or street names
2 questions
2 special keyboard characters
1 Web address
1 email address
204 application specific words and phrases per session (adults)
74 toy commands, 14 phone commands and 34 general commands (children)
The following age distribution has been obtained:
Adults: 290 speakers are between 15 and 30, 166 speakers are between 31 and 45, 94 speakers are over 46.
Children: 24 speakers are between 8 and 10, and 26 speakers are between 11 and 14.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included.- hasVersion: C-000095: Mandarin Chinese Speecon database
- hasVersion: C-000120: Portuguese Speecon database
- hasVersion: C-000136: Spanish Speecon database
- hasVersion: C-000415: German Speecon database
- hasVersion: C-000936: Finnish Speecon database
- hasVersion: C-000941: French Speecon database
- hasVersion: C-000946: Hebrew Speecon database
- hasVersion: C-000952: Italian Speecon database
- hasVersion: C-000955: Korean Speecon database
- hasVersion: C-000974: Polish Speecon database
- hasVersion: C-000977: Russian Speecon database
- hasVersion: C-000995: Swedish Speecon database
- hasVersion: C-001000: Turkish Speecon database
- hasVersion: C-001002: UK English Speecon database
- hasVersion: C-001237: Taiwan Mandarin Speecon database
- hasVersion: C-001530: Swiss-German Speecon database
- hasVersion: C-001553: US English Speecon database
- hasVersion: C-001554: US Spanish Speecon database
- hasVersion: C-003376: Japanese Speecon database
- hasVersion: C-003377: Danish Speecon Database
- hasVersion: C-003378: Dutch from the Netherlands Speecon Database
- hasVersion: C-003379: Dutch from Belgium Speecon Database
- hasVersion: C-003380: French-Canadian Speecon database
- hasVersion: C-004483: Cantonese Speecon database
- hasVersion: C-004484: Thai Speecon database
- hasVersion: C-004494: Hungarian Speecon database
- hasVersion: C-004495: Czech Speecon database
- hasVersion: C-004539: Catalan Speecon database
-
C-004525: TC-STAR Spanish Baseline Female Speech Database
Desktop/Microphone
The TC-STAR Spanish Baseline Female Speech Database was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.
It contains the recordings of one female Spanish speaker recorded in a noise-reduced room simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 10 DVDs. The database complies with the common specifications created in the TC-STAR project.
The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.
Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (lohi or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The TC-STAR Spanish Baseline Male Speech Database is also available via ELRA under reference ELRA-S0310. -
C-004526: TC-STAR Spanish Baseline Male Speech Database
Desktop/Microphone
The TC-STAR Spanish Baseline Male Speech Database was created within the scope of the TC-STAR project (IST- FP6-506738) funded by the European Commission.
It contains the recordings of one male Spanish speaker recorded simultaneously through a close talk microphone, a mid distance microphone and a laryngograph signal in a noise-reduced room. It consists of the recordings and annotations of read text material of approximately 10 hours of speech for baseline applications (Text-to-Speech systems). This database is distributed on 9 DVDs. The database complies with the common specifications created in the TC-STAR project.
The annotation of the database includes manual orthographic transcriptions, the automatic segmentation into phonemes and automatic generation of pitch marks. A certain percentage of phonetic segments and pitch marks has been manually checked. A pronunciation lexicon in SAMPA with POS, lemma and phonetic transcription of all the words prompted and spoken is also provided.
Speech samples are stored as sequences of 24-bit 96 kHz with the least significant byte first (lohi or Intel format) as (signed) integers. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The TC-STAR Spanish Baseline Female Speech Database is also available via ELRA under reference ELRA-S0309. -
C-004527: TC-STAR Bilingual Voice-Conversion Spanish Speech Database
Desktop/Microphone
4 hours and 80 minutes of speech as spoken by 2 female speakers and 2 male speakers, covering both mimics and parallel voice conversion data. -
C-004528: TC-STAR Bilingual Voice-Conversion English Speech Database
Desktop/Microphone
4 hours and 80 minutes of speech as spoken by 2 female speakers and 2 male speakers, covering both mimics and parallel voice conversion data.