言語資源の登録件数: 3330件
2023 件中 521 - 530 件目
-
C-000974: Polish Speecon database
Desktop/Microphone
The Polish Speecon database is divided into 2 sets:
1) The first set comprises the recordings of 550 adult Polish speakers (286 males, 264 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
2) The second set comprises the recordings of 50 child Polish speakers (25 boys, 25 girls), recorded over 4 microphone channels in 1 recording environment (children room).
This database is partitioned into 26 DVDs (first set) and 3 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications.
Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
Calibration data:
6 noise recordings
The silence word recording
Free spontaneous items (adults only):
3 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)
17 Elicited spontaneous items (adults only):
3 dates, 2 times, 3 proper names, 2 city name, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language
Read speech:
30 phonetically rich sentences uttered by adults and 60 uttered by children
5 phonetically rich words (adults only)
4 isolated digits
1 isolated digit sequence
4 connected digit sequences
1 telephone number
3 natural numbers
1 money amount
2 time phrases (T1 : analogue, T2 : digital)
3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
3 letter sequences
1 proper name
2 city or street names
2 questions
5 special keyboard characters (2 by children)
1 Web address
1 email address
208 application specific words and phrases per session (adults)
74 toy commands, 34 general commands and 14 phone commands (children)
The following age distribution has been obtained:
Adults: 286 speakers are between 15 and 30, 165 speakers are between 31 and 45, 79 speakers are between 46 and 60, and 20 speakers are over 60.
Children: 23 speakers are between 8 and 10, 27 speakers are between 11 and 14.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-000975: PolyVar
Telephone
PolyVar is a speaker verification database comprising native and non-native speakers of French, mainly from Switzerland but also from other European countries. It consists of read and spontaneous speech recorded by 143 speakers (85 male and 58 female) amounting to 160 hours of speech. Each speaker recorded from 1 to 229 sessions, giving a total of 3,600 recorded sessions. The data are provided with orthographic annotation.
The number of calls per speaker is as follows:
13 speakers called 100 times
9 speakers called from 51 to 100 times
16 speakers called from 21 to 50 times
3 speakers called from 11 to 20 times
31 speakers called from 2 to 10 times
71 speakers called only once
Each speaker uttered up to 53 different items per session, including:
* 3 sequences of digits (1 ID number, 1 credit card number and 1 sequence of 6 digits)
* 24 application words (17 words about touris in Martigny)
* 10 read sentences
* 4 numbers (2 natural numbers, 2 amounts)
* 2 items with dates (1 read/1 spontaneous)
* 2 items with hours (1 read/1 spontaneous)
* 2 spelled words
* 3 spontaneous answers (questions about their gender, native language and the weather)
* 1 comment
* 1 telephone enquiry
File format: 8-bit a-law
Standard in use: NIST
Sampling rate: 8 kHz
Medium: 8 CD-ROMs
See also ELRA-S0047. -
C-000976: RVG1 (Regional Variants of German 1, Part 1)
Desktop/Microphone
The corpus consists of single digits, connected digits, phone numbers, phonetically balanced sentences, computer command phrases and spontaneous speech. Each speaker has read a subcorpus of 85 items:
* 11 single digits (0-9, with the two pronunciations of 2 (`zwei', `zwo')),
* 19 connected digits (10-19, 20-100 in steps of ten),
* 12 computer command phrases,
* 30 phonetically balanced sentences,
* 5 6-digit phone numbers,
* 5 7-digit phone numbers,
* 2 phone numbers with area code,
* 1 minute spontaneous speech (monologue).
The speaker was placed in front of a standard IBM-compatible PC. The backround noise was limited to the usual noise in office environment, eg. door slam, backround crosstalk, phone ringing, paper rustle, PC noise, etc. The head of the speaker is in a range between 2-4 feet to the screen, 1-2 feet from the desktop microphones. The speaker is not forced into a special position. The speaker is wearing a Sennheiser HD 410 and is free to use the keyboard or the mouse in front of him. The three desktop microphones are: Sennheiser MD 441 U, Telex (Soundblaster) and Talk Back (ATThe corpus consists of single digits, connected digits, phone numbers, phonetically balanced sentences, computer command phrases and spontaneous speech. Each speaker has read a subcorpus of 85 items:
* 11 single digits (0-9, with the two pronunciations of 2 (`zwei', `zwo')),
* 19 connected digits (10-19, 20-100 in steps of ten),
* 12 computer command phrases,
* 30 phonetically balanced sentences,
* 5 6-digit phone numbers,
* 5 7-digit phone numbers,
* 2 phone numbers with area code,
* 1 minute spontaneous speech (monologue).
The speaker was placed in front of a standard IBM-compatible PC. The backround noise was limited to the usual noise in office environment, eg. door slam, backround crosstalk, phone ringing, paper rustle, PC noise, etc. The head of the speaker is in a range between 2-4 feet to the screen, 1-2 feet from the desktop microphones. The speaker is not forced into a special position. The speaker is wearing a Sennheiser HD 410 and is free to use the keyboard or the mouse in front of him. The three desktop microphones are: Sennheiser MD 441 U, Telex (Soundblaster) and Talk Back (AT&T). Speakers were selected to achieve the demoscopic density of the German spoken areas in Europe (including Austria and Switzerland).
The recorded sound samples are stored in NIST SPHERE format. The resolution is 16 Bits. The sampling frequency is 22.050 Hz except for speakers 001 to 036 which were recorded with 11.025 Hz. Each microphone channel is stored into a separate file. A transliteration of spontaneous speech according to Verbmobil Format is also provided.
RVG1, Part 1 contains 197 speakers recorded through 2 microphones.
(RVG1, Part 2, with 303 speakers recorded through 2 microphones will be available from the beginning of 1999.)- isVersionOf: C-001513: RVG-J (Regional Variants of German J)
-
C-000977: Russian Speecon database
Desktop/Microphone
The Russian Speecon database is divided into 2 sets:
1. The first set comprises the recordings of 550 adult Russian speakers (271 males, 279 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
2. The second set comprises the recordings of 50 child Russian speakers (31 boys, 19 girls), recorded over 4 microphone channels in 1 recording environment (children room).
This database is partitioned into 21 DVDs (first set) and 3 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications. Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.
Each speaker uttered the following items:
* Calibration data:
o 6 noise recordings
o The silence word recording
* Free spontaneous items (adults only):
o 5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)
* 17 Elicited spontaneous items (adults only):
o 3 dates
o 2 times
o 3 proper names
o 2 city names
o 1 letter sequence
o 2 answers to questions
o 3 telephone numbers
o 1 language
* Read speech:
o 30 phonetically rich sentences uttered by adults and 60 uttered by children
o 5 phonetically rich words (adults only)
o 4 isolated digits
o 1 isolated digit sequence
o 4 connected digit sequences
o 1 telephone number
o 3 natural numbers
o 1 money amount
o 2 time phrases (T1 : analogue, T2 : digital)
o 3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
o 3 letter sequences
o 1 proper name
o 2 city or street names
o 2 questions
o 2 special keyboard characters
o 1 Web address
o 1 email address
o 208 application specific words and phrases per session (adults)
o 74 toy commands and 48 general commands (children)
The following age distribution has been obtained:
* Adults: 290 speakers are between 15 and 30, 187 speakers are between 31 and 45, 73 speakers are between 46 and 60.
* Children: 28 speakers are between 8 and 10, 22 speakers are between 11 and 14.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-000985: SIEMENS 1000 - SI1000
Desktop/Microphone
The corpus contains read speech of 10 different speakers. Each speaker has read approximately 1000 sentences from a German newspaper corpus (similar to the Siemens 100 - SI100 corpus described herein), (5 CDROMs). -
C-000986: SieTill (Siemens Tillman)
Telephone
It is the SieTill database, consisting of 730 speakers (338 female, 392 male), and 36000 utterances. Recording were made via an Automatic Speech Server attached to an ISDN-line. The speakers called from different locations from Germany using their personal telephone sets.
The main German dialects are covered. Recorded Items per speaker include:
· 3 digit-sequences (continuous): 34 combinations
· 5 digit-sequences (continuous): 2 combinations
· date of call, time of call
· birthday of caller, first name of caller
· spelling of first name
· "ja", "nein", "richtig", "falsch"
· arbitrary amount of money
· arbitrary date
· arbitrary telephone number (arbitrary speaking style and isolated digits) -
C-000987: Siemens Russian SpeechDat-like FDB-1000
Telephone
This Russian SpeechDat-like FDB-1000 database contains the recordings of 1,000 speakers (500 males, 500 females) for 5 different regions, but mainly from Moscow and St. Petersburg (803 speakers), recoded over the fixed telephone network.
The database is partitioned into 4 CD-ROMs. Speech samples are stored as sequences of 8 bits 8 kHz A-law, and the data are stored in a SAM file format.
The whole database consists of 72 hours of speech, with approx. 49 prompted utterances per speaker.
It was validated and accepted according to the SpeechDat(II) database exchange format.
Each speaker uttered the following items:
* Isolated and connected digits
* Natural numbers
* Money amounts
* Spelled words
* Time and date phrases
* Yes/no questions
* City names
* Common application words
* Application words in phrases
* Phonetically rich sentences
The following age distribution has been obtained: 16 speakers are under 16, 340 speakers are between 16 and 30, 345 speakers are between 31 and 45, 255 speakers are between 46 and 60, and 44 speakers are over 60.
The database is provided with orthographic transliteration for all 48,812 utterances including 4 categories of non-speech acoustic events. A phonetic lexicon with canonical pronunciation in SAMPA is also provided. -
C-000988: Siemens Shanghai Mandarin FDB-1000
Telephone
The Shanghai Mandarin FDB-1000 database contains the recordings of 1,000 speakers (500 males, 500 females) recorded over the fixed telephone network. This acoustic database gathers Mandarin data, as spoken in Shanghai as a first or second Chinese dialect/language.
The corpus consists of read speech, including digits and application words for teleservices, recorded through an ISDN card. Chinese characters and English translation are included, as well as canonical Pinyin transcription including tone markers, and several categories of non-speech events.
Speech samples are stored as sequences of 8 bits 8 kHz A-law. Signal and annotation files are stored separately.
Each speaker uttered about 70 items, which consist of isolated digits, yes/no questions, common application words and phrases.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included. -
C-000989: Slovenian SpeechDat(II) FDB-1000
Telephone
The Slovenian SpeechDat(II) FDB-1000 database contains the recordings of 1,000 Slovenian speakers (500 males and 500 females) from different dialect regions of Slovenia, recorded over the Slovenian fixed telephone network.
The database is partitioned into CD-ROMs. Speech samples are stored as sequences of 8 bits 8 kHz A-law, and the data are stored in a SAM file format.
It was validated and accepted according to the SpeechDat(II) database exchange format.
Each speaker uttered the following items:
* Isolated and connected digits
* Natural numbers
* Money amounts
* Spelled words
* Time and date phrases
* Yes/no questions
* City names
* Common application words
* Application words in phrases
* Phonetically rich sentences
A phonetic lexicon with canonical transcriptions in SAMPA is also provided. -
C-000990: Spanish TTS Speech Corpus (Appen)
Desktop/Microphone
The Spanish TTS Speech Corpus contains the recordings of 1 native Spanish speaker (male, 28 years old) recorded in a studio over 1 channel (Shure SM15 unidirectional professional head-word condenser microphone). The data collection and transcription were performed by Appen (Australia).
Speech samples are stored as sequences of 16-bit 22.05 kHz PCM in uncompressed WAV files.
The speaker read 1,787 prompted sentences covering all legal triphones and diphones.
The database is provided with orthographic transcriptions in SAMPA, including canonical and alternative pronunciation, and syllable, stress and acoustic events markings. All transcriptions were segmented at the utterance (sentence/command word) level, annotated at the word level and checked manually. A pronunciation lexicon including 3,748 headwords (plus variants) is also available.
This database is aimed to be used within text-to-speech and speech synthesis applications.