言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 61 - 70 件目

C-000128: SALA Spanish Venezuelan Database
Telephone
The SALA Spanish Venezuelan database contains the recordings of 1,000 Venezuelan speakers (504 males, 496 females) recorded over the Venezuelan fixed telephone network. This database is partitioned into 5 CD-ROMs The speech files are stored as sequences of 8-bit, 8kHz mu-law speech files and are not compressed, according to the specifications of SALA. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SALA format and content specifications.

Each speaker uttered the following items:

* 6 application words
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number -6 digits, 1 telephone number 9/11 digits, 1 credit card number 14/16 digits, 1 PIN code -6 digits)
* 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression)
* 1 spotting phrase using an embedded application word
* 1 isolated digit
* 3 spelled words (1surname, 1 directory assistance city name, 1 real/artificial name for coverage)
* 1 currency money amount
* 1 natural number
* 5 directory assistance names (1 surname out of a set of 500, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150 )
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 9 phonetically rich sentences
* 1 additional sentence
* 2 time phrases (1 spontaneous time of day, 1word style time phrase)
* 4 phonetically rich words

The following age distribution has been obtained: 7 speakers are under 16, 476 speakers are between 16 and 30, 330 speakers are between 31 and 45, 177 speakers are between 46 and 60, and 10 speakers are over 60.
C-000133: Slovak SpeechDat(E) Database
Telephone
The Slovak SpeechDat(E) Database (Eastern European Speech Databases for Creation of Voice Driven Teleservices) comprises 1000 Slovak speakers (498 males, 502 females) recorded over the Slovak fixed telephone network. This database is partitioned into 5 CDs. The speech databases made within the SpeechDat(E) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat(E) format and content specifications.
The speech files are stored as sequences of 8-bit, 8kHz A-law speech files and are not compressed, according to the specifications of SpeechDat(E). Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.
Corpus contents:
- 6 application words;
- 1 sequence of 10 isolated digits;
- 4 connected digits: 1 sheet number (5 digits), 1 telephone number (9-11 digits), 1 credit card number (16 digits), 1 PIN code (6 digits);
- 3 dates: 1 spontaneous date (birthday), 1 prompted date (word style), 1 relative and general date expression;
- 1 spotting phrase using an application word (embedded);
- 1 isolated digit;
- 3 spelled-out words (letter sequences): 1 spontaneous e.g. own forename; 1 spelling of directory assistance city name; 1 real/artificial name for coverage;
- 2 currency money amounts: 1 Slovak money amount, 1 International money amount (USD, EURO)
- 1 natural number;
- 6 directory assistance names: 1 spontaneous, e.g. own forename; 1 city of birth / growing up (spontaneous); 1 most frequent city (out of 500); 1 most frequent company/agency (out of 500); 1 "forename surname" (set of 150 ), 1 "surname" (set of 150 )
- 2 questions, including "fuzzy" yes/no: 1 predominantly "yes" question, 1 predominantly "no" question;
- 12 phonetically rich sentences;
- 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style);
- 4 phonetically rich words.
The following age distribution has been obtained: 39 speakers are below 16 years old, 446 speakers are between 16 and 30, 253 speakers are between 31 and 45, 214 speakers are between 46 and 60, and 48 speakers are over 60.
A pronunciation lexicon with a phonemic transcription in SAMPA is also included
C-000134: Spanish SpeechDat(II) FDB-1000
Telephone
The Castillian Spanish SpeechDat(II) FDB-1000 database contains the recordings of 1,000 Castillian Spanish speakers (481 males, 519 females) recorded over the Spanish fixed telephone network. The FDB-1000 database is partitioned into 4 CDs, which comprise 250 speakers sessions each.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Each speaker uttered the following items:

* 3 application words
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number -6 digits, 1 telephone number 9/11 digits, 1 credit card number 14/16 digits, 1 PIN code -6 digits)
* 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression)
* 1 word spotting phrase using an embedded application word
* 1 isolated digit
* 3 spelled word (1 surname, 1 directory city name, 1 real/artificial for coverage)
* 1 currency money amount
* 1 natural number
* 5 directory assistance names (1 spontaneous e.g. own forename, 1 city of birth/growing up, 1most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150)
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 9 phonetically rich sentences
* 2 time phrases (1 spontaneous time of day, 1 word style time phrase)
* 4 phonetically rich words

The following age distribution has been obtained: 19 speakers are under 16, 555 speakers are between 16 and 30, 198 speakers are between 31 and 45, 198 speakers are between 46 and 60, and 30 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

This database is a subset of the Spanish SpeechDat(II) FDB-4000 (ref. ELRA-S0102).
- hasVersion: C-000135: Spanish SpeechDat(II) FDB-4000
C-000135: Spanish SpeechDat(II) FDB-4000
Telephone
The Castillian Spanish SpeechDat(II) FDB-4000 contains the recordings of 4,000 Castillian Spanish speakers (2,061 males, 1,939 females) recorded over the Spanish fixed network. It is partitioned into 14 CDs.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Each speaker uttered the following items:

* 3 application words
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number -6 digits, 1 telephone number 9/11 digits, 1 credit card number 14/16 digits, 1 PIN code -6 digits out of a set of 150)
* 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression)
* 1 word spotting phrase using an embedded application word
* 1 isolated digit
* 3 spelled word (1 surname, 1 directory assistance city name, 1 real/artificial for coverage)
* 1 currency money amount
* 1 natural number
* 5 directory assistance names (1 spontaneous e.g. own forename, 1 city of birth/growing up, 1most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150)
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 9 phonetically rich sentences
* 2 time phrases (1 spontaneous time of day, 1 word style time phrase)
* 4 phonetically rich words

The following age distribution has been obtained: 42 speakers are under 16, 2,234 are between 16 and 30, 844 are between 31 and 45, 764 are between 46 and 60, and 116 are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

This database includes the Spanish SpeechDat(II) FDB-1000 (ref. ELRA-S0101).
- hasVersion: C-000134: Spanish SpeechDat(II) FDB-1000
C-000136: Spanish Speecon database
Desktop/Microphone
The Spanish Speecon database is divided into 2 sets:

The first set comprises the recordings of 561 adult Spanish speakers (279 males, 282 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
The second set comprises the recordings of 55 child Spanish speakers (27 boys, 28 girls), recorded over 4 microphone channels in 1 recording environment (children room).

This database is partitioned into 21 DVDs (first set) and 3 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications. Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:

Calibration data:
- 6 noise recordings
- The silence word recording

Free spontaneous items (adults only):
- 5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)
- 17 Elicited spontaneous items (adults only):
- 3 dates
- 2 times
- 3 proper names
- 2 city names
- 1 letter sequence
- 2 answers to questions
- 3 telephone numbers
- 1 language

Read speech:
- 30 phonetically rich sentences uttered by adults and 60 uttered by children
- 5 phonetically rich words (adults only)
- 4 isolated digits
- 1 isolated digit sequence
- 4 connected digit sequences
- 1 telephone number
- 3 natural numbers
- 1 money amount
- 2 time phrases (T1 : analogue, T2 : digital)
- 3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
- 3 letter sequences
- 1 proper name
- 2 city or street names
- 2 questions
- 2 special keyboard characters
- 1 Web address
- 1 email address
- 208 application specific words and phrases per session (adults)
- 74 toy commands and 48 general commands (children)

The following age distribution has been obtained:
Adults: 313 speakers are between 15 and 30, 176 speakers are between 31 and 45, 61 speakers are between 46 and 60, and 11 speakers are over 60.
Children: 19 speakers are between 8 and 10, 36 speakers are between 11 and 14.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000138: Swedish SpeechDat(II) FDB-1000
Telephone
The Swedish SpeechDat(II) FDB-1000 contains the recordings of 1,000 Swedish speakers recorded over the Swedish fixed telephone network. This database is partitioned into 4 CDs, which comprise 250 speakers sessions each. It was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:

* 1 isolated single digit
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number -5-10 digits, 1 telephone number 9/11 digits, 1 credit card number -16 digits), 1 PIN code -6 digits)
* 3 dates (1 spontaneous date e.g. birthday, 1 prompted date, 1 relative and general date expression)
* 1 word spotting phrase using an embedded application word
* 1 isolated digit
* 3 spelled words (1 spontaneous e.g. own forename, 1 spelling of directory city name, 1 real word for coverage)
* 1 currency money amount
* 1 natural number
* 5 directory assistance names (1 spontaneous e.g. own forename, 1 spontaneous city of school at 7 years, 1 most frequent cities out of a set of 500, 1 most frequent company/agency out of a set of 500 names, 1 "forename surname" out of a set of 500 names)
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 9 phonetically rich sentences
* 2 time phrases (1 spontaneous time of day, 1 word style time phrase)
* 4 phonetically rich words

The database also contains additional Swedish specific material for speaker verification purposes and dialectal studies:

* 2 sentences for speaker verification purposes, same for all speakers
* 4 connected digits strings (3-6 digits) for speaker verification purposes
* 2 sentences for dialectal studies, same for all speakers

The following age distribution has been obtained: 43 speakers are under 16, 429 speakers are between 16 and 30, 208 speakers are between 31 and 45, 241 speakers are between 46 and 60, and 79 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000139: Swedish SpeechDat(II) FDB-5000
Telephone
The Swedish SpeechDat(II) FDB-5000 database contains the recordings of 5,000 Swedish speakers (2470 males, 2530 females), recorded over the Swedish fixed telephone network. This database is partitioned into 25 CDs, which comprise 200 speakers sessions each.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:

* 3 application words
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number 5/10 digits, 1 telephone number 9/11 digits, 1 credit card number -16 digits), 1 PIN code -6 digits)
* 3 dates (1 spontaneous date e.g. birthday, 1 prompted date, 1 relative and general date expression)
* 1 word spotting phrase using an embedded application word
* 1 isolated digit
* 3 spelled words (1 spontaneous e.g. own forename, 1 spelling of directory city name, 1 real word for coverage)
* 1 currency money amount
* 1 natural number
* 5 directory assistance names (1 spontaneous e.g. own forename, 1 spontaneous city of school at 7 years, 1 most frequent cities out of a set of 500, 1 most frequent company/agency out of a set of 500 names, 1 "forename surname" out of a set of 500 names)
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 9 phonetically rich sentences
* 2 time phrases (1 spontaneous time of day, 1 word style time phrase)
* 4 phonetically rich words

The database also contains sentences uttered by all speakers for speaker verification purposes and dialectal studies:

* Sentences for speaker verification purposes, same for all speakers (Additional material for speaker verification and dialectal coverage: X1 - X8)
* Connected digits strings (3-6 digits) for speaker verification purposes
* Sentences for dialectal studies, same for all speakers

The following age distribution has been obtained: 315 speakers are under 16, 2095 speakers are between 16 and 30, 1080 speakers are between 31 and 45, 1078 speakers are between 46 and 60, and 432 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000140: Swedish SpeechDat(II) MDB-1000
Telephone
The Swedish SpeechDat(II) MDB-1000 database contains the recordings of 1,000 Swedish speakers recorded over the Swedish mobile telephone network. This database is partitioned into 5 CDs, which comprise 200 speakers sessions each.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:

* 2 isolated single digits
* 1 sequence of 10 isolated digits
* 4 connected digits (1 sheet number -5 digits, 1 telephone number 9/11 digits, 1 credit card number -16 digits, 1 PIN code -6 digits)
* 1 currency money amount
* 1 natural number
* 3 dates (1 spontaneous e.g. birthday, 1 prompted date, 1 relative or general date expression)
* 2 time phrases (time of recording, 1 time phrase)
* 3 spelled words (1 spontaneous e.g. own forename, 1 city name, 1 real word for coverage)
* 5 directory assistance names (1 spontaneous e.g. own forename, 1 city of school at 7 years, 1 frequent city name, 1 frequent company name, 1 common forename and surname)
* 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question)
* 6 application words
* 1 keyword phrase using an embedded application word
* 4 phonetically rich words
* 9 phonetically rich sentences

The database also contains additional Swedish specific material for speaker verification purposes and dialectal studies:

* 2 sentences for speaker verification purposes, same for all speakers
* 4 connected digits strings (3-6 digits) for speaker verification purposes
* 2 sentences for dialectal studies, same for all speakers

The following age distribution has been obtained: 32 speakers are under 16, 348 speakers are between 16 and 30, 253 speakers are between 31 and 45, 292 speakers are between 46 and 60, and 75 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000141: Swiss-French SpeechDat(M)
Telephone
The Swiss-French SpeechDat(M) project comprises 1000 recorded Swiss-French speakers (575 female and 425 male speakers) . The corpus contains phonetically rich sentences & application oriented utterances such as keywords, digits, etc..
Speech samples are stored as sequences of 8-bit 8 kHz A-law speech samples (before compression). Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
The following items were recorded:
- 1 sequence of 6 single digits including the hash (#) and the star (*) symbols
- 1 sheet id number (5 connected digits / 1 natural number)
- 1 telephone number (spontaneous)
- 1 16-digit credit card number
- 2 natural numbers (1 + sheet id)
- 2 money amounts
- 1 quantity
- 3 spelled words
- 1 time phrase (prompted, word style)
- 1 date (spontaneous)
- 1 date (prompted)
- 1 yes/no question
- 1 city name (prompted)
- 1 city name (spontaneous)
- 5 function words
- 1 name (spelling table)
- 1 mother tongue (spontaneous)
- 1 education level (out of 3 choices)
- 1 telephone type (out of 6 choices)
- 10 sentences (read)
- 1 query to telephone directory (given the name and the city of subject)
- 1 free comment on the session

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-000142: TED Translanguage English Database
Desktop/Microphone
LDC reference: http://www.ldc.upenn.edu/Catalog/LDC2002S04.html

The Translanguage English Database (TED) is a corpus of recordings made of oral presentations at Eurospeech'93 in Berlin. The corpus name derives from the high percentage of oral presentations given in English by non-native speakers of English. Two hundred twenty-four (224) oral presentations at the conference were successfully recorded, providing a total of about 75 hours of speech material. These recordings provide a large number of presenters, speaking multiple variants of English, over a relatively large amount of time (15 minutes for each presentation + 5 minutes of discussion), on a specific topic. This release of TED (6 CDROMs) includes 188 speeches, without the ensuing discussion periods. This database was produced with the support of ELSNET. Associated text materials consist of ASCII versions of over 400 proceedings papers and oral preparations that were supplied by the authors, as well as, 250 speaker questionnaires.
- isVersionOf: C-001544: Translanguage English Database (TED) Transcripts database

SHACHI - Language Resource Metadata Database