言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 891 - 900 件目

C-001518: SIelex (Siemens Phonetic lexicon)
Speech Related
The lexicon consists of a list of 186.600 entries, including proper names, place names, no-native entries and abbreviations, with phonetic transcriptions, main stress markers and syllable boundary markers. Most of the entries were selected from the political and economical parts of two German newspapers namely the 'Süddeutsche Zeitung' (SZ) and the 'Frankfurter Allgemeine Zeitung' (FAZ). The transcription follows in most parts the so-called standard German pronunciation. Departures are described in the documentation. For some entries multiple pronunciations are taken into account especially in the case of homographs and abbreviations.
The alphabet chosen is extended German SAM-PA, but it can easily be translated into other alphabets. The character set chosen is ISO-8859-1, a tool for conversion into LATEX is provided with the CD-ROM.
C-001519: SPK
Desktop/Microphone
SPK is an Italian speech database of isolated and connected digits. It was designed and collected at the Istituto per la Ricerca Scientifica e Tecnologica (ITC/IRST), Trento, Italy. SPK was conceived for speaker recognition and verification purposes.With this CD-ROM, speech material corresponding to isolated digits acquired from 100 speakers (30 females and 70 males, from 23 to 50 years old) is released. Most of the speakers are from the North-East of Italy.
Speech material was collected from each speaker during five recording sessions scheduled on different days. During a recording session four repetitions of the ten Italian digits were acquired from a speaker. A total of 20,000 speech waveform files form the corpus.
Recordings were performed in a quiet room. Speech was acquired at 48 kHz, with 16 bit accuracy, by means of a Digital Audio Tape-Recorder Sony TCD-D10PRO and a super-cardioid microphone Sennheiser MKH 416-T. Then, digital recordings were downsampled to 16 kHz. Speech waveform files in the corpus were stored in the NIST-SPHERE format by using the SPHERE library, version 2.6a.
C-001521: Siemens VoiceMail
Telephone
VoiceMail consists of 17,5 hours of read acoustic speech divided into 9,5 hours of transliterated speech and 8 hours of non-transliterated speech recorded over the digital telephone network (ISDN) with 921 speakers originated from the USA. It contains orthographic transliteration for about 25,000 utterances (of 34,912 utterances in total).

Standard in use: headerless, one separate transliteration file comprising all utterances of all speakers
Sampling rate: 8 kHz
Speakers: 377 males and 544 females
Size: 17,5 hours
Medium: 2 CD-ROM
C-001522: SmartKom Public
Multimodal/Multimedia Resources
The SmartKom corpora were produced at BAS in the years 1999 to 2003 within the SmartKom project which was funded by the German Ministry of Education and Science. The corpus consists of multi-modal recordings (sessions) of 224 persons in a Wizard-of-Oz setting.
Release SKP 2.0 contains 172 recordings in the technical setup (scenario) SmartKom Public which is comparable to a traditional public phone booth but equipped with additional intelligent communication devices. Naive users were asked to test a prototype for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4.5 minutes while they were left alone with the system. The instruction was kept to a minimum; in fact the user only knew that the system is able to understand speech, gestures and even mimical expressions and should more or less communicate like a human.
Main technical features of release SKP 2.0
Technical setup: Public (scenario)
Primary domain Cinema; secondary domain Restaurant
Primary domain Fax; secondary domain Telephone, Email
86 users
172 recording sessions; size: 580 GB
Recorded modalities:
o Audio in max 10 channels
o Video of face
o Video of upper body from the left
o Infrared video of the display area (to capture the 2D gestures) as input to the SIVIT device (Siemens gesture recognizer)
o Video of the GUI output
o Coordinates of graphic tableau (when pen was used)
o Coordinates of SIVIT device (when finger/hands were used)
Annotations:
o Transliteration
o 2D Gesture
o user states in three modalities
o Turn segmentation
Documentation, TechDoks and publications
All annotations compatible to the BAS Partitur Format (BPF)

The full database is provided on USB. Single volumes on DVD can be obtained upon deman.
C-001523: Spanish SpeechDat(M) - DB1
Telephone
The SpeechDat(M) Spanish database contains the recordings of 1,002 Spanish speakers (508 males, 494 females), recorded over the Spanish fixed telephone network.

A pronunciation dictionary for the correctly spoken items is also available.

It was agreed that the ESPRIT Project SAM standards be followed for speech file storage. Speech samples are stored as sequences of 8-bit 8 kHz A-law speech files (before compression).

Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat format and content specifications.

Each speaker uttered the following items:

* 1 isolated digit
* 4 connected digits (1 sheet number 4 digits, 1 telephone number 9 digits, 1 credit card number 16 digits, 1 spontaneous home telephone number)
* 2 natural numbers
* 1 natural number with decimal point
* 2 money amounts (1 large amount, 1 small amount)
* 3 spelled-out words (7 letter sequences)
* 2 time phrases (1 spontaneous time of day, 1 time phrase -prompted, word style)
* 3 dates (1 spontaneous date e.g. birthdayn 2 prompted dates)
* 3 yes/no questions (Are you calling from the same province? (as P1), Do you speak another language fluently?, Are you calling from a public phonebox?),
* 1 place (province of longest residence)
* 6 application keywords (out of a set of 54 words)
* 2 additional application keywords (out of a set of 18 words)
* 3 embedded application word phrases (from A1-6 vocabulary)
* 9 read sentences for phonetic coverage

The set of phonetically balanced sentences was automatically transcribed and manually checked by the Department de Filologia Espanyola of the Universitat Autonoma de Barcelona. Standard Castillian transcription was used. No dialectal variations were considered.

The following age distribution has been obtained: 530 speakers are between 15 and 29 years old, 283 speakers are between 30 and 45, 156 speakers are between 46 and 60, and 23 speakers are over 60; the age of 10 speakers is unknown.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-001524: Spanish SpeechDat(M) - DB2
Telephone
Phonetically rich sentences.
Sub-set of ELRA-S0065
- isPartOf: C-001523: Spanish SpeechDat(M) - DB1
C-001526: Speecon manually pitch-marked reference database for Spanish
Desktop/Microphone
This database is intended for the development and the evaluation of noise robust pitch marking (PMA) and/or pitch determination (PDA) algorithms. The audio data used for the construction of the database was selected as a subset of the Speecon Spanish database (see ELRA-S0160)

The acoustical environments found in this database comprise those of the car interior, the office, and living rooms. The office environment is mostly quiet, and slightly affected by stationary and white noises from computer fans or air-conditioning devices. However, in some of the offices the recordings contain also background voices. The living room recordings (entertainment environment) contain a wider range of noises, less stationary and more colored than the office noises. In some utterances, the radio or TV set is on; consequently, voices can be found in the recordings, as well as music, etc. The reverberations are mostly present in office and entertainment environments.

The Speecon Spanish database was recorded at 16 kHz sampling frequency and quantized using 16-bit linear coding. From this database the recordings of 60 speakers was selected (30 male and 30 female speakers, speaker age from 19 to 79 years). In order to manually construct the reference pitch-marked database under low noise conditions and without reverberation the close talking microphone recordings in the amount of 1 minute per speaker were selected. Thus the reference database comprises 60 minutes of pitch-marked speech signal. In the first step, the 60 minutes of selected close-talking channel speech signal were automatically pitch-marked (epoch marked). In the next step accurate manual rechecking and correcting of pitch marks is performed thus resulting in reference pitch-marked database.

Each session consists of 17 utterances:
1 isolated digit sequence
1 money amount
10 phonetically rich sentences
5 phonetically rich isolated words

The following age distribution has been obtained:
40 speakers are between 15 and 30, 11 speakers are between 31 and 45, 8 speakers are between 46 and 60, and 1 speaker is over 60.
C-001528: Swiss-French Polyphone Database 1000 speakers
Telephone
Like the Dutch and German polyphone corpora, this is a Polyphone-like database recorded in Switzerland to cover the French language as spoken in the Roman area.

The database consists of 5,000 speakers who answered several questions (around 10), leading to spontaneous speech, and reading about 28 items .

This form contains several speech sequences, including sentences from different sources (local newspapers, existing corpora, law articles, etc.) to ensure a good phonetic coverage, application words from a defined list of command words, currency amounts, quantities, credit card numbers, spelled words (mainly names), etc.
The database is divided into two subsets: the first one comprises 1,000 speakers and the second one 4,000 speakers (1,000 speakers are not available). Each subset is divided into two subsets: the phonetically rich sentences and the application-oriented data.
C-001529: Swiss-French Polyphone Database 4000 speakers
Telephone
Like the Dutch and German polyphone corpora, this is a Polyphone-like database recorded in Switzerland to cover the French language as spoken in the Roman area.

The database consists of 5,000 speakers who answered several questions (around 10), leading to spontaneous speech, and reading about 28 items from a form supplied by IDIAP.

This form contains several speech sequences, including sentences from different sources (local newspapers, existing corpora, law articles, etc.) to ensure a good phonetic coverage, application words from a defined list of command words, currency amounts, quantities, credit card numbers, spelled words (mainly names), etc.
The database is divided into two subsets: the first one comprises 1,000 speakers and the second one 4,000 speakers (1,000 speakers are not available). Each subset is divided into two subsets: the phonetically rich sentences and the application-oriented data.
C-001530: Swiss-German Speecon database
Desktop/Microphone
The Swiss-German Speecon database is divided into 2 sets:
1) The first set comprises the recordings of 550 adult Swiss-German speakers (273 males, 277 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
2) The second set comprises the recordings of 50 child Swiss-German speakers (20 boys, 30 girls), recorded over 4 microphone channels in 1 recording environment (children room).

This database is partitioned into 27 DVDs (first set) and 3 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications. Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items (over 290 items for adults and over 210 items for children):

Calibration data:
- 6 noise recordings
- The silence word recording

Free spontaneous items (adults only):
- 5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)

17 Elicited spontaneous items (adults only):
- 3 dates
- 2 times
- 3 proper names
- 2 city names
- 1 letter sequence
- 2 answers to questions
- 3 telephone numbers
- 1 language

Read speech:
- 30 phonetically rich sentences uttered by adults and 60 uttered by children
- 5 phonetically rich words (adults only)
- 4 isolated digits
- 1 isolated digit sequence
- 4 connected digit sequences
- 1 telephone number
- 3 natural numbers
- 1 money amount
- 2 time phrases (T1 : analogue, T2 : digital)
- 3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
- 3 letter sequences
- 1 proper name
- 2 city or street names
- 2 questions
- 2 special keyboard characters
- 1 Web address
- 1 email address
- 208 application specific words and phrases per session (adults)
- 74 toy commands, 14 phone commands and 34 general commands (children)

The following age distribution has been obtained:
Adults: 200 speakers are between 15 and 30, 166 speakers are between 31 and 45, 184 speakers are over 46.
Children: 21 speakers are between 8 and 10, and 29 speakers are between 11 and 14.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

SHACHI - Language Resource Metadata Database