Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1731 - 1740 of 2023

C-004434: FASiL multimodal fasil-mm corpus
Desktop/Microphone
The corpus was collected in the context of the FASiL project, EU FP5 IST-2001-38685 (http://www.fasil.co.uk), as a wizard-of-oz experiment. Therefore, there are sound and interaction recordings of subject and wizard. A total of 90 subjects were recorded (30 per language: English, Portuguese and Swedish).
The corpus is formatted as .wav files (u-law) for audio, plain ASCII text (.txt) for transcriptions, and a TASX .XML for annotations which binds everything together.
The multimodal woz experiment is about the voice interaction with a Virtual Personal Assistent (VPA) for an email, calender and contacts task. Hesitations are marked as UH, noise as NOISE and other irrelevant stuff as IRRELEVANT. All annotations are in lower case, except for the former mentioned cases.
Exact documentation of experiment in FASiL deliverable D.2.2_b.
See also S0174-01, S0174-02, S0174-03, and S0174-04.
- hasVersion: C-004430: FASiL English unimodal fasil-uk corpus
- hasVersion: C-004431: FASiL Portuguese unimodal fasil-pt corpus
- hasVersion: C-004432: FASiL Swedish unimodal fasil-sv corpus
- hasVersion: C-004433: FASiL combined unimodal fasil-all corpus
C-004435: OrienTel Egypt MCA (Modern Colloquial Arabic) database
Telephone
The OrienTel Egypt MCA (Modern Colloquial Arabic) database comprises 750 Egyptian speakers (398 males, 352 females) recorded over the Egyptian fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validated by SPEX, the Netherlands, to assess their compliance with the OrienTel format and content specifications.

Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:
1 isolated single digit
1 sequence of 10 isolated digits
5 connected digits : 1 prompt sheet number (6 digits), 1 telephone number (6-15 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits), 1 spontaneous phone number
2 currency money amounts
1 natural number
4 dates : 1 spontaneous (date or year of birth), 1 prompted date, 1 relative or general date expression, 1 prompted date phrase (Islamic calendar)
2 time phrases : 1 time of day (spontaneous), 1 time phrase (word style)
3 spelled words : 1 spontaneous (own forename), 1 city name, 1 real word for coverage
5 directory assistance utterances : 1 spontaneous, own forename, 1 city of childhood (spontaneous), 1 frequent city name, 1 frequent company name, 1 common forename and surname
2 yes/no questions : 1 predominantly yes question, 1 predominantly no question
6 application keywords/keyphrases
1 word spotting phrase using embedded application words
4 phonetically rich words
9 phonetically rich sentences
3 spontaneous items (for control)

The following age distribution has been obtained: 379 speakers are between 16 and 30, 291 speakers are between 31 and 45, 80 speakers are between 46 and 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-004437: IDIOLOGOS 2 Eingenspeakers (NEOLOGOS Project)
Telephone
The IDIOLOGOS 2 Eingenspeakers database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The databases produced in the framework of the NEOLOGOS project are designed for the development and the assessment of French speech or speaker recognizers and speech synthesizers. They consist in:
1) the IDIOLOGOS databases are made of adults voices and are available in 2 subsets:
- the Bootstrap database (catalogue ref. ELRA-S0226-01),
- the Eingenspeakers database (catalogue ref. ELRA-S0226-02)
2) the PAIDIALOGOS database (catalogue ref. ELRA-S0227) is made of childrens and teenagers voices.

The IDIOLOGOS 2 Eingenspeakers database contains the recordings of 200 adult French speakers (97 males and 103 females) recorded over the French fixed telephone network. The speakers uttered 45 sentences per call with 10 calls per speaker. The 450 sentences per speaker are common to all speakers. Speakers were selected from the IDIOLOGOS 1 Bootstrap (ELRA-S0226-01) database.

This database is distributed as 1 DVD-ROM. The speech files are stored as sequences of 8-bit, 8kHz A-law speech files and are not compressed, according to the specifications of NEOLOGOS. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.

This speech database was validated by SPEX (the Netherlands) to assess its compliance with the NEOLOGOS format and content specifications.

Each speaker uttered the following items:
- 1 digit sequence (6 digits)
- 1 telephone number (10 digits)
- 1 credit card number (16 digits)
- 1 spelling of directory assistance city name
- 1 real/artificial for coverage
- 45 phonetically rich sentences

The following age distribution has been obtained: 42 speakers are between 18 and 30, 50 speakers are between 31 and 45, 62 speakers are between 46 and 61, and 46 speakers are over 61.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-004438: Mandarin Chinese Telephone Speech Recognition Corpus - Digit String
Desktop/Microphone
This corpus comprises 6,140 entries uttered by 265 speakers of different dialects, ages and various educational levels (144 males and 141 females), recorded over the mobile telephone network. The database comprises 8,109 Chinese digit strings. Speech samples are stored as a sequence of 16-bit 8kHz WAV for a total of 11.8 hours of speech. The total capacity of the data is 669 Mb.
Each speaker read 25-30 items. Text files are stored in Unicode format. All data have been proofread manually.
The transcriptions include non-speech markers (background noise, background speech, speaker sounds) as well as markers for mispronunciation, channel distortions, words left-out and duplicates.
The corpus aims to be applied to the testing and telephone natural speech recognition system.
C-004439: Mandarin Chinese Telephone Speech Recognition Corpus - Stock
Desktop/Microphone
This corpus comprises 3,077 entries uttered by 285 speakers of different dialects, ages and various educational levels (144 males and 141 females), recorded over the fixed telephone network. The database comprises 7,239 Chinese stocks. Speech samples are stored as a sequence of 16-bit 8kHz WAV for a total of 7 hours of speech. The total capacity of the data is 373 Mb.
Each speaker read 15-30 items. Text files are stored in Unicode format. All data have been proofread manually.
The transcriptions include non-speech markers (background noise, background speech, speaker sounds) as well as markers for mispronunciation, channel distortions, words left-out and duplicates.
The corpus aims to be applied to the testing and telephone natural speech recognition system.
C-004440: CHIL 2006 Evaluation Package
Multimodal/Multimedia Resources
The CHIL 2006 Evaluation Package was produced within the CHIL Project (Computers in the Human Interaction Loop), in the framework of an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. The objective of this project is to create environments in which computers serve humans who focus on interacting with other humans as opposed to having to attend to and being preoccupied with the machines themselves. Instead of computers operating in an isolated manner, and Humans [thrust] in the loop [of computers] we will put Computers in the Human Interaction Loop (CHIL).

In this context, the CHIL project produced CHIL Seminars. The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. During the talks, videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speakers voice and ambient sounds were recorded.

The CHIL Seminars have been compiled in four different packages, according to the evaluations for which they have been created and used:
- CHIL 2004 Evaluation Package (catalogue reference ELRA-E0009)
- CHIL 2005 Evaluation Package (catalogue reference ELRA-E0010)
- CHIL 2006 Evaluation Package (catalogue reference ELRA-E0017)
- CHIL 2007 Evaluation Package (catalogue reference ELRA-E0033)

The CHIL 2006 Evaluation Package consists of the following contents:
1) A set of audiovisual recordings of seminars, called non-interactive seminars and of highly-interactive small working groups seminars, called interactive seminars. The recordings were done between 2004 and 2005 according to the CHIL Room Setup specification.
2) Video annotations.
3) Orthographic transcriptions.
- hasVersion: C-001352: CHIL 2004 Evaluation Package
- hasVersion: C-001353: CHIL 2005 Evaluation Package
- hasVersion: C-004492: CHIL 2007 Evaluation Package
- hasVersion: C-004578: CHIL 2007+ Evaluation Package
C-004444: MIST Multi-lingual Interoperability in Speech Technology database
Desktop/Microphone
In 1996, some 75 Dutch people participated in recording a multi-purpose continuous speech database. Most of them were recruited from the TNO Human Factors Research Institute, where the recordings were made. The main part of the database consisted of Dutch sentences. However, most speakers participated in recording 10 sentences in English, French and German. This data was initially distributed as a common data set for research leading to presentations and discussions at the ESCA/NATO MIST workshop held in Leusen, The Netherlands, in 1999.

The non-nativeness in any particular language, for instance English, is of course very biased towards Dutch, and therefore this database can be considered only as a start for studying non-native speech. However, with experiences with this database, researchers in other countries may record similar data, so that also other foreign accents can be studied, and compared to this database.

Recording conditions:
- Sennheiser HMD-414-6 close talking microphone
- B&K MD-211-N far-field microphone
- anechoic silent recording room
- sentences read from computer screen
- Ariel Pro-Port digital recording equipment
- 16 kHz sampling rate, 16 bit resolution

Speech material
- 10 sentences in Dutch, English, French and German, including 5 sentences per language which are identical for all speakers and 5 sentences per language which are unique for each speaker
- Sentence text from newspapers: Dutch: NRC/Handelsblad; English: Wall Street Journal; French: Le Monde; German: Frankfurter Rundschau
The text of the English, French and German sentences were obtained from other databases recorded/used in the European project SQALE.

Annotation:
- Dutch sentences are orthographically annotated
- For English, French and German sentences the prompt texts are available
- Only the Dutch unique sentences have been listened to, and annotated accordingly. The English, French and German sentences have been generated from the prompt texts, i.e., only the punctuation characters have been removed. For French and English, the first word has been de-capitalized according to some simple algorithm.
- The spoken text is annotated in a format of one line per speech utterance, with the utterance identification in parenthesis at the end.

Speakers:
- 74 speakers, including 52 males and 22 females
- All speakers are native Dutch. Not all of them were able to produce speech in German, English and French.
C-004445: N4 (NATO Native and Non Native) database
Desktop/Microphone
Speech technology is covering an increasing number of languages, and systems are becoming more robust with regard to speech variability such as speaking style and accents. However, for real applications, especially in a multilingual and multinational context, further robustness to regional and even non-native accents is necessary. Among numerous corpora available for speech research few have specifically addressed this issue.

The NATO Speech and Language Technology group decided to create a corpus geared towards the study of non-native accents. The group chose naval communications as the common task because it naturally includes a great deal of non-native speech and because there were training facilities where data could be collected in several countries.

The N4 NATO Native and Non-Native Speech corpus was developed by the NATO research group on Speech and Language Technology in order to provide a military-oriented database for multilingual and non-native speech processing studies.

Speech data was recorded in the naval transmission training centers of four countries (Germany, The Netherlands, United Kingdom, and Canada) during naval communication training sessions in 2000-2002. The material consists of native and non-native speakers using NATO Naval English procedure between ships where the typical sentence sounds like This is alpha, whiskey, roger. I make two seven zero six hostile, two seven zero six. Out, and reading from a text, "The North Wind and the Sun," in both English and the speaker's native language.

The audio material was recorded on DAT and downsampled to 16kHz-16bit, and all the audio files have been manually transcribed and annotated with speakers identities using the Transcriber tool. Navy procedure recordings and text readings have been stored in different files. The first digit in the filename indicates the type of speech.

Among speech segments, the duration of Navy procedure recordings range from 1.3 to 2.3 hours for a total of 7.5 hours. The duration of the native language text readings range from 1.5 minutes to 22.9 minutes for a total of around one hour. <table border="0" width="100%" cellspacing="0" cellpadding="2" class="infoBoxContents">
<tr align=center><td> </td><td>Canada</td><td>Germany</td><td>The Netherlands</td><td>United Kingdom</td><td>All</td></tr>
<tr align=center><td align=left><strong>Signal</strong></td><td>5.30</td><td>3.20</td><td>5.00</td><td>6.30</td><td>19.80</td></tr>
<tr align=center><td align=right>Silence</td><td>3.00</td><td>0.56</td><td>2.00</td><td>4.70</td></tr>
<tr align=center><td align=right>Speech</td><td>2.30</td><td>2.64</td><td>3.00</td><td>1.60</td></tr>
<tr align=center><td align=left><strong>Speech</strong></td><td>2.30</td><td>2.64</td><td>3.00</td><td>1.60</td><td>9.54</td></tr>
<tr align=center><td align=right>Navy proc</td><td>2.00</td><td>1.90</td><td>2.30</td><td>1.30</td></tr>
<tr align=center><td align=right>Read text</td><td>0.30</td><td>0.74</td><td>0.70</td><td>0.30</td></tr>
<tr align=center><td align=left><strong>Read text</strong></td><td>0.30</td><td>0.74</td><td>0.70</td><td>0.30</td><td>2.04</td></tr>
<tr align=center><td align=right>Non-native</td><td>0.27</td><td>0.37</td><td>0.32</td><td>0.00</td></tr>
<tr align=center><td align=right>Native</td><td>0.03</td><td>0.37</td><td>0.38</td><td>0.30</td></tr>
</table>

The database contains the following information about each speaker: gender, age, weight, length, possible speaking or hearing disorders, education level, living area, accent, second language, the year English was learned(for non-native speakers). The speaker accents vary widely from country to country. The speaker's average age was 22.6 years. Nineteen women participated, accounting for 18% of the study participants. There were a total of 115 speakers. <table border="0" width="100%" cellspacing="0" cellpadding="2" class="infoBoxContents">
<tr align=center><td></td><td>Canada</td><td>Germany</td><td>The Netherlands</td><td>United Kingdom</td><td>All</td></tr>
<tr align=center><td align=left><strong>#Speakers</strong></td><td>22</td><td>51</td><td>31</td><td>11</td><td>115</td></tr>
<tr align=center><td align=left><strong>#Women</strong></td><td>5</td><td>0</td><td>9</td><td>5</td><td>19</td></tr>
<tr align=center><td align=left><strong>Age</strong></td><td>22-35</td><td>17-23</td><td>17-61</td><td>19-62</td><td>17-62</td></tr>
<tr align=center><td align=left><strong>Age mean</strong></td><td>28.3</td><td>20.1</td><td>21</td><td>27.5</td><td>22.6</td></tr>
</table>
C-004446: SpeechDat Catalan FDB database
Telephone
The SpeechDat Catalan FDB database contains the recordings of 1,005 Catalan speakers (474 males, 531 females) recorded over the Spanish fixed telephone network. The database is partitioned into 4 CD-ROMs, in ISO 9660 format.

Speech samples are stored as sequences of 8-bit 8 kHz A-law, uncompressed. Each prompted utterance is stored in a separate file, and each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items:
- 3 application words
- 1 sequence of 10 isolated digits
- 4 connected digits (prompt sheet number -6 digits, telephone number 9/11 digits, credit card number 14/16 digits, PIN code 6 digits)
- 3 dates (spontaneous date e.g. birthday, prompted date, relative and general date expression)
- 1 word spotting phrase using embedded application words
- 1 isolated digit
- 3 spelled words (1 surname, 1 directory assistance city name, 1 real/artificial name for coverage)
- 1 currency money amount
- 1 natural number
- 5 directory assistance names (1 spontaneous, e.g. own surname, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 forename surname out of a set of 150 )
- 2 yes/no questions (1 predominantly yes question, 1 predominantly no question, including fuzzy questions)
- 9 phonetically rich sentences
- 2 time phrases (1 spontaneous time of day, 1 word style time phrase)
- 4 phonetically rich words

The following age distribution has been obtained: 13 speakers are under 16, 473 are between 16 and 30, 286 are between 31 and 45, 192 are between 46 and 60, and 41 speakers are over 60.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
C-004447: TC-STAR 2007 Evaluation Package - ASR English
Desktop/Microphone
TC-STAR is a European integrated project focusing on Speech-to-Speech Translation (SST). To encourage significant breakthrough in all SST technologies, annual open competitive evaluations are organized. Automatic Speech Recognition (ASR), Spoken Language Translation (SLT) and Text-To-Speech (TTS) are evaluated independently and within an end-to-end system.

The third TC-STAR evaluation campaign took place in March 2007.
Three core technologies were evaluated during the campaign:
Automatic Speech Recognition (ASR),
Spoken Language Translation (SLT),
Text to Speech (TTS).

Each evaluation package includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the second evaluation campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.

This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for English. The same packages are available for both Spanish (ELRA-E0026-01 and E0026-02) and Mandarin (ELRA-E0027) for ASR and for SLT in 3 directions, English-to-Spanish (ELRA-E0028), Spanish-to-English (ELRA-E0029-01 and E0029-02), Chinese-to-English (ELRA-E0030).

To be able to chain the components, ASR, SLT and TTS evaluation tasks were designed to use common sets of raw data and conditions. Three evaluation tasks, common to ASR, SLT and TTS, were selected: EPPS (European Parliament Plenary Sessions) task, CORTES (Spanish Parliament Sessions) task and VOA (Voice of America) task. The CORTES data were used in addition to the EPPS data to evaluate ASR in Spanish and SLT from Spanish into English.

This package was used within the EPPS task and consists of one test data set, composed of audio recordings of Parliaments sessions from June to September 2006. The test data set is made of 3 hours (29,748 running words).
- hasVersion: C-004448: TC-STAR 2007 Evaluation Package - ASR Spanish - CORTES
- hasVersion: C-004449: TC-STAR 2007 Evaluation Package - ASR Spanish - EPPS
- hasVersion: C-004450: TC-STAR 2007 Evaluation Package - ASR Mandarin Chinese
- hasVersion: C-004451: TC-STAR 2007 Evaluation Package - SLT English-to-Spanish
- hasVersion: C-004452: TC-STAR 2007 Evaluation Package - SLT Spanish-to-English - CORTES
- hasVersion: C-004454: TC-STAR 2007 Evaluation Package - SLT Chinese-to-English
- hasVersion: C-004456: TC-STAR 2007 Evaluation Package End-to-End

SHACHI - Language Resource Metadata Database