Language resource #: 3330 Results 1751 - 1760 of 2023
Current query
Input keywords
Select items
  • C-004461: TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech
    Desktop/Microphone
    TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

    This corpus consists of the recordings of around 290 hours from EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English), 92 hours of which were annotated (transcribed) (the transcriptions are not included in the present package). These recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from May 2004 until May 2006.

    The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

    The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.

    For corresponding transcriptions, see ELRA-S0249.
  • C-004462: TC-STAR Spanish Training Corpora for ASR: Recordings of EPPS Speech
    Desktop/Microphone
    TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

    This corpus consists of the recordings of around 283 hours from EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European Spanish (a mixture of native and non-native Spanish), 62 hours of which were annotated (transcribed) within the project (the transcriptions are not provided in the present package but will be made available soon). These recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from May 2004 until May 2006.

    The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.
  • C-004463: TC-STAR English Test Corpora for ASR
    Desktop/Microphone
    TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

    This corpus consists of 70 hours of recordings of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English and other European languages. From this corpus, 16 hours of English speeches (native or non native) were annotated (transcribed). Transcriptions are included in the present package. The data comprises the test (development and evaluation) data for the TC-STAR project in the years 2005, 2006, and 2007. The recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from Oct. until Nov. 2004, June to Nov. 2005, and June until July 2006. The transcription files are stored in Transcriber XML file format.

    The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

    The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
  • C-004464: TC-STAR Spanish Test Corpora for ASR
    Desktop/Microphone
    TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

    This corpus consists of 174 hours of recordings of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European Spanish and other European languages. From this corpus, 16 hours of Spanish speeches were annotated (transcribed). Transcriptions are included in the present package. The data comprises the test (development and evaluation) data for the TC-STAR project in the years 2005, 2006, and 2007. The recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from Oct. until Nov. 2004, June to Nov. 2005, and June until Sept. 2006. The transcription files are stored in Transcriber XML file format.

    The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

    The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.
  • C-004465: Hungarian SpeechDat(E) Database
    Telephone
    The Hungarian SpeechDat(E) Database (Eastern European Speech Database) comprises 1000 speakers (511 males, 489 females) recorded over the local fixed telephone network. This database is partitioned into 5 CDs. The speech databases made within the SpeechDat(E) project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat(E) format and content specifications.

    The speech files are stored as sequences of 8-bit, 8kHz A-law speech files, according to the specifications of SpeechDat(E). Each utterance is stored within a separate file and has an accompanying ASCII SAM label file.

    Corpus contents:
    • 6 application words;
    • 1 sequence of 10 isolated digits;
    • 4 connected digits: 1 sheet number (5+ digits), 1 telephone number (9-11 digits), 1 credit card number (14/16 digits), 1 PIN code (6 digits);
    • 3 dates: 1 spontaneous date (birthday), 1 prompted date (word style), 1 relative and general date expression;
    • 1 spotting phrase using an application word (embedded);
    • 1 isolated digit;
    • 3 spelled words (letter sequences): 1 spontaneous e.g. own forename; 1 spelling of city name; 1 real/artificial name
    • for coverage;
    • 2 currency money amounts: 1 local money amount, 1 international money amount (USD, EURO)
    • 1 natural number;
    • 6 directory assistance names: 1 spontaneous, e.g. own forename; 1 city of birth / growing up (spontaneous); 1 most frequent city
    • (out of 500); 1 most frequent company/agency (out of 500); 1 "forename surname" (set of 150 ), 1 "surname" (set of 150 )
    • 2 questions, including fuzzy yes/no: 1 predominantly yes question, 1 predominantly no question;
    • 12 phonetically rich sentences;
    • 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style);
    • 4 phonetically rich words.

    The following age distribution has been obtained: 92 speakers are below 16 years old, 450 speakers are between 16 and 30, 230 speakers are between 31 and 45, 210 speakers are between 46 and 60, 18 speakers are over 60.

    A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
  • C-004466: UPC-TALP database of isolated meeting-room acoustic events
    Desktop/Microphone
    This database was produced within the CHIL Project (Computers in the Human Interaction Loop), in the framework of an Integrated Project (IP 506909) under the European Commission's Sixth Framework Programme. It contains a set of isolated acoustic events that occur in a meeting room environment and that were recorded for the CHIL Acoustic Event Detection (AED) task. The recorded sounds do not have temporal overlapping. The database can be used as training material for AED technologies as well as for testing AED algorithms in quiet environments without temporal sound overlapping.

    The database contains signals corresponding to 23 audio channels with corresponding labels (out of 84 channels used in the whole CHIL task). The 23 audio channels correspond to: 12 microphones of the 3 T-shaped clusters, 4 tabletop omni directional microphones, and 7 channels of the Mark III array.

    Data was recorded at 44.1kHz, 24-bit precision, and then converted to 16-bit Raw Little Endian format. All the channels were synchronized. During all recordings two-three additional people were inside the room for a more realistic scenario.

    Approximately 60 sounds per sound class were recorded. Each session was produced by the same ten people (5 men and 5 women). There are 3 sessions per participant. At each session, the participant took a different place in the room out of 7 fixed different positions. During each session a person had to produce a complete set of sounds twice. A script indicating the order of events to be produced was given to each participant. Almost each event was followed and preceded by a pause of several seconds. All sounds were produced individually, except “applause” and several “laugh” that were produced by the people that were inside the room altogether. The annotation was done manually.

    The database is stored on 3 DVDs (one session per DVD).

    The following table summarizes the content of the DVDs and shows the number of annotated acoustic events in each session:
    <table border="0" width="100%" cellspacing="0" cellpadding="2" class="infoBoxContents">
    <tr align=center><td align=left><strong>Event type</strong></td><td><strong>Session 1</strong></td><td><strong>Session 2</strong></td><td><strong>Session 3</strong></td></tr>
    <tr align=center><td align=left>Knock (door, table)</td><td>15</td><td>18</td><td>17</td></tr>
    <tr align=center><td align=left>Door open</td><td>20</td><td>20</td><td>20</td></tr>
    <tr align=center><td align=left>Door close</td><td>20</td><td>21</td><td>20</td></tr>
    <tr align=center><td align=left>Steps</td><td>28</td><td>24</td><td>21</td></tr>
    <tr align=center><td align=left>Chair moving</td><td>23</td><td>28</td><td>25</td></tr>
    <tr align=center><td align=left>Spoon (cup jingle)</td><td>23</td><td>21</td><td>24</td></tr>
    <tr align=center><td align=left>Paper work (listing, wrapping)</td><td>31</td><td>29</td><td>24</td></tr>
    <tr align=center><td align=left>Key jingle</td><td>21</td><td>21</td><td>23</td></tr>
    <tr align=center><td align=left>Keyboard typing</td><td>21</td><td>25</td><td>20</td></tr>
    <tr align=center><td align=left>Phone ringing/Music</td><td>37</td><td>36</td><td>43</td></tr>
    <tr align=center><td align=left>Applause</td><td>20</td><td>20</td><td>20</td></tr>
    <tr align=center><td align=left>Cough</td><td>22</td><td>22</td><td>21</td></tr>
    <tr align=center><td align=left>Laugh</td><td>22</td><td>21</td><td>21</td></tr>
    <tr align=center><td align=left>Unknown</td><td>38</td><td>46</td><td>42</td></tr>
    </table>
  • C-004469: Slovenian BNSI Broadcast News Speech Corpus
    Broadcast Resources
    This speech database consists of TV news shows (both evening news, “TV Dnevnik” and late night news, “Odmevi”), from the archive of a Slovenian national broadcaster RTV Slovenia. The recordings took place between June 1999 and May 2003.

    The database comprises a total of 36 hours of recordings (training set: 30 hours, development set: 3 hours and test set: 3 hours), transcribed and manually checked using the Transcriber tool. Transcription conventions are based on documents defined by LDC, LIMSI and COST 278 BN SIG. There are 268,000 words in transcriptions, out of which 37,000 are distinct words. The transcription files contain: orthographic transcriptions, information on acoustic conditions and background, segmentation on turn and section level. The topic is described and marked (25 topic categories) for each section of news show. Speaker information consists of gender, speaking style, accent and origin.

    1,565 speakers were recorded (1,069 males, 477 females, 19 unspecified).

    The speech signal is as follows: 16kHz, 16 bit, WAV, 1 channel.
  • C-004471: Swedish EUROM1
    Desktop/Microphone
    EUROM1 is the first really multilingual speech database produced in Europe. Equivalent corpora for each of the European languages were collected with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats. Initially eight European countries have made recordings: Italy, United Kingdom, Germany, Netherlands, Denmark, Sweden, Norway, France. Additional recordings have been then completed (thanks to CEE Esprit Project SAM-A), in Greece, Spain and Portugal. More than sixty speakers were recorded per language.

    The content consists of:
    1) Continuous speech:
    - 40 passages made of five task related sentences.
    - 50 patching sentences, designed to compensate for uneven phoneme distribution in the passage material.
    2) Numbers:
    The numbers were divided into five blocks, each containing twenty numbers. Each block was recorded as one single take.
    3) CVC words:
    The CVC word lists contain five list types and also carrier phrases of the suggested type. Eighty-two isolated words were used.
  • C-004472: SpeechDat Galician Database for the Fixed Telephone Network
    Telephone
    The SpeechDat Galician Database for the Fixed Telephone Network contains the recordings of 653 speakers (217 males, 436 females) of Galician recorded over the fixed telephone network. This database is partitioned into 3 CDs. The database complies with the common specifications created in the SpeechDat project.

    Speech samples are stored as sequences of 8-bit 8 kHz A-law. Each prompted utterance is stored in a separate file. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.

    Each speaker uttered the following 44 items:

    – 3 common application words
    – 1 sequence of isolated digits
    – 4 digit strings : prompt sheet number, telephone number, credit card number, PIN code
    – 1 spontaneous phone number
    – 1 spontaneous PIN code (8 digits)
    – 3 dates : spontaneous, date (birth date), Prompted date (word style), relative and general date expr.
    – 1 application word phrase
    – 1 isolated digit
    – 3 spelled word : spontaneous, spelled own forename, spelled directory city name, spelled real/artificial words
    – 1 money amount
    – 2 natural numbers
    – 5 directory assistance: forename (spontaneous), city of origin (spontaneous), country name (most frequent city), most frequent company/agency name, forename &amp; surname (out of 500), surname (out of 76), “forename surname” (spontaneous)
    – 2 spontaneous yes/no questions
    – 10 phonetically rich sentences
    – 2 time phrases : time of day (spontaneous), time phrase
    – 4 phonetically rich words

    The following age distribution has been obtained: 12 speakers are under 16, 375 are between 16 and 30, 164 are between 31 and 45, 88 are between 46 and 60, and 9 speakers are over 60. (The age of 5 speakers was not defined).

    A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
  • C-004473: SmartWeb Handheld Corpus (SHC)
    Desktop/Microphone
    The SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004-2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include field recordings using a hand-held UMTS device (one person, SmartWeb Handheld Corpus SHC, ref. ELRA-S0278), field recordings with video capture of the primary speaker and a secondary speaker (SmartWeb Video Corpus SVC, ref. ELRA-S0279), as well as mobile recordings performed on a BMW motorbike (one speaker, SmartWeb Motorbike Corpus SMC, ref. ELRA-S0280).

    This corpus corresponds to the hand-held UMTS device (SmartWeb Handheld Corpus) and contains recordings spoken by 156 speakers in a human-machine query situation. Users were asked to solve several tasks with a spoken query system to the WWW using a smart phone as portable device in natural environments (office, hall, restaurant, street). Recorded channels are the Bluetooth headset over UMTS (telephone quality), the Bluetooth headset and an additional collar microphone in high quality.

    The corpus contains:
    - Total number of recorded queries: 10,966
    - Total duration segmented speech: 1835 minutes
    - Formats: WAV 44,1kHz, 16 bit, ALAW 8kHz 8bit, Verbmobil transliteration, BAS Partitur Format (BPF)
    - Segmentation: automatic segmentation into queries by the recording server
    - Distribution: 15 DVD-R

    See also ELRA-S0279 and ELRA-S0280.