Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 451 - 460 of 2023

C-000858: ETL Spoken Dialog Corpus (Town Guidance Task, Japanese)
A corpus that contains spontaneous speech data between human and machine using the WOZ technique. It is designed to collect data in order to analyze the elements that enable interaction between human and machine, such as taking turns naturally during the dialog, chiming in, interruption, and natural recovery from interruption. The corpus contains the data from forty speakers during 197 sessions. The net amount of time that was spent on dialog sessions was over 1,300 min. The corpus contains speech signal waves, pitch patterns, transcriptions, utterance segment boundaries, and semantic representations of the user utterance.
C-000866: The NICT JLE Corpus
The courpus of Japanese learners of English. The data is collected and evaluated according to SST（Standard Speaking Test). About 15 minutes interview with an informant is one session. The informant greets, introduce oneself, describes pictures, roll-plays, tell a story, and makes a closing conversation. The trancription is made from the speech data and annotated using the original XML editor. The data is ranked in 9 levels in accordance with the SST evaluation system.
- hasPart: Normative Corpus
- hasPart: back-translation corpus
C-000867: Tori-Bank
The data bank which records language knowledge database for natural language processing. It is developed by applying the Theory of Semantic Typology to the analysis of clause relations in complex and compound sentences.　(The Theory of Semantic Typology is a principle of the new machine translation system.)
- hasPart: D-000865: Semantic Typology Pattern Dictionary
- hasPart: D-000862: Japanese Semantic Pattern Dictionary
- hasPart: G-000864: Pattern parser program file
- hasPart: Pattern meaning search program file
C-000869: Priority Areas "Spoken Dialogue" Simulated Spoken Dialogue Corpus (PASD)
This document describes the "Simulated Spoken Dialogue Corpus" produced and edited by the Grant-in-Aid for Scientific Research on Priority Areas Project “Research on Understanding and Generating Dialogue by Integrated Processing of Speech, Language and Concept," (“Spoken Dialogue” project in short) which was carried out from fiscal 1993 to 1995, sponsored by the Ministry of Education, Science, Sports and Culture of Japan. The corpus contains the speech waves and their transcribed text of 93 played dialogues in Japanese on secretary system, appointment scheduling, telephone shopping tasks and so on. The total length of time for dialogues is about 450 minutes. The basic specifications of the corpus were reviewed by the "Spoken Dialogue Corpus Working Group" made up of researchers from 11 universities in the project. The working group produced 4 CD-ROMs containing all speech waves, transcribed texts and the HTML format files in which the transcribed utterances link to speech wave files.
C-000873: ACCOR - English
Desktop/Microphone
ACCOR is a unique acoustic and articulatory database recorded as part of the ESPRIT- ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. The European Languages covered are: Catalan, English, French, German, Irish Gaelic, Italian and Swedish.
Recording Conditions: Simultaneous digital recording of the acoustic signal and of additional channels for physiological and aerodynamic data. electropalatograph to measure the timing and location of tongue contacts with the palate, pneumotachograph with Rothenberg mask (for recording volume velocity of air flow from nose and mouth), laryngograph (for recording details of vocal fold vibration).
Sampling rates: Speech signal: 20,000 Hz; Laryngograph: 10,000 Hz; Oral air flow: 500 Hz; Nasal air flow: 500 Hz; EPG data: 200 Hz.
Corpora: a common corpus was used for all languages (with a few exceptions when sequences were not phonotactically permissible). It covers nonsense items: (Vowels /i, a, u/ in isolation, VCV sequences, where C= /p, b, t, d, k, s, z, n, l, S, tS/ and the sequences /kl, st/; V = /i, u, a/ ; real words which match the VCV nonsense sequences above as closely as possible; and short sentences constructed in each language to illustrate the main connected speech processes in that language (assimilations, weak forms, etc.).
Speakers: Five speakers from each language recorded a total of 10 repetitions of the full corpus. Five of these repetitions have electropalatography, electrolaryngography and audio signal data. The other five repetitions have electropalatography, electrolaryngography, audio signal, and pneumotachography (separate nasal and oral airflow velocity).

Currently, only English is available.
C-000874: APASCI
Desktop/Microphone
APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech.
The speech material was read by 100 Italian speakers (50 male and 50 female). Each of them uttered 1 calibration sentence, 4 sentences with a wide phonetic coverage, 15 or 20 sentences with a wide diphonic coverage. Six of these speakers (3 male and 3 female) read 26 occurrences of the calibration sentence, 104 sentences with a wide phonetic coverage, 390 sentences with a wide diphonic coverage. 54 of the speakers (42 male and 12 female) pronounced 20 repetitions of the 10 isolated digits.
The documentation of the database includes the transcription of each sentence both at phonemic and at orthographic levels.
This database allows to design, train and evaluate continuous speech recognition systems (speaker independent, speaker adaptive, speaker dependent, multispeakers). It was also designed for research on acoustic modelling as well as on acoustic parameters for speech recognition and for research on speaker recognition.
Format: 16 bit linear
Standard: NIST SPHERE
Sampling rate: 16 kHz
Medium: CD-ROM
C-000875: AURORA Project database - Subset of SpeechDat-Car - Danish database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

- ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm

- ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car :

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic
C-000876: AURORA Project database - Subset of SpeechDat-Car - Finnish database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.

The two work items within ETSI are:
- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in the following driving conditions inside a car:

1. 0 km/hr with the car engine on
2. 40-60 km/hr with the car windows closed
3. 40-60 km/hr with the car windows open
4. 100-120km/hr with no music in the background
5. 100-120km/hr with music in the background

The database also contains the software needed to run simulations using the Entropic's HTK, which has been adopted as the "standard" HMM recogniser for the Aurora standard evaluation
C-000877: AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.

The two work items within ETSI are:
- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic
C-000878: AURORA Project database - Subset of SpeechDat-Car - Italian database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

- ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm

- ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the Italian SpeechDat-Car database which has been collected as part of the European Union funded SpeechDat-Car project. It contains contains 2200 Italian connected digit utterances divided into training and testing utterances in the following noise and driving conditions inside a car :

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic

SHACHI - Language Resource Metadata Database