Language resource #: 3330
Results 1631 - 1640 of 2023
-
C-004285: Keio University Japanese Emotional Speech Database
A set of human speech with vocal emotion spoken by a Japanese male speaker and a set of artificial speech that were synthesized by a system that had been developed using the subset of this database for training.
-
C-004288: Tokyo Institute of Technology Multilingual Speech Corpus - Icelandic
The Icelandic speech corpus was developed for training the acoustic models of an automatic speech recognition system. The database contains 3 kinds of read speech; Icelandic bi-phonetically balanced sentences, weather information related questions and sentences from news domain.
- references: JUPITER corpus
- hasVersion: C-004289: Tokyo Institute of Technology Multilingual Speech Corpus - Indonesian
-
C-004289: Tokyo Institute of Technology Multilingual Speech Corpus - Indonesian
The corpus was developed for training the acoustic models of an automatic speech recognition system. The database contains Bahasa Indonesia speech data from 20 Indonesian speakers. Each speaker was asked to read 343 phonetically balanced sentences.
-
C-004291: AWA Long-Term Recording Speech Corpus
The corpus contains speech data of the same person recorded periodically (once a week in the morning, afternoon and evening) over 2-10 years. In this first distribution, the corpus contains only a one-year set of a male speaker. The dataset also contains supplemental information including room temperature, humidity and speaker's physical condition.
- references: ATR 503 Phonetically Balanced Sentences
-
C-004293: Speech Database of the 1991-1992 Tsuruoka Survey
The database contains speech material recorded in the investigation of standardization of dialects in Tsuruoka, Yamagata. Each investigator interviewed the informant according to the investigation forms with question - answer mode. Answers to 78 questions regarding pronunciation, accent, and vocabulary were recorded.
-
C-004295: Vowel Database: Five Japanese Vowels of Males, Females, and Children Along with Relevant Physical Data
This corpus has been developed in order to make the standard scientific material of spoken Japanese. The speech data of men, women, and children ranging between 6 and 56 years of age were edited into files containing /haa, hii, huu, hee, hoo/.
-
C-004297: Reverberant Speech Recognition Evaluation Environment (CENSREC-4)
CENSREC-4 is a common platform for evaluating independently speech recognition accuracy and speech interval detection under noisy environment. The target evaluation framework is distant talking speech recognition in various reverberation environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1.
- hasVersion: C-003254: CENSREC-1 (AURORA-2-J) Noisy Speech Recognition Evaluation Environments
- hasVersion: C-003255: CENSREC-1-C Noisy Speech Detection Evaluation Environments
- hasVersion: C-003256: CENSREC-2 In-car Spoken Digits Data and Environments for Noisy Speech Recognition
- hasVersion: C-003257: CENSREC-3 In-car Isolated Words Data and Environments for Noisy Speech Recognition
-
C-004299: Chiba University Japanese Map Task Dialogue Corpus (MapTask)
The corpus contains task-oriented dialogues using maps, in which two speakers participate; an instruction-giver who has a map with a route and an instruction-follower who has a map without a route. The giver instructs the follower verbally to reconstruct the giver's route on the follower's map.
-
C-004300: Yahoo! Semantically Annotated Snapshot of the English Wikipedia, version 1.0
The dataset contains a snapshot of the English Wikipedia processed with a number of publicly-available NLP tools. The dataset contains 1,490,688 entries (excluding redirects). It was built by extracting texts from the XML entry and split into sentences using simple heuristics, and running several syntactic and semantic NLP taggers on it and collected their output.
-
C-004301: Yahoo! Answers Manner Questions, version 2.0
The corpus is a subset of the Yahoo! Answers corpus from a 10/25/2007 dump, containing 142,627 questions and their answers. It is a small subset of the questions, selected for their linguistic properties. Questions and answers of obvious low quality had been removed. The corpus also contains a small amount of metadata, i.e., which answer was selected as the best answer, and the category and sub-category that was assigned to this question.