Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1631 - 1640 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004285: Keio University Japanese Emotional Speech Database
A set of human speech with vocal emotion spoken by a Japanese male speaker and a set of artificial speech that were synthesized by a system that had been developed using the subset of this database for training.
C-004288: Tokyo Institute of Technology Multilingual Speech Corpus - Icelandic
The Icelandic speech corpus was developed for training the acoustic models of an automatic speech recognition system. The database contains 3 kinds of read speech; Icelandic bi-phonetically balanced sentences, weather information related questions and sentences from news domain.
- references: JUPITER corpus
- hasVersion: C-004289: Tokyo Institute of Technology Multilingual Speech Corpus - Indonesian
C-004289: Tokyo Institute of Technology Multilingual Speech Corpus - Indonesian
The corpus was developed for training the acoustic models of an automatic speech recognition system. The database contains Bahasa Indonesia speech data from 20 Indonesian speakers. Each speaker was asked to read 343 phonetically balanced sentences.
- hasVersion: C-004288: Tokyo Institute of Technology Multilingual Speech Corpus - Icelandic
C-004291: AWA Long-Term Recording Speech Corpus
The corpus contains speech data of the same person recorded periodically (once a week in the morning, afternoon and evening) over 2-10 years. In this first distribution, the corpus contains only a one-year set of a male speaker. The dataset also contains supplemental information including room temperature, humidity and speaker's physical condition.
- references: ATR 503 Phonetically Balanced Sentences
C-004293: Speech Database of the 1991-1992 Tsuruoka Survey
The database contains speech material recorded in the investigation of standardization of dialects in Tsuruoka, Yamagata. Each investigator interviewed the informant according to the investigation forms with question - answer mode. Answers to 78 questions regarding pronunciation, accent, and vocabulary were recorded.
C-004295: Vowel Database: Five Japanese Vowels of Males, Females, and Children Along with Relevant Physical Data
This corpus has been developed in order to make the standard scientific material of spoken Japanese. The speech data of men, women, and children ranging between 6 and 56 years of age were edited into files containing /haa, hii, huu, hee, hoo/.
C-004297: Reverberant Speech Recognition Evaluation Environment (CENSREC-4)
CENSREC-4 is a common platform for evaluating independently speech recognition accuracy and speech interval detection under noisy environment. The target evaluation framework is distant talking speech recognition in various reverberation environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1.
C-004299: Chiba University Japanese Map Task Dialogue Corpus (MapTask)
The corpus contains task-oriented dialogues using maps, in which two speakers participate; an instruction-giver who has a map with a route and an instruction-follower who has a map without a route. The giver instructs the follower verbally to reconstruct the giver's route on the follower's map.
C-004300: Yahoo! Semantically Annotated Snapshot of the English Wikipedia, version 1.0
The dataset contains a snapshot of the English Wikipedia processed with a number of publicly-available NLP tools. The dataset contains 1,490,688 entries (excluding redirects). It was built by extracting texts from the XML entry and split into sentences using simple heuristics, and running several syntactic and semantic NLP taggers on it and collected their output.
C-004301: Yahoo! Answers Manner Questions, version 2.0
The corpus is a subset of the Yahoo! Answers corpus from a 10/25/2007 dump, containing 142,627 questions and their answers. It is a small subset of the questions, selected for their linguistic properties. Questions and answers of obvious low quality had been removed. The corpus also contains a small amount of metadata, i.e., which answer was selected as the best answer, and the category and sub-category that was assigned to this question.
- isPartOf: C-004302: Yahoo! Answers Comprehensive Questions and Answers version 1.0

SHACHI - Language Resource Metadata Database