Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 241 - 250 of 2023

C-000538: ATR Digital Speech Database (set A)
Japanese sound data
20 speakers
about8500 words
- hasVersion: C-000539: ATR Digital Speech Database (set B)
- hasVersion: C-000540: ATR Digital Speech Database (set C)
- hasVersion: C-000541: ATR Digital Speech Database (set D)
- hasVersion: C-000542: ATR Digital Speech Database (set E)
- hasVersion: C-000543: ATR Digital Speech Database (set F)
C-000539: ATR Digital Speech Database (set B)
Japanese sound data
About 10000 sentences were chosen randamly from newspapers, magazines, novels, letters,textbooks, research papers and 503 sentences were made in phonemically well balanced environment.10 speakers.
- hasVersion: C-000543: ATR Digital Speech Database (set F)
- hasVersion: C-000538: ATR Digital Speech Database (set A)
- hasVersion: C-000540: ATR Digital Speech Database (set C)
- hasVersion: C-000541: ATR Digital Speech Database (set D)
- hasVersion: C-000542: ATR Digital Speech Database (set E)
C-000540: ATR Digital Speech Database (set C)
Japanese sound data
- hasVersion: C-000543: ATR Digital Speech Database (set F)
- hasVersion: C-000538: ATR Digital Speech Database (set A)
- hasVersion: C-000539: ATR Digital Speech Database (set B)
- hasVersion: C-000541: ATR Digital Speech Database (set D)
- hasVersion: C-000542: ATR Digital Speech Database (set E)
C-000541: ATR Digital Speech Database (set D)
12 research papers chosen from textbooks of junior high schools and NHK TV materials. 519sentences.2speakers.2titles.
- hasVersion: C-000543: ATR Digital Speech Database (set F)
- hasVersion: C-000538: ATR Digital Speech Database (set A)
- hasVersion: C-000539: ATR Digital Speech Database (set B)
- hasVersion: C-000540: ATR Digital Speech Database (set C)
- hasVersion: C-000542: ATR Digital Speech Database (set E)
C-000542: ATR Digital Speech Database (set E)
Most frequently used 5156 words
phonemically well-balanced 200 short sentences
4 speakers
4 titles
- hasVersion: C-000543: ATR Digital Speech Database (set F)
- hasVersion: C-000538: ATR Digital Speech Database (set A)
- hasVersion: C-000539: ATR Digital Speech Database (set B)
- hasVersion: C-000540: ATR Digital Speech Database (set C)
- hasVersion: C-000541: ATR Digital Speech Database (set D)
C-000543: ATR Digital Speech Database (set F)
The following data is recorded:phonemically well-balanced sentences,frequently used words with foreign syllables,600 technical testing sentences made for evaluation on the sound translation system. Sound segment labels come with the spoken sounds. 6 speakers.1122 sentences.6 titles.
- hasVersion: C-000542: ATR Digital Speech Database (set E)
- hasVersion: C-000538: ATR Digital Speech Database (set A)
- hasVersion: C-000539: ATR Digital Speech Database (set B)
- hasVersion: C-000540: ATR Digital Speech Database (set C)
- hasVersion: C-000541: ATR Digital Speech Database (set D)
C-000549: ASJ Japanese Newspaper Article Sentences Read Speech Corpus
It's a phonetic database of Mainichi Newspapers Articles and phonically balanced sentences spoken by 306 speakers.Recording was done using 2 types of microphones, hedset type(Sennheiser HMD410/HMD25-1) and desktop microphone（Sanken, Sony, etc). 16kHzsampling, 16 bit quantization, A/D conversion, NIST SPHERE header for speech wave.
- references: C-001603: CD-ROM Mainichi Shimbun '94 Data Collection
- references: C-001599: CD-Mainichi Shimbun '93 Data Collection
- references: C-001602: CD-ROM Mainichi Shimbun '92 Data Collection
- references: C-000838: DCS - Mainichi Newspaper 1991-2006 data files
C-000551: Corpus of Spontaneous Japanese
CSJ, or Corpus of Spontaneous Japanese, is a large-scale annotated corpus of spontaneous Japanese. CSJ is an outcome of Japan's national priority-area research project known as Spontaneous Speech: Corpus and Processing Technology (1999-2003) supported by the Ministry of Education, Culture, Sports, Science and Technology. This is a collaborative work of the National Institute for Japanese Language (NIJLA), the Communications Research Laboratory (CRL), and the Tokyo Institute of Technology (TITech). The project supervisor is professor Sadaoki Furui of TITech.Manual is available from http://www.kokken.go.jp/katsudo/seika/corpus/releaseinfo/#%83}%83j%83%85%83A%83%8B
- replaces: monitor version 2001
- replaces: monitor version 2002
C-000553: Simultaneous Interpretation Database (speech)
It's a corpus which has built simultaneous interpretation(self talk&conversation)for 5 years from 1999 to 2003at CIAIR of Nagoya University. Overall it contains approximately 182 hours sound recorded, and has finished making the scripts, visualized the recordings and analyzing the language.The numbers of words(form elements)of the dictated script data is about 1 million and it's the biggest simultaneous interpretation corpus in the world.
- hasVersion: C-003270: Simultaneous Interpretation Database
- hasVersion: C-000464: Simultaneous Interpretation Database (conversation)
C-000560: 1996 English Broadcast News Dev and Eval (HUB4)
LDC97S44 - Speech data LDC97S66 - Dev and eval LDC97T22 - Transcripts

*Introduction*

The 1996 Broadcast News Speech Corpus contains a total of 104 hours of broadcasts from ABC, CNN and CSPAN television networks and NPR and PRI radio networks with corresponding transcripts. The primary motivation for this collection is to provide training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain.

*Data*

The speech files are available in a 19 disc training data set with one additional disc of development data and an additional disc of evaluation data. The following programs are represented in this corpus:

* ABC Nightline
* ABC World Nightly News
* ABC World News Tonight
* CNN Early Edition
* CNN Early Prime News
* CNN Headline News
* CNN Prime Time News
* CNN The World Today
* CSPAN Washington Journal
* NPR All Things Considered
* NPR Marketplace
Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries, and gender information about the speakers. The released version of the transcripts is in SGML format and there is accompanying documentation and an SGML DTD file, included with the transcription release. The transcripts are available via FTP.

*Updates*

There are no updates at this time.

*Pricing*

The Reduced Licensing Fee for this corpus is US$200.
- isPartOf: C-000561: 1996 English Broadcast News Speech (HUB4)
- hasFormat: C-000562: 1996 English Broadcast News Transcripts (HUB4)
- isReferencedBy: David Graff, et al. 1997 1996 English Broadcast News Dev and Eval (HUB4) Linguistic Data Consortium, Philadelphia
- isReferencedBy: Specification of the 1996 HUB4 Broadcast News Evaluation(http://www.nist.gov/speech/publications/darpa97/pdf/stern1.pdf)

SHACHI - Language Resource Metadata Database