言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 451 - 460 件目

C-000858: 電総研道案内対話音声コーパス
本コーパスは、Wizard of OZ法によって、自動推論エンジンを実装した機械と人間との間の、道案内についての対話を記録したものである。このコーパスは、人間と機械の間の自然なやりとりを可能にさせる要素、たとえば、発話の番の交換・うなづき・割り込み・割り込みへの適切な対応などを分析できるように設計されている。
C-000866: ＮＩＣＴＪＬＥコーパス
ＳＳＴの受験者１２８１人の協力を得て、独立行政法人・情報通信研究機構が主体となって作成した、日本人英語学習者の話し言葉を収集したものが『ＮＩＣＴＪＬＥコーパス』です。学習者の発話コーパスとしては現在、世界最大規模といわれ、言語研究の分野で注目を浴びています。
- hasPart: Normative Corpus （正解コーパス）
- hasPart: 日本語訳コーパス (back-translation corpus)
C-000867: 鳥バンク(Tori-Bank)
「鳥バンク」は、自然言語処理のための言語知識ベースを収録したデータバンクです。日本語重文と複文を対象とする「意味類型パターン辞書（22.7万件）」及び、それに関連したデータやドキュメントが収録されています。著作権等は「日本語表現意味辞書等管理委員会（代表池原悟）」が管理し、提供に関する業務は、事務局（株.学際統合創研）が代行しています。

鳥バンクは、日本語表現意味辞書などの知的成果を広く流通・利用されて開かれた言語コミュニティーの形成・拡大・発展に資し、以って言語文化等の発展に寄与るために、著作者等の厚意により原則無償で（利用態様により実費を徴することがあります）提供される知的財産です。但し、当面は、研究開発目的で具体的な利用計画等がある研究開発者に限ります。（注）
- hasPart: 意味類型パターン辞書ファイル
- hasPart: 日本語意味分類辞書ファイル
- hasPart: G-000864: パターンパーサ・プログラムファイル
- hasPart: G-000863: パターン意味検索プログラムファイル
C-000869: 重点領域研究「音声対話」対話音声コーパス
計93対話の音声データと書き起こしテキスト
- isVersionOf: 京都大学収録分
- isVersionOf: 大阪大学収録分
- isVersionOf: 千葉大学収録分
- isVersionOf: 筑波大学収録分
- isVersionOf: 電気通信大学収録分
- isVersionOf: 早稲田大学収録分
C-000873: ACCOR - English
Desktop/Microphone
ACCOR is a unique acoustic and articulatory database recorded as part of the ESPRIT- ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. The European Languages covered are: Catalan, English, French, German, Irish Gaelic, Italian and Swedish.
Recording Conditions: Simultaneous digital recording of the acoustic signal and of additional channels for physiological and aerodynamic data. electropalatograph to measure the timing and location of tongue contacts with the palate, pneumotachograph with Rothenberg mask (for recording volume velocity of air flow from nose and mouth), laryngograph (for recording details of vocal fold vibration).
Sampling rates: Speech signal: 20,000 Hz; Laryngograph: 10,000 Hz; Oral air flow: 500 Hz; Nasal air flow: 500 Hz; EPG data: 200 Hz.
Corpora: a common corpus was used for all languages (with a few exceptions when sequences were not phonotactically permissible). It covers nonsense items: (Vowels /i, a, u/ in isolation, VCV sequences, where C= /p, b, t, d, k, s, z, n, l, S, tS/ and the sequences /kl, st/; V = /i, u, a/ ; real words which match the VCV nonsense sequences above as closely as possible; and short sentences constructed in each language to illustrate the main connected speech processes in that language (assimilations, weak forms, etc.).
Speakers: Five speakers from each language recorded a total of 10 repetitions of the full corpus. Five of these repetitions have electropalatography, electrolaryngography and audio signal data. The other five repetitions have electropalatography, electrolaryngography, audio signal, and pneumotachography (separate nasal and oral airflow velocity).

Currently, only English is available.
C-000874: APASCI
Desktop/Microphone
APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech.
The speech material was read by 100 Italian speakers (50 male and 50 female). Each of them uttered 1 calibration sentence, 4 sentences with a wide phonetic coverage, 15 or 20 sentences with a wide diphonic coverage. Six of these speakers (3 male and 3 female) read 26 occurrences of the calibration sentence, 104 sentences with a wide phonetic coverage, 390 sentences with a wide diphonic coverage. 54 of the speakers (42 male and 12 female) pronounced 20 repetitions of the 10 isolated digits.
The documentation of the database includes the transcription of each sentence both at phonemic and at orthographic levels.
This database allows to design, train and evaluate continuous speech recognition systems (speaker independent, speaker adaptive, speaker dependent, multispeakers). It was also designed for research on acoustic modelling as well as on acoustic parameters for speech recognition and for research on speaker recognition.
Format: 16 bit linear
Standard: NIST SPHERE
Sampling rate: 16 kHz
Medium: CD-ROM
C-000875: AURORA Project database - Subset of SpeechDat-Car - Danish database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

- ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm

- ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in the following noise and driving conditions inside a car :

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic
C-000876: AURORA Project database - Subset of SpeechDat-Car - Finnish database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.

The two work items within ETSI are:
- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in the following driving conditions inside a car:

1. 0 km/hr with the car engine on
2. 40-60 km/hr with the car windows closed
3. 40-60 km/hr with the car windows open
4. 100-120km/hr with no music in the background
5. 100-120km/hr with music in the background

The database also contains the software needed to run simulations using the Entropic's HTK, which has been adopted as the "standard" HMM recogniser for the Aurora standard evaluation
C-000877: AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.

The two work items within ETSI are:
- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm
- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic
C-000878: AURORA Project database - Subset of SpeechDat-Car - Italian database - Evaluation Package
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are :

- ETSI DES/STQ WI007 : Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm

- ETSI DES/STQ WI008 : Distributed Speech Recognition - Advanced Feature Extraction Algorithm.

This database is a subset of the Italian SpeechDat-Car database which has been collected as part of the European Union funded SpeechDat-Car project. It contains contains 2200 Italian connected digit utterances divided into training and testing utterances in the following noise and driving conditions inside a car :

1. High speed good road
2. Low speed rough road
3. Stopped with motor running
4. Town traffic

SHACHI - Language Resource Metadata Database