言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1141 - 1150 件目

C-003377: Danish Speecon Database
Desktop/Microphone
The Danish Speecon database comprises the recordings of 550 adult speakers and 50 child speakers who uttered respectively over 290 items and 210 items (read and spontaneous).

Prices available upon request. Please contact us.
- hasVersion: C-000095: Mandarin Chinese Speecon database
- hasVersion: C-000120: Portuguese Speecon database
- hasVersion: C-000136: Spanish Speecon database
- hasVersion: C-001554: US Spanish Speecon database
- hasVersion: C-000415: German Speecon database
- hasVersion: C-000936: Finnish Speecon database
- hasVersion: C-000941: French Speecon database
- hasVersion: C-000946: Hebrew Speecon database
- hasVersion: C-000952: Italian Speecon database
- hasVersion: C-000955: Korean Speecon database
- hasVersion: C-000974: Polish Speecon database
- hasVersion: C-000977: Russian Speecon database
- hasVersion: C-000995: Swedish Speecon database
- hasVersion: C-001000: Turkish Speecon database
- hasVersion: C-001002: UK English Speecon database
- hasVersion: C-001553: US English Speecon database
- hasVersion: C-001237: Taiwan Mandarin Speecon database
- hasVersion: C-001530: Swiss-German Speecon database
- hasVersion: C-003376: Japanese Speecon database
- hasVersion: C-003380: French-Canadian Speecon database
- hasVersion: C-003379: Dutch from Belgium Speecon Database
- hasVersion: C-003378: Dutch from the Netherlands Speecon Database
C-003378: Dutch from the Netherlands Speecon Database
Desktop/Microphone
The Dutch from the Netherlands Speecon database comprises the recordings of 550 adult speakers and 50 child speakers who uttered respectively over 290 items and 210 items (read and spontaneous).

Prices available upon request. Please contact us.
- hasVersion: C-000095: Mandarin Chinese Speecon database
- hasVersion: C-000120: Portuguese Speecon database
- hasVersion: C-000136: Spanish Speecon database
- hasVersion: C-001554: US Spanish Speecon database
- hasVersion: C-000415: German Speecon database
- hasVersion: C-000936: Finnish Speecon database
- hasVersion: C-000941: French Speecon database
- hasVersion: C-000946: Hebrew Speecon database
- hasVersion: C-000952: Italian Speecon database
- hasVersion: C-000955: Korean Speecon database
- hasVersion: C-000974: Polish Speecon database
- hasVersion: C-000977: Russian Speecon database
- hasVersion: C-000995: Swedish Speecon database
- hasVersion: C-001000: Turkish Speecon database
- hasVersion: C-001002: UK English Speecon database
- hasVersion: C-001553: US English Speecon database
- hasVersion: C-001237: Taiwan Mandarin Speecon database
- hasVersion: C-001530: Swiss-German Speecon database
- hasVersion: C-003376: Japanese Speecon database
- hasVersion: C-003377: Danish Speecon Database
- hasVersion: C-003380: French-Canadian Speecon database
- hasVersion: C-003379: Dutch from Belgium Speecon Database
C-003379: Dutch from Belgium Speecon Database
Desktop/Microphone
The Dutch from Belgium Speecon database comprises the recordings of 550 adult speakers and 50 child speakers who uttered respectively over 290 items and 210 items (read and spontaneous).

Prices available upon request. Please contact us.
- hasVersion: C-000095: Mandarin Chinese Speecon database
- hasVersion: C-000120: Portuguese Speecon database
- hasVersion: C-000136: Spanish Speecon database
- hasVersion: C-001554: US Spanish Speecon database
- hasVersion: C-000415: German Speecon database
- hasVersion: C-000936: Finnish Speecon database
- hasVersion: C-000941: French Speecon database
- hasVersion: C-000946: Hebrew Speecon database
- hasVersion: C-000952: Italian Speecon database
- hasVersion: C-000955: Korean Speecon database
- hasVersion: C-000974: Polish Speecon database
- hasVersion: C-000977: Russian Speecon database
- hasVersion: C-000995: Swedish Speecon database
- hasVersion: C-001000: Turkish Speecon database
- hasVersion: C-001002: UK English Speecon database
- hasVersion: C-001553: US English Speecon database
- hasVersion: C-001237: Taiwan Mandarin Speecon database
- hasVersion: C-001530: Swiss-German Speecon database
- hasVersion: C-003376: Japanese Speecon database
- hasVersion: C-003377: Danish Speecon Database
- hasVersion: C-003380: French-Canadian Speecon database
- hasVersion: C-003378: Dutch from the Netherlands Speecon Database
C-003380: French-Canadian Speecon database
Desktop/Microphone
The French-Canadian Speecon database is divided into 2 sets:
1) The first set comprises the recordings of 550 adult French-Canadian speakers (276 males, 274 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place).
2) The second set comprises the recordings of 50 child French-Canadian speakers (20 boys, 30 girls), recorded over 4 microphone channels in 1 recording environment (children room).

This database is partitioned into 29 DVDs (first set) and 4 DVDs (second set).
The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications. Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.

Each speaker uttered the following items (over 290 items for adults and over 210 items for children):
Calibration data:
- 6 noise recordings
- The silence word recording
Free spontaneous items (adults only):
- 5 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)
- 17 Elicited spontaneous items (adults only):
- 3 dates, 2 times, 3 proper names, 2 city names, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language
Read speech:
- 30 phonetically rich sentences uttered by adults and 60 uttered by children
- 5 phonetically rich words (adults only)
- 4 isolated digits
- 1 isolated digit sequence
- 4 connected digit sequences
- 1 telephone number
- 3 natural numbers
- 1 money amount
- 2 time phrases (T1 : analogue, T2 : digital)
- 3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)
- 3 letter sequences
- 1 proper name
- 2 city or street names
- 2 questions
- 2 special keyboard characters
- 1 Web address
- 1 email address
- 213 application specific words and phrases per session (adults)
- 74 toy commands, 14 phone commands and 34 general commands (children)

The following age distribution has been obtained:
- Adults: 220 speakers are between 15 and 30, 226 speakers are between 31 and 45, 104 speakers are over 46.
- Children: 24 speakers are between 8 and 10, and 26 speakers are between 11 and 15.

A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
- hasVersion: C-000095: Mandarin Chinese Speecon database
- hasVersion: C-000120: Portuguese Speecon database
- hasVersion: C-000136: Spanish Speecon database
- hasVersion: C-001554: US Spanish Speecon database
- hasVersion: C-000415: German Speecon database
- hasVersion: C-000936: Finnish Speecon database
- hasVersion: C-000941: French Speecon database
- hasVersion: C-000946: Hebrew Speecon database
- hasVersion: C-000952: Italian Speecon database
- hasVersion: C-000955: Korean Speecon database
- hasVersion: C-000974: Polish Speecon database
- hasVersion: C-000977: Russian Speecon database
- hasVersion: C-000995: Swedish Speecon database
- hasVersion: C-001000: Turkish Speecon database
- hasVersion: C-001002: UK English Speecon database
- hasVersion: C-001553: US English Speecon database
- hasVersion: C-001237: Taiwan Mandarin Speecon database
- hasVersion: C-001530: Swiss-German Speecon database
- hasVersion: C-003376: Japanese Speecon database
- hasVersion: C-003377: Danish Speecon Database
- hasVersion: C-003378: Dutch from the Netherlands Speecon Database
- hasVersion: C-003379: Dutch from Belgium Speecon Database
C-003382: 日本語学習者による日本語作文と，その母語訳との対訳データベース（作文対訳DB）　オンライン版
このデータベースは，
1. 日本語学習者による日本語作文（参考資料として，日本語母語話者による日本語作文も含む）
2. 作文執筆者本人による1.の母語訳(またはもっとも楽に書ける言語への翻訳)
3. 日本語教師等による1.の添削（ただし一部のみ）
4. 作文執筆者・添削者の言語的履歴に関する情報
という4種類のデータを大量に収集し，相互に参照することが可能な形で電子化したものです。略称を対訳作文DBといいます。
国立国語研究所では1999年以来，日本国内外において上記の各種データを収集し，日本語教育関係者，日本語学・対照言語学等の研究者に利用していただいてまいりました。
いる)
- replaces: 作文対訳データベース　CD-ROM版（2001年3月）
- replaces: 作文対訳データベース　オンライン版（2004年）
C-003384: 日本語学習者による日本語／母語発話の対照言語データベース（発話対照DB）
国立国語研究所で作成している発話対照DB の目的は，第一義的には｢教育現場への応
用｣である．この｢教育現場｣には，学習者に対する教育現場だけでなく，日本語教師に対す
る研修の現場も含まれる．
この目的のため，本データベースでは，
1) 学習者の日本語発話と，それとほぼ同内容の母語発話との対照
2
2) 学習者の話しことばと書きことばとの対照
という2 種類の対照が可能となるよう，データベースを設計することとした．
- references: C-003382: 日本語学習者による日本語作文と，その母語訳との対訳データベース（作文対訳DB）　オンライン版
C-003386: ことばに関する新聞記事見出しデータベース
国立国語研究所では，創立直後の1949年から，ことばに関する新聞記事を収集し，「新聞所載国語関係記事切抜集」「新聞所載国語関係記事切抜特集」（以下，まとめて「切抜集」）として保存しています。「切抜集」は，50年以上にわたり，「言語」「言語生活」という特定の視点で収集された新聞記事資料であり，日本語及び日本人の言語生活，言語意識の変遷を知る上で，大変貴重な　ｂ資料です。なお，「ことばに関する」記事といっても，記事中の特定の語や表現に着目して収集するのではなく，＜ことばについての意識・意見・解説や，ことばをめぐる状況などを伝えている記事＞を収集の対象としています。
- hasPart: 見出しデータベース
- hasFormat: C-004263: ことばに関する新聞記事画像データベース
C-003388: 近代女性雑誌コーパス
現代日本語の書き言葉は，19世紀末から20世紀初め，文語文から口語文に移行することを機に，ほぼ確立したと見ることができます。その確立期の現代日本語について，様々な観点から調査研究を行うことができる，雑誌を対象としたコーパスを作成しました。『太陽コーパス』と『近代女性雑誌コーパス』の二つです。
『太陽コーパス』の比較資料として，同時代の女性を読者とする雑誌を対象としたコーパスとして，『近代女性雑誌コーパス』を作成しました。
　総合雑誌『太陽』は，この時期の日本語を代表できる雑誌の筆頭にあげられますが，女性や子どもなどはその読者層から外れています。女性向け雑誌，子ども向け雑誌などを対象としたコーパスの整備が望まれますが，まずは，女性を読者とするものを対象として，『近代女性雑誌コーパス』を作成することにしました。
- conformsTo: C-003365: Taiyō Corpus
C-003390: 日本のふるさとことば集成第１巻　北海道・青森
書籍：方言の会話をカタカナで文字化し，共通語訳をつけた。文字化と共通語訳を二段組にして対照させ，意味を取りやすくしている。付属のCDを聞きながら読むのもいい。CD：談話全体の方言音声を収録。知らない地域のことばでも，書籍を助けにして，聞くことができる。CD-ROM:方言談話の文字データおよび音声データが閲覧・再生可能。書籍のページを画像データにし，パソコンでも本と同じような感じで読むことができ，ページ単位で方言の会話が聞けるよう方言音声をリンクさせている。また，添付の検索ソフトにより談話データを検索することもできる。その他，方言談話の文字化と共通語訳をテキストファイルで収録。これらのデジタルデータは，研究・教育用に加工して，自由に活用することができる。
- hasVersion: C-003404: 日本のふるさとことば集成第２巻　岩手・秋田
- hasVersion: C-003478: 日本のふるさとことば集成第３巻　宮城・山形・福島
- hasVersion: C-003479: 日本のふるさとことば集成第４巻　茨城・栃木
- hasVersion: C-003480: 日本のふるさとことば集成第５巻　埼玉・千葉
- hasVersion: C-003481: 日本のふるさとことば集成第６巻　東京・神奈川
- hasVersion: C-003482: 日本のふるさとことば集成第７巻　群馬・新潟
- hasVersion: C-003483: 日本のふるさとことば集成第８巻　長野・山梨・静岡
- hasVersion: C-003484: 日本のふるさとことば集成第９巻　岐阜・愛知・三重
- hasVersion: C-003485: 日本のふるさとことば集成第10巻　富山・石川・福井
- hasVersion: C-003486: 日本のふるさとことば集成第11巻　京都・滋賀
- hasVersion: C-003487: 日本のふるさとことば集成第12巻　奈良・和歌山
- hasVersion: C-003488: 日本のふるさとことば集成第13巻　大阪・兵庫
- hasVersion: C-003489: 日本のふるさとことば集成第14巻　鳥取・島根・岡山
- hasVersion: C-003490: 日本のふるさとことば集成第15巻　広島・山口
- hasVersion: C-003491: 日本のふるさとことば集成第16巻　香川・徳島
- hasVersion: C-003492: 日本のふるさとことば集成第17巻　愛媛・高知
- hasVersion: C-003493: 日本のふるさとことば集成第18巻　福岡・佐賀・大分
- hasVersion: C-003494: 日本のふるさとことば集成第19巻　長崎・熊本・宮崎
- hasVersion: C-003495: 日本のふるさとことば集成第20巻　鹿児島・沖縄
C-003399: PennBioIE Release 0.9
PennBioIE is a biomedical information extraction project at the University of Pennsylvania. The PennBioIE corpus consists of 2258 Medline abstracts which have been manually annotated for paragraphs, sentences, part of speech, and a set of biomedical entity types defined for this project and specific to each domain. In addition, 642 of the abstracts have been syntactically annotated. The entity and POS annotated data are available in three formats; HTML view, WordFreak files and XML format. The syntactically annotated files are stored in the Penn Treebank format.
- references: MEDLINE (http://www.nlm.nih.gov/pubs/factsheets/medline.html)
- conformsTo: C-001546: Treebank-2

SHACHI - Language Resource Metadata Database