Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1591 - 1600 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004224: Face Emotional Expression
This corpus was designed for standard database for face recognition, emotional face recognition and synthesis. It contains 1000 face expression video records (500 emotional face expression, 500 emotional speech expression) from 70 people. The database is consist of the eight emotional face expressions and 23 emotional speech expressions.
C-004225: Chinese Event Bank - Part 1
The corpus was designed for Chinese event detection tasks. The sentence contained in the corpus were selected from among the real text sentence of some basic event situations, including possession transferring, existence, space and time transferring, and annotated with detailed event description information for each sentence. All sentences are extracted from Tsinghua Chinese Treebank (TCT).
- references: G-001136: Tsinghua Chinese Treebank
- conformsTo: HowNet
- conformsTo: CiLin
- conformsTo: Xianhan
C-004226: 京都大学テキストコーパス Version 3.0
毎日新聞95年1月1日から17日までの全記事（約2万文）、1月から12月までの社説記事（約2万文）、計約4万文に対して京都大学の形態素解析システム(JUMAN)、構文解析システム(KNP)で自動解析を行い、その結果を人手修正したテキストコーパス。尚、本パッケージにおいて配布されるのは形態素・構文の付加情報だけで、もとの毎日新聞データは含まれていない。
- requires: C-001600: CD-Mainichi Shimbun '95 Data Collection
- isReplacedBy: C-004227: 京都大学テキストコーパス Version 4.0
- isRequiredBy: G-004228: NICT 格助詞変換データ Version 1.1
C-004227: 京都大学テキストコーパス Version 4.0
毎日新聞95年1月1日から17日までの全記事（約2万文）、1月から12月までの社説記事（約2万文）、計約4万文に対して京都大学の形態素解析システム(JUMAN)、構文解析システム(KNP)で自動解析を行い、その結果を人手修正したテキストコーパス。本バージョンではこのうち5000文に対して格関係、照応・省略関係、共参照の情報を付与。尚、本パッケージにおいて配布されるのは形態素・構文その他の付加情報だけで、もとの毎日新聞データは含まれていない。
- requires: C-001600: CD-Mainichi Shimbun '95 Data Collection
- replaces: C-004226: 京都大学テキストコーパス Version 3.0
C-004230: Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles Version 2.01
The corpus aims mainly at supporting research and development relevant to high-performance multilingual machine translation, information extraction, and other language processing technologies. It contains Kyoto-related Japanese Wikipedia articles and their manual translations. The corpus set also include the Japanese-English Bilingual Kyoto Lexicon created by extracting the Japanese-English word pairs from the corpus.
- replaces: Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles Version 2.0
C-004235: 日本語Wikipediaエントリの係り受けデータベース
日本語の大量（約6億ページ、約430億文）のWeb文書から、Wikipediaの記事のタイトル（エントリ）の内、二文節以上のもの（例：「風と共に去りぬ」）に関する係り受けとその頻度を抽出したもの。通常の形態素解析・係り受け解析では、これらのエントリが複数の文節に分割されてしまうため、これまで高度言語情報融合フォーラム（ALAGIN）で公開してきた日本語係り受けデータベース (Version 1.0)では、これらのエントリに関する係り受け情報は含まれていなかったが、これらのエントリを形態素解析器の辞書に固有名詞として追加することで係り受けの抽出が可能となる。それによって作成した係り受けデータが本データベースである。
- hasVersion: C-004236: 日本語係り受けデータベース (Version 1.1)
C-004236: 日本語係り受けデータベース (Version 1.1)
本データベースは、大量の日本語のWeb文書のデータをJuman/KNPで係り受け解析した結果から、語句と語句の係り受けを抽出し、ある程度のノイズデータを取り除いた上で、係り受けとその頻度を収録したもので、約46億種類の係り受けが含まれてる。
- hasVersion: C-004235: 日本語Wikipediaエントリの係り受けデータベース
C-004237: 日英中基本文データ
京都大学格フレームをベースに日本語の基本的な文を自動抽出し、人手で修正を行った日本語5304文に対し、英語と中国語の翻訳を付与。
- references: C-004170: 京都大学格フレーム(Ver 1.0)
C-004241: 京都観光ブログの評価情報付与データ（Version 1.0）
「京都観光ブログ」は京都観光を中心とした日本語ブログ記事のデータベースで、執筆者は47名、合計1041記事（平均約480字）から構成される。「評価情報付与データ」とは「京都観光ブログ」に対して評価情報（評判・意見）が人手で抽出され、評価保持者、評価表現、評価対象などが付与されたデータのこと。本データは、観光に関する様々な意見が収録されており、意見解析エンジン等の学習用コーパスとして利用することができる。
C-004245: 日本語パターン言い換えデータベース (Version 1)
係り受け解析の結果を利用して、「AがBの原因となる」というような、文内で任意の名詞AとBを結ぶ表現パターンの各々のパターンに対して類似したパターンをその類似度とともに列挙したもの。パターンデータとともに、検索・類似度計算スクリプトも提供。

SHACHI - Language Resource Metadata Database