言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1641 - 1650 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004302: Yahoo! Answers Comprehensive Questions and Answers version 1.0
This is the Yahoo! Answers corpus as of 10/25/2007, including all the questions and their corresponding answers. The corpus also contains a small amount of metadata, i.e., which answer was selected as the best answer, and the category and sub-category that was assigned to this question.
- hasPart: C-004301: Yahoo! Answers Manner Questions, version 2.0
C-004306: Yahoo! News extracted metadata: noun phrases and their context, version 1.0
The dataset contains a large sample of noun phrases and their context, extracted from Yahoo! News data, and can be used for AI and NLP studies.
C-004307: Yahoo! Answers browsing behavior, version 1.0
The dataset contains browsing behavior data for a collection of users on Yahoo! Answers, where the users interact socially and are rewarded by a point system based on Q&A system. The data includes questions, answers, and browsing behavior for users on the site. There is no textual or NLP information.
C-004308: The ClueWeb09 Dataset
The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies, containing about 1 billion web pages in ten languages (English, Chinese, Spanish, Japanese, French, German, Italian, Korean, Portuguese and Arabic).
C-004311: NTT・東北大親密度別単語了解度試験用音声データセット
「難聴者のための単語了解度試験用単語リスト」に含まれる全単語音声（4モーラ単語4,000 語）、ならびに単音節音声を収録。
- references: G-004312: 難聴者のための単語了解度試験用単語リスト
C-004313: 電子協騒音データベース
環境騒音17種類を収録。
C-004315: OpenMWEコーパスv0.02
OpenMWE（慣用句や複合語（MWE: multiword expression）に関わる自然言語処理技術の開発を主な目的とし、MWE関連の言語資源を構築しオープンソースソフトウェアとして配布）において構築された日本語慣用句の用例集。今回の配布では、曖昧性のある慣用句を対象とし、一つの見出し（慣用句）に対して約1000用例を付与。各用例には、その用例中の慣用句候補の句が実際に慣用句として使われているのか（正例)、あるいは文字通りの意味で使われているのか（負例）を示すラベルが与えられている。
- replaces: C-003617: OpenMWEコーパスv0.01
- references: C-003619: 基本慣用句五種対照表
- references: 日本語Webコーパス
C-004316: Textual Entailment 評価データ
日本語のRTE(Recognizing Textual Entailment)評価データ。本評価セットは人手で作成したもので、ほとんどの問題において表現のずれは1箇所であり、RITEやRITE2で公開されている日本語RTEの評価セットのデータに比べてやさしい問題になっている。評価データは約2700セットからなり、それぞれに4値の推論判定を付与、また、それぞれの評価セットを、包含、語彙（体言）、語彙（用言）、構文、推論の5つのカテゴリに分類。
C-004317: Wenzhou Spoken Corpus Version 1.0
Wenzhou Spoken Corpus is an online, searchable corpus of transcribed spoken Wenzhou data, consisting of six sub-corpora: Face to Face Conversation, Phone Call, Wenzhou News Commentary, Internet Chat, Story and Wenzhou Song. The current population of Wenzhou speakers is about 7.5 million. The Wenzhou is regarded as a branch of Southern Wu dialect.
C-004318: The KPG English Corpus
The corpus comprises collections of written English texts (scripts) produced by EFL speakers/learners in Greece. The scripts in the corpus database have been graded by human raters following a 15-point scale corresponding to three broad rating bands

SHACHI - Language Resource Metadata Database