Language resource #: 3330
Results 1641 - 1650 of 2023
-
C-004302: Yahoo! Answers Comprehensive Questions and Answers version 1.0
This is the Yahoo! Answers corpus as of 10/25/2007, including all the questions and their corresponding answers. The corpus also contains a small amount of metadata, i.e., which answer was selected as the best answer, and the category and sub-category that was assigned to this question.
-
C-004306: Yahoo! News extracted metadata: noun phrases and their context, version 1.0
The dataset contains a large sample of noun phrases and their context, extracted from Yahoo! News data, and can be used for AI and NLP studies.
-
C-004307: Yahoo! Answers browsing behavior, version 1.0
The dataset contains browsing behavior data for a collection of users on Yahoo! Answers, where the users interact socially and are rewarded by a point system based on Q&A system. The data includes questions, answers, and browsing behavior for users on the site. There is no textual or NLP information.
-
C-004308: The ClueWeb09 Dataset
The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies, containing about 1 billion web pages in ten languages (English, Chinese, Spanish, Japanese, French, German, Italian, Korean, Portuguese and Arabic).
-
C-004311: NTT-Tohoku University Speech Data Set for Word Intelligibility Test based on Word Familiarity
The dataset contains recordings of words (4000 items of 4-mora-words) contained in the "Word list for word intelligibility test for the hard-of-hearing" and monosyllables.
- references: G-004312: Familiarity-controlled word lists 2003
-
C-004313: JEIDA Noise Database
The database contains 17 different environmental noise recordings.
-
C-004315: OpenMWEコーパスv0.02
-
C-004316: Textual Entailment 評価データ
日本語のRTE(Recognizing Textual Entailment)評価データ。本評価セットは人手で作成したもので、ほとんどの問題において表現のずれは1箇所であり、RITEやRITE2で公開されている日本語RTEの評価セットのデータに比べてやさしい問題になっている。評価データは約2700セットからなり、それぞれに4値の推論判定を付与、また、それぞれの評価セットを、包含、語彙(体言)、語彙(用言)、構文、推論の5つのカテゴリに分類。
-
C-004317: Wenzhou Spoken Corpus Version 1.0
Wenzhou Spoken Corpus is an online, searchable corpus of transcribed spoken Wenzhou data, consisting of six sub-corpora: Face to Face Conversation, Phone Call, Wenzhou News Commentary, Internet Chat, Story and Wenzhou Song. The current population of Wenzhou speakers is about 7.5 million. The Wenzhou is regarded as a branch of Southern Wu dialect.
-
C-004318: The KPG English Corpus
The corpus comprises collections of written English texts (scripts) produced by EFL speakers/learners in Greece. The scripts in the corpus database have been graded by human raters following a 15-point scale corresponding to three broad rating bands