言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1531 - 1540 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004155: Web Corpus
English-language corpora compiled from the Web in 2006 and 2007.
2007 still under development, currently 3,123,996 types and 518,129,710 tokens; target size at least 1,000,000,000 tokens; will be part-of-speech tagged.
2006 97,198,272 tokens and 950,087 types; 1-6-grams; wildcard searchable; the original texts and URLs are no longer available due to a hard drive failure.
- hasPart: C-004156: Web Corpus 2007
- hasPart: C-004157: Web Corpus 2006
C-004156: Web Corpus 2007
Web Corpus 2007, compiled in July 2007. The goal is to produce a corpus of at least a billion words annotated with the same Part of Speech tagset as the British National Corpus.
- isPartOf: C-004155: Web Corpus
C-004157: Web Corpus 2006
Web Corpus 2006, based on a corpus of about 104 M tokens 'clean' version (140 M tokens 'dirty' version) compiled from the Web in Feb-Mar 2006. (The original webpages on which these datasets are based were lost in a hard disk crash, so there are some gaps in the data.)
- hasVersion: C-004156: Web Corpus 2007
- isPartOf: C-004155: Web Corpus
C-004158: PERC Corpus
The PERC Corpus (formerly called the "Corpus of Professional English (CPE)") is a 17-million-word corpus of copyright-cleared English academic journal texts in science, engineering, technology and other fields. It was compiled as a part of the project of the Professional English Research Consortium (PERC) and is intended to be used for research in the field of Professional English. Until the end of June, 2010, the PERC Corpus will be available for access free of charge on the web concordancer provided by the "Shogakukan Corpus Network" administered by NetAdvance Inc., based on authorization from PERC.
- isPartOf: Shogakukan Corpus Network
C-004159: WaC Users
WaC Users, derived from search results of users of this site's Web Concordancer. Based towards users' interest, with no claim to breadth.
- isVersionOf: C-004155: Web Corpus
C-004160: WaC Users Marginal
WaC Users Marginal, based on text chunks in users' search results not unambiguously identified as English text.
- isPartOf: C-004155: Web Corpus
C-004161: WaC Users Junk
WaC Users Junk, based on text chunks in users' search results rejected as English text. Contains fragments, lists, search-engine spam, non-English content, and occasional gems.
- hasVersion: C-004155: Web Corpus
C-004162: DeWaC German Web Corpus
The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
C-004164: 日本語アプレイザル評価表現辞書（JAppraisal 辞書）～態度評価編～
本辞書は日本語評価表現を肯定的か否定的かという評価極性だけでなく、評価基準（愛情に関する基準・倫理に関する基準など）の種類によって分類・集約するための辞書。例えば，意見が肯定的なものと否定的なものに二極化することが想定される主題や，震災や緊急事態の評価のように評価極性が偏るような事象の評価分析をする際に、評価情報を分類・細分化し類似した評価情報を集約したり必要な評価情報をフィルタリングして抽出するための資源として利用できる。また、評価の観点の変化を通時的に捉えるための資源としても活用することができる。さらに，批判的談話分析などのディスコース分析のためのレファレンスブックとしても活用可能。
- replaces: 日本語アプレイザル評価表現辞書（JAppraisal 辞書） version1.1
- references: C-004324: 現代日本語書き言葉均衡コーパス
C-004165: 岩波国語辞典第五版タグ付きコーパス2004
本データは岩波国語辞典第五版における約5万6千項目のデータに、形態素・統語構造・照応と共参照、岩波国語辞典自身に基づく語義の情報などを付与したコーパスであり、これらの付加情報は全て人手修正されている。
- isReferencedBy: C-004164: 日本語アプレイザル評価表現辞書（JAppraisal 辞書）～態度評価編～
- isRequiredBy: C-004168: 新聞記事GDAコーパス2004

SHACHI - Language Resource Metadata Database