言語資源の登録件数: 3330件 2023 件中 1531 - 1540 件目
現在の検索条件
キーワードを入力
検索条件を選択
  • C-004155: Web Corpus
    English-language corpora compiled from the Web in 2006 and 2007.
    2007 still under development, currently 3,123,996 types and 518,129,710 tokens; target size at least 1,000,000,000 tokens; will be part-of-speech tagged.
    2006 97,198,272 tokens and 950,087 types; 1-6-grams; wildcard searchable; the original texts and URLs are no longer available due to a hard drive failure.
  • C-004156: Web Corpus 2007
    Web Corpus 2007, compiled in July 2007. The goal is to produce a corpus of at least a billion words annotated with the same Part of Speech tagset as the British National Corpus.
  • C-004157: Web Corpus 2006
    Web Corpus 2006, based on a corpus of about 104 M tokens 'clean' version (140 M tokens 'dirty' version) compiled from the Web in Feb-Mar 2006. (The original webpages on which these datasets are based were lost in a hard disk crash, so there are some gaps in the data.)
  • C-004158: PERC Corpus
    The PERC Corpus (formerly called the "Corpus of Professional English (CPE)") is a 17-million-word corpus of copyright-cleared English academic journal texts in science, engineering, technology and other fields. It was compiled as a part of the project of the Professional English Research Consortium (PERC) and is intended to be used for research in the field of Professional English. Until the end of June, 2010, the PERC Corpus will be available for access free of charge on the web concordancer provided by the "Shogakukan Corpus Network" administered by NetAdvance Inc., based on authorization from PERC.
    • isPartOf: Shogakukan Corpus Network
  • C-004159: WaC Users
    WaC Users, derived from search results of users of this site's Web Concordancer. Based towards users' interest, with no claim to breadth.
  • C-004160: WaC Users Marginal
    WaC Users Marginal, based on text chunks in users' search results not unambiguously identified as English text.
  • C-004161: WaC Users Junk
    WaC Users Junk, based on text chunks in users' search results rejected as English text. Contains fragments, lists, search-engine spam, non-English content, and occasional gems.
  • C-004162: DeWaC German Web Corpus
    The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
  • C-004164: 日本語アプレイザル評価表現辞書(JAppraisal 辞書)~態度評価編~
    本辞書は日本語評価表現を肯定的か否定的かという評価極性だけでなく、評価基準(愛情に関する基準・倫理に関する基準など)の種類によって分類・集約するための辞書。例えば,意見が肯定的なものと否定的なものに二極化することが想定される主題や,震災や緊急事態の評価のように評価極性が偏るような事象の評価分析をする際に、評価情報を分類・細分化し類似した評価情報を集約したり必要な評価情報をフィルタリングして抽出するための資源として利用できる。また、評価の観点の変化を通時的に捉えるための資源としても活用することができる。さらに,批判的談話分析などのディスコース分析のためのレファレンスブックとしても活用可能。
  • C-004165: 岩波国語辞典第五版タグ付きコーパス2004
    本データは岩波国語辞典第五版における約5万6千項目のデータに、形態素・統語構造・照応と共参照、岩波国語辞典自身に基づく語義の情報などを付与したコーパスであり、これらの付加情報は全て人手修正されている。