Language resource #: 3330 Results 1531 - 1540 of 2023
Current query
Input keywords
Select items
  • C-004155: Web Corpus
    English-language corpora compiled from the Web in 2006 and 2007.
    2007 still under development, currently 3,123,996 types and 518,129,710 tokens; target size at least 1,000,000,000 tokens; will be part-of-speech tagged.
    2006 97,198,272 tokens and 950,087 types; 1-6-grams; wildcard searchable; the original texts and URLs are no longer available due to a hard drive failure.
  • C-004156: Web Corpus 2007
    Web Corpus 2007, compiled in July 2007. The goal is to produce a corpus of at least a billion words annotated with the same Part of Speech tagset as the British National Corpus.
  • C-004157: Web Corpus 2006
    Web Corpus 2006, based on a corpus of about 104 M tokens 'clean' version (140 M tokens 'dirty' version) compiled from the Web in Feb-Mar 2006. (The original webpages on which these datasets are based were lost in a hard disk crash, so there are some gaps in the data.)
  • C-004158: PERC Corpus
    The PERC Corpus (formerly called the "Corpus of Professional English (CPE)") is a 17-million-word corpus of copyright-cleared English academic journal texts in science, engineering, technology and other fields. It was compiled as a part of the project of the Professional English Research Consortium (PERC) and is intended to be used for research in the field of Professional English. Until the end of June, 2010, the PERC Corpus will be available for access free of charge on the web concordancer provided by the "Shogakukan Corpus Network" administered by NetAdvance Inc., based on authorization from PERC.
    • isPartOf: Shogakukan Corpus Network
  • C-004159: WaC Users
    WaC Users, derived from search results of users of this site's Web Concordancer. Based towards users' interest, with no claim to breadth.
  • C-004160: WaC Users Marginal
    WaC Users Marginal, based on text chunks in users' search results not unambiguously identified as English text.
  • C-004161: WaC Users Junk
    WaC Users Junk, based on text chunks in users' search results rejected as English text. Contains fragments, lists, search-engine spam, non-English content, and occasional gems.
  • C-004162: DeWaC German Web Corpus
    The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
  • C-004164: Japanese Dictionary of Appraisal -attitude- version 1.2
    The dictionary provides the classification of Japanese evaluative expressions according to Appraisal theory, i.e. a linguistic model of evaluative language. In this dictionary, the evaluative expressions are classified not only according to polarity (positive/negative attitude) but also in terms of evaluative criteria such as affection, desire, morality, honesty, peacefulness, etc. The dictionary can be utilised as a dictionary for sentiment analyses, as a reference book for discourse analyses such as Critical Discourse Analysis, or as a general resource for linguistic, educational or computational studies.
  • C-004165: Annotated Corpus of Iwanami Japanese Dictionary Fifth Edition 2004
    The corpus of Iwanami Japanese Dictionary Fifth Edition consisting of 56,000 headwords. It is annotated with morphological information, syntactic structures and coreference/anaphora and word senses defined by the dictionary itself. All annotations are manually revised.