Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1531 - 1540 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004155: Web Corpus
English-language corpora compiled from the Web in 2006 and 2007.
2007 still under development, currently 3,123,996 types and 518,129,710 tokens; target size at least 1,000,000,000 tokens; will be part-of-speech tagged.
2006 97,198,272 tokens and 950,087 types; 1-6-grams; wildcard searchable; the original texts and URLs are no longer available due to a hard drive failure.
- hasPart: C-004156: Web Corpus 2007
- hasPart: C-004157: Web Corpus 2006
C-004156: Web Corpus 2007
Web Corpus 2007, compiled in July 2007. The goal is to produce a corpus of at least a billion words annotated with the same Part of Speech tagset as the British National Corpus.
- isPartOf: C-004155: Web Corpus
C-004157: Web Corpus 2006
Web Corpus 2006, based on a corpus of about 104 M tokens 'clean' version (140 M tokens 'dirty' version) compiled from the Web in Feb-Mar 2006. (The original webpages on which these datasets are based were lost in a hard disk crash, so there are some gaps in the data.)
- hasVersion: C-004156: Web Corpus 2007
- isPartOf: C-004155: Web Corpus
C-004158: PERC Corpus
The PERC Corpus (formerly called the "Corpus of Professional English (CPE)") is a 17-million-word corpus of copyright-cleared English academic journal texts in science, engineering, technology and other fields. It was compiled as a part of the project of the Professional English Research Consortium (PERC) and is intended to be used for research in the field of Professional English. Until the end of June, 2010, the PERC Corpus will be available for access free of charge on the web concordancer provided by the "Shogakukan Corpus Network" administered by NetAdvance Inc., based on authorization from PERC.
- isPartOf: Shogakukan Corpus Network
C-004159: WaC Users
WaC Users, derived from search results of users of this site's Web Concordancer. Based towards users' interest, with no claim to breadth.
- isVersionOf: C-004155: Web Corpus
C-004160: WaC Users Marginal
WaC Users Marginal, based on text chunks in users' search results not unambiguously identified as English text.
- isPartOf: C-004155: Web Corpus
C-004161: WaC Users Junk
WaC Users Junk, based on text chunks in users' search results rejected as English text. Contains fragments, lists, search-engine spam, non-English content, and occasional gems.
- hasVersion: C-004155: Web Corpus
C-004162: DeWaC German Web Corpus
The corpus was prepared by Marco Baroni in a web crawl as described at EACL 2006. It was part-of-speech tagged and lemmatised using TreeTagger, a leading part-of-speech tagger which has been trained for a number of languages.
C-004164: Japanese Dictionary of Appraisal -attitude-　version 1.2
The dictionary provides the classification of Japanese evaluative expressions according to Appraisal theory, i.e. a linguistic model of evaluative language. In this dictionary, the evaluative expressions are classified not only according to polarity (positive/negative attitude) but also in terms of evaluative criteria such as affection, desire, morality, honesty, peacefulness, etc. The dictionary can be utilised as a dictionary for sentiment analyses, as a reference book for discourse analyses such as Critical Discourse Analysis, or as a general resource for linguistic, educational or computational studies.
- replaces: Japanese Dictionary of Appraisal version 1.1
- references: C-004165: Annotated Corpus of Iwanami Japanese Dictionary Fifth Edition 2004
- references: Balanced Corpus of Contemporary Written Japanese Ver.1.0 (BCCWJ)
C-004165: Annotated Corpus of Iwanami Japanese Dictionary Fifth Edition 2004
The corpus of Iwanami Japanese Dictionary Fifth Edition consisting of 56,000 headwords. It is annotated with morphological information, syntactic structures and coreference/anaphora and word senses defined by the dictionary itself. All annotations are manually revised.
- isReferencedBy: Japanese Dictionary of Appraisal -attitude-
- isRequiredBy: C-004168: News Article GDA Corpus 2004

SHACHI - Language Resource Metadata Database