Language resource #: 3330 Results 1251 - 1260 of 2023
Current query
Input keywords
Select items
  • C-003633: MultiSemCor Corpus 1.1
    The MultiSemCor project aims at creating a semantically annotated corpus by utilizing information in the English SemCor corpus, a subset of the English Brown Corpus containing almost 700,000 running words. MultiSemCor is an English/Italian parallel corpus, aligned at the word level and annotated with POS, lemma and word sense, consisting of 116 English texts aligned with their corresponding 116 Italian translations, for a total of about 464,000 running words. It is also an aligned parallel corpus lexically annotated with a shared inventory of word senses. The corpus uses the same sense inventory as the one used by MultiWordNet.
  • C-003634: SemCor 1.7
    The SemCor corpus 1.7 consists of semantically annotated 352 texts from Brown Corpus. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged. SemCor 1.7 was automatically created from SemCor 1.6 by mapping WordNet 1.6 to WordNet 1.7 senses (SemCor 1.6 was manually annotated).
  • C-003635: SemCor 1.7.1
    The SemCor corpus 1.7.1 consists of semantically annotated 352 texts from Brown Corpus. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged. SemCor 1.7.1 was automatically created from SemCor 1.6 by mapping WordNet 1.6 to WordNet 1.7.1 senses (SemCor 1.6 was manually annotated).
  • C-003636: SemCor 2.0
    The SemCor corpus 2.0 consists of semantically annotated 352 texts from Brown Corpus. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged. SemCor 2.0 was automatically created from SemCor 1.6 by mapping WordNet 1.6 to WordNet 2.0 senses (SemCor 1.6 was manually annotated).
  • C-003637: SemCor 2.1
    The SemCor corpus 2.1 consists of semantically annotated 352 texts from Brown Corpus. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged. SemCor 2.1 was automatically created from SemCor 1.6 by mapping WordNet 1.6 to WordNet 2.1 senses (SemCor 1.6 was manually annotated).
  • C-003638: SemCor 3.0
    The SemCor corpus 3.0 consists of semantically annotated 352 texts from Brown Corpus. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged. SemCor 3.0 was automatically created from SemCor 1.6 by mapping WordNet 1.6 to WordNet 3.0 senses (SemCor 1.6 was manually annotated).
  • C-003639: Wordnet Domains 3.2
    WordNet Domains was created by augmenting WordNet, created by Princeton University, with domain labels (e.g. Sport, Politics, Medicine) based on the Dewey Decimal Classification. Synsets have been annotated with at least one semantic domain label among about two hundred labels. Information in the Domains corpus is complementary to what is already in Wordnet. In the Domains corpus, senses of the same word may be grouped into homogeneous clusters, reducing word polysemy in WordNet. The corpus icludes WordNet-Affect which contains an additional hierarchy of "affective" domain labels.
    • references: D-000825: WordNet
    • hasPart: WordNet-Affect
    • conformsTo: Dewey Decimal Classification
  • C-003640: Corpus of Written British Creole
    This is a corpus of written Caribbean Creole (Jamaican Creole) English consisting of about 12,000 words. The data is annoated by using a set of contrastive tags which would mark differences in spelling, lexis, and discoursal and grammatical structure between Standard English and the Creole language texts. In other words, tags have mainly been used only where the word or structure in the corpus would not be expected in a text which was in Standard English. The types of texts in the corpus includes poems, novels/fictions, plays, and miscellaneous writings including advertisements and grafitti.
  • C-003641: AUTONOMATA Spoken Names Corpus
    The AUTONOMATA Spoken Names Corpus is a speech database of about 5000 Dutch and Flemish first names, surnames, street names, city names and control words. The package includes the corresponding phonetic transcriptions. The total of 240 speakers (120 Dutch and 120 Flemish) consist of 120 native speakers and 120 non-native.
  • C-003642: COREA-coreferentiecorpus
    The COREA corpus is a text corpus in Dutch and Flemish annotated for coreference relations for names, pronouns, noun phrases. The corpus was developed under the COREA project, a two-year project for developing a robust system for assigning coreference relations automatically.