Language resource #: 3330 Results 1541 - 1550 of 2023
Current query
Input keywords
Select items
  • C-004168: News Article GDA Corpus 2004
    3000 newspaper articles (about 37,000 sentences, 910,000 words) annotated with morphological information, syntactic structures, word senses and coreference/anaphora. All annotations are manually revised. Data is compiled in GDA (Global Document Annotation) format. This data contains metadata only, not the original text. The `Mainichi Shimbun CD-ROM (1994)' is required to restore the complete corpus containing the text. IDs of senses in the Iwanami Japanese Dictionary Corpus are annotated as meanings of words. `Iwanami Japanese Dictionary Corpus 2004' is required to cross reference definition sentences for sense IDs.
  • C-004170: Kyoto University's case frame data ver 1.0
    A database of case frames automatically constructed from 1.6 billion Japanese sentences taken from Web pages. Each case frame is represented as a predicate and a set of its case filler words. The database has about 40,000 predicates, 13 case frames on average for each predicate.
  • C-004174: Konan-JIEM Learner Corpus Third Edition
    The KJ learner corpus consists of 233 essays written by Japanese college students where all essays are manually annotated with grammatical errors and POS/chunk tags. The new features of the third edition are: (1) a new set of essay data has been added; (2) parsing information of which subject-verb relations are manually annotated is included; (3) the error detection results in Error Detection and Correction Workshop (EDCW;https://sites.google.com/site/edcw2012/) are available.
  • C-004176: Konan Kodomo corpus
    The Konan Kodomo corpus (KK corpus) consists of texts written by primary school children. The number of students is 66 and the period of the data collection is eight month.
  • C-004178: CASTEL/J CD-ROM V1.5
    This is a database of data related to teaching and learning Japanese, developed by CASTEL/J (Computer Assisted Systems for TEaching and Learning / Japanese special interest group). The data set include texts from famous essays, movie scripts and novels, Chinese character dictionary dataset, and sound and image data for teaching Japanese.
  • C-004179: A Linguistic Atlas of Early Middle English Version 2.1
    LAEME aims to present information about the variation in space and time of linguistic forms found in early Middle English texts. It contains the LAEME corpus of lexico-grammatically tagged texts and a Corpus of Etymologies (CE), which provides a narrative etymology for every form type in the LAEME Corpus of Tagged Texts, and a Corpus of Changes, which explicates the phonological and morphological changes invoked in the CE, and other materials.
    • hasVersion: A Linguistic Atlas of Late Mediaeval English
    • replaces: A Linguistic Atlas of Early Middle English Version 1.1
  • C-004180: A Representative Corpus of Historical English Registers 3.2
    This is a motphologically-tagged, multi-genre historical corpus of British and American English covering the period 1650-1999. The latest version, ARCHER 3.2, consists of ca. 3.2 million words in 1,658 text files, distributed as ca. 1.9 million words in 1,075 British files and ca. 1.3 million words in 583 American files. There are eleven genres (advertising, diaries, drama, fiction, legal texts, letters, journals, medicine, news reportage, science, and sermons). The version 3.2 has improved radically in size, text type coverage, regional coverage, and mark-up (now TEI/xml-conformant and with new POS-tagging).
    • replaces: A Representative Corpus of Historical English Registers 3.1
    • conformsTo: British National Corpus
  • C-004181: British English 2006
    This is a one million word corpus of published general written British English and has the same sampling frame as the LOB and F-LOB corpora. The corpus consists of 500 files of 2000 word samples taken from 15 genres of writing. All of the texts were taken from internet sources.
  • C-004182: Michigan Corpus of Upper-Level Student Papers
    The Michigan Corpus of Upper-level Student Papers is a collection of around 830 A grade papers from a range of disciplines across four academic divisions (Humanities and Arts, Social Sciences, Biological and Health Sciences, Physical Sciences) of the University of Michigan.
  • C-004183: The John Swales Conference Corpus
    The John Swales Conference Corpus is a collection of transcripts from an academic conference held in honour of John Swales hosted by the English Language Institute at the University of Michigan. The corpus contains both lectures and question and answer sessions.