Language resource #: 3330 Results 1501 - 1510 of 2023
Current query
Input keywords
Select items
  • C-004118: Slovene Dependency Treebank
    The Slovene Dependency Treebank project aims to build a syntactically annotated corpus of Slovene texts. The corpus is to be annotated with dependency analyses, and we are taking as our model the Prague Dependecy Treebank.
    • conformsTo: Prague Dependecy Treebank
    • hasPart: MULTEXT-East corpus
    • isReferencedBy: CoNLL-X
    • references: SVEZ-IJS
    • hasVersion: MULTEXT-East, Version 3
    • hasVersion: MULTEXT-East Morphosyntactic Specifications, Version 3
  • C-004120: Multext-East Resources, Version 3
    This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes annotated parallel, comparable and speech corpora with morphosyntactic lexica and specifications. The most important component is the linguistically annotated corpus consisting of Orwell's novel ‘1984’ in the English original and translations.
  • C-004122: MULTEXT-East 1984 corpus
    The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel corpus annotated contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.
  • C-004123: MULTEXT-East comparable corpus
    The multilingual comparable corpus contains a fiction part and a news part, where the data is comparable across the languages in terms of the number and size of texts; each of the 12 parts has approx. 100,000 words. The corpus is structurally marked up with over 40 different elements; however. sub-paragraph markup has not been harmonised across the languages.
  • C-004124: MULTEXT-East parallel speech corpus
    MULTEXT-East produced a small corpus of spoken texts taken from the EUROM-1 speech corpus. It comprises the translations (from English) of forty short passages of five thematically connected sentences. For four languages, the texts have also been read, recorded and included in the distribution. The corpus texts contain links to the spoken passages, which have for V3 been normalised in terms of volume, and stored as .wav files. The speech files are, due to their size, stored and distributed in a separate bundle.
  • C-004125: EVROKORPUS
    Evrokorpus consists of parallel bilingual corpora. In 2002, English-Slovene corpus was composed from translation memories made in the Translation Unit of the Slovenian Government Office for European Affairs (GOEA) by means of Trados' translation tool Translator's Workbench. In 2006, German-Slovene corpus was made, followed by French-Slovene corpus in 2007 and Italian-Slovene and Spanish-Slovene in 2008.
    In 2008 the corpus was extended by inclusion of EU Commission data.
  • C-004126: IPI PAN Corpus
    The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS), mainly within projects funded by the State Committee for Scientific Research, as well as within statutory research carried out at ICS PAS.
  • C-004127: KACENKA
    Korpus anglicko-cesky - elektronicky nastroj Katedry anglistiky has been created by the Department of English, Faculty of Arts, Masaryk University during the year 1997 to support research and teaching in the field of translation. It was financed by the FR VS (Development Fund for Universities in the Czech Republic).
  • C-004128: Korpus 2000
    The aim of the Korpus 2000 project is to document the use of the Danish language around the year 2000 - in the form of a text corpus in which one can look up words and phrases via this website. The texts that constitute the Korpus 2000 were written mainly between 1998 and 2002.
  • C-004129: Korpus 90
    The Korpus 90 is compiled of text excerpts written in the period 1988-1992. This corpus is quite similar to the Korpus 2000 in its composition and size and hence serves as an older comparative corpus for the Korpus 2000.