Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 1501 - 1510 of 2023

Select items

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004118: Slovene Dependency Treebank
The Slovene Dependency Treebank project aims to build a syntactically annotated corpus of Slovene texts. The corpus is to be annotated with dependency analyses, and we are taking as our model the Prague Dependecy Treebank.
- conformsTo: Prague Dependecy Treebank
- hasPart: MULTEXT-East corpus
- isReferencedBy: CoNLL-X
- references: SVEZ-IJS
- hasVersion: MULTEXT-East, Version 3
- hasVersion: MULTEXT-East Morphosyntactic Specifications, Version 3
C-004120: Multext-East Resources, Version 3
This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes annotated parallel, comparable and speech corpora with morphosyntactic lexica and specifications. The most important component is the linguistically annotated corpus consisting of Orwell's novel ‘1984’ in the English original and translations.
- hasPart: MULTEXT-East morphosyntactic specifications
- hasPart: MULTEXT-East morphosyntactic lexica
- hasPart: MULTEXT-East morphosyntactically annotated "1984" corpus
- hasPart: C-004123: MULTEXT-East comparable corpus
- hasPart: C-004124: MULTEXT-East parallel speech corpus
C-004122: MULTEXT-East 1984 corpus
The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel corpus annotated contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.
- isPartOf: C-004120: Multext-East Resources, Version 3
C-004123: MULTEXT-East comparable corpus
The multilingual comparable corpus contains a fiction part and a news part, where the data is comparable across the languages in terms of the number and size of texts; each of the 12 parts has approx. 100,000 words. The corpus is structurally marked up with over 40 different elements; however. sub-paragraph markup has not been harmonised across the languages.
- isPartOf: C-004120: Multext-East Resources, Version 3
C-004124: MULTEXT-East parallel speech corpus
MULTEXT-East produced a small corpus of spoken texts taken from the EUROM-1 speech corpus. It comprises the translations (from English) of forty short passages of five thematically connected sentences. For four languages, the texts have also been read, recorded and included in the distribution. The corpus texts contain links to the spoken passages, which have for V3 been normalised in terms of volume, and stored as .wav files. The speech files are, due to their size, stored and distributed in a separate bundle.
- isPartOf: C-004120: Multext-East Resources, Version 3
C-004125: EVROKORPUS
Evrokorpus consists of parallel bilingual corpora. In 2002, English-Slovene corpus was composed from translation memories made in the Translation Unit of the Slovenian Government Office for European Affairs (GOEA) by means of Trados' translation tool Translator's Workbench. In 2006, German-Slovene corpus was made, followed by French-Slovene corpus in 2007 and Italian-Slovene and Spanish-Slovene in 2008.
In 2008 the corpus was extended by inclusion of EU Commission data.
C-004126: IPI PAN Corpus
The IPI PAN Corpus is a large (currently over 250 million segments), morphosyntactically annotated, publicly available corpus of Polish, developed by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS), mainly within projects funded by the State Committee for Scientific Research, as well as within statutory research carried out at ICS PAS.
C-004127: KACENKA
Korpus anglicko-cesky - elektronicky nastroj Katedry anglistiky has been created by the Department of English, Faculty of Arts, Masaryk University during the year 1997 to support research and teaching in the field of translation. It was financed by the FR VS (Development Fund for Universities in the Czech Republic).
C-004128: Korpus 2000
The aim of the Korpus 2000 project is to document the use of the Danish language around the year 2000 - in the form of a text corpus in which one can look up words and phrases via this website. The texts that constitute the Korpus 2000 were written mainly between 1998 and 2002.
- hasVersion: C-004129: Korpus 90
- isRequiredBy: C-004130: KorpusDK
C-004129: Korpus 90
The Korpus 90 is compiled of text excerpts written in the period 1988-1992. This corpus is quite similar to the Korpus 2000 in its composition and size and hence serves as an older comparative corpus for the Korpus 2000.
- hasVersion: C-004128: Korpus 2000
- isReplacedBy: C-004130: KorpusDK

SHACHI - Language Resource Metadata Database