Language resource #: 3330
Results 1471 - 1480 of 2023
-
C-004085: SMULTRON
SMULTRON (Stockholm MULtilingual TReebank) is a parallel treebank developed by the Computational Linguistics Group at the Department of Linguistics, at Stockholm University. The parallel treebank contains around 1 000 sentences in English, German and Swedish. The sentences have been PoS-tagged and annotated with phrase structure trees. The trees have been aligned on sentence, phrase and word level. Additionally, the German and Swedish monolingual treebanks contain lemma information.
-
C-004086: Stockholm Umeå Corpus
The Stockholm Umeå Corpus is a Swedish corpus of 1 million words, in which each word has been tagged, i.e. annotated with its part-of-speech, inflectional form and lemma. All the texts in the corpus were written in the 1990's, and are balanced according to genre, following the principles used in the Brown and LOB corpora. SUC was developed in a joint project between the universities of Stockholm and Umeå, and it is freely distributed for research purposes.
- conformsTo: C-000751: Brown Corpus
- conformsTo: C-000801: THE LOB CORPUS
- replaces: SUC 1.0
-
C-004087: Göteborg Spoken Language Corpus
GSLC is an incrementally growing corpus of spoken language from different social activities. Based on the fact that spoken language varies considerably in different social activities with regard to pronunciation, vocabulary and grammar, the goal of the corpus is to include spoken language from as many social activities as possible.
-
C-004088: Swedish treebank
The Swedish treebank now being made available in a preview version for evaluation. The treebank is the result of the harmonization of the linguistic information in two existing Swedish language resources:Talbanken and SUC (Stockholm Umeå Corpus).
- hasPart: Talbanken
- hasPart: C-004086: Stockholm Umeå Corpus
-
C-004090: Norwegian Newspaper Corpus
The Norwegian Newspaper Corpus is a large and self-expanding corpus of Norwegian newspaper texts. The collection of this dynamic and continually growing corpus began in 1998.Project Site:http://avis.uib.no/
-
C-004091: LOGON parallel tourist corpus
The LOGON parallel tourist corpus consists of Norwegian texts from several sources, with English translations. All the texts are from the tourist domain, and some are specifically from the hiking domain. The texts are described in more detail here.
The corpus has been especially developed as training and testing material for the LOGON machine translation project, but it is available for any kind of research.- requires: LOGON
-
C-004092: Sofie Treebank
The Sofie Treebank is a parallel treebank that at completion will consist of material from nine North European languages; Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish. The text material of treebank is taken from the Norwegian original and the translations of the first two chapters of Jostein Gaarder's novel Sofies verden.
- : Nordic Treebank Network
-
C-004093: Corpus of Written Estonian
Newspapers, Fiction, Science, New media(Chat rooms, Newsgroups, Forums, Comments), Stenograms of Riigikogu, 1980s - other, Various cientific articles 1995-2007, etc.
-
C-004095: Balanced Corpus of Estonian
The Balanced Corpus of Estonian is compiled in order to enable the comparison of the three main text classes of the written language: fiction, journalistic and scientific writing.
-
C-004096: Estonian Reference Corpus
Estonian Reference Corpus is a big collection of Estonian texts that is under construction right now. This corpus contains only whole texts, not text samples. Here we collect the written language