Language resource #: 3330
Results 1481 - 1490 of 2023
-
C-004097: English-Estonian and Estonian-English parallel corpus
English-Estonian and Estonian-English parallel corpus
-
C-004098: Corpus of Estonian Dialects
The territory where Estonian is spoken is quite small but there are large differences between traditional dialects. Researches of Estonian dialects have classified at least eight main dialects and over hundred sub-dialects (parish dialects). Until now, there are only few comparative studies on Estonian dialect phonology and grammar because of the lack of united data sources for such kind of analysis. The Corpus of Estonian Dialects is meant to simplify every kind of research on Estonian dialects.
-
C-004099: Phonetic Corpus of Estonian Spontaneous Speech
The aim of the corpus is to compile a large amount of quality recordings of spontaneous Estonian and segment it phonetically on different levels. The project started in autumn 2006.
-
C-004100: Corpus of spoken Estonian
The corpus is planned as an open corpus, i.e. no limits have been set. Our intention is to collect various types of oral speech, the usage of both everyday and institutional conversation, spontaneous and planned speech, monologues and dialogues, face-to-face interaction and media texts.
-
C-004101: PLUG corpus
In workpackage 1 a bilingually sentence aligned corpus has to be compiled. The corpus shall include the following language pairs: Swedish - English, Swedish - German, Swedish - Italien
-
C-004102: Turkish-Swedish Corpus
The main goal of the project is to promote research and teaching in the Turkish language. More specifically, the aim of the project is to build a Turkish-Swedish Parallel Corpus with contrastive studies in focus. The corpus consists of original texts and their translations from Turkish to Swedish and from Swedish to Turkish.
The corpus is built semi-automatically by using a basic language resource kit (BLARK) for the particular languages.- references: BLARKs
-
C-004103: Talbanken76
Talbanken is a Swedish treebank, divided into two main parts, consisting of written and spoken language, respectively: * Jan Einarsson: Talbankens skriftspråkskonkordans (1976) * Jan Einarsson: Talbankens talspråkskonkordans (1976) . The data were collected in several projects at Lund University in the 1970s and the material is described in several publication.
- isReplacedBy: Talbanken05 (Version 1.0)
- isReplacedBy: C-004088: Swedish treebank
-
C-004104: Talbanken05
Talbanken05 is a modernized version of Talbanken76, a Swedish treebank of roughly 300,000 words, constructed at Lund University in the 1970s.
- replaces: Talbanken05 (Version 1.0)
- replaces: C-004103: Talbanken76
- isReplacedBy: C-004088: Swedish treebank
-
C-004105: LinGO Redwoods
The Redwoods initiative is a seed activity into the design and development of a new type of treebank: a dynamic treebank that parses sentences in accord with a precise HPSG grammar.
-
C-004106: New Corpus for Ireland
The new English-Irish dictionary will be based on the biggest and best linguistic resources available. The corpus includes:
* A new corpus of Irish, containing 30 million words. This corpus is based on the National Corpus of Irish developed by the ITÉ (Linguistics Institute of Ireland) and containing 8.5 million words, together with a further 15.5 million words also collected by the ITÉ. A further 6 million words was added to this database during Phase 1.
* A new corpus of Irish-English, with 25 million words of text written – in newspapers, novels, and other sources – by authors from the island of Ireland.- references: C-000750: British National Corpus (XML Edition)
- isReferencedBy: New English-Irish Dictionary (NEID)