言語資源の登録件数: 3330件 2023 件中 1481 - 1490 件目
現在の検索条件
キーワードを入力
検索条件を選択
  • C-004097: English-Estonian and Estonian-English parallel corpus
    English-Estonian and Estonian-English parallel corpus
  • C-004098: Corpus of Estonian Dialects
    The territory where Estonian is spoken is quite small but there are large differences between traditional dialects. Researches of Estonian dialects have classified at least eight main dialects and over hundred sub-dialects (parish dialects). Until now, there are only few comparative studies on Estonian dialect phonology and grammar because of the lack of united data sources for such kind of analysis. The Corpus of Estonian Dialects is meant to simplify every kind of research on Estonian dialects.
  • C-004099: Phonetic Corpus of Estonian Spontaneous Speech
    The aim of the corpus is to compile a large amount of quality recordings of spontaneous Estonian and segment it phonetically on different levels. The project started in autumn 2006.
  • C-004100: Corpus of spoken Estonian
    The corpus is planned as an open corpus, i.e. no limits have been set. Our intention is to collect various types of oral speech, the usage of both everyday and institutional conversation, spontaneous and planned speech, monologues and dialogues, face-to-face interaction and media texts.
  • C-004101: PLUG corpus
    In workpackage 1 a bilingually sentence aligned corpus has to be compiled. The corpus shall include the following language pairs: Swedish - English, Swedish - German, Swedish - Italien
  • C-004102: Turkish-Swedish Corpus
    The main goal of the project is to promote research and teaching in the Turkish language. More specifically, the aim of the project is to build a Turkish-Swedish Parallel Corpus with contrastive studies in focus. The corpus consists of original texts and their translations from Turkish to Swedish and from Swedish to Turkish.
    The corpus is built semi-automatically by using a basic language resource kit (BLARK) for the particular languages.
    • references: BLARKs
  • C-004103: Talbanken76
    Talbanken is a Swedish treebank, divided into two main parts, consisting of written and spoken language, respectively: * Jan Einarsson: Talbankens skriftspråkskonkordans (1976) * Jan Einarsson: Talbankens talspråkskonkordans (1976) . The data were collected in several projects at Lund University in the 1970s and the material is described in several publication.
  • C-004104: Talbanken05
    Talbanken05 is a modernized version of Talbanken76, a Swedish treebank of roughly 300,000 words, constructed at Lund University in the 1970s.
  • C-004105: LinGO Redwoods
    The Redwoods initiative is a seed activity into the design and development of a new type of treebank: a dynamic treebank that parses sentences in accord with a precise HPSG grammar.
  • C-004106: New Corpus for Ireland
    The new English-Irish dictionary will be based on the biggest and best linguistic resources available. The corpus includes:
    * A new corpus of Irish, containing 30 million words. This corpus is based on the National Corpus of Irish developed by the ITÉ (Linguistics Institute of Ireland) and containing 8.5 million words, together with a further 15.5 million words also collected by the ITÉ. A further 6 million words was added to this database during Phase 1.
    * A new corpus of Irish-English, with 25 million words of text written – in newspapers, novels, and other sources – by authors from the island of Ireland.