言語資源の登録件数: 3330件
2023 件中 1441 - 1450 件目
-
C-004050: The EGYPT Statistical Machine Translation Toolkit
1)East Timorese (Tetun):A small Tetun-English parallel corpora. It was sentence-aligned manually. 2)Arabic :The Quran (Islam's holy book) in Arabic with English translation. It was aligned semi-automatically based on chapters and verse numbers. Verses are typically equivalent to sentences. We have not done any experiments with this corpora.
3)English: It contains some monolingual text that was downloaded from the UN web site. It was used for our East Timorese (Tetun) experiments.
4)French :Currently no French corpora is distributed. -
C-004051: Corpus of Contemporary Arabic
The main purpose of my research is to develop a prototype Corpus of Contemporary Arabic (CCA). The target users of this corpus will be language teachers, language engineers, foreign learners of Arabic and material writers. The first step in designing my corpus is deciding on the text type to include in this corpus. For this reason I have developed a questionnaire to help me identify the suitable texts.
- references: International Corpus of Arabic(ICA)
-
C-004052: Penman Upper Model
The objective of the Penman system is to function as a useful and theoretically motivated sentence generator for research groups interested in the nature of language, as well as to provide a text generation system that can be used routinely by computer system developers. The Penman Upper Model, a taxonomy of 250 very general abstractions of the objects, processes, and relations in the world, organized to support linguistic processing [Bateman et al. 89]. This taxonomy serves to link the terms in a user's application domain to the terms used within Penman. The Upper Model is being extended to the Middle Model, a taxonomy of approx. 70,000 concepts modeling the world.
- isReplacedBy: Middle Model
- requires: Penman lexicon of over 90,000 English words (containing word definitions, inflectional forms, etc.)
- requires: Penman
-
C-004054: Unified Medical Language System
The purpose of NLM's Unified Medical Language System® (UMLS) is to facilitate the development of computer systems that behave as if they "understand" the meaning of the language of biomedicine and health.There are three UMLS Knowledge Sources: the Metathesaurus®, the Semantic Network, and the SPECIALIST Lexicon. They are distributed with flexible lexical tools and the MetamorphoSys install and customization program.http://www.nlm.nih.gov/pubs/factsheets/umls.html
- requires: T-000789: Metathesaurus
- requires: T-004055: UMLS Semantic Network
- requires: D-004056: SPECIALIST Lexicon
- requires: MetamorphoSys
- isRequiredBy: PubMed®
- isRequiredBy: NLM Gateway
- isRequiredBy: ClinicalTrials.gov
- isRequiredBy: Indexing Initiative
- isRequiredBy: Enterprise Vocabulary Services
- isRequiredBy: National Guidelines Clearinghouse
- isRequiredBy: National Quality Measures Clearinghouse
-
C-004057: BulTreeBank
The main goal of the BulTreeBank project is to develop a high quality set (TreeBank) of syntactic trees for Bulgarian within the framework of Head-driven Phrase Structure Grammar (HPSG) .
- conformsTo: C-001100: Penn Treebank Online
- hasPart: BulTreeBank-DP
- hasPart: Morphologically Annotated Part of BulTreeBank
- hasPart: Part-of-Speech Taggers for Bulgarian
-
C-004058: Corpus of the Contemporary Lithuanian Language
The corpus is general rather than specialised. It is compiled according to reading and not publishing tendencies. It is growing continuously and consists of whole texts rather than fragments. The design of the corpus follows principles of some corpora of other European languages (English, German, Danish, Czech, etc).
-
C-004059: Corpus of Spoken Israeli Hebrew
CoSIH is a Israeli Hebrew spoken corpus that will amount to 5,000,000 words when completed (still under construction).
This resource will integrate demographic and contextual variables, thereby reflecting the different varieties of Hebrew spoken in Israel. -
C-004060: DGT Multilingual Translation Memory of the Acquis Communautaire
A parallel multilingual corpus of the legislative documents (Acquis Communautaire) of the European Union in 22 EU languages. DGT-TM is not machine translation software. A translation memory is a collection of small text segments and their translation.
-
C-004061: multilingual parallel corpus of translation
The multilingual corpus contains EU acts in 22 EU official languages - however, all the texts in the corpus have not been translated into all languages and therefore the number of hits varies with different languages. Most of the texts are in English, which was the source language in most cases. http://evrokorpus.gov.si/k2/about.php?jezik=angl
-
C-004062: EVROKORPUS
Evrokorpus consists of parallel bilingual corpora. In 2002, English-Slovene corpus was composed from translation memories made in the Translation Unit of the Slovenian Government Office for European Affairs (GOEA) by means of Trados' translation tool Translator's Workbench. In 2006, German-Slovene corpus was made, followed by French-Slovene corpus in 2007. In 2008 the corpus was extended by inclusion of EU Commission data. The currently available corpus contains legal acts and other texts in English, French, German, Italian, Slovene and Spanish.