Language resource #: 3330
Results 1381 - 1390 of 2023
-
C-003894: Tamil Digital Corpus
(Chennai, March 27, 1998) The Institute of Indology and Tamil Studies at Cologne in Germany has undertaken a project named 'Pongal 2000' to digitise and computerise Tamil literature on a fairly large-scale. Sangam and post-Sangam literature, Silappadikaram, Periyapuranam, Thiruvachakam and Kamparamayanam have been made available in transliterated form on the Internet.
-
C-003895: VerreTaal
Using English as the interface language, it aims to provide a research tool for translation studies, (comparative) literary studies, (inter)cultural studies and area studies, Chinese studies and other academic fields, and last but not least, the general reader and library user.The database lists Dutch-language, direct and relay translations of works originally written in Chinese. book titles in poetry, fiction and non-fiction, and drama. For entries in multiple-author anthologies, individual author names, story titles.
-
C-003933: Urdu-Nepali-English Parallel Corpus
Center for Research in Urdu Language Processing (CRULP) is pleased to release Urdu and Nepali corpora parallel to 100,000 words of common English source from PENN Treebank corpus, available through Linguistic Data Consortium (LDC). The text files used are listed in the README files provided for each corpus. The corpora are also tagged for part of speech.
- isPartOf: C-001100: Penn Treebank Online
-
C-003934: Urdu Word List
The word list has 149466 words. About 144000 words have been taken from Urdu Lughat (or generated through these words). The remaining count includes proper names, country names and city names.
Proper names have been extracted from various telephone directories and Urdu websites and include both Urdu names and English names transliterated in Urdu.
List of countries and major cities are limited to those commonly used on the Internet. As they are in English they have been translated in Urdu, with the assistance of دنیا کے تمام ممالک کا انسائیکلوپیڈیا published by علم و عرفان پبلشرز.- isPartOf: Urdu Lughat
-
C-003938: The Hong Kong Cantonese Child Language Corpus
The files contain episodes of conversational exchanges between children and adults, with each utterance represented in Chinese characters, romanizations as well as corresponding parts-of-speech tags. http://hum.shoppingshop.us/~cancorp/index.html
-
C-003941: Vietnamese Text Corpus
This mononlingual corpus consists of Vietnamese texts published on the Internet, sampled here for research and educational purposes. We are using a combination of newspaper, literary, and Wikipedia texts.
- hasVersion: C-003942: Vietnamese Bitext Corpus
- hasVersion: C-003943: Vietnamese Dictionary
-
C-003942: Vietnamese Bitext Corpus
A bitext corpus shows words, phrases, and sentences in translation. Insofar as possible, translated texts are aligned sentence-by-sentence. Bitext corpora have many applications:
- isFormatOf: William Peter Hyde's A New Vietnamese-English Dictionary (2008, Dunwoody Press, 928 pages; ISBN 978-1-931546-43-0
- hasVersion: C-003941: Vietnamese Text Corpus
- hasVersion: C-003943: Vietnamese Dictionary
-
C-003943: Vietnamese Dictionary
The dictionary includes more than 15,000 Vietnamese words, compounds, and phrases.
- hasVersion: C-003941: Vietnamese Text Corpus
- hasVersion: C-003942: Vietnamese Bitext Corpus
-
C-003945: Thai Text Corpus
Thai Text Corpus
- hasVersion: D-003944: Thai Dictionary
- hasVersion: C-003946: Thai Bitext Corpus
-
C-003946: Thai Bitext Corpus
A bitext corpus shows words, phrases, and sentences in translation. Insofar as possible, translated texts are aligned sentence-by-sentence.
- hasVersion: D-003944: Thai Dictionary
- hasVersion: C-003945: Thai Text Corpus