Language resource #: 3330
Results 1461 - 1470 of 2023
-
C-004075: ORAL2006
ORAL2006 is the third spoken corpus available within the Czech National Corpus project. It captures spoken Czech from the entire area of Czech dialects in the narrow sense of the word. It is a transcription of 221 recordings from 2002-2006. All recordings were made in informal situations, which means the speakers knew each other and had friendly relationships. The total length of recordings is 6 693 minutes, that is about 111 and a half hours, and they contain a total of 1 000 798 words of 754 speakers.
- isPartOf: C-004065: Czech National Corpus
- hasVersion: C-004074: ORAL2008
- : C-004076: PMK
-
C-004076: PMK
The Prague Spoken Corpus (PMK) is the first corpus of spoken Czech and it captures authentic spoken Czech, mainly colloquial and thematically unspecialised, or unlimited, from the Prague area and its surroundings. Because of the central and unique status of Prague, a major mixing of people from all regions of the Czech Republic takes place here and the language picture, thus, has to a large extent a countrywide character. Prague also has the most important media influence over the entire country. The recordings (a total of 304), which are fully anonymous and were gradually transcribed into an electronic format, come from 1988-1996, thus reflecting the language of the end of the previous social era and the beginning of a new one.
- isPartOf: C-004065: Czech National Corpus
- hasVersion: C-004074: ORAL2008
- hasVersion: C-004075: ORAL2006
-
C-004077: BMK
The Brno Spoken Corpus (BMK) is the first corpus of spoken Czech from Moravian regions as part of the Czech National Corpus. It records authentic spoken language in the city of Brno and is not thematically specialised. The BMK is an electronic transcript of 250 anonymous recordings from 1994-1999, capturing 294 speakers.
- isPartOf: C-004065: Czech National Corpus
- conformsTo: C-004076: PMK
-
C-004078: DIAKORP
The diachronic section of the CNC covers the texts of a total of seven centuries of the Czech language development. The first completed part (approximately 700 000 word forms) of the diachronic section of the Czech National Corpus (further only DCNC) was made accessible to the public in September 2005. Making the DCNC public continues at apace of about 250 000 word forms yearly.
- isPartOf: C-004065: Czech National Corpus
-
C-004079: InterCorp
The InterCorp corpus is the main outcome of the InterCorp project. Its aim is to build a large parallel synchronic corpus covering a number of languages. It is compiled mostly by teachers and students of the Faculty of Arts, Charles University in Prague, and by other collaborators of the ICNC. There are several aspects that make InterCorp special among the corpora published by ICNC. In particular, it is accessible only through a special interface, built on top of the corpus manager Manatee. Also, unlike the other ICNC corpora, which are static (unchanged in time), InterCorp is incremental, with its size and the number of languages growing.
- isPartOf: C-004065: Czech National Corpus
-
C-004080: Audio Archive of Linguistic Fieldwork
This is a collection of sound recordings of endangered or rare languages. The recordings date from the early 1950s to the present and contain linguistic data (wordlists and other elicitations), stories, songs/chants, ethnographic data and other material in about 90 languages. The largest group represented are Native American languages and are the result of fieldwork sponsored by the Survey of California and Other Indian Languages (UCB, Department of Linguistics). The Berkeley Language Center is currently working on improving preservation of and access to this collection by transferring analog recordings to digital format and by making audio files accessible via the internet with enhanced catalog records.
-
C-004081: NewsgroupsUseNet Corpora
NUNC is a multilingual (It. De. Fr. En. Es. Ma. Su. Ee. Pt.) suite of corpora based on the language of newsgroups, freely available and querable online. Devised by Manuel Barbera, NUNC was born in 2002, and is currently under developement by A. Allora, M. Barbera, S. Colombo, E. Corino, C. Marello, S. Casavecchia, C. Onesti, M. Tomatis, L. Valle and others. There are already some betas available for testing (Italian, UK English, French and Spanish).
-
C-004082: Hellenic National Corpus
The ILSP Corpus has been developed by the Institute of Language and Speech Processing. It currently contains about 47.000.000 words, while it is constantly being updated. All texts have been selected, so as to present a realistic picture of modern language use.
- isPartOf: ILSP Corpus
-
C-004083: Dialogue Diversity Corpus Version 2.0
The Dialogue Diversity Corpus (DDC) is about finding data to use in research on human interaction, especially dialogue. This edition, Version 2.0, retains access to all of the still-accessible sources that were available through the original release.
-
C-004084: Louvain International Database of Spoken English Interlanguage
In 1995, a complementary project was launched in Louvain to compile a corpus of spoken learner language, the Louvain International Database of Spoken English Interlanguage (LINDSEI). The first component of LINDSEI contains transcripts of 50 interviews (30 female subjects, 20 male subjects) with French mother tongue learners of English (c. 100,000 words of learner language) and research has already begun into the phraseology of this type of interlanguage (see list of publications on learner corpora: De Cock 1996, De Cock et al. 1998, De Cock 1998, De Cock 2000). A number of other components are currently being compiled for different mother tongue backgrounds. Alongside these non-native varieties of English, a comparable corpus of interviews with native speakers of English has been compiled, so that interlanguage and native language can be compared and the universal and L1-specific features of oral interlanguage identified. The corpus needs to be extended still further and we are hoping to attract yet more researchers working with students from different mother tongue backgrounds to join the LINDSEI project.