Language resource #: 3330
Results 1521 - 1530 of 2023
-
C-004145: Digital Morphology Archives
The Digital Morphology Archives contains a digitalized portion of the data stored to the Morfology Archives at the Department of Finnish language and literature in the University of Helsinki. The purpose of the Morphology Archives is to facilitate research on the rich morphology of Finnish and to provide researchers with well-organised data on the dialects of different parishes.
-
C-004146: Finnish Broadcast Corpus
The material in the Finnish broadcast corpora has been divided into four categories:
* Radio monologues
* Radio dialogues
* TV monologues
* TV dialogues
Currently, the Finnish Broadcast Corpus contains two parts: 1 (FBC-1) and 2 (FBC-2). These contains recordings from the Finnish Broadcasting Company. -
C-004147: Finnish-Swedish Textcollection
The documents are provided as they have been received from the publisher in electronic form. The correspondence to printed version has not been verified manually.
-
C-004148: Finnish Text Collection
The Finnish Language Text Collection (Suomen kielen tekstikokoelma) is a selection of electronic research material that contains written Finnish from 1990's.
-
C-004149: Helsinki Corpus of Swahili
The Helsinki Corpus of Swahili (HCS) is an annotated corpus of Standard Swahili text. It contains news texts from several current Swahili newspapers as well as from the news site of Deutsche Welle. It also contains extracts from a number of books containing prose text, including fiction, education and sciences.
-
C-004150: Oulu corpus
The Oulu Corpus is a research material of the standard Finnish language in 1960's. Its collection was led by prof. Pauli Saukkonen. The research material was converted later, in 1997, into an SGML format by the Research Institute for the Languages of Finland.
The corpus project aimed at creation of a corpus that contains a representative sample of the Finnish language in the 1960's media. The corpus does not include the language as used in television. -
C-004151: SFNET discussion group corpus
The Sfnet corpus is collected from a finnish newsgroup area, sfnet. Sfnet is an administered Finnish usenet hierarchy. The sfnet newsgroups generally exist for discussion in the Finnish language.
-
C-004152: Berlin Database of Emotional Speech
It contains about 500 utterances spoken by actors in a happy, angry, anxious, fearful, bored and disgusted way as well as in a neutral version. You can choose utterances from 10 different actors and ten different texts.
The recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. -
C-004153: Estonian Emotional Speech Corpus
The Estonian Emotional Speech Corpus (EEKK) is being created in the framework of the National Programme for Estonian Language Technology at the Institute of the Estonian Language. The corpus contains sentences expressing anger, joy and sadness, as well as neutral sentences.
-
C-004154: Weblog Data Collection
Intelliseek will be a big corpus of spidered and annotated blog posts to attendees at the 3rd Annual Workshop on the Weblogging Ecosystem (held in conjunction with the WWW 2006 Conference in Edinburgh, Scottland):
The data release comprises a complete set of weblog posts for three weeks in July 2005 (on the order of 10M posts from 1M weblogs). This data set has been selected as it spans a period of time during which an event of global significance occurred, namely the London bombings.
The data set includes the full content of the posts plus mark-up. The marked-up fields include: date of posting, time of posting, author name, title of the post, weblog url, permalink, tags/categories, and outlinks classified by type.
Sounds like a great resource for researchers. I'm also amused (in a dark sort of way) by the datashare individual agreement they require people to sign — essentially they admit that there's no way they can get copyright clearance from all million or so bloggers they've collected, so they just ask everyone to agree to remove any posts if anyone complains, not use the results for commercial purposes and not use it passed the workshop.