Language resource #: 3330 Results 1031 - 1040 of 2023
Current query
Input keywords
Select items
  • C-003229: Eindhoven Corpus
    The Eindhoven Corpus is the first collection of written and transcribed spoken Dutch texts with 720,000 tokens from the period 1960 to 1973. It was initially intended to put together a frequency list for Dutch, and now, it can be used for all kinds of linguistic technology research. The corpus is manually, almost fautlessly annotated. This makes the corpus suitable for use as a training and test data when developing part-of-speech taggers.
  • C-003230: IFA Spoken Language Corpus v1.0
    The IFA Corpus is an open source database of hand-segmented Dutch speech. It contains speech from 8 Dutch speakers. For each speaker, a fixed text has been recorded in several "styles", and a retold version of the fixed text. Furthermore, each speaker told an Informal story face-to-face with an interviewer which was the basis of a speaker specific variable text corpus, which was read and retold by each speaker individualy. This corpus is unique in the sense that it has phonemic segmentation and that the same speakers recorded in many syles, which many of the currently available speech corpora lack.
    • isReferencedBy: [???Reference] The IFA Corpus: a Phonemically Segmented Dutch "Open Source" Speech Database (http://www.fon.hum.uva.nl/Service/IFAcorpus/SLcorpus/AdditionalDocuments/IFAcorpusEurospeech2001.html)
    • isReferencedBy: [???Reference] Structure and access of the open source IFA-corpus (http://www.fon.hum.uva.nl/Service/IFAcorpus/SLcorpus/AdditionalDocuments/IRCS2001paper.html)
  • C-003235: Spoken Dutch Corpus 2.0
    The Spoken Dutch Corpus is a collection of approximately 900 hours of Standard Dutch from Flemish and Dutch speakers. The total number of words included is nearly 9 million. All recordings have been aligned with an orthographic transcription and each word has been given a POS tag and a lemma. A selection of one million words has been annotated syntactically, and for a more modest part of the corpus, approximately 250,000 words, a prosodic annotation is available. In this release, the CGN lexicon has also been included.
  • C-003236: CGN Annotation dvd
    This DVD contains the written portion of the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), a collection of approximately 900 hours of standard Dutch from Flemish and Dutch speakers. The total number of words included is nearly 9 million. The DVD includes the fully annotated version of the transcribed corpus. The package also includes COREX, the corpus software used by the CGN. A selection of one million words has been annotated syntactically, and for a more modest part of the corpus, approximately 250,000 words, a prosodic annotation is available.
  • C-003244: Priority Area Project on "Spoken Language" - Grant-in-Aid for Developmental Scientific Research on "Speech Database" Continuous Speech Corpus
    PASL-DSR is a speech database composed of 216 ATR phonetically balanced words, 110 monosyllables from the JEIDA list, 37 vowels and numerals from JEIDA auxiliary list, and 122 sentences. The speech data were originally recorded on video cassettes through a PCM processor, for the "Advanced Man-Machine Interface through Spoken Language" project supported by the Ministry of Education, Science and Culture of Japan. The project has been succeeded by a Grant-in-Aid for Developmental Scientific Research on "Speech Database" project by which the original video-recorded data have been converted to DAT cassettes and later to CD-ROM.
    • references: ATR 503 Phonetically Balanced Sentences
  • C-003245: University of Tsukuba Multilingual Speech Corpus
    The UT-ML corpus was made in support of the "Special Research Project for the Typological Investigation of Languages and Cultures of the East and West" project. It contains speech data spoken in five European and six Asian languages (English, French, German, Russian, Spanish, Arabic, Chinese, Indonesian, Japanese, Korean, and Thai). The number of languages of Asia and Europe were kept almost even. The corpus includes utterances of the same semantic content for each language.
  • C-003246: Tohoku University - Matsushita Isolated Word Database
    The TMW corpus was made in support of the "Grant-in-Aid for Publication Scientific Research Results" project sponsored by Ministry of Education, Science and Culture of Japan during 1988 - 1990. The corpus contains speech data of about 3500 phonetically balanced words and about 3300 station names, as well as text data containing such information as the word lists, name, age and birth place of the speakers etc. The corpus also have phoneme labels for some speech data.
  • C-003247: GSR(A) "Regional Difference in Spoken Japanese Dialects" Spoken Japanese Dialect Corpus
    The spoken Japanese dialect corpus was made in support of the "Grant-in-Aid for Scientific Research (A)" project sponsored by Ministry of Education, Science and Culture of Japan during 1999 - 2001. The corpus contains speech data of text reading (single words, numerals, and very short sentences/phrases of greetings) and natural conversations, as well as corresponding transcripts for conversation data. The conversations cover various topics and the recording time varies depending on the speaker and the topic.
  • C-003248: RWCP-SP96 Spoken Dialogue Database (1996 edition)
    The corpus contains speech data of object-oriented face-to-face dialogues by two persons on purchasing a car and overseas travel plan. It also contains corresponding transcripts with time codes.
  • C-003249: RWCP-SP97 Spoken Dialogue Database (1997 edition)
    The corpus contains speech data of object-oriented face-to-face dialogues by two persons on overseas travel plan. It also contains corresponding transcripts with time codes.