言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 871 - 880 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-001493: Original Short-Message Data Collation I in Chinese (participles)
Written Corpora
This corpus comprises 5,891,275 characters, corresponding to 51,568 short messages (SMS) from radio/TV stations and 213,694 daily life short messages. This subset contains original messages together with participles.
All data have been proofread manually.
C-001494: Original Short-Message Data Collation I in Chinese
Written Corpora
This corpus comprises 5,891,275 characters, corresponding to 51,568 short messages (SMS) from radio/TV stations and 213,694 daily life short messages. This subset contains the original messages.
All data have been proofread manually.
C-001495: Original Short-Message Data Collation II in Chinese (PinYin)
Written Corpora
This corpus comprises 2,604,901 characters, corresponding to 202,277 daily life short messages (SMS). This subset contains original messages together with PinYin transcription.
All data have been proofread manually with PinYin.
C-001496: Original Short-Message Data Collation II in Chinese (named entities)
Written Corpora
This corpus comprises 2,604,901 characters, corresponding to 202,277 daily life short messages (SMS). This subset contains original messages together with named entities.
All data have been proofread manually.
C-001497: Original Short-Message Data Collation II in Chinese (participles)
Written Corpora
This corpus comprises 2,604,901 characters, corresponding to 202,277 daily life short messages (SMS). This subset contains original messages together with participles.
All data have been proofread manually.
C-001498: Original Short-Message Data Collation II in Chinese
Written Corpora
This corpus comprises 2,604,901 characters, corresponding to 202,277 daily life short messages (SMS). This subset contains the original messages.
All data have been proofread manually.
C-001500: PAROLE Irish Distributable Corpus
Written Corpora
The PAROLE Irish Distributable Corpus consists of over 8 million words (a subset of the 15+ million words Irish Reference corpus).

The text is marked-up in accordance with the PAROLE encoding standard which incorporates the Corpus Encoding Standard (CES) and Text Encoding Initiative (TEI) Guidelines. All the files are in SGML format with a detailed header and the body of the text tagged to paragraph level. The header includes information such as title, author(s), number of words, ownership, publication details and also a standard coding for Medium, Topic and Genre categories.

A subset of the Distributable Corpus is morpho-syntactically tagged.

Included in this distribution is approximately 3,000 manually checked words.
C-001504: PHONDAT 2 - PD2 (2nd edition)
Desktop/Microphone
The corpus contains read speech of 16 different speakers. Each speaker has read a corpus of 200 different sentences from a train inquiry task. The speakers were recorded at three different sites in Germany (University of Kiel, University of Bonn, University of Munich). The language is German. The corpus contains a total of 3.200 recorded utterances. It is provided with a phonological segmentation by hand, an automatic alignment, a word segmentation, and a prosodic segmentation. (1 CDROM).
C-001506: Phonetically Balanced Sentences
Desktop/Microphone
Large acoustic corpus in Korean produced by Kaist Korterm. 20 native Korean speakers (males and females) read 1 time 539 sentences and a set of 50 common sentence. Information such as the size and the level of studies of the speakers are provided. The recordings took place in a soundproof room. The data are stored in a 8-bit A-law speech file, with a 16 kHz sampling rate. The standard in use is NIST.
C-001507: Phonetically Balanced Words (2)
Desktop/Microphone
Large acoustic corpus of read text in Korean produced by Kaist Korterm. Native Korean speakers (males and females) have uttered 36 geographical proper nouns. Information such as the size and the level of studies of the speakers are provided. The recordings took place in a soundproof room. The data are stored in a 8-bit A-law speech file, with a 16 kHz sampling rate. The standard in use is NIST.

SHACHI - Language Resource Metadata Database