言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 961 - 970 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-001605: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2005
This CD-ROM contains full-text artcles from Nihon Keizai Shimbun, Nikkei Kin'yu Shimbun and Nikkei MJ(Ryutsu Shimbun). Users can search by keywords. Headline list is included.
- isVersionOf: Nikkei Full-text Database (1990-2006 each)
C-001606: CRL-DB-TEXT-97-1
This data analyzes the linking between the simple sentences devided from the text of RWC-DB-TEXT-95-2. They are already corrected manually.
- isPartOf: RWC Text Database
C-001607: Mandarin Chinese Telephone Speech Recognition Corpus SMS (Fixed phone 86)
Desktop/Microphone
This corpus comprises 1,648 entries uttered by 86 speakers of different dialects, ages and various educational levels (64 males and 22 females), recorded over the fixed telephone network. The database comprises 4,282 Chinese short messages (SMS). Speech samples are stored as a sequence of 16-bit 8kHz WAV for a total of 3.7 hours of speech. The total capacity of the data is 205 Mb.
Each speaker read 50 items. Text files are stored in Unicode format. All data have been proofread manually.
The transcriptions include non-speech markers (background noise, background speech, speaker sounds) as well as markers for mispronunciation, channel distortions, words left-out and duplicates.
The corpus aims to be applied to the testing and telephone natural speech recognition system.
C-001608: DVD-ROM 公開公報
This DVD-ROM contains mopnthly issued patent reports.
C-001616: EDR Japanese Corpus
The Japanese Corpus is composed of records arranged according to EUC
(Extended Unix Code). The records of the Japanese Corpus are composed of the
record number, sentence information, constituent information, morphological
information, syntactic information, semantic information and management
information. The basic role of the Japanese Corpus is first to identify the
sentence constituents of sentences, and then to indicate how the constituents
combine to form the morphological, syntactic and semantic structure of the
sentence using a large number of actual examples as the source data.
- isPartOf: D-001615: EDR Japanese Co-occurrence Dictionary
- isPartOf: EDR Corpus
- hasVersion: English Corpus
C-001622: Kyoto Text Corpus
This corpus has morphological information and sentence information on 40,000 Mainichi Shimbun articles of 1995. Manually corrected. Mainichi Shimbun '95 CD-ROM is also required to purchase.
- requires: C-001600: CD-Mainichi Shimbun '95 Data Collection
C-001623: RWC-DB-TEXT-94-1
This is a morphological-analysed corpus of white papers of the Ministry of Economy from 1992 to 1994.Manually corrected. It is not distributed at this moment; it will be distributed again by GSK.
- isPartOf: RWC Text Database
C-001624: RWC-DB-TEXT-94-2
This is a morphological-analysed corpus of reports concerning natural data processing by Japan Electronic Industry Development Association. Manually corrected. It is not distributed at this moment; it will be distributed again by GSK.
- isPartOf: RWC Text Database
C-001625: RWC-DB-TEXT-95-2
This is part of morphological-analysed corpus of 3,000 Mainichi Shimbun articles of 1994.Manually corrected. It is not distributed now; it will be distributed again by GSK.
- isPartOf: RWC Text Database
- requires: CD-Mainichi Shimbun 1994 Data Collection
C-001626: RWC-DB-TEXT-95-3
This data provides UDC code with 30,000 Mainichi Shimbun articles of 1994. It is not distributed now; it will be distributed again by GSK.
- isPartOf: RWC Text Database
- requires: CD-Mainichi Shimbun '94 Data Collection

SHACHI - Language Resource Metadata Database