言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1261 - 1270 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-003643: D-Coi-corpus
The corpus consists of newspaper text, texts intended for the general public obtained from government websites, journals, brochures, legal texts, manuals etc. The texts have been annotated with dependency relations according to the guidelines of the D-Coi-project, a preparatory project which aimed to produce a blueprint and the tools needed for the construction of a 500-million-word reference corpus of contemporary written Dutch.
- isReferencedBy: C-003642: COREA-coreferentiecorpus
- references: C-003645: Twente Nieuws Corpus
C-003644: DuELME
DuELM is one of the results of the IRME (Identification and Representation of Multiword Expressions) project. It contains lexical descriptions of 5,000 multiword expressions (MWEs), which meets the criterion of being highly theory- and implementation-independent. Its main purpose is for it to be used in various Dutch NLP systems.
- references: C-003645: Twente Nieuws Corpus
C-003645: Twente Nieuws Corpus
The TwNC is a multifaceted Dutch news corpus, comprising about 530 million words of text data and some audio data useful for language model training. The data includes text data from newspaper and magazine articles, and text and audio data from subtitling and autocues/transcripts of broadcast news shows.
- isReferencedBy: C-003644: DuELME
- isReferencedBy: C-003643: D-Coi-corpus
C-003646: Malay Concordance Project
The MCP contains over 4.8 million words (including over 100,000 verses) from more than 130 sources of pre-modern Malay written text. These texts can be searched on-line to provide useful information about contexts in which words are used, where particular terms or names occur in texts, and patterns of morphology and syntax.
C-003648: NTCIRデータセット／テストコレクション
C-003650: 宇都宮大学パラ言語情報研究向け音声対話データベース
宇都宮大学パラ言語情報研究向け音声対話データベース (UUDB) は、自然(spontaneous)で表情豊かな音声対話に見られる多様な音声学的現象および言語学的現象の研究への用途を主たる開発目的とした音声コーパスである。
C-003652: 電総研単語音声データベース
音韻バランスを考慮した単語リスト1542語の読み上げ音声
C-003655: NTT 乳幼児音声データベース
3家庭5名の幼児とその両親の自然発話を，幼児の誕生直後から最大5年間にわたって断続的に録音した縦断的データベース．
C-003658: CASTEL/J CD-ROM バージョン1.2
日本語教育用各種データベース（漢字辞書、漢字筆順辞書、単語辞書、専門用語辞書「文部省学術用語集」、用例辞書、和英辞書、音声・イラスト辞書、講談社現代新書・松竹「男はつらいよ」映画台本、小松左京作品などテキストデータ）が収録されています。
- hasVersion: D-003659: CASTEL/J CD-ROM ミレニアム・バージョン (v.1.3)
C-003662: 教育研究情報データベース　高校入試問題
本データベースは、平成３年度から平成19年度までの、各都道府県教育委員会が行った公立高校の入試問題に関する情報のデータベースである。平成13年度以降については、現在データの整理中であり、準備が整い次第データベースを追加更新する予定である。平成12年度までの蓄積件数は、問の数で76,269 件である。年間のデータ追加件数は、問の数で約8,000件である。本データベースは、全件検索可能となっている。

SHACHI - Language Resource Metadata Database