言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1241 - 1250 件目

C-003622: 日本語の語彙特性第4期
NTTデータベースシリーズ「日本語の語彙特性」は、日本語の言語心理学的研究の基盤となるデータベースで、全4期9巻から成る(第1期(第1巻～第6巻)・第2期(第7巻)・第3期(第8巻)・第4期(第9巻))。十分な項目数と単語・文字に関する各種の言語情報をそなえ、日本ではじめて学術的研究における実用に耐えうる信頼性をもつ。本巻シリーズ第4期分には第1期分収録の約7万語と、約10万語を収録した学研国語大辞典第二版との差分約3万語に対する文字単語親密度が収録されている。簡易検索ソフト及びテキスト形式データも併せて収録。
- references: 学研国語大辞典第二版
- isPartOf: D-003476: 日本語の語彙特性
- hasVersion: D-003474: CD-ROM 日本語の語彙特性第1期
- hasVersion: D-003470: CD-ROM 日本語の語彙特性第2期
- hasVersion: D-003472: 日本語の語彙特性第3期 CD-ROM付き
- hasVersion: D-003458: 日本語の語彙特性第1巻
- hasVersion: D-003460: 日本語の語彙特性第2巻
- hasVersion: D-003462: 日本語の語彙特性第3巻
- hasVersion: D-003464: 日本語の語彙特性第4巻
- hasVersion: D-003466: 日本語の語彙特性第5巻
- hasVersion: D-003468: 日本語の語彙特性第6巻
- hasVersion: C-003623: 基本語データベース語義別単語新密度
C-003623: 基本語データベース語義別単語新密度
日本語約28,000単語・約45,000語義について、意味の違いを考慮した単語親密度（語義別単語親密度）を収録。複数の意味を持つ単語については、その意味毎に単語親密度を収録。言葉の意味に関する科学的基礎データとして多くの分野で活用が可能。
- hasVersion: D-003474: CD-ROM 日本語の語彙特性第1期
- hasVersion: D-003470: CD-ROM 日本語の語彙特性第2期
- hasVersion: D-003472: 日本語の語彙特性第3期 CD-ROM付き
- hasVersion: C-003622: 日本語の語彙特性第4期
- hasVersion: D-003458: 日本語の語彙特性第1巻
- hasVersion: D-003460: 日本語の語彙特性第2巻
- hasVersion: D-003462: 日本語の語彙特性第3巻
- hasVersion: D-003464: 日本語の語彙特性第4巻
- hasVersion: D-003466: 日本語の語彙特性第5巻
- hasVersion: D-003468: 日本語の語彙特性第6巻
C-003624: Turin University Treebank 1.1
TUT is a morphologically, syntactically and semantically annotated corpus of Italian sentences. It consists of two Italian subcorpora (Civil law corpus (1100 sentences) and Newpaper corpus (1100)) and an English corpus (200 sentences) as a support for non-Italian speakers to the comprehension of the annotation scheme. The Italian corpora are annoted with two different formats; TUT format and Penn Treebank format. TUT format is dependency-oriented and aims at capturing the richness of the predicate-argument structure. The English corpus is annoted only with TUT format.
- conformsTo: C-001546: Treebank-2
- conformsTo: ILEX
C-003625: Tübingen Partially Parsed Corpus of Written German
TüPP-D/Z is a collection of newspaper articles written in German, automatically annotated with clause structure, topological fields, chunks and some low level annotation including POS, morphological ambiguity classes and information about some regular types of named entities including numerical expressions such as dates, numbers and units. The raw text of the corpus consists of more than 200 million words.
C-003626: Tübingen Treebank of Spoken German
The TüBa-D/S treebank was built under the project Verbmobil, a longterm machine translation project for spontaneous speech funded by the Ministry for Education, Science, Research, and Technology (BMBF) in Germany. It contains syntactically annotated transcribed spontaneous dialogues in German consisting of approximately 38,000 sentences (360,000 words). The annotation scheme distinguishes four levels of syntactic constituency: the lexical level, the phrasal level, the level of topological fields, and the clausal level. The treebank is available in 3 different formats: negra export format, XML format and Penn Treebank format.
- hasVersion: C-003627: Tübingen Treebank of Written German Release 4
- hasVersion: C-003628: Tübingen Treebank of Spoken English
- hasVersion: C-003629: Tübingen Treebank of Spoken Japanese
- references: C-000188: VERBMOBIL - VM CD 1.1 (new edition)
- references: C-000189: VERBMOBIL - VM CD 12.1 (new edition)
- references: C-000190: VERBMOBIL - VM CD 14.1 (new edition)
- references: C-000191: VERBMOBIL - VM CD 2.1 (new edition)
- references: C-000192: VERBMOBIL - VM CD 3.1 (new edition)
- references: C-000193: VERBMOBIL - VM CD 4.1 (new edition)
- references: C-000194: VERBMOBIL - VM CD 5.1 (new edition)
- references: C-000196: VERBMOBIL - VM CD 7.1 (new edition)
- references: C-000197: VERBMOBIL - VM CD S 1.0 (original edition)
- references: C-000198: VERBMOBIL II - VM CD 22.1 - VM22.1 (BAS edition)
- references: C-000199: VERBMOBIL II - VM CD 24.1 - VM24.1 (BAS edition)
- references: C-000203: VERBMOBIL II - VM CD 29.1 - VM29.1 (BAS edition)
- references: C-000207: VERBMOBIL II - VM CD 38.1 - VM38.1 (BAS edition)
- references: C-000208: VERBMOBIL II - VM CD 39.1 - VM39.1 (BAS edition)
- references: C-000209: VERBMOBIL II - VM CD 48.1 - VM48.1 (BAS edition)
- references: C-000211: VERBMOBIL II - VM CD20.1 - VM20.1 (new edition)
- references: C-000212: VERBMOBIL II - VM CD21.1 - VM21.1 (new edition)
- references: C-000370: VERBMOBIL II - VM CD 63.0 - VM63.0 (original edition)
- references: C-000373: VERBMOBIL II - VM CD 65.0 - VM65.0 (original edition)
- references: C-000374: VERBMOBIL II - VM CD 53.1 - VM53.1 (BAS edition)
- references: C-000375: VERBMOBIL II - VM CD 60.1 - VM60.1 (BAS edition)
- references: C-000376: VERBMOBIL II - VM CD 61.1 - VM61.1 (BAS edition)
- references: C-000377: VERBMOBIL II - VM CD 64.0 - VM64.0 (original edition)
C-003627: Tübingen Treebank of Written German Release 4
The TüBa-D/Z treebank is a syntactically annotated German newspaper corpus consisting of approximately 36,000 sentences (640,000 words). The annotation represents information on inflectional morphology, syntactic constituency, grammatical functions, (complex) named entities and anaphora and coreference relations. The corpus is still in progress (as of November of 2008), and releases of more data will follow.
- replaces: Tübingen Treebank of Written German Release 3
C-003628: Tübingen Treebank of Spoken English
The TüBa-E/S treebank was built under the project Verbmobil, a longterm machine translation project for spontaneous speech funded by the Ministry for Education, Science, Research, and Technology (BMBF) in Germany. It contains syntactically annotated transcribed spontaneous dialogues in English consisting of approximately 30,000 sentences (310,000 words). The manual syntactic annotation is HPSG-oriented and based on three levels of syntactic constituency: the lexical level, the phrasal level and the clausal level. The treebank is available in 3 different formats: negra export format, XML format and Penn Treebank format.
- hasVersion: C-003626: Tübingen Treebank of Spoken German
- hasVersion: C-003629: Tübingen Treebank of Spoken Japanese
- references: C-000195: VERBMOBIL - VM CD 6.1 (new edition)
- references: C-001565: VERBMOBIL - VM CD 8.1 (new edition)
- references: C-001564: VERBMOBIL - VM CD 13.1 (new edition)
- references: C-001567: VERBMOBIL II - VM CD 23.1 - VM23.1 (BAS edition)
- references: C-001568: VERBMOBIL II - VM CD 28.1 - VM28.1 (BAS edition)
- references: C-001569: VERBMOBIL II - VM CD 30.1 - VM30.1 (BAS edition)
- references: C-001570: VERBMOBIL II - VM CD 31.1 - VM31.1 (BAS edition)
- references: C-001571: VERBMOBIL II - VM CD 32.1 - VM32.1 (BAS edition)
- references: C-001572: VERBMOBIL II - VM CD 42.1 - VM42.1 (BAS edition)
- references: C-001573: VERBMOBIL II - VM CD 43.1 - VM43.1 (BAS edition)
- references: C-000210: VERBMOBIL II - VM CD 50.1 - VM50.1 (BAS edition)
C-003629: Tübingen Treebank of Spoken Japanese
The TüBa-J/S treebank was built under the project Verbmobil, a longterm machine translation project for spontaneous speech funded by the Ministry for Education, Science, Research, and Technology (BMBF) in Germany. It contains syntactically annotated transcribed spontaneous dialogues in Japanese consisting of approximately 18,000 sentences (160,000 words). The speech data was romanized and manually annotated. The syntactic annotation is HPSG-oriented and based on three levels of syntactic constituency: the lexical level, the phrasal level and the clausal level. The treebank is available in 2 different formats: negra export format and CoNLL-X Shared Task dependency format.
- hasVersion: C-003626: Tübingen Treebank of Spoken German
- hasVersion: C-003628: Tübingen Treebank of Spoken English
- references: C-000368: VERBMOBIL II - VM CD 16.1 - VM16.1 (new edition)
- references: C-000371: VERBMOBIL II - VM CD 17.1 - VM17.1 (new edition)
- references: C-000367: VERBMOBIL II - VM CD 18.1 - VM18.1 (new edition)
- references: C-000372: VERBMOBIL II - VM CD 19.1 - VM19.1 (new edition)
- references: C-000200: VERBMOBIL II - VM CD 25.1 - VM25.1 (BAS edition)
- references: C-000201: VERBMOBIL II - VM CD 26.1 - VM26.1 (BAS edition)
- references: C-000202: VERBMOBIL II - VM CD 27.1 - VM27.1 (BAS edition)
- references: C-000204: VERBMOBIL II - VM CD 33.1 - VM33.1 (BAS edition)
- references: C-000205: VERBMOBIL II - VM CD 34.1 - VM34.1 (BAS edition)
- references: C-000206: VERBMOBIL II - VM CD 35.1 - VM35.1 (BAS edition)
- references: N-001197: VERBMOBIL II - VM CD 44.1 - VM44.1 (BAS edition)
- references: C-000365: VERBMOBIL II - VM CD 45.1 - VM45.1 (BAS edition)
- references: C-001574: VERBMOBIL II - VM CD 46.1 - VM46.1 (BAS edition)
- references: C-000369: VERBMOBIL II - VM CD 62.1 - VM62.1 (BAS edition)
C-003631: Princeton WordNet Gloss Corpus
The corpus is a set of annotated disambiguated glosses and contains word forms from the definitions in WordNet's synsets, manually linked to the context-appropriate sense in WordNet. The corpus is provided in two different formats; the merged format (all annotations combined in a single file) and the standoff format (annotations are stored in documents separate from the gloss text).
- references: D-000825: WordNet
C-003632: SemCor 1.6
The SemCor corpus 1.6 is a subcorpus of WordNet 1.6 and consists of 352 texts. All the words in SemCor are tagged for POS and more than 200,000 content words are lemmatized and sense-tagged according to Word Net 1.6. The semantic tagging of SemCor 1.6 was manually done while all the other versions like SemCor 1.7 were automatically created.
- references: D-000825: WordNet
- references: C-000751: Brown Corpus
- isPartOf: C-003633: MultiSemCor Corpus 1.1
- isReplacedBy: C-003634: SemCor 1.7
- isPartOf: D-000825: WordNet

SHACHI - Language Resource Metadata Database