言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1491 - 1500 件目

検索条件を選択

description_language
language_area
language
type
subject_monoMultilingual
subject_resourceSubject
type_style
type_form
type_sentence
type_linguisticType
type_discourseType
type_purpose
subject_linguisticField
contributor_author_level
contributor_speaker_level
contributor_author_motherTongue
contributor_speaker_motherTongue
contributor_author_dialect
contributor_speaker_dialect
contributor_author_age
contributor_speaker_age
contributor_author_gender
contributor_speaker_gender
type_annotation

C-004107: speech accent archive
The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. The speech accent archive is established to uniformly exhibit a large set of speech accents from a variety of language backgrounds. Native and non-native speakers of English all read the same English paragraph and are carefully recorded. The archive is constructed as a teaching tool and as a research tool. It is meant to be used by linguists as well as other people who simply wish to listen to and compare the accents of different English speakers.
C-004109: Lancaster Speech, Writing and Thought Presentation Written Corpus
The Lancaster Speech, Writing and Thought Presentation Written Corpus was built to investigate the nature of SW&TP in written narrative texts, and to test the model of S&TP proposed in Leech and Short (1981). A corpus of approximately 260,000 words of modern British narrative texts representing three text types (fiction, newpapers, biography) with detailed annotation for all forms of speech, thought and writing presentation which occur in the corpus.
- hasVersion: C-004110: Lancaster Speech, Writing and Thought Presentation Spoken Corpus
C-004110: Lancaster Speech, Writing and Thought Presentation Spoken Corpus
The Lancaster Speech, Writing and Thought Presentation Spoken Corpus has been built as part of an AHRB-funded project to investigate the nature of speech, writing and thought presentation (SW&TP) in contemporary spoken British English.
- hasVersion: C-004109: Lancaster Speech, Writing and Thought Presentation Written Corpus
C-004111: Russian National Corpus
The corpus of Russian is a reference system based on a collection of Russian texts in electronic form.The Corpus is intended for all who are interested in the Russian language and various associated fields: professional linguists, language teachers, school and university students, foreigners learning the language.
C-004112: Deeply Annotated Corpus
This subcorpus of the RNC contains texts augmented with morphosyntactic annotation. Besides the morphological information ascribed to each word in the text, every sentence has its syntax structure marked up. Unlike the morphologically annotated portion of the RNC, the DAC only contains fully disambiguiated annotations (i.e. both morphological and syntax ambiguity is resolved).
- isPartOf: C-004111: Russian National Corpus
C-004113: Parallel text corpus
The parallel text corpus is a special type of corpus where a text in Russian is complemented by its translation into a different language, and vice versa. The units of the original and the translated texts (usually, a unit is a sentence) are matched through a procedure known as “leveling”. A leveled parallel corpus is an important tool for various type of research, including studies on the theory of translation; it can also be used as a language teaching tool.
- isPartOf: C-004111: Russian National Corpus
C-004114: Dialectal corpus
The dialectal corpus contains recordings of dialectal speech (presented in loosely standardized orthography) from different regions of Russia. There is no intention to present the phonetic variation, but morphological, syntactic and lexical peculiarities of these texts are preserved. The subcorpus employs special tags for specifically dialectal morphological features (including those absent in standard language); moreover, purely dialectal lexemes are supplied with commentary.
- isPartOf: C-004111: Russian National Corpus
C-004115: Corpus of Spoken Russian
The Corpus of Spoken Russian includes the recordings of public and spontaneous spoken Russian and the transcripts of the Russian movies. To record the spoken specimens the standard spelling was used. The lexical, morphological and semantic queries are practicable. The building of the user's sub-corpora is available (for this purpose the usage of the sociological parameters is also possible). The corpus contains the patterns of different genres/types and of different geographic origins (Moscow, Sanct-Peterburg, Saratov, Ulyanovsk, Taganrog, Ekaterinburg, and so on).
- isPartOf: C-004111: Russian National Corpus
C-004116: FIDA
FIDA, the Corpus of Slovene Language, represents a reference corpus of the Slovene language.
C-004117: IJS - ELAN
The IJS-ELAN corpus contains 1 million words from 15 parallel Slovene-English / English-Slovene texts. The composition, annotation, encoding and availability of the corpus are meant to facilitate developments of language technology and studies in bilingual terminology extraction, primarily for the Slovene language.

SHACHI - Language Resource Metadata Database