言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 411 - 420 件目

C-000757: Cambridge Learner Corpus
The Cambridge Learner Corpus (CLC) is a large collection of exam scripts written by students taking Cambridge ESOL English exams around the world. It currently contains over 85,000 scripts and it is growing all the time. It forms part of the Cambridge International Corpus (CIC). It has been built by Cambridge University Press and Cambridge ESOL (part of UCLES, the University of Cambridge Local Examination Syndicate).
- isPartOf: Cambridge International Corpus (CIC).
- hasVersion: Cambridge and Nottingham Corpus of Discourse in English (CANCODE)
- isPartOf: Cambridge and Nottingham Spoken Business English (CANBEC)
- hasVersion: N-000755: Cambridge Cornell Corpus of Spoken North American English
- isPartOf: Cambridge Corpus of Spoken North American English (CAMSNAE)
- hasVersion: N-000756: Cambridge Corpus of Business English
- isPartOf: Cambridge Corpus of Legal English
- hasVersion: Cambridge Corpus of Financial English
- hasVersion: Cambridge Corpus of Academic English
C-000758: Cambridge and Nottingham Corpus of Discourse in English
It is a unique collection of spoken English that has been built up by Cambridge University Press and the University of Nottingham. It forms part of the Cambridge International Corpus. The recordings were collected in Britain between 1995 and 2000, keyboarded by trained transcribers, coded, and stored in a computerised database which can be searched with specially designed software. CANCODE comprises 5 million words.
- isPartOf: Cambridge International Corpus
- isPartOf: Cambridge and Nottingham Spoken Business English (CANBEC)
- isVersionOf: N-000755: Cambridge Cornell Corpus of Spoken North American English
- isPartOf: Cambridge Corpus of Spoken North American English (CAMSNAE)
- isVersionOf: N-000756: Cambridge Corpus of Business English
- isPartOf: Cambridge Corpus of Legal English
- isVersionOf: Cambridge Corpus of Financial English
- isVersionOf: Cambridge Corpus of Academic English
C-000760: Collins Word Web
In order to stay at the forefront of language developments we have an extensive reading, listening and viewing programme, taking in broadcasts, websites and publications from around the globe - from the British Medical Journal to The Sun, from Channel Africa to CBC News. These are fed into our monitoring system, an unparalleled 2.5 billion-word analytical database: the Collins Word Web.
Every month the Collins Word Web grows by 35 million words, making it the largest resource of its type. When new words and phrases emerge, our active system is able to recognize the moment of their acceptance into the language, the precise context of their usage and even subtle changes in definition - and then alert us to them. All of which ensures that when you use a Collins dictionary, you are one of the best-informed language users in the world.
- isReferencedBy: Collins dictionaries
- hasPart: Bank of English®
- hasVersion: La banque de français moderne and El Banco de Español
- requires: C-000826: WordbanksOnline
C-000761: Corpus of Early English Correspondence Sampler
The Corpus of Early English Correspondence Sampler (CEECS) represents the non-copyrighted materials included in the Corpus of Early English Correspondence. CEECS1 covers the 15th and 16th centuries, with the exception of the Hutton collection, which goes on to the 17th century. CEECS2 consists of 17th century material, only 3 letters in Original 3 are from the late 16th century.
- isPartOf: The Corpus of Early English Correspondence (CEEC)
- hasPart: The Oxford Text Archive
C-000763: ERIC
ERIC is an internet-based digital library of education research and information. ERIC provides access to bibliographic records of journal and non-journal literature indexed from 1966 to the present. ERIC also contains a growing collection of full-text materials in Adobe PDF format, including reports from the What Works Clearinghouse.
What's in the collection?
The ERIC collection includes bibliographic records (citations, abstracts, and other pertinent data) for more than 1.2 million items indexed since 1966, including:
* journal articles
* books
* research syntheses
* conference papers
* technical reports
* policy papers, and
* other education-related materials
- hasPart: T-000762: ERIC Thesaurus
C-000764: English-Norwegian Parallel Corpus
The English-Norwegian Parallel Corpus (ENPC) consists of original texts and their translations (English to Norwegian and Norwegian to English). It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research.
- references: Oslo Multilingual Corpus
C-000766: European Parliament Proceedings Parallel Corpus
This parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romance (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.
- hasVersion: German-English Parallel Corpus
- hasVersion: Pharaoh, 2003, a beam search decoder for phrase-based statistical machine translation model
C-000768: Freiburg - Brown Corpus
In 1991, Christian Mair took the initiative to compile a set of corpora that would match the well-known and widely used Brown and LOB corpora with the only difference that they should represent the language of the early 1990s.
- conformsTo: C-000751: Brown Corpus
- conformsTo: C-000801: THE LOB CORPUS
- references: Freiburg-LOB Corpus
C-000770: Hansards
This release contains 1.3 million pairs of aligned text chunks (sentences or smaller fragments) from the official records (Hansards) of the 36th Canadian Parliament.
C-000772: Innsbruck Computer Archive of Machine-Readable English Texts
Three sections- INNSBRUCK MIDDLE ENGLISH PROSE CORPUS, INNSBRUCK LETTER CORPUS 1386 to 1688, and ICAMET Varia Corpus. As part of the Bergen ICAME CD-ROM, 2nd ed. (launched Sept. 1999), though with the restriction that copyright conventions concerning about half of the prose texts (though not the letter texts) allowed the inclusion of only a sampler of the prose texts.

SHACHI - Language Resource Metadata Database