Language Resource Search - SHACHI: Language Resource Metadata Database

Language resource #: 3330 Results 401 - 410 of 2023

C-000745: Australian Corpus of English
It was designed to support a variety of linguistic research. Interest in the differentiation between Australian, British and American English. It would also serve as a strategic sample of current Australian English, and as a reference corpus for comparisons with more specialised, homogeneous corpora in Australia. ACE exists in two versions:ACE I is a full version, containing all 500 samples, available for interrogation via CD ROM or Internet connection.
ACE II is a reduced version includes 75% of ACE I, that is 375 samples available for unrestricted use.
- conformsTo: C-000751: Brown Corpus
- conformsTo: C-000801: THE LOB CORPUS
C-000746: BNC Sampler
The BNC Sampler is a subset of the full BNC. It comprises two samples of written and spoken material of one million words each, compiled to mirror the composition of the full BNC as far as possible. The word-class annotation of the BNC Sampler texts has been carefully checked and manually corrected. The Sampler was first created at Lancaster University during the creation of the BNC.
It is distributed on the BNC Baby CD together with the BNC Baby and an XML version of the American English Brown corpus.
- references: C-000748: BNC-baby
- isPartOf: British National Corpus
C-000747: BNC Spoken Corpus
As part of a major collaborative research project called the British National Corpus which collected over 100 million words of written and spoken English, Longman has develop a 10 million word spoken corpus. The Spoken Corpus consists of natural, spontaneous conversations heard all around us and from the language of lectures, business meetings, after dinner speeches and chat shows. This is the first time that spoken English has ever been recorded in any systematic way on such a huge scale and now lexicographers and linguists have their first opportunity to study English as it is spoken, the English that is found in the street.
- isReferencedBy: The Longman Dictionary Of Contemporary English
- isPartOf: C-000781: Longman Corpus Network
- isPartOf: British National Corpus
C-000748: BNC-baby
BNC Baby is a subset of the BNC World. It consists of four one-million word samples, each compiled as an example of a particular genre: fiction, newspapers, academic writing and spoken conversation. The texts have the same annotation as the full corpus (part of speech, meta data, etc).
It is distributed on a CD together with the BNC Sampler and an XML version of the American English Brown corpus. The CD can be ordered online.
Full XML text of BNC-Baby
- references: BNC Sampler(subset of BNC)
- isPartOf: BNC World
C-000749: British National Corpus (World Edition)
The BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. BNC World is a revised version of the original BNC and was produced between 1998 and 2000. It contains a thorough revision of the part of speech tagging, several corrections to the headers, and some minor revision of the SGML tagging used. BNC World was made available world-wide in 2001. It has now been superseded by BNC XML Edition.
BNC World was made available on CD for installation on a stand-alone PC or on a Windows, Unix or OSX server. The corpus can also be accessed via the BNC Subscription service or by using the BNC Simple Search.
- hasPart: the BNC Sampler(subset of BNC)
- hasPart: the BNC Baby(subset of BNC)
- replaces: C-001018: British National Corpus 1.0
- isReplacedBy: BNC XML Edition(3rd edition of BNC corpus)
C-000750: British National Corpus (XML Edition)
The BNC contains about 100 million words: 90% written, 10% orthographically transcribed spoken text. It has been annotated with word-class information (part-of-speech) and the texts also contain metatextual information.BNC XML Edition is a revised version of the BNC World and it was released in 2007. BNC XML Edition has some additional information about lemmas and simplified word-class of the individual words, but apart from a few errors and inconsistencies, no changes have been made to the actual corpus texts between the two versions. This version of the corpus is in XML format and can be used with the XAIRA search program which allows more search options and an improved user interface than the previous SARA program.
BNC XML Edition is made available on DVD for installation on a stand-alone PC or on a Windows, Unix or OSX server. It is delivered with a copy of the XAIRA search program and all necessary XAIRA index files.
- hasPart: the BNC Sampler(subset of BNC)
- hasPart: the BNC Baby(subset of BNC)
- replaces: BNC World (2nd edition of BNC corpus)
- replaces: C-001018: British National Corpus 1.0
C-000751: Brown Corpus
This Standard Corpus of Present-Day American English consists of 1,014,312 words of running text of edited English prose printed in the United States during the calendar year 1961.
Form A, B, nad C(Tagged version) are available.
- hasFormat: A Standard Corpus of Present Day Edited American English (Brown corpus in XML format)
C-000752: CHILDES
The Child Language Data Exchange System (CHILDES) is an international database organized for the study of first and second language acquisition. The project has been directed by Brian MacWhinney in collaboration with Catherine Snow of Harvard University. From 1984 to 1988 support came from the MacArthur Foundation. Since 1987, support has come from NIH and NSF.
- isPartOf: TalkBank
- isPartOf: http://talkbank.org
- isPartOf: /hasVersion,AphasiaBank
- hasVersion: Conversation Analysis
- hasVersion: Animal Communication
C-000753: COMPARA
This corpus is based on Portuguese-English and English-Portuguese source texts and translations. You can use COMPARA to find out how words and expressions have been translated from Portuguese into English and from English into Portuguese.
C-000754: CRATER Multilingual Aligned Annotated Corpus
(The International Telecommunications UnionCrater Corpus): Multilingual Aligned Annotated Corpus (1,000,000-word) of Spanish, French and English, aligned at the sentence level, available from the School of Engineering, Computing and Mathematical Sciences, Lancaster University, UK. The corpus consists entirely of technical texts from the International Telecommunications Union. The texts are tagged with part-of-speech and morphological annotation.
http://tcc.itc.it/people/forner/multilingualcorpora.html

SHACHI - Language Resource Metadata Database