言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 1211 - 1220 件目

C-003571: 単語音声データベースVol.2(地名編)
音声によるコマンド入力機能を備えた情報機器が実用段階を向かえるにあたり、システム調整、性能評価などで要望の強い単語の音声を収録。本巻Vol.2では日本人200名が駅名、空港、高速道路インター216ヶ所の地名を発声。
- isPartOf: C-003526: 単語音声データベース
- hasVersion: C-003563: 単語音声データベースVol.1(制御用語編)
- hasVersion: C-003572: 単語音声データベースVol.3(人名編)
- hasVersion: C-003573: 単語音声データベースVol.4(商取引用語編)
- hasFormat: C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
C-003572: 単語音声データベースVol.3(人名編)
音声によるコマンド入力機能を備えた情報機器が実用段階を向かえるにあたり、システム調整、性能評価などで要望の強い単語の音声を収録。本巻Vol.3では日本人200名が人口比第1位から第256位までの姓・256単語を発声。
- hasFormat: C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
- hasVersion: C-003563: 単語音声データベースVol.1(制御用語編)
- hasVersion: C-003571: 単語音声データベースVol.2(地名編)
- hasVersion: C-003576: 単語音声データベースVol.3(人名編) (通信端末版)
- isPartOf: C-003526: 単語音声データベース
C-003573: 単語音声データベースVol.4(商取引用語編)
音声によるコマンド入力機能を備えた情報機器が実用段階を向かえるにあたり、システム調整、性能評価などで要望の強い単語の音声を収録。本巻Vol.4では日本人200名が数字、単位、バンキング・交通関連用語など347種を発声。
- isPartOf: C-003526: 単語音声データベース
- hasVersion: C-003563: 単語音声データベースVol.1(制御用語編)
- hasVersion: C-003571: 単語音声データベースVol.2(地名編)
- hasVersion: C-003572: 単語音声データベースVol.3(人名編)
- hasFormat: C-003577: 単語音声データベースVol.4(商取引用語編) (通信端末版)
C-003574: 単語音声データベース (通信端末版)
同社シリーズ・単語音声データベース第2巻、第3巻、第4巻を基に、擬似頭（HATS）と環境騒音データベースを使用した音響的シミュレーションによる音声データベース。日本人200名が日本の地名(第2巻)、日本人人名(第3巻)、数字・単位、バンキング・交通関連用語(第4巻)などを発声。
C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
同社シリーズ「単語音声データベース」を基に、擬似頭（HATS）と環境騒音データベースを使用した音響的シミュレーションによる音声データベース。本巻Vol.2では日本人200名が駅名、空港、高速道路インター216ヶ所の地名を発声。
- isPartOf: C-003574: 単語音声データベース (通信端末版)
- hasVersion: C-003576: 単語音声データベースVol.3(人名編) (通信端末版)
- hasVersion: C-003577: 単語音声データベースVol.4(商取引用語編) (通信端末版)
- isFormatOf: C-003571: 単語音声データベースVol.2(地名編)
C-003576: 単語音声データベースVol.3(人名編) (通信端末版)
同社シリーズ「単語音声データベース」を基に、擬似頭（HATS）と環境騒音データベースを使用した音響的シミュレーションによる音声データベース。本巻Vol.3では日本人200名が人口比第1位から第256位までの姓・256単語を発声。
- isPartOf: C-003574: 単語音声データベース (通信端末版)
- hasVersion: C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
- hasVersion: C-003577: 単語音声データベースVol.4(商取引用語編) (通信端末版)
- isFormatOf: C-003572: 単語音声データベースVol.3(人名編)
C-003577: 単語音声データベースVol.4(商取引用語編) (通信端末版)
同社シリーズ「単語音声データベース」を基に、擬似頭（HATS）と環境騒音データベースを使用した音響的シミュレーションによる音声データベース。本巻Vol.4では日本人200名が数字、単位、バンキング・交通関連用語など347種を発声。
- isPartOf: C-003574: 単語音声データベース (通信端末版)
- hasVersion: C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
- hasVersion: C-003576: 単語音声データベースVol.3(人名編) (通信端末版)
- isFormatOf: C-003573: 単語音声データベースVol.4(商取引用語編)
C-003578: TUNA Corpus
Multimodal/Multimedia Resources
TUNA (Towards a UNified Algorithm for the generation of referring expressions) is a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC).

The TUNA Corpus of Referring Expressions is built with the contributions from 50 native or fluent speakers of English and it contains about 2000 descriptions (referring expressions).

Participants described objects (targets) in visual domains by typing and submitting referring expressions that distingued them from other objects that were shown simultaneously (distractors): Each experimental trial consisted of one (singular) or two (plural) targets, plus six distractors. Each description is richly annotated with semantic information, including information about all the other objects that the human authors saw.

The TUNA Corpus was annotated with the main objective to evaluate the output of algorithms for the Generation of Referring Expressions (GRE) with Natural Language Generation, with particular regard for the semantic content of the expressions.
- isReferencedBy: van Deemter, K., van der Sluis, I. & Gatt, A. (2006). "Building a semantically transparent corpus for the generation of referring expressions". Proceedings of the 4th International Conference on Natural Language Generation (Special Session on Data Sharing and Evaluation), INLG-06.(http://www.csd.abdn.ac.uk/research/tuna/corpus/pub/greDataSharingFinal.pdf)
- isReferencedBy: "The TUNA final project report" (May 2007)(http://www.csd.abdn.ac.uk/research/tuna/pubs/TUNA-final-report.pdf)
C-003579: North American News Text, Complete
*Introduction*

North American News Text, Complete, Linguistic Data Consortium (LDC) catalog number LDC2008T15 and isbn 1-58563-483-2, is a collection of English news text from the Los Angeles Times, Washington Post, New York Times, Reuters and the Wall Street Journal. This corpus was originally released in 1995 as the North American News Text Corpus (LDC95T21) and is reissued to complement the release of the Brown Laboratory for Linguistic Information Processing (BLLIP) North American News Text sets (LDC2008T13, LDC2008T14), which consist of Penn Treebank-style parsing of that news text.

North American News Text is reissued in two versions: North American News Text, Complete LDC2008T15, the members-only original version, now available as a 2008 Membership Year corpus; and North American News Text, General Release LDC2008T16 (which does not include text from the Wall Journal Street Journal), available to nonmembers for the first time. The directory structure of each of these publications has been restructured to be identical to the directory structure of the BLLIP releases.

*Data*

The table below contains a breakdown of the sources, epochs and word counts for the data in the North American News Text releases:

Source Dates # Words (millions) Los Angeles Times & Washington Post May, 1994 - August 1997 52 New York Times News & Syndicate July, 1994 - December 1996 173 Reuters News Service (General and Finanical) April, 1994 - December 1996 85 Wall Street Journal (not in General Release) July, 1994 - December 1996 40 The New York Times and the Los Angeles Times/Washington Post services include a range of other newspaper sources in their syndicated newswires. The Los Angeles Times/Washington Post material in this corpus includes some news text from the following sources:

* Newsday
* The Baltimore Sun
* The Hartford Courant
The New York Times material in this corpus contains some data from the following sources, although New York Times articles predominate:

* Bloomberg Business News
* The Boston Globe
* Los Angeles Daily News
* Fort Worth Star-Telegram
* Newsweek
* Cox News Service
* The Arizona Republic
* Seattle Post-Intelligencer
* San Francisco Examiner
* Houston Chronicle
* San Francisco Chronicle
* Economist Newspaper Ltd.
* Hearst Newspapers
The text content of each data file (following uncompression with the GNU-unzip utility) consists of plain ASCII character data with SGML tags to indicate article boundaries and organization of information within each article.

There are differences among the five primary newswire sources in terms of the number and types of SGML tags used in the text, but the following tag structure is common to all data sets:

-- start of a new article ... -- some variety of "header" tags appears here -- start of the text content of the article

-- all paragraph boundaries are marked by this tag ... -- text data as it is provided by the newswire service

-- end of text content of the article ... -- some variety of "trailer" tags appears here -- end of article In general, the differences in format among the various newswire sources will be found in the SGML tags that appear between and , and those that appear between and . The actual text content of articles (the region between and ) is consistent in format across sources, except for some uses of the SGML "&..;" notation to represent special characters in the data. For example, "&MD;" is used in the "latwp" material to represent the "em-dash" character, which is typically used to separate the "dateline" from the opening sentence in the first paragraph of each article. There may also be differences in how quotation marks are rendered.

As this re-release is intended to complement the BLLIP North American News Text releases, the directory structure of this corpus is identical to that of the BLLIP publications.

*Pricing*

The Reduced Licensing Fee for this corpus is US$200.
- replaces: C-001073: North American News Text Corpus
- hasPart: C-003580: North American News Text, General Release
- references: C-003581: BLLIP North American News Text, Complete
- isReferencedBy: Dave Graff, 2008, North American News Text, Complete, Linguistic Data Consortium, Philadelphia
C-003580: North American News Text, General Release
*Introduction*

North American News Text, General Release, Linguistic Data Consortium (LDC) catalog number LDC2008T16 and isbn 1-58563-484-0, is a collection of English news text from the Los Angeles Times, Washington Post, New York Times and Reuters. This data is a subset of the data contained in the North American News Text Corpus (LDC95T21) released in 1995 and is reissued to complement the release of the Brown Laboratory for Linguistic Information Processing (BLLIP) North American News Text sets (LDC2008T13, LDC2008T14), which consist of Penn Treebank-style parsing of the North American News Text Corpus text.

North American News Text is reissued in two versions: North American News Text, Complete, LDC2008T15, the members-only original version, now available as a 2008 Membership Year corpus; and North American News Text, General Release LDC2008T16 (which does not include text from the Wall Street Journal), available to nonmembers for the first time. The directory structure of each of these publications has been restructured to be identical to the directory structure of the BLLIP releases.

*Data*

The table below contains a breakdown of the sources, epochs and word counts for the data in the North American News Text releases:

Source Dates # Words (millions) Los Angeles Times & Washington Post May 1994 - August 1997 52 New York Times News & Syndicate July 1994 - December 1996 173 Reuters News Service (General and Finanical) April 1994 - December 1996 85 Wall Street Journal (not in General Release) July 1994 - December 1996 40 The New York Times and the Los Angeles Times/Washington Post services include a range of other newspaper sources in their syndicated newswires. The Los Angeles Times/Washington Post material in this corpus includes some news text from the following sources:

* Newsday
* The Baltimore Sun
* The Hartford Courant
The New York Times material in this corpus contains some data from the following sources, although New York Times articles predominate:

* Bloomberg Business News
* The Boston Globe
* Los Angeles Daily News
* Fort Worth Star-Telegram
* Newsweek
* Cox News Service
* The Arizona Republic
* Seattle Post-Intelligencer
* San Francisco Examiner
* Houston Chronicle
* San Francisco Chronicle
* Economist Newspaper Ltd.
* Hearst Newspapers
The text content of each data file (following uncompression with the GNU-unzip utility) consists of plain ASCII character data with SGML tags to indicate article boundaries and organization of information within each article.

There are differences among the five primary newswire sources in terms of the number and types of SGML tags used in the text, but the following tag structure is common to all data sets:

-- start of a new article ... -- some variety of "header" tags appears here -- start of the text content of the article

-- all paragraph boundaries are marked by this tag ... -- text data as it is provided by the newswire service

-- end of text content of the article ... -- some variety of "trailer" tags appears here -- end of article In general, the differences in format among the various newswire sources will be found in the SGML tags that appear between and , and those that appear between and . The actual text content of articles (the region between and ) is consistent in format across sources, except for some uses of the SGML "&..;" notation to represent special characters in the data. For example, "&MD;" is used in the "latwp" material to represent the "em-dash" character, which is typically used to separate the "dateline" from the opening sentence in the first paragraph of each article. There may also be differences in how quotation marks are rendered.

As this re-release is intended to complement the BLLIP North American News Text releases, the directory structure of this corpus is identical to that of the BLLIP publications.
- isPartOf: C-003579: North American News Text, Complete
- isReferencedBy: Dave Graff, 2008, North American News Text, General Release, Linguistic Data Consortium, Philadelphia
- references: C-003582: BLLIP North American News Text, General Release

SHACHI - Language Resource Metadata Database