言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 811 - 820 件目

C-001412: FORM1 Kinematic Gesture
*Introduction*

FORM1 Kinematic Gesture was produced by Linguistic Data Consortium (LDC) catalog number LDC2004V01 and ISBN 1-58563-299-6.

FORM is a gesture annotation scheme designed to capture the kinematic information in gesture from videos of speakers. This publication is a detailed database of gesture-annotated videos stored in the Anvil and FORM file formats. FORM encodes the "phonetics" of gesture by giving geometric descriptions of location and movement of the right and left arms. Other kinematic information such as effort and shape are also recorded.

FORM gesture data has applications in statistical natural language processsing, gesture recognition and generation, information extraction from video, and human-computer interaction.

Please go to the FORM website for more information. The FORM2 publication was released in 2003 by the LDC and encoded much of the same data provided here using a more recent tag set.

*Data*

This publication contains gesture annotations created using the FORM 1.0 tag set. The Anvil annotation files used in their creation are also included, as are 29.5 minutes of the original audio and video recordings excerpted from a lecture given by Brian MacWhinney on January 24, 2000 at Carnegie Mellon University. A second data set, with 5.5 minutes of Paul Howard telling a story in conversation while being motion captured, is also supplied. These video recordings were chosen because they are part of the NSF-funded TalkBank project.

There are a total of 69 data files: 21 movie (.mov) files, 24 Anvil (.anvil) files, and 24 FORM (.form1) files.

The movie files are in Quicktime format with the following specs:

Size 360 x 240 pixels Compression H.261 Video rate 29.97 fps Audio rate 48 kHz Audio format 8-bit/16-bit stereo Anvil files can be opened using the Anvil video annotation tool, which is freely available from Michael Kipp. The .form file format is an intermediate data format that contains only the FORM2 values from each .anvil in a comma-delimited, frame-by-frame listing of the following form: frame,upper_arm_lift,forearm_orientation,handshape,wrist_up_down,wrist_side_side,effort,tension

*Sponsorship*

This research was conducted using funding from the following grant sources: ISLE - 9910603 NSF: TalkBank (via subcontract from Carnegie Mellon University) - BCS-998009 and BCS-9978056 NSF: Discourse and Gesture w/ Joshi, Liberman, and Martell - EIA98-09209

*Updates*

There are no updates available at this time.
- references: Craig Martell, et al. 2004 FORM1 Kinematic Gesture Linguistic Data Consortium, Philadelphia
C-001413: FORM2 Kinematic Gesture
*Introduction*

FORM2 Kinematic Gesture was produced by the Linguistic Data Consortium.

FORM is a gesture annotation scheme designed to capture the kinematic information in gesture from videos of speakers. This publication is a detailed database of gesture-annotated videos stored in the Anvil and FORM file formats. FORM encodes the "phonetics" of gesture by giving geometric descriptions of location and movement of the right and left arms. Other kinematic information such as effort and shape are also recorded.

*Data*

There are a total of 24 data files: eight movie files, eight Anvil files, and eight Form files.

The movie files represent 12 minutes of audio and video recordings excerpted from a lecture given by Brian MacWhinney on January 24, 2000 at Carnegie Mellon University. These video recordings were chosen because they are part of the NSF-funded Talkbank project.

The video format is as follows:

Size
360 x 240 pixels

Compression
H.261

Data rate
696 K/sec

Video rate
29.82 fps

Audio rate
48.000 kHz

Audio format
8-bit stereo

The gesture annotations were created using the FORM 2.0 tag set. The Anvil annotation files used in their creation, augmented with FORM 1.0 data, are also included. (FORM1 data will be the subject of a separate publication to be released in the near future). FORM1 values that are not included in the FORM2 spec are not included in the publication. A full description of the FORM tag set with explanations of each value can be found in the documentation.

*Sponsorship*

This research was conducted using funding from the following grant sources: ISLE - 9910603 NSF: TalkBank (via subcontract from Carnegie Mellon University) - BCS-998009 and BCS-9978056 NSF: Discourse and Gesture - EIA98-09209

*Updates*

There are no updates available at this time.

*Note*

The cost of the first 50 copies of this publication (not counting the copies distributed to LDC members) is covered by the sponsoring grants, and therefore free of charge to qualified researchers; a $30 shipping and handling fee applies. After these first 50 copies are distributed, additional copies will be available for the production cost of $500.
- references: Craig Martell, et al. 2003 FORM2 Kinematic Gesture Linguistic Data Consortium, Philadelphia
C-001416: Fisher English Training Part 2, Speech
*Introduction*

Fisher English Training Part 2 Speech represents the second half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,849 audio files, each one containing a full conversation of up to ten minutes. Additional information regarding the speakers involved, and types of telephones used, can be found in the companion text corpus of transcripts, Fisher English Training Part 2, Transcripts (LDC2005T19).

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing the formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so; however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

The first half of the collection (Fisher English Training Speech Part 1) was released by the LDC in 2004 (LDC2004S13 for speech data, LDC2004T19 for transcripts). Taken as a whole, the two parts comprise 11,699 recorded telephone conversations.

The individual audio files are presented in NIST SPHERE format, and contain two-channel mu-law sample data; "shorten" compression has been applied to all files.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

*Samples*

For an example of this corpus, please examine this audio sample.
- references: Christopher Cieri, et al. 2005 Fisher English Training Part 2, Speech Linguistic Data Consortium, Philadelphia
C-001417: Fisher English Training Part 2, Transcripts
*Introduction*

Fisher English Training Part 2 Transcripts represents the second half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It consists of transcripts for the speech contained in Fisher English Training Part 2, Speech (LDC2005S13).

The Fisher telephone conversation collection protocol was created at the LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems.
Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a large number of participants each calls an other participant, whom they typically do not know, for a short short period of time to discuss the assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

The first half of the collection (Fisher English Training Speech,Part 1) was released by the LDC in 2004 (LDC2004S13 for speech data,LDC2004T19 for transcripts). Taken as a whole, the two parts comprise11,699 recorded telephone conversations.

The individual audio files are presented in NIST SPHERE format, and contain two-channel mu-law sample data shorten compression has been applied to all files.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

*Samples*

To see an example of this corpus, please examine this sample.
- references: Christopher Cieri, et al. 2005 Fisher English Training Part 2, Transcripts Linguistic Data Consortium, Philadelphia
C-001418: Fisher English Training Speech Part 1 Speech
*Introduction*

Fisher English Training Speech Part 1 Speech represents the first half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,850 audio files, each one containing a full conversation of up to 10 minutes. Additional information regarding the speakers involved and types of telephones used can be found in the companion text corpus of transcripts, Fisher English Training Speech Part 1, Transcripts (LDC2004T19).

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so; however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

The individual audio files are presented in NIST SPHERE format, and contain two-channel mu-law sample data; "shorten" compression has been applied to all files.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

*Samples*

Please examine this sample to see an example of the data in this corpus.
- references: Christopher Cieri , et al. 2004 Fisher English Training Speech Part 1 Speech Linguistic Data Consortium, Philadelphia
C-001419: Fisher English Training Speech Part 1 Transcripts
*Introduction*

Fisher English Training Speech Part 1 Transcripts represents the first half of a collection of conversational telephone speech (CTS) that was created at LDC in 2003. It contains transcript data for 5,850 complete conversations, each lasting up to 10 minutes. In addition to the transcriptions, which are found under the trans directory, there is a complete set of tables describing the speakers, the properties of the telephone calls, and the set of topics that were used to initiate the conversations. The corresponding speech files are contained in Fisher English Training Speech Part 1 Speech (LDC2004S13).

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a large number of participants each calls an other participant, whom they typically do not know, for a short short period of time to discuss the assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so, however, the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak about an assigned topic chosen from a randomly generated list that changes every 24 hours. All participants that day will be assigned subjects from that list. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

Overall, about 12% of the conversations were transcribed at LDC, and the rest were transcribed by BBN and WordWave using a significantly different approach to the task. A central goal in both sets was to maximize the speed and economy of the transcription process. This in turn involved certain aspects of mark-up detail and quality control that may have been common in previous, smaller corpora.

The LDC transcripts were based on automatic segmentation of the audio data, to identify the utterance end-points on both channels of each conversation. Given these time stamps, manual transcription was simply a matter of typing in the words for each segment and doing a rudimentary spell-check. No attempt was made to modify the segmentation boundaries manually, or to locate utterances that the segmenter might have missed. Portions of speech where the transcriber could not be sure exactly what was said were marked with double parentheses -- (( ... )) -- and the transcriber could hazard a guess as to what was said, or leave the region between parentheses blank. The LDC transcription process yields one plain-text transcript file per conversation, in which the first two lines show the call-ID and the fact that the transcript was developed at LDC. The remainder of the file contains one utterance per line (with blank lines separating the utterances), with the start-time, end-time, speaker/channel-ID and utterance text.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

*Samples*

Please examine this sample to see an example of the data in this corpus.
- references: Christopher Cieri, et al. 2004 Fisher English Training Speech Part 1 Transcripts Linguistic Data Consortium, Philadelphia
C-001420: Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
*Introduction*

Levantine Arabic is spoken along the western Mediterranean coast from Anatolia to the Sinai Peninsula and encompasses the local dialects of Lebanon, Syria and Palestine. There are two distinct varieties: Northern, centered around Syria and Lebanon and Southern, spoken in Jordan and Palestine. Northern Levantine Arabic speakers include approximately 8.8 million speakers in Syria and 6 million speakers in Lebanon. Southern Levantine Arabic speakers include approximately 3.5 million speakers in Jordan, 1.6 million speakers in Palestine and nearly one million speakers in Israel.

Fisher Levantine Arabic Conversational Telephone Speech, Transcripts contains transcripts for 279 telephone conversations. The majority of the speakers are from Jordan, Lebanon and Palestine. The corresponding telephone speech is contained in Fisher Levantine Arabic Conversational Telephone Speech.

Speaker Distribution by Region

Jordan
60%

Palestine
15%

Lebanon
15%

Syria
8%

other
2%

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breadth although it also increases formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

The transcripts were created with green and yellow layers using LDC's Multi-Dialectal Transcription Tool (AMADAT). The green layer seeks to anchor dialectal forms to similar or related Modern Standard Arabic orothgraphy-based forms. The yellow layer is a more careful and detailed transcription that adds functionally necessary vowels and marks important sociolinguistic variations and morphophonemic features.

The green-layer transcripts in this corpus are a subset of the transcripts contained in Levantine Arabic QT Training Data Set 5, Transcripts, LDC2006T07. The yellow-layer transcription was added in this release.

*Samples*

For an example of the text contained in this corpus, please view this image of the transcriptions (jpeg format).
- references: Mohamed Maamouri (Project head), et al. 2007 Fisher Levantine Arabic Conversational Telephone Speech, Transcripts Linguistic Data Consortium, Philadelphia
C-001421: Fisher Levantine Arabic Conversational Telephone Speech
*Introduction*

Levantine Arabic is spoken along the western Mediterranean coast from Anatolia to the Sinai Peninsula and encompasses the local dialects of Lebanon, Syria and Palestine. There are two distinct varieties: Northern, centered around Syria and Lebanon; and Southern, spoken in Jordan and Palestine. Northern Levantine Arabic speakers include approximately 8.8 million speakers in Syria and 6 million speakers in Lebanon. Southern Levantine Arabic speakers include approximately 3.5 million speakers in Jordan, 1.6 million speakers in Palestine and nearly one million speakers in Israel.

Fisher Levantine Arabic Conversational Telephone Speech contains 279 telephone conversations totaling 45 hours of speech. The majority of the speakers are from Jordan, Lebanon and Palestine.

Speaker Distribution by Region Jordan 60% Palestine 15% Lebanon 15% Syria 8% other 2% The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breadth although it also increases formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so; however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

*Data*

The conversations in this corpus are a subset of the conversations in Levantine Arabic QT Training Data Set 5, Speech, LDC2006S29. The individual audio files are in NIST Sphere format. The corresponding transcripts may be found in Fisher Levantine Arabic Conversational Telephone Speech, Transcripts, LDC2007T04.

*Samples*

For an example of the speech data in this corpus, please listen to this audio sample in wav format.
- references: Mohamed Maamouri (Project head), et al. 2007 Fisher Levantine Arabic Conversational Telephone Speech Linguistic Data Consortium, Philadelphia
C-001422: French Gigaword First Edition
French Gigaword First Edition is a comprehensive archive of newswire text data that has been acquired over several years by the Linguistic Data Consortium (LDC) at the University of Pennsylvania.

The two distinct international sources of French newswire in this edition, and the time spans of collection covered for each, are as follows:

* Agence France-Presse (afp_fre) May 1994 - July 2006
* Associated Press French Service (apw_fre) Nov 1994 - July 2006
The seven-letter codes in parentheses include the three-character source name abbreviations and the three-character language code ("fre") separated by an underscore ("_") character. The three-letter language code conforms to LDC's new internal convention based on the ISO 639-3 standard.

The overall totals for each source are summarized below. Note that the "Totl-MB" numbers show the amount of data you get when the files are uncompressed (i.e. approximately 15 gigabytes, total); the "Gzip-MB" column shows totals for compressed file sizes as stored on the DVD-ROM; the "K-wrds" numbers are simply the number of whitespace-separated tokens (of all types) after all SGML tags are eliminated.

Source #Files Gzip-MB Totl-MB K-wrds #DOCs AFP_FRE 147 1139 3445 482904 1797139 APW_FRE 141 389 1167 167405 622740 TOTAL 288 1528 4612 650309 2419879 The following tables present "Text-MB", "K-wrds" and "#DOCS" broken down by source and DOC type; "Text-MB" represents the total number of characters (including whitespace) after SGML tags are eliminated.

Source Text-MB K-wrds #DOCs type="advis": AFP_FRE 79 10924 47044 APW_FRE 8 1381 6291 TOTAL 87 12305 53335 type="multi": AFP_FRE 40 5964 6828 >APW_FRE 118 18527 29797 TOTAL 158 24491 36625 type="other": AFP_FRE 169 23723 155571 APW_FRE 72 11006 68429 TOTAL 241 34729 224000 type="story": AFP_FRE 2848 442284 1587696 APW_FRE 866 136481 518223 TOTAL 3715 578765 2105919

*Samples*

For an example of the data in this corpus, please view this image of the text of French Gigaword.
- references: David Graff 2006 French Gigaword First Edition Linguistic Data Consortium, Philadelphia
C-001424: GRONINGEN
Desktop/Microphone
The 4 CD-ROMs contain over 20 hours of speech. It is a corpus of read speech material in Dutch, recorded on PCM tape under fairly good conditions. These 4 CD-ROMs contain speech from 238 speakers who read:
· 2 short texts (the famous North wind text, and a longer text, "de Koning" by Godfried Bomans, with many quoted sentences to elicit `emotional' speech)
· 23 short sentences (containing all possible vowels and all possible consonants and consonant clusters in Dutch),
· 20 numbers (the numbers 0--9 and the tens from 10--100), 16 monosyllabic words (containing all possible vowels in Dutch), and 3 long vowels (a:,/E:, \i:)
Ninety-four of the 238 speakers also read an extended word list. In addition to the speech signal, an electro-glottograph signal has been included on the CD-ROMs.Orthographic transcriptions of the material are included. Resource made available with the financial support of ELSNET.

SHACHI - Language Resource Metadata Database