言語資源検索 - SHACHI: Language Resource Metadata Database

言語資源の登録件数: 3330件 2023 件中 331 - 340 件目

C-000647: CALLHOME American English Speech
*Introduction*

CALLHOME American English Speech was developed by the Linguistic Data Consortium (LDC) and consists of 120 unscripted 30-minute telephone conversations between native speakers of English.

All calls originated in North America; 90 of the 120 calls were placed to various locations outisde of North America, while the remaining 30 calls were made within North America. Most participants called family members or close friends.

*Data*

This corpus contains speech data files with documentation describing their contents and format along with the software packages needed to uncompress the speech data. Corresponding transcripts and documentation (LDC97T14) are available separately, as is an associated lexicon (LDC97L20).

*Updates*

The "shorten" and "sphere" directories have been removed.

The sphere directory contained NIST "SPeech HEader REsources" (SPHERE): C-language source code libraries and utilities for manipulating NIST SPHERE-format waveform files.

The shorten directory contained files for the "shorten" software for speech compression.

A more recent version of SPHERE utilities is available on the NIST web site; additional utilities for converting SPHERE files are also available from LDC's web site.
- hasFormat: C-000648: CALLHOME American English Transcripts
- isReferencedBy: G-000646: CALLHOME American English Lexicon (PRONLEX)
- hasVersion: C-000650: CALLHOME Egyptian Arabic Speech
- hasVersion: C-000654: CALLHOME German Speech
- hasVersion: C-000657: CALLHOME Japanese Speech
- hasVersion: C-000660: CALLHOME Mandarin Chinese Speech
- hasVersion: C-000664: CALLHOME Spanish Speech
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_American_English__Speech" (http://www.ldc.upenn.edu/Catalog/docs/LDC97S42/)
- isReferencedBy: Alexandra Canavan, David Graff, and George Zipperlen 1997 CALLHOME American English Speech Linguistic Data Consortium, Philadelphia
C-000648: CALLHOME American English Transcripts
*Introduction*

The text component of the CALLHOME English package includes transcripts and documentation files for 120 unscripted telephone conversations between native speakers of English; a separate catalog entry, (LDC97S42) provides the speech data for these conversations, which are partitioned into separate subdirectories for "training" (80 conversations), "development test set" (20 conversations) and "evalutation test set" (20 conversations).

*Data*

The transcripts cover a contiguous ten minute segment of each call in the training and development test sets, and a five minute segment of each call in the evaluation set, yielding a total of 18.3 hours of transcribed spontaneous speech, comprising about 230,000 words. The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography.

In addition to transcript files, this corpus contains full documentation on the transcription conventions and format. Complete auditing information on the speakers represented in the transcripts (including gender, channel quality and so on) is also included.

This corpus is distributed throughout the LDC's FTP server.

The corpus of telephone speech (LDC97S42) is available separately, as well as an associated lexicon (LDC97L20).

*Updates*

There are no updates at this time.
- isFormatOf: C-000647: CALLHOME American English Speech
- isReferencedBy: G-000646: CALLHOME American English Lexicon (PRONLEX)
- hasVersion: C-000652: CALLHOME Egyptian Arabic Transcripts
- hasVersion: C-000655: CALLHOME German Transcripts
- hasVersion: C-000658: CALLHOME Japanese Transcripts
- hasVersion: C-000661: CALLHOME Mandarin Chinese Transcripts
- hasVersion: C-000665: CALLHOME Spanish Transcripts
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_American_English_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC97T14/)
- isReferencedBy: C-003405: American National Corpus (ANC) Second Release
- isReferencedBy: C-000452: American National Corpus
C-000649: CALLHOME Egyptian Arabic Speech Supplement
*Introduction*

The CALLHOME Egyptian Arabic Speech Supplement was produced by Linguistic Data Consortium (LDC), catalog number LDC2002S37 and ISBN 1-58563-243-0.

This publication contains 20 CALLHOME Egyptian Arabic telephone conversations. The corresponding transcripts are published as CALLHOME Egyptian Arabic Transcripts Supplement, LDC catalog number LDC2002T38. These conversations had originally been held in reserve for future NIST HUB5 Non-English evaluations, but are being "re-tasked" to provide additional material for general use.

*Data*

There are 20 data files in sphere format. The files are 8 KHz shorten-compressed two-channel mulaw. 12 of the files were recorded from domestic phone calls (both parties living in the continental U.S.), while the other eight are overseas calls (a participant in the U.S. called a friend or relative in Egypt or some other overseas country).

There is a total of 273,681,144 bytes (261 Mbytes) or eight hours of audio data.

*Updates*

There are no updates at this time.
- hasFormat: C-000651: CALLHOME Egyptian Arabic Transcripts Supplement
- isVersionOf: C-000650: CALLHOME Egyptian Arabic Speech
- isReferencedBy: Online documentation: http://www.ldc.upenn.edu/Catalog/docs/LDC2002S37/ (the same documents as those for "CALLHOME Egyptian Arabic Speech")
- isReferencedBy: 2002 CALLHOME Egyptian Arabic Speech Supplement Linguistic Data Consortium, Philadelphia
C-000650: CALLHOME Egyptian Arabic Speech
*Introduction*

The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary represents is Cairene Arabic.

*Data*

All calls, which lasted up to 30 minutes, originated in North America and were placed to locations overseas (typically Egypt). Most participants called family members or close friends.

This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (LDC97T19) are available separately, as is an associated lexicon (LDC99L22).

*Updates*

The "shorten" and "sphere" directories have been removed.

The sphere directory contained NIST "SPeech HEader REsources" (SPHERE): C-language source code libraries and utilities for manipulating NIST SPHERE-format waveform files.

The shorten directory contained files for Tony Robinson's "shorten" software for speech compression.

A more recent version of the SPHERE utilities is now available on the NIST web site; additional utilities for converting from SPHERE to other waveform file formats is also available at the LDC web site.
- hasFormat: C-000652: CALLHOME Egyptian Arabic Transcripts
- hasVersion: C-000649: CALLHOME Egyptian Arabic Speech Supplement
- isReferencedBy: CALLHOME Egyptian Arabic Lexicon
- hasVersion: C-000647: CALLHOME American English Speech
- hasVersion: C-000654: CALLHOME German Speech
- hasVersion: C-000657: CALLHOME Japanese Speech
- hasVersion: C-000660: CALLHOME Mandarin Chinese Speech
- hasVersion: C-000664: CALLHOME Spanish Speech
- isReferencedBy: C-000567: 1997 HUB5 Arabic Evaluation
- isReferencedBy: C-003109: 2003 NIST Rich Transcription Evaluation Data
- isReferencedBy: Online documentation: http://www.ldc.upenn.edu/Catalog/docs/LDC97S45/
- isReferencedBy: "Documentation for CALLHOME_Egyptian_Arabic_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC97T19/index.html)
- isReferencedBy: Alexandra Canavan, George Zipperlen, and David Graff 1997 CALLHOME Egyptian Arabic Speech Linguistic Data Consortium, Philadelphia
C-000651: CALLHOME Egyptian Arabic Transcripts Supplement
*Introduction*

The CALLHOME Egyptian Arabic Transcripts Supplement corpus was produced by Linguistic Data Consortium (LDC), catalog number LDC2002T38 and ISBN 1-58563-244-9.

This publication contains transcripts for 20 CALLHOME Egyptian Arabic telephone conversations. These 20 conversations are published as CALLHOME Egyptian Arabic Speech Supplement LDC2002S37. These conversations had originally been held in reserve for future NIST HUB5 Non-English evaluations, but are being "re-tasked" to provide additional material for general use.

*Data*

There are 40 data files. Each of the 20 calls has transcripts in two formats: .txt and .scr.

The .txt files are transcript files containing the Romanized orthographic forms that were used in the original transcription process. These forms also serve as the head-words in the associated Egyptian Colloquial Lexicon LDC99L22.

The .scr files are transcript files rendered in Arabic script orthography, using the ISO 8859-6 character set; these files were derived from the .txt files by replacing each word token with its Arabic script counterpart (which is also provided in the Egyptian Colloquial Arabic Lexicon). These files have been formatted to avoid problems of bi-directional text: line-feed characters are used to separate ASCII content from Arabic script content in each utterance.

Please follow these links for sample transcripts: txt | scr

*Updates*

There are no updates at this time.
- isFormatOf: C-000649: CALLHOME Egyptian Arabic Speech Supplement
- isReferencedBy: Egyptian Colloquial Arabic Lexicon (CALLHOME Arabic Lexicon)
- isVersionOf: C-000652: CALLHOME Egyptian Arabic Transcripts
- isReferencedBy: Online documentation: http://www.ldc.upenn.edu/Catalog/docs/LDC2002T38/ (the same documents as those for "CALLHOME Egyptian Arabic Transcripts")
- isReferencedBy: LDC 2002 CALLHOME Egyptian Arabic Transcripts Supplement Linguistic Data Consortium, Philadelphia
C-000652: CALLHOME Egyptian Arabic Transcripts
*Introduction*

The text component of the CALLHOME Egyptian Arabic package includes transcripts and documentation files. The transcripts cover a contiguous five or ten minute segment taken from 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary represents is Cairene Arabic.

*Data*

The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography.

In addition to transcript files, this corpus contains full documentation on the transcription conventions and format. Complete auditing information on the speakers represented in the transcripts (including gender, channel quality and so on) is also included.

For a sample file, please click here.

The corpus of telephone speech (LDC97S45) is available separately, as is an associated lexicon (LDC99L22).

*Updates*

There are no updates at this time.
- isFormatOf: C-000650: CALLHOME Egyptian Arabic Speech
- isReferencedBy: CALLHOME Egyptian Arabic Lexicon
- hasVersion: C-000665: CALLHOME Spanish Transcripts
- hasVersion: C-000648: CALLHOME American English Transcripts
- hasVersion: C-000655: CALLHOME German Transcripts
- hasVersion: C-000658: CALLHOME Japanese Transcripts
- hasVersion: C-000661: CALLHOME Mandarin Chinese Transcripts
- hasVersion: C-000651: CALLHOME Egyptian Arabic Transcripts Supplement
- isReferencedBy: C-000568: 1997 HUB5 Arabic Transcripts
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_Egyptian_Arabic_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC97T19/)
- isReferencedBy: Hassan Gadalla, et al. 1997 CALLHOME Egyptian Arabic Transcripts Linguistic Data Consortium, Philadelphia
C-000654: CALLHOME German Speech
*Introduction*

The CALLHOME German corpus of telephone speech consists of 100 unscripted telephone conversations between native speakers of German.

*Data*

All calls originated in North America and were placed to locations overseas (typically Europe). Most participants called family members or close friends.

This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (LDC97T15) are available separately, as is an associated lexicon (LDC97L18).

*Updates*

There are no updates at this time.
- hasFormat: C-000655: CALLHOME German Transcripts
- isReferencedBy: G-000653: CALLHOME German Lexicon
- hasVersion: C-000647: CALLHOME American English Speech
- hasVersion: C-000657: CALLHOME Japanese Speech
- hasVersion: C-000660: CALLHOME Mandarin Chinese Speech
- hasVersion: C-000650: CALLHOME Egyptian Arabic Speech
- hasVersion: C-000664: CALLHOME Spanish Speech
- isReferencedBy: C-000569: 1997 HUB5 German Evaluation
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_German_Speech" (http://www.ldc.upenn.edu/Catalog/docs/LDC97S43/)
- isReferencedBy: Alexandra Canavan, David Graff, and George Zipperlen 1997 CALLHOME German Speech Linguistic Data Consortium, Philadelphia
C-000655: CALLHOME German Transcripts
*Introduction*

The text component of the CALLHOME German corpus package includes transcripts and documentation files. The transcripts cover contiguous five or ten minute segments taken from 100 unscripted telephone conversations between native speakers of German. The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography.

*Data*

In addition to transcript files, this corpus contains full documentation on the transcription conventions and format. Complete auditing information on the speakers represented in the transcripts (including gender, channel quality and so on) is also included.

This corpus is distributed throughout the LDC's FTP server.

The corpus of telephone speech (LDC97S43) is available separately, as well as an associated lexicon (LDC97L18).

For a list of updates, user reports, and other addenda, please go to LDC1997T15.

*Updates*

There are no updates at this time.
- isFormatOf: C-000654: CALLHOME German Speech
- isReferencedBy: G-000653: CALLHOME German Lexicon
- hasVersion: C-000648: CALLHOME American English Transcripts
- hasVersion: C-000652: CALLHOME Egyptian Arabic Transcripts
- hasVersion: C-000658: CALLHOME Japanese Transcripts
- hasVersion: C-000661: CALLHOME Mandarin Chinese Transcripts
- hasVersion: C-000665: CALLHOME Spanish Transcripts
- isReferencedBy: C-000570: 1997 HUB5 German Transcripts
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_German_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC97T15/)
- isReferencedBy: Krisjanis Karins, et al. 1997 CALLHOME German Transcripts Linguistic Data Consortium, Philadelphia
C-000657: CALLHOME Japanese Speech
*Introduction*

The CALLHOME Japanese corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Japanese.

All calls, which lasted up to 30 minutes, originated in North America and were placed to locations overseas (typically Japan). Most participants called family members or close friends.

*Data*

This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (LDC96T18) are available separately, as is an associated lexicon and transducer (LDC96L17).

*Updates*

There are no updates at this time.
- hasFormat: C-000658: CALLHOME Japanese Transcripts
- isReferencedBy: G-000656: CALLHOME Japanese Lexicon
- hasVersion: C-000647: CALLHOME American English Speech
- hasVersion: C-000650: CALLHOME Egyptian Arabic Speech
- hasVersion: C-000654: CALLHOME German Speech
- hasVersion: C-000660: CALLHOME Mandarin Chinese Speech
- hasVersion: C-000664: CALLHOME Spanish Speech
- isReferencedBy: (Online documentation) "Documentation for CALLHOME_Japanese_Speech" (http://www.ldc.upenn.edu/Catalog/docs/LDC96S37/)
- isReferencedBy: (Online documentation 2) "Documentation for CALLHOME_Japanese_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC96T18/index.html)
- isReferencedBy: Alexandra Canavan and George Zipperlen 1996 CALLHOME Japanese Speech Linguistic Data Consortium, Philadelphia
C-000658: CALLHOME Japanese Transcripts
*Introduction*

The text component of the CALLHOME Japanese package includes transcripts and documentation files.

*Data*

The transcripts cover a contiguous five or ten-minute segment taken from 120 unscripted telephone conversations between native speakers of Japanese. The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography.

In addition to transcript files, this corpus contains full documentation on the transcription conventions and format. Auditing and demographic information on the speakers represented in the transcripts (including gender, channel quality and so on) are also included.

This corpus is distributed throughout the LDC's FTP server.

The corpus of telephone speech (LDC96S37) are available seperately, as is an associated lexicon and transducer (LDC96L17).

*Updates*

There are no updates at this time.
- isFormatOf: C-000657: CALLHOME Japanese Speech
- isReferencedBy: G-000656: CALLHOME Japanese Lexicon
- hasVersion: C-000648: CALLHOME American English Transcripts
- hasVersion: C-000652: CALLHOME Egyptian Arabic Transcripts
- hasVersion: C-000655: CALLHOME German Transcripts
- hasVersion: C-000661: CALLHOME Mandarin Chinese Transcripts
- hasVersion: C-000665: CALLHOME Spanish Transcripts
- isReferencedBy: (Online documentation ) "Documentation for CALLHOME_Japanese_Transcripts" (http://www.ldc.upenn.edu/Catalog/docs/LDC96T18/index.html)
- isReferencedBy: Barbara Wheatley, Masayo Kaneko, and Megumi Kobayashi 1996 CALLHOME Japanese Transcripts Linguistic Data Consortium, Philadelphia

SHACHI - Language Resource Metadata Database