言語資源の登録件数: 3330件 2023 件中 271 - 280 件目
現在の検索条件
キーワードを入力
検索条件を選択
  • C-000581: 1999 Speaker Recognition Benchmark
    *Introduction*

    The 1999 speaker recognition evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition.

    *Data*

    Technical Objectives of the 1999 speaker recognition evaluation were:

    1. Exploring promising new ideas in speaker recognition 2. Developing advanced technology incorporating these ideas 3. Measuring the performance of this technology

    The evaluation data was drawn from the Switchboard-2 Phase 3 corpus. Both training and test segments were constructed by concatenating consecutive turns for the desired speaker, similar to what was done in 1996. Each segment is stored as a continuous speech signal in a separate SPHERE file. The speech data is stored in 8-bit mulaw format.

    *Updates*

    There are no updates at this time.
  • C-000582: 2000 Communicator Dialogue Act Tagged
    *Introduction*

    2000 Communicator Dialogue Act Tagged was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T15 and ISBN 1-58563-305-4.

    This corpus is an addendum to the 2000 Communicator Evaluation corpus produced by the LDC in 2002. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the logfiles of the 2000 Communicator Evaluation corpus.

    Dialogue Act annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further description of the 2000 Communicator Evaluation corpus, please refer to the main publication from 2002 (LDC2002S56).

    *Data*

    The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 16 MB.

    The total number of dialogues is 648. There are 314,223 words (tokens) and 1,403,985 unique words.

    Each dialogue is segmented into system and user turns. The total number of turns for the entire corpus is 24,728 (13,013 system turns and 11,715 user turns).

    Except for one system, no utterance segmentation was done within the turns in the logfiles. The number of utterances is therefore the same as the number of turns. Utterance segmentation is carried out and reflected as the dialogue act segmentation. The total number of tagged dialogue acts is 22,701 with 61 unique tags. There are a total of 275,938 words in the system utterances and a total of 38,285 words in the user utterances.

    Dialogue Act tagging was done automatically via pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology.

    *Sponsorship*

    This research was conducted using funding from the following grant number and funding agency: DARPA - contract MDA972-99-3-0003.

    *Updates*

    There are no updates available at this time.
  • C-000583: 2000 Communicator Evaluation
    *Introduction*

    2000 Communicator Evaluation was produced by Linguistic Data Consortium (LDC) catalog number LDC2002S56 and ISBN 1-58563-258-9.

    The original goals of the Communicator program were to support the creation of speech-enabled interfaces that scale gracefully across modalities, from speech-only to interfaces that include graphics, maps, pointing and gesture. The original vision of the Communicator systems included the ability of a user, during one 10-minute session, to plan a three-leg trip, with the three flights/legs on three different days, with rental car and hotel in each of the two "away" cities, plus dictating/sending a voice-mail message.

    The actual research that led to the data collections in 2000 and 2001 explored ways to construct better spoken-dialogue sys tems, with which users interact via speech-alone to perform relatively complex tasks such as travel planning. During 2000 and 2001 two large data sets were collected, in which users used the Communicator systems built by the research groups to do travel planning. The researchers improved their systems intensively during the ten months between the two data collections. This distribution consists of all the data from the 2000 collection.

    All the Communicator implementations used a common software architecture, called Galaxy-II, which was designed by a research team at MIT and adapted for Communicator in collaboration with a team at MITRE. The architecture supported detailed logging of the interaction between users a nd the systems.

    *Data*

    Nine sites participated in this project: ATT, BBN, Carnegie Mellon University, IBM, MIT, MITRE, NIST, SRI and University of Colorado at Boulder.

    In 2000, each user called the nine different automated travel-planning systems to make simulated flight reservations. The order in which the users encountered the systems was counterbalanced, for statistical analysis purposes. All aspects of the reservations were simulated in 2000.

    Each user was to make nine calls. The first seven calls had an assigned hypothetical travel task, which the user got via th e web. The last two calls asked the user to make simulated travel reservations for a trip that they might wish to take: they were asked to make travel plans for a vacation or pleasure trip on the eighth call and a business trip paid for by an employer on the ninth call.

    All audio files are in SPHERE format, recorded in 8-bit u-law and pcm, at 8 KHZ. The files consist of the sites' recordings and the NIST recordings. The sites' recordings are utterance level (one channel) while the NIST recordings are a continuous recording of the whole call (both channels: user and system). The two-channel sphere files total ~62 hours of audio (3415 MB), representing ~317K words in transcription. The caller side of the calls have had sample_checksums added to the files headers submitted by the sites.

    *Updates*

    There are no updates available at this time.

    NIST and DARPA have an Interagency Agreement by which funds are transferred to NIST. The funds to support NIST's DARPA Communicator Role were transferred under ARPA Order No. G270.
  • C-000584: 2000 NIST Speaker Recognition Evaluation
    *Introduction*

    This publication contains the 2000 NIST Speaker Recognition Evaluation Corpus, Linguistic Data Consortium (LDC) catalog number LDC2001S97 and ISBN 1-58563-192-2. The 2000 NIST Speaker Recognition Evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end, the evaluation was designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible.

    *Data*

    This publication consists of 10,328 single channel SPHERE files encoded in 8-bit mulaw containing a total of approximately 4.31 Gbytes of data covering 148.9 hours of conversational telephone speech collected by LDC.

    Supporting documentation for this evaluation may be found on the 2000 NIST Speaker Recognition Evaluation website.

    Please note that there was an optional additional corpus in the original Evaluation. If you are interested in this AHUMADA corpus, please contact Javier Ortega-Garcia of the Universidad Politecnica de Madrid. Information on how to contact Dr. Ortega-Garcia is available at 2000 NIST Resources.

    *Updates*

    There are no updates at this time.
  • C-000585: 2001 Communicator Dialogue Act Tagged
    *Introduction*

    2001 Communicator Dialogue Act Tagged was produced by Linguistic Data Consortium (LDC) catalog number LDC2004T16 and ISBN 1-58563-306-2.

    This corpus is an addendum to the 2001 Communicator Evaluation corpus produced by the LDC in 2003. This addendum contains annotations on the transcriptions of the system and user utterances as taken from the corrected logfiles of the 2001 Communicator Evaluation corpus. Corrections were hand-done for missing or misaligned time-stamps on turn/utterance boundaries.

    Dialogue Act Annotations are provided for system utterances in the dialogues. The dialogue act tags follow the DATE (Dialogue Act Tagging for Evaluation) scheme. In addition, both system and user utterances are tagged for named entities. For further description of the 2001 Communicator Evaluation corpus, please refer to the main publication from 2003 (LDC2003S01).

    *Data*

    The complete Dialogue Act annotated corpus is available as a single XML text file totalling approximately 67 MB.

    The total number of dialogues is 1,683. There are 1,151,330 words (tokens) and 5,343,286 unique words.

    Each dialogue is segmented into system and user turns. The total number of turns for the entire corpus is 78,718 (39,419 system turns and 39,299 user turns). Turns were further segmented into utterances in the system logfiles. The total number of utterances is 89,666 (39,417 system utterances and 50,249 user utterances). There are a total of 1,048,311 words in the system utterances and a total of 103,019 words in the user utterances.

    The total number of tagged dialogue acts is 82,277 with 68 unique tags. Dialogue Act tagging was done automatically using pattern matching with human-labeled dialogue utterances used by the nine different participating Communicator Systems. Named entity tagging also followed the same methodology.

    *Sponsorship*

    This research was conducted using funding from the following grant number and funding agency: DARPA contract MDA972-99-3-0003.

    *Updates*

    There are no updates available at this time.
  • C-000586: 2001 Communicator Evaluation
    *Introduction*

    2001 Communicator Evaluation was produced by Linguistic Data Consortium (LDC) catalog number LDC2003S01 and ISBN 1-58563-259-7.

    The original goals of the Communicator program were to support the creation of speech-enabled interfaces that scale gracefully across modalities, from speech-only to interfaces that include graphics, maps, pointing and gesture. The original vision of the Communicator systems included the ability of a user, during one 10-minute session, to plan a three-leg trip, with the three flights/legs on three different days, with rental car and hotel in each of the two "away" cities, plus dictating/sending a voice-mail message.

    The actual research that led to the data collections in 2000 and 2001 explored ways to construct better spoken-dialogue systems, with which users interact via speech-alone to perform relatively complex tasks such as travel planning. During 2000 and 2001 two large data sets were collected, in which users used the Communicator systems built by the research groups to do travel planning. The researchers improved their systems intensively during the ten months between the two data collections. This distribution consists of all the data from the 2001 collection.

    All the Communicator implementations used a common software architecture, called Galaxy-II, which was designed by a research team at MIT and adapted for Communicator in collaboration with a team at MITRE. The architecture supported detailed logging of the interaction between users and the systems.

    For possible updated information about the Communicator project and the data distributions, please visit the NIST website.

    *Data*

    The following sites participated in this project: ATT, BBN, Carnegie Mellon University, IBM, Lucent Bell Labs, MIT, SRI and University of Colorado at Boulder.

    All audio files have been converted into SPHERE format; there are 53394 sphere files, totalling approximately 102 hours of audio. All sphere files are one-channel, 8KHz, but the sample coding and format, while consistent for all files belonging to one site, is not consistent across sites (for example, some sites provided pcm, while others provided u-law data). The documentation included in this distribution is replicated exactly as received from NIST and from the participating sites.

    *Updates*

    There are no updates available at this time.

    NIST and DARPA have an Interagency Agreement by which funds are transferred to NIST. The funds to support NIST's DARPA Communicator Role were transferred under ARPA Order No. G270.
  • C-000587: 2001 HUB5 English Evaluation
    *Introduction*

    2001 HUB5 English Evaluation was developed by the Linguistic Data Consortium and consists of approximately 5 hours of English conversational telephone speech and associated transcripts used in the 2001 HUB5 evaluation sponsored by NIST (National Institute of Standards and Technology).

    The HUB5 evaluation series focused on conversational speech recognition over the the telephone with the particular task of transcribing conversational speech into text. Its goals were to explore promising new areas in the recognition of conversational speech, to develop advanced technology incorporating those ideas and to measure the performance of the new technology. Further information about the evaluation is contained in The 2001 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone, included in this release.

    *Data*

    The source data consists of conversational telephone speech collected between 1990-2000 under the Switchboard protocol, specifically, 20 conversations from each of Switchboard-1, Release 2 (LDC97S62), Switchboard-2 Phase III Audio (LDC2002S06) and from the Switchboard cellular phone collection, Switchboard Cellular Part 1 Audio (LDC2001S13) and Switchboard Cellular Part 2 Audio (LDC2004S07). In the Switchboard study, recruited speakers were connected through a robot operator to carry on casual conversations about a daily topic announced by the robot operator at the start of the call.

    The audio files are two-channel μlaw recordings in sphere format. The corresponding transcripts are presented in stm format.

    *Samples*

    Please listen to this audio sample and view this transcript sample.

    *Updates*

    In March 2015, transcripts were added to this release along with updated documentation.
    • references: C-001283: Switchboard-1 Release 2
    • references: C-001285: Switchboard-2 Phase III Audio
    • references: Switchboard-2 Phase 4
    • isReferencedBy: (Online documentation)http://www.ldc.upenn.edu/Catalog/docs/LDC2002S13/
    • isReferencedBy: David Graff, et al. 2002 2001 HUB5 English Evaluation Linguistic Data Consortium, Philadelphia
    • isReferencedBy: "The 2001 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone (a.k.a. "Hub5"): http://www.nist.gov/speech/tests/ctr/h5_2001/pas-v1.2.pdf
    • isReferencedBy: "The 2001 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone Version 1.1": http://www.nist.gov/speech/tests/ctr/h5_2001/h5-01v1.1.pdf
  • C-000588: 2001 HUB5 Mandarin Evaluation
    *Introduction*

    The 2001 HUB5 Mandarin Evaluation, Linguistic Data Consortium (LDC) catalog number LDC2002S12 and ISBN 1-58563-228-7 is part of an ongoing series of periodic evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of conversational speech recognition. To this end the evaluation was designed to be simple, to focus on core speech technology issues, to be fully supported, and to be accessible.

    The evaluation was held from February 21 - March 12, 2001. The systems were to produce character-level transcripts and character-level confidence scores for the complete set of evaluation test material.

    Additional information is available at the 2001 NIST Evaluation Plan for Recognition of Conversational Speech Over the Telephone website.

    *Data*

    The test data comes from unexposed Mandarin CALLHOME Conversations, stored in sphere format. There are 20 sphere files encoded in two-channel interleaved mulaw for a total of 441,990,656 bytes (421 Mbytes) or eight hours of sphere data. These conversations were transcribed and time-marked by speaker turn, by the LDC.

    An included documentation table contains information on the speech segments to be processed as follows: ...

    *Updates*

    There are no updates at this time.
    • hasVersion: C-000587: 2001 HUB5 English Evaluation
    • references: C-000660: CALLHOME Mandarin Chinese Speech
    • hasFormat: C-000589: 2001 HUB5 Mandarin Transcripts
    • isReferencedBy: (Online documentation) http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S12
    • isReferencedBy: David Graff, et al. 2002 2001 HUB5 Mandarin Evaluation Linguistic Data Consortium, Philadelphia
    • isReferencedBy: "The 2001 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone (a.k.a. "Hub5"): http://www.nist.gov/speech/tests/ctr/h5_2001/pas-v1.2.pdf
    • isReferencedBy: "The 2001 NIST Evaluation Plan for Recognition of Conversational Speech over the Telephone Version 1.1": http://www.nist.gov/speech/tests/ctr/h5_2001/h5-01v1.1.pdf
  • C-000589: 2001 HUB5 Mandarin Transcripts
    *Introduction*

    The 2001 HUB5 Mandarin Transcripts corpus was produced by the Linguistic Data Consortium (LDC), catalog number LDC2003T01 and ISBN 1-58563-252-x.

    This publication contains transcripts for twenty CALLHOME Mandarin telephone conversations. These twenty conversations were used in NIST's 2001 HUB5 Non-English evaluation, and are published as 2001 HUB5 Mandarin Evaluation, LDC catalog number LDC2002S12.

    *Data*

    There are 20 data files in .txt format.

    The .txt files are transcript files rendered in Mandarin script orthography, containing the orthographic forms that were used in the original transcription process. These forms also serve as the head-words in the associated CALLHOME Mandarin Lexicon, LDC catalog number LDC96L15.

    Please follow these links for a sample transcript: Mandarin script | GIF format.

    *Updates*

    There are no updates at this time.
  • C-000590: 2001 NIST Speaker Recognition Evaluation Corpus
    *Introduction*

    The 2001 NIST Speaker Recognition Evaluation Corpus was produced by the Linguistic Data Consortium (LDC) catalog number LDC2002S34 and ISBN 1-58563-241-4.

    The 2001 NIST Speaker Recognition Evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end the evaluation was designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible.

    The corpus is based entirely on conversational cellular telephone speech collected by the LDC.

    Supporting documentation for this evaluation may be found on the 2001 NIST Speaker Recognition Evaluation website. Consult the NIST evaluation plan for detailed instructions on using this evaluation material.

    *Data*

    The files are divided into evaluation and development data. There are a total of 2,350 compressed speech files, all of which are in sphere format. The sphere files are compressed and encoded in one channel 8-bit mulaw, for a total of 575,337,198 bytes (548.7 Mbytes), or 26 hours of sphere data.

    The evaluation data is divided into evaluation training data and evaluation test data. The training data consists of 174 speech files that are two minutes long. The test data comprises 2,038 speech files of varying lengths not exceeding sixty seconds.

    The development data is similarly divided into development training data and development test data. The training data comprises 60 speech files with durations of two minutes per target speaker. The 78 development test data files contain segments of varying length not exceeding 60 seconds.

    *Updates*

    No updates are available at this time.