3330 件
- N-000001: 863 program in 2003 speech synthesis evaluation data
- N-000002: 863 program in 2004 automatic index evaluation data
- N-000003: 863 program in 2004 information index evaluation data
- C-000004: 863 program in 2004 name entry identification evaluation data
- C-000005: 863 program in 2004 speech recognition evaluation data
- C-000006: 863 program in 2005 information index evaluation data
- C-000007: 863 program in 2005 machine translation evaluation data
- C-000008: 863 program in 2005 speech recognition evaluation data
- C-000009: ASCCD-Annotated Speech Corpus of Chinese Discourse
- C-000010: AURORA Project Database - Aurora 4a - Evaluation Package
- D-000011: Adverbial Equivalence Dictionary
- C-000012: Albayzin corpus
- C-000013: Al-Hayat Arabic Corpus
- C-000014: Austrian SpeechDat(AT) FDB-1000 database
- C-000015: BABEL Bulgarian Database
- C-000016: BABEL Hungarian Database
- D-000017: Bilingual English-Russian Russian-English Dictionaries
- D-000020: Biology Database
- G-000021: British English Source Lexicon (BESL) version 2.2
- C-000022: British English SpeechDat(II) FDB-4000
- C-000023: British English SpeechDat(II) MDB-1000
- C-000024: CADCC-Chinese Annotated Dialogue and Conversation Corpus
- C-000025: CASIA single syllable isolated word speech corpus
- C-000026: CASIA-863Chinese Speech Synthesis Corpus
- N-000027: CASIA-Chinese Emotional Speech Corpus
- C-000028: CASIA-Chinese Question Structures Corpus
- C-000029: CASIA-Mandarin continuous digit speech corpus
- N-000030: CASIA-Northern China accent speech corpus
- N-000031: CASIA-Southern China accent speech corpus
- N-000032: CASIA98-99 speech testing library
- C-000033: CASIAThe weather forecast broadcasts the pronunciation storehouse
- G-000034: CELEX Dutch lexical database - Derivational Morphology Subset
- G-000035: CELEX Dutch lexical database - Frequency Subset
- G-000036: CELEX Dutch lexical database - Inflectional Morphology Subset
- G-000037: CELEX Dutch lexical database - Orthography Subset
- G-000038: CELEX Dutch lexical database - Phonology Subset
- G-000039: CELEX Dutch lexical database - Syntax Subset
- G-000040: Chinese Lexicon
- C-000041: Mandarin Chinese Speech Recognition Corpus (desktop) - single Chinese sentence (200 people)
- C-000042: Mandarin Chinese Speech Recognition Corpus (desktop)- person name (200 people)
- C-000043: Mandarin Chinese Speech Recognition Corpus (telephone channel) - Chinese single sentence (100 people)
- C-000044: Mandarin Chinese Speech Recognition Corpus (telephone channel) - digit string (100 people)
- C-000045: Mandarin Chinese Speech Recognition Corpus (telephone channel) - person name (100 people)
- C-000046: Mandarin Chinese Speech Recognition Corpus (telephone channel) - place name (100 people)
- D-000047: Computer Science Database
- D-000048: Concise Oxford Spanish Dictionary
- D-000049: Concise Oxford-Duden German Dictionary
- C-000050: Corpus of Contemporaneous Spanish Novels
- D-000051: DST Dictionary - String Dictionary (DST)
- D-000052: DST Dictionary - The whole dictionary (DST)
- C-000053: Danish SpeechDat(II) FDB-4000
- D-000054: Dictionary of French verbs (SINEQUA - Jean Dubois)
- D-000055: Dictionary of Law
- D-000056: Dictionary of Medicine
- D-000057: Dictionary of affixes (SINEQUA - Jean Dubois)
- D-000058: Dictionary of verb phrases (SINEQUA - Jean Dubois)
- G-000059: Dutch Lexicon
- G-000060: Dutch PAROLE lexicon
- C-000061: EUROM1g German
- D-000062: English lexicon with morphological information
- D-000063: English lexicon
- G-000064: English-French Lexicon (LanTmark)
- D-000065: EuroWordNet Dutch
- C-000066: Finnish Speechdat(II) FDB-1000
- C-000067: Flemish SpeechDat(II) FDB-1000
- G-000068: French Lexicon
- G-000069: French Source Lexicon
- G-000070: French lexicon with morphological information
- G-000071: French-Dutch Lexicon (LanTmark)
- C-000072: German SpeechDat(II) FDB-4000
- G-000073: German lexicon
- D-000074: German-Danish dictionaries (Institut for Erhvervsinformatik)
- C-000075: Greek SpeechDat(II) FDB-5000
- D-000076: Hydrogeology database
- C-000077: IBNC - An Italian Broadcast News Corpus
- C-000078: IDIOLOGOS 1 Bootstrap (NEOLOGOS Project)
- C-000079: ILPho phonetic lexicon
- C-000080: ISLE Speech Corpus
- G-000081: Italian lexicon with morphological information
- G-000082: Korean Lexicon
- C-000083: LC-STAR Catalan phonetic lexicon
- G-000084: LC-STAR English-Hebrew (Israel) Bilingual Aligned Phrasal lexicon
- G-000085: LC-STAR US English phonetic lexicon
- C-000086: "Le Monde Diplomatique" Text corpus in English
- C-000087: "Le Monde Diplomatique" Text corpus in French - archives 1980-1998
- C-000088: "Le Monde Diplomatique" Text corpus in French - archives from 1999
- G-000089: LusoLEX European Portuguese Lexicon (LusoLEX)
- C-000090: MICROAES
- G-000091: MULTEXT Lexicons
- C-000092: Mandarin Chinese Speech Recognition Corpus (desktop) - digit string (119 people)
- C-000093: Mandarin Chinese Speech Recognition Corpus (desktop) - person name (120 people)
- C-000094: Mandarin Chinese Speech Recognition Corpus (in the car) - person name, place name in Beijing, stocks, digit string (20 people)
- C-000095: Mandarin Chinese Speecon database
- D-000096: Monolingual Danish Lexicon
- D-000097: New Oxford Dictionary of English, 2nd Edition
- T-000098: New Oxford Thesaurus of English
- D-000099: Nominalisation Dictionary
- C-000100: OrienTel Morocco MCA (Modern Colloquial Arabic) database
- D-000101: Oxford Business French Dictionary
- G-000102: Oxford Business Spanish Dictionary
- C-000103: Oxford English phonetics files
- D-000104: Oxford French Minidictionary
- T-000105: Oxford Paperback Thesaurus, 2nd edition
- T-000106: PAROLE English lexicon
- C-000107: PAROLE Portuguese Corpus - complete version
- C-000108: PAROLE Portuguese Corpus - tagged subset
- T-000109: PAROLE Spanish Lexicon
- C-000110: PAROLE-SIMPLE-CLIPS PISA Italian Lexicon Phonetic layer
- D-000111: PAROLE-SIMPLE-CLIPS PISA Italian Lexicon Morphological layer
- D-000112: PAROLE-SIMPLE-CLIPS PISA Italian Lexicon Semantic layer
- G-000113: PAROLE-SIMPLE-CLIPS PISA Italian Lexicon Syntactic layer
- D-000114: POLEX Polish Lexicon
- D-000115: Pedology database
- D-000116: Pocket Oxford Italian Dictionary
- C-000117: Polish SpeechDat(E) Database
- C-000118: Portuguese SpeechDat(II) FDB-4000
- C-000119: Portuguese SpeechDat(M) database
- C-000120: Portuguese Speecon database
- G-000121: Pronunciation lexicon of British place names, surnames and first names
- C-000123: RASC863-annotated 4 regional accent speech corpus(II)
- C-000124: Russian SpeechDat(E) Database
- C-000125: SALA II Spanish Mobile Network Database collected in Venezuela
- C-000126: SALA II Spanish from Mexico database
- C-000127: SALA Spanish Mexican Database
- C-000128: SALA Spanish Venezuelan Database
- D-000129: SCI-ANES English-Spanish Bilingual Dictionary
- D-000130: SCI-FRAL-EURADIC French-German Bilingual Dictionary
- D-000131: SCI-FRAN-EURADIC French-English Bilingual Dictionary
- D-000132: Shorter Oxford English Dictionary - Audio Files
- C-000133: Slovak SpeechDat(E) Database
- C-000134: Spanish SpeechDat(II) FDB-1000
- C-000135: Spanish SpeechDat(II) FDB-4000
- C-000136: Spanish Speecon database
- G-000137: Spanish gilcUB-M Dictionary
- C-000138: Swedish SpeechDat(II) FDB-1000
- C-000139: Swedish SpeechDat(II) FDB-5000
- C-000140: Swedish SpeechDat(II) MDB-1000
- C-000141: Swiss-French SpeechDat(M)
- C-000142: TED Translanguage English Database
- C-000143: TEDphone (Polyphone-like Translanguage English Database)
- D-000144: THAMUS Bilingual dictionaries - Computer Science (4)
- D-000145: THAMUS Bilingual dictionaries - Engineering (4)
- G-000146: THAMUS Bilingual dictionaries - Law (4)
- G-000147: THAMUS Bilingual dictionaries - Medicine (2)
- N-000148: TSC973-Telephone Speech Corpus 973
- D-000149: Terminology database of telecommunication
- C-000150: Telephone Speech Data Collection for Czech
- C-000151: Telephone speech corpus for recognition-Male voice
- C-000152: Text of Northern Bunun (Taiwan)...New TextTexts
- C-000153: The EMILLE/CIIL Corpus
- D-000154: The Oxford Spanish Dictionary
- G-000155: The grammatical knowledge-base of contemporary Chinese (high frequency words)
- C-000156: The identifiable speech database of tabletop speech--the people?fs name, the place?f name (120 persons)
- D-000157: Tri-, quadri-, pentagrams dictionaries
- T-000158: VERBA Polytechnic and Plurilingual Terminological Database - A-QG Metrology
- T-000159: VERBA Polytechnic and Plurilingual Terminological Database - B-AA General Chemistry
- T-000160: VERBA Polytechnic and Plurilingual Terminological Database - B-AB Analytical Chemistry
- T-000161: VERBA Polytechnic and Plurilingual Terminological Database - B-AC Inorganic Chemistry
- T-000162: VERBA Polytechnic and Plurilingual Terminological Database - B-AD Organic Chemistry
- T-000163: VERBA Polytechnic and Plurilingual Terminological Database - B-AE Physical Chemistry
- T-000164: VERBA Polytechnic and Plurilingual Terminological Database - B-MA Acoustics
- T-000165: VERBA Polytechnic and Plurilingual Terminological Database - B-MB Electricity
- T-000166: VERBA Polytechnic and Plurilingual Terminological Database - B-MC Electromechanics
- T-000167: VERBA Polytechnic and Plurilingual Terminological Database - B-MD Spectrography
- T-000168: VERBA Polytechnic and Plurilingual Terminological Database - B-ME Solid State Physics
- T-000169: VERBA Polytechnic and Plurilingual Terminological Database - B-MF General Physics
- T-000170: VERBA Polytechnic and Plurilingual Terminological Database - B-MG Atomic Physics
- T-000171: VERBA Polytechnic and Plurilingual Terminological Database - B-MH Particle Physics
- T-000172: VERBA Polytechnic and Plurilingual Terminological Database - B-MI Plasma Physics
- T-000173: VERBA Polytechnic and Plurilingual Terminological Database - B-MJ Nuclear Physics
- T-000174: VERBA Polytechnic and Plurilingual Terminological Database - B-MK General Mechanics
- T-000175: VERBA Polytechnic and Plurilingual Terminological Database - B-ML Quantum Mechanics
- T-000176: VERBA Polytechnic and Plurilingual Terminological Database - B-MM Statistical Mechanics
- T-000177: VERBA Polytechnic and Plurilingual Terminological Database - B-MN Fluid Mechanics
- T-000178: VERBA Polytechnic and Plurilingual Terminological Database - B-MO Nucleonics
- T-000179: VERBA Polytechnic and Plurilingual Terminological Database - B-MP Optics
- T-000180: VERBA Polytechnic and Plurilingual Terminological Database - B-MQ Relativity
- T-000181: VERBA Polytechnic and Plurilingual Terminological Database - B-MR Thermodynamics
- T-000182: VERBA Polytechnic and Plurilingual Terminological Database - C-AB Geography
- T-000183: VERBA Polytechnic and Plurilingual Terminological Database - C-AC Geology
- T-000184: VERBA Polytechnic and Plurilingual Terminological Database - C-LA Hydrology
- T-000185: VERBA Polytechnic and Plurilingual Terminological Database - C-LB Oceanography
- T-000186: VERBA Polytechnic and Plurilingual Terminological Database - C-RE Energy Resources
- T-000187: VERBA Polytechnic and Plurilingual Terminological Database - D-AE Climate Control
- C-000188: VERBMOBIL - VM CD 1.1 (new edition)
- C-000189: VERBMOBIL - VM CD 12.1 (new edition)
- C-000190: VERBMOBIL - VM CD 14.1 (new edition)
- C-000191: VERBMOBIL - VM CD 2.1 (new edition)
- C-000192: VERBMOBIL - VM CD 3.1 (new edition)
- C-000193: VERBMOBIL - VM CD 4.1 (new edition)
- C-000194: VERBMOBIL - VM CD 5.1 (new edition)
- C-000195: VERBMOBIL - VM CD 6.1 (new edition)
- C-000196: VERBMOBIL - VM CD 7.1 (new edition)
- C-000197: VERBMOBIL - VM CD S 1.0 (original edition)
- C-000198: VERBMOBIL II - VM CD 22.1 - VM22.1 (BAS edition)
- C-000199: VERBMOBIL II - VM CD 24.1 - VM24.1 (BAS edition)
- C-000200: VERBMOBIL II - VM CD 25.1 - VM25.1 (BAS edition)
- C-000201: VERBMOBIL II - VM CD 26.1 - VM26.1 (BAS edition)
- C-000202: VERBMOBIL II - VM CD 27.1 - VM27.1 (BAS edition)
- C-000203: VERBMOBIL II - VM CD 29.1 - VM29.1 (BAS edition)
- C-000204: VERBMOBIL II - VM CD 33.1 - VM33.1 (BAS edition)
- C-000205: VERBMOBIL II - VM CD 34.1 - VM34.1 (BAS edition)
- C-000206: VERBMOBIL II - VM CD 35.1 - VM35.1 (BAS edition)
- C-000207: VERBMOBIL II - VM CD 38.1 - VM38.1 (BAS edition)
- C-000208: VERBMOBIL II - VM CD 39.1 - VM39.1 (BAS edition)
- C-000209: VERBMOBIL II - VM CD 48.1 - VM48.1 (BAS edition)
- C-000210: VERBMOBIL II - VM CD 50.1 - VM50.1 (BAS edition)
- C-000211: VERBMOBIL II - VM CD20.1 - VM20.1 (new edition)
- C-000212: VERBMOBIL II - VM CD21.1 - VM21.1 (new edition)
- C-000213: VERBMOBIL II - VM Lexicon database - VMLEX (BAS edition)
- C-000214: Welsh SpeechDat(II) FDB-2000
- G-000217: VERBA Polytechnic and Plurilingual Terminological Database - D-KA Water Cycle
- G-000218: VERBA Polytechnic and Plurilingual Terminological Database - D-KB Solid Waste Treatment
- G-000219: VERBA Polytechnic and Plurilingual Terminological Database - D-KC Laboratory Techniques
- G-000220: VERBA Polytechnic and Plurilingual Terminological Database - D-KE Environmental Technology
- G-000221: VERBA Polytechnic and Plurilingual Terminological Database - E-AA Health Materials and Equipment
- G-000222: VERBA Polytechnic and Plurilingual Terminological Database - E-AB Hospital Services
- G-000223: VERBA Polytechnic and Plurilingual Terminological Database - E-AC Hospital Management
- G-000224: VERBA Polytechnic and Plurilingual Terminological Database - E-AD Pharmacology
- G-000225: VERBA Polytechnic and Plurilingual Terminological Database - E-AF General Medicine
- G-000226: VERBA Polytechnic and Plurilingual Terminological Database - F-AA Agrarian Economics
- G-000227: VERBA Polytechnic and Plurilingual Terminological Database - F-AB Farming Activities and Techniques
- G-000228: VERBA Polytechnic and Plurilingual Terminological Database - F-AC Edafology
- G-000229: VERBA Polytechnic and Plurilingual Terminological Database - F-AD Drainage and Irrigation
- G-000230: VERBA Polytechnic and Plurilingual Terminological Database - F-AE Fertilizers
- G-000231: VERBA Polytechnic and Plurilingual Terminological Database - F-AF Pest Protection
- G-000232: VERBA Polytechnic and Plurilingual Terminological Database - F-AL Arboriculture and Viticulture
- G-000233: VERBA Polytechnic and Plurilingual Terminological Database - F-AM Trees and Bushes
- G-000234: VERBA Polytechnic and Plurilingual Terminological Database - F-HB Aviculture, Cuniculture, Apiculture
- G-000235: VERBA Polytechnic and Plurilingual Terminological Database - F-HD Animal Health and Nutrition
- G-000236: VERBA Polytechnic and Plurilingual Terminological Database - F-MA Meat Industry
- G-000237: VERBA Polytechnic and Plurilingual Terminological Database - G-AA Computing-General Topics
- G-000238: VERBA Polytechnic and Plurilingual Terminological Database - G-AB Peripherals
- G-000239: VERBA Polytechnic and Plurilingual Terminological Database - G-AE Applications and Services
- G-000240: VERBA Polytechnic and Plurilingual Terminological Database - G-AH Data Processing
- G-000241: VERBA Polytechnic and Plurilingual Terminological Database - G-AN Data Transmission
- G-000242: VERBA Polytechnic and Plurilingual Terminological Database - G-AU General Terminology
- G-000243: VERBA Polytechnic and Plurilingual Terminological Database - G-BC Essays
- G-000244: VERBA Polytechnic and Plurilingual Terminological Database - G-GF Microelectronics
- G-000245: VERBA Polytechnic and Plurilingual Terminological Database - G-GH Cybernetics
- G-000246: VERBA Polytechnic and Plurilingual Terminological Database - G-GJ Cathode Rays
- G-000247: VERBA Polytechnic and Plurilingual Terminological Database - G-GM Semi- and Super-Conductors
- G-000248: VERBA Polytechnic and Plurilingual Terminological Database - G-GP Electronics-General Topics
- G-000249: VERBA Polytechnic and Plurilingual Terminological Database - G-GR Ionics
- G-000250: VERBA Polytechnic and Plurilingual Terminological Database - G-GU Magnetics Recording and Playback
- G-000251: VERBA Polytechnic and Plurilingual Terminological Database - G-GY Integrated Circuits
- G-000252: VERBA Polytechnic and Plurilingual Terminological Database - G-GZ Electronic Office
- G-000253: VERBA Polytechnic and Plurilingual Terminological Database - G-HL Components and Material
- G-000254: VERBA Polytechnic and Plurilingual Terminological Database - G-NA Radioelectric Broadcasting
- G-000255: VERBA Polytechnic and Plurilingual Terminological Database - G-NB Radar
- G-000256: VERBA Polytechnic and Plurilingual Terminological Database - G-NF Cables and Conductors
- G-000257: VERBA Polytechnic and Plurilingual Terminological Database - G-NH Radiocommunications
- G-000258: VERBA Polytechnic and Plurilingual Terminological Database - G-NM T.V.
- G-000259: VERBA Polytechnic and Plurilingual Terminological Database - G-NQ Telecomms Lines and Devices
- G-000260: VERBA Polytechnic and Plurilingual Terminological Database - G-NR Telephone and Telegraph
- G-000261: VERBA Polytechnic and Plurilingual Terminological Database - G-NZ Telecommunications-General Topics
- G-000262: VERBA Polytechnic and Plurilingual Terminological Database - G-OB Control
- G-000263: VERBA Polytechnic and Plurilingual Terminological Database - G-OF Signalling
- G-000264: VERBA Polytechnic and Plurilingual Terminological Database - G-OG Switching Devices
- G-000265: VERBA Polytechnic and Plurilingual Terminological Database - G-SA Electrical Systems
- G-000266: VERBA Polytechnic and Plurilingual Terminological Database - G-SB Instrumentation
- G-000267: VERBA Polytechnic and Plurilingual Terminological Database - H-AB Reinforced Concrete
- G-000268: VERBA Polytechnic and Plurilingual Terminological Database - H-GA Architecture
- G-000269: VERBA Polytechnic and Plurilingual Terminological Database - H-GE Construction-General Topics
- G-000270: VERBA Polytechnic and Plurilingual Terminological Database - H-GG Town Planning
- G-000271: VERBA Polytechnic and Plurilingual Terminological Database - VERBA Polytechnic and Plurilingual Terminological Database
- G-000272: VERBA Polytechnic and Plurilingual Terminological Database - I-AA Metal and Steel Foundries
- G-000273: VERBA Polytechnic and Plurilingual Terminological Database - I-AB Oil Industry
- G-000274: VERBA Polytechnic and Plurilingual Terminological Database - I-AC Automobile Industry
- G-000275: VERBA Polytechnic and Plurilingual Terminological Database - I-AD Textile Industry
- G-000276: VERBA Polytechnic and Plurilingual Terminological Database - I-MA Aerospace Engineering
- G-000277: VERBA Polytechnic and Plurilingual Terminological Database - I-MB Engineering Design
- G-000278: VERBA Polytechnic and Plurilingual Terminological Database - I-MC Mechanical Engineering
- G-000279: VERBA Polytechnic and Plurilingual Terminological Database - I-MG Control Systems
- G-000280: VERBA Polytechnic and Plurilingual Terminological Database - I-MJ Hydraulic Engineering
- G-000281: VERBA Polytechnic and Plurilingual Terminological Database - I-MM Air-Conditioning
- G-000282: VERBA Polytechnic and Plurilingual Terminological Database - I-MN Outfitting
- G-000283: VERBA Polytechnic and Plurilingual Terminological Database - I-MO Tools
- N-000284: VERBA Polytechnic and Plurilingual Terminological Database - I-MY Machine Tools
- G-000285: VERBA Polytechnic and Plurilingual Terminological Database - I-QN Industry, General Topics
- G-000286: VERBA Polytechnic and Plurilingual Terminological Database - I-TA Paints
- G-000287: VERBA Polytechnic and Plurilingual Terminological Database - I-TB Products and Ingredients
- G-000288: VERBA Polytechnic and Plurilingual Terminological Database - I-TC Manufacturing, General Topics
- G-000289: VERBA Polytechnic and Plurilingual Terminological Database - L-AA Transport, General Topics
- N-000290: VERBA Polytechnic and Plurilingual Terminological Database - L-AG Sea Shipping
- G-000291: VERBA Polytechnic and Plurilingual Terminological Database - L-AH Infrastructure
- G-000292: VERBA Polytechnic and Plurilingual Terminological Database - L-MA Transport Vehicles
- G-000293: VERBA Polytechnic and Plurilingual Terminological Database - M-AA Law, General Topics
- G-000294: VERBA Polytechnic and Plurilingual Terminological Database - M-AB Criminal Law
- G-000295: VERBA Polytechnic and Plurilingual Terminological Database - M-AC Civil Law
- G-000296: VERBA Polytechnic and Plurilingual Terminological Database - M-AD Politics
- G-000297: VERBA Polytechnic and Plurilingual Terminological Database - M-AF Financial Law
- G-000298: VERBA Polytechnic and Plurilingual Terminological Database - M-AI International Law
- G-000299: VERBA Polytechnic and Plurilingual Terminological Database - M-AL Marine Law
- G-000300: VERBA Polytechnic and Plurilingual Terminological Database - M-AM Company Law
- G-000301: VERBA Polytechnic and Plurilingual Terminological Database - M-AP Court Procedure
- G-000302: VERBA Polytechnic and Plurilingual Terminological Database - M-AR Roman Law
- G-000303: VERBA Polytechnic and Plurilingual Terminological Database - M-AT Labour Law
- G-000304: VERBA Polytechnic and Plurilingual Terminological Database - M-KD State Administration
- G-000305: VERBA Polytechnic and Plurilingual Terminological Database - M-RA Politics, General Topics
- G-000306: VERBA Polytechnic and Plurilingual Terminological Database - M-RB Diplomacy
- G-000307: VERBA Polytechnic and Plurilingual Terminological Database - M-RC Politics and International Co-operation
- G-000308: VERBA Polytechnic and Plurilingual Terminological Database - M-RD International Conferences
- G-000309: VERBA Polytechnic and Plurilingual Terminological Database - M-RE International Treaties
- G-000310: VERBA Polytechnic and Plurilingual Terminological Database - M-RF International Institutions
- G-000311: VERBA Polytechnic and Plurilingual Terminological Database - M-RG International Courts
- G-000312: VERBA Polytechnic and Plurilingual Terminological Database - M-RH Armed Conflicts
- G-000313: VERBA Polytechnic and Plurilingual Terminological Database - N-AA Economic Growth
- G-000314: VERBA Polytechnic and Plurilingual Terminological Database - N-AB Economic Cycles
- G-000315: VERBA Polytechnic and Plurilingual Terminological Database - N-AC Economic Policy
- G-000316: VERBA Polytechnic and Plurilingual Terminological Database - N-AD Macroeconomics
- G-000317: VERBA Polytechnic and Plurilingual Terminological Database - N-AE Microeconomics
- G-000318: VERBA Polytechnic and Plurilingual Terminological Database - N-AF History of Economics
- G-000319: VERBA Polytechnic and Plurilingual Terminological Database - N-AG Economic Structure
- G-000320: VERBA Polytechnic and Plurilingual Terminological Database - N-AH Accounting
- G-000321: VERBA Polytechnic and Plurilingual Terminological Database - N-AI State Exchequer
- G-000322: VERBA Polytechnic and Plurilingual Terminological Database - N-AJ Natural Resources and Environment
- G-000323: VERBA Polytechnic and Plurilingual Terminological Database - N-AK Statistics
- G-000324: VERBA Polytechnic and Plurilingual Terminological Database - N-AL European Union
- G-000325: VERBA Polytechnic and Plurilingual Terminological Database - N-AM Regional and Urban
- G-000326: VERBA Polytechnic and Plurilingual Terminological Database - N-AN Labour Economy
- G-000327: VERBA Polytechnic and Plurilingual Terminological Database - N-AO Agricultural Economy
- G-000328: VERBA Polytechnic and Plurilingual Terminological Database - N-AU Demography
- G-000329: VERBA Polytechnic and Plurilingual Terminological Database - N-AW Economic Institutions
- G-000330: VERBA Polytechnic and Plurilingual Terminological Database - N-AX Economics of Real Estate
- G-000331: VERBA Polytechnic and Plurilingual Terminological Database - N-BB Social Welfare
- G-000332: VERBA Polytechnic and Plurilingual Terminological Database - N-BC Economics, General Topics
- G-000333: VERBA Polytechnic and Plurilingual Terminological Database - N-LA Trade, General Topics
- G-000334: VERBA Polytechnic and Plurilingual Terminological Database - N-LB Foreign Trade
- G-000335: VERBA Polytechnic and Plurilingual Terminological Database - N-LC Measures and Currencies
- G-000336: VERBA Polytechnic and Plurilingual Terminological Database - N-LF Marketing
- G-000337: VERBA Polytechnic and Plurilingual Terminological Database - N-LG Import-Export
- G-000338: VERBA Polytechnic and Plurilingual Terminological Database - N-LH Distribution
- G-000339: VERBA Polytechnic and Plurilingual Terminological Database - N-LP Business Correspondence
- G-000340: VERBA Polytechnic and Plurilingual Terminological Database - N-PA Banking
- G-000341: VERBA Polytechnic and Plurilingual Terminological Database - N-PB Stock Markets
- G-000342: VERBA Polytechnic and Plurilingual Terminological Database - N-PC International Finance
- G-000343: VERBA Polytechnic and Plurilingual Terminological Database - N-PD Money and Currencies
- G-000344: VERBA Polytechnic and Plurilingual Terminological Database - N-PE Insurance
- G-000345: VERBA Polytechnic and Plurilingual Terminological Database - N-PF Financial Services
- G-000346: VERBA Polytechnic and Plurilingual Terminological Database - N-TA Business Management
- G-000347: VERBA Polytechnic and Plurilingual Terminological Database - N-TB Human Resources
- G-000348: VERBA Polytechnic and Plurilingual Terminological Database - N-TC Quality Control
- G-000349: VERBA Polytechnic and Plurilingual Terminological Database - N-TD Manufacturing and Logistics
- G-000350: VERBA Polytechnic and Plurilingual Terminological Database - N-TF Business Finance
- G-000351: VERBA Polytechnic and Plurilingual Terminological Database - N-TH Business Computing
- G-000352: VERBA Polytechnic and Plurilingual Terminological Database - S-AA Anatomy
- G-000353: VERBA Polytechnic and Plurilingual Terminological Database - S-AC Biology
- G-000354: VERBA Polytechnic and Plurilingual Terminological Database - S-AE Biochemistry
- G-000355: VERBA Polytechnic and Plurilingual Terminological Database - S-AF General Botany
- G-000356: VERBA Polytechnic and Plurilingual Terminological Database - S-AG Cytology
- G-000357: VERBA Polytechnic and Plurilingual Terminological Database - S-AH Ecology
- G-000358: VERBA Polytechnic and Plurilingual Terminological Database - S-AI Embryology
- G-000359: VERBA Polytechnic and Plurilingual Terminological Database - S-AK General Physiology
- G-000360: VERBA Polytechnic and Plurilingual Terminological Database - S-AL Genetics
- G-000361: VERBA Polytechnic and Plurilingual Terminological Database - S-AM Histology
- G-000362: VERBA Polytechnic and Plurilingual Terminological Database - S-AN Mycology
- G-000363: VERBA Polytechnic and Plurilingual Terminological Database - S-AO Microbiology
- G-000364: VERBA Polytechnic and Plurilingual Terminological Database - S-AQ Palaeontology
- C-000365: VERBMOBIL II - VM CD 45.1 - VM45.1 (BAS edition)
- C-000366: VERBMOBIL II - VM Bonus CD - VMBONUS (BAS edition)
- C-000367: VERBMOBIL II - VM CD 18.1 - VM18.1 (new edition)
- C-000368: VERBMOBIL II - VM CD 16.1 - VM16.1 (new edition)
- C-000369: VERBMOBIL II - VM CD 62.1 - VM62.1 (BAS edition)
- C-000370: VERBMOBIL II - VM CD 63.0 - VM63.0 (original edition)
- C-000371: VERBMOBIL II - VM CD 17.1 - VM17.1 (new edition)
- C-000372: VERBMOBIL II - VM CD 19.1 - VM19.1 (new edition)
- C-000373: VERBMOBIL II - VM CD 65.0 - VM65.0 (original edition)
- C-000374: VERBMOBIL II - VM CD 53.1 - VM53.1 (BAS edition)
- C-000375: VERBMOBIL II - VM CD 60.1 - VM60.1 (BAS edition)
- C-000376: VERBMOBIL II - VM CD 61.1 - VM61.1 (BAS edition)
- C-000377: VERBMOBIL II - VM CD 64.0 - VM64.0 (original edition)
- D-000378: MultiWordNet database (included semantic fields) (MultiWordNet)
- D-000379: Labelling of WordNet 1.6 with semantic fields (WordNet Domains)
- C-000380: French SpeechDat-Car
- C-000381: Danish SpeechDat-Car - GSM recordings - GSM recordings only
- C-000382: Danish SpeechDat-Car - In-car recordings
- C-000383: Finnish SpeechDat-Car
- D-000384: NODE+DIMAP
- C-000385: Flemish/Dutch SpeechDat-Car database
- C-000386: Spanish SpeechDat-Car database
- C-000387: Italian SpeechDat-Car database
- N-000388: AURORA Project Database - Aurora 4b - Evaluation Package
- C-000389: Belgian-French SpeechDat(II) FDB-1000
- C-000390: Luxembourgish-French SpeechDat(II) FDB-500 database
- C-000391: Luxembourgish-German SpeechDat(II) FDB-500
- C-000392: American English SpeechDat-Car
- C-000393: British-English SpeechDat-Car
- C-000394: Austrian SpeechDat(AT) MDB-1000 database
- C-000395: M2VTS Speaker Verification Database
- C-000396: ILSP/ELEFTHEROTYPIA Corpus (Greek corpus)
- C-000397: British English SpeechDat(II) SDB-2400
- C-000398: Greek SpeechDat-Car
- G-000399: PHONOLEX (BAS/DFKI)
- G-000401: STO SprogTeknologisk Ordbase (Danish Lexicon for NLP/HLT Applications)
- C-000402: SALA II Spanish from Costa Rica database
- D-000403: Bulgarian WordNet
- C-000404: SALA II Spanish from Argentina database
- C-000405: OrienTel French as spoken in Tunisia database
- C-000406: OrienTel French as spoken in Morocco database
- C-000408: OrienTel Hebrew database
- C-000409: OrienTel Arabic as spoken in Israel database
- C-000410: ZipTel
- C-000411: Venice Italian Treebank (VIT)
- G-000412: LC-STAR Spanish phonetic lexicon
- G-000413: Multilingual Wordbank
- G-000414: Multilingual Phrasebank
- C-000415: German Speecon database
- C-000416: BITS Logatome Synthesis Corpus BITS-LG
- T-000417: PAROLE Italian Corpus
- C-000418: Italian Syntactic-Semantic Treebank (ISST)
- C-000419: PAIDIALOGOS (NEOLOGOS Project)
- C-000420: TC-STAR 2006 Evaluation Package - ASR English
- C-000421: TC-STAR 2006 Evaluation Package - ASR Spanish - CORTES
- C-000422: TC-STAR 2006 Evaluation Package - ASR Mandarin Chinese
- C-000423: TC-STAR 2006 Evaluation Package - SLT English-to-Spanish
- C-000424: TC-STAR 2006 Evaluation Package - SLT Spanish-to-English - CORTES
- C-000426: Kids01 - PC Environment Children's Speech Corpus
- C-000427: Car Environment Speech DB
- C-000428: Car02 - Car Speech Corpus
- C-000429: Car03 - Car Speech Corpus
- C-000430: Car04 - Car Speech Corpus
- C-000431: CarNoise01 - Car Noise Corpus
- C-000432: CarSpkr01 - Speech Corpus for In-car Speaker verification
- C-000433: Chinese01 - Chinese Speech Corpus
- C-000434: Chinese02 - Chinese Speech Corpus
- C-000435: Chinese03 - Chinese Speech Corpus
- C-000436: Chinese04 - Chinese Speech Corpus
- C-000437: CleanSent01 - Read Sentence Clean Speech Corpus
- C-000438: Dict01 - PC Environment Read Sentence Speech Corpus
- C-000439: Dict02 - PC Environment Read Sentence Speech Corpus
- C-000440: Emotion01- Emotional Speech Corpus
- C-000441: English01- English Speech Corpus
- C-000442: English02- English Speech Corpus
- C-000443: Noisy01 - High Noise Speech Corpus
- C-000444: Num01 - Number Speech Corpus
- C-000445: SimulCar01 - Simulated Speech Corpus in Car Environment
- C-000446: Spanish01- Spanish Speech Corpus
- C-000447: StandMic01 - Speech Corpus for Speech Recognition Assessment
- C-000448: SynthMale01- Read Sentences Speech Corpus for Prosody Synthesis
- C-000449: TelNum01 - Telephone network Number Speech Corpus
- C-000450: VariMic01 - Microphone Performance Test Speech Corpus
- C-000451: Address01 - Korean Address Speech Corpus
- C-000452: American National Corpus
- C-000453: C-English01 - Chinese Speakers’ English
- C-000454: Center for Spoken Language Understanding Corpora
- T-000455: 分類語彙表 -増補改訂版-
- C-000456: Embedded01- Embedded Speech Corpus
- C-000457: F-Korean01 - Foreign Speakers’ Korean
- C-000458: K-SEC - Korean Speakers’ Korean and English
- N-000459: Lancaster Corpus of Children's Project Writing
- G-000460: Lexical collocation data
- G-000461: Morpheme Dictionary
- C-000463: Multimodal01 - Multimodal Speech Corpus
- C-000464: Simultaneous Interpretation Database (conversation)
- C-000465: SynthFemale01- Read Sentences Speech Corpus for Prosody Synthesis
- N-000466: TC-STAR 2006 Evaluation Package - SLT Chinese-to-English
- C-000467: The Babel English-Chinese Parallel Corpus
- C-000470: The Bergen Corpus of London Teenage Language
- C-000471: The Chinese Treebank
- N-000472: The Corpus of Early English Correspondence
- G-000473: The Enabling Minority Language Engineering Corpus
- C-000474: The ICAME Corpus Collection
- C-000476: The Lancaster Los Angeles Spoken Chinese Corpus
- C-000477: The Machine Readable Spoken English Corpus
- C-000478: The PDC2000 Corpus of Chinese News Text
- C-000479: The UCLA Chinese Corpus
- C-000480: The international corpus of English
- G-000481: The Enabling Minority Language Engineering Corpus
- N-000482: Turbo Lingo
- T-000483: Web concordance of English romantic literature
- C-000484: Web corp
- C-000485: 日本音響学会 研究用連続音声データベース
- C-000486: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1990
- C-000487: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1991
- C-000488: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1992
- C-000489: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1993
- C-000490: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1994
- C-000491: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1995
- C-000492: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1996
- C-000493: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1997
- C-000494: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1998
- C-000495: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 1999
- C-000496: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2000
- C-000497: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2001
- C-000498: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2002
- C-000499: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2003
- C-000500: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2004
- C-000501: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2006
- C-000502: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1994
- C-000503: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1995
- C-000504: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1996
- C-000505: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1997
- C-000506: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1998
- C-000507: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 1999
- C-000508: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2000
- C-000509: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2001
- C-000510: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2002
- C-000511: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2003
- C-000512: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2004
- C-000513: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2006
- C-000514: 車内対話音声データベース
- D-000515: CICC Basic Chinese Dictionary
- N-000516: CICC Basic Indonesian Dictionary
- N-000517: CICC Basic Thai dictionary
- D-000518: CICC Basic Malaysian Dictionary
- D-000519: CICC technical dictionary
- N-000520: Classical contrastive lexicon
- C-000525: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 1985-1989
- C-000526: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 1990-1995
- C-000527: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 1994-1998
- C-000528: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 1995-1999
- C-000529: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 1996-2000
- C-000530: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2001
- C-000531: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2002
- C-000532: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2003
- C-000533: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2004
- C-000534: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2005
- C-000535: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, Nikkei Financial Daily, and Nikkei Marketing Journal 2006
- C-000536: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, and Nikkei Marketing Journal 1975-1979
- C-000537: DVD-ROM Nikkei Full-text Database - Nikkei Daily,Nikkei Business Daily, and Nikkei Marketing Journal 1980-1984
- C-000538: デジタル音声データベース(セットA)
- C-000539: デジタル音声データベース(セットB)
- C-000540: デジタル音声データベース(セットC)
- C-000541: デジタル音声データベース(セットD)
- C-000542: デジタル音声データベース(セットE)
- C-000543: デジタル音声データベース(セットF)
- D-000545: EDICT
- O-000546: English basic words list
- G-000547: 北海道大学英語語彙表
- C-000549: 日本音響学会 新聞記事読み上げ音声コーパス
- D-000550: ライフサイエンス辞書
- C-000551: 日本語話し言葉コーパス
- C-000553: 名古屋大学同時通訳データベース
- G-000558: テレビ放送の語彙調査 [語彙表] CD-ROM版
- C-000560: 1996 English Broadcast News Dev and Eval (HUB4)
- C-000561: 1996 English Broadcast News Speech (HUB4)
- C-000562: 1996 English Broadcast News Transcripts (HUB4)
- C-000563: 1997 English Broadcast News Speech (HUB4)
- C-000564: 1997 English Broadcast News Transcripts (HUB4)
- C-000565: 1997 HUB4 Broadcast News Evaluation Non-English Test Material
- C-000566: 1997 HUB4 English Evaluation Speech and Transcripts
- C-000567: 1997 HUB5 Arabic Evaluation
- C-000568: 1997 HUB5 Arabic Transcripts
- C-000569: 1997 HUB5 German Evaluation
- C-000570: 1997 HUB5 German Transcripts
- C-000571: 1997 HUB5 Spanish Evaluation
- C-000572: 1997 HUB5 Spanish Transcripts
- C-000573: 1997 Spanish Broadcast News Speech (HUB4-NE)
- C-000574: 1997 Spanish Broadcast News Transcripts (HUB4-NE)
- C-000575: 1997 Speaker Recognition Benchmark
- C-000576: 1998 HUB4 Broadcast News Evaluation English Test Material
- C-000577: 1998 HUB5 English Evaluation
- C-000578: 1998 HUB5 English Transcripts
- C-000579: 1998 Speaker Recognition Benchmark
- C-000580: 1999 HUB4 Broadcast News Evaluation English Test Material
- C-000581: 1999 Speaker Recognition Benchmark
- C-000582: 2000 Communicator Dialogue Act Tagged
- C-000583: 2000 Communicator Evaluation
- C-000584: 2000 NIST Speaker Recognition Evaluation
- C-000585: 2001 Communicator Dialogue Act Tagged
- C-000586: 2001 Communicator Evaluation
- C-000587: 2001 HUB5 English Evaluation
- C-000588: 2001 HUB5 Mandarin Evaluation
- C-000589: 2001 HUB5 Mandarin Transcripts
- C-000590: 2001 NIST Speaker Recognition Evaluation Corpus
- C-000591: 2002 NIST Speaker Recognition Evaluation
- C-000592: 2002 Rich Transcription Broadcast News and Conversational Telephone Speech
- C-000593: ACE 2004 Multilingual Training Corpus
- C-000594: ACE 2005 Multilingual Training Corpus
- C-000595: ACE Time Normalization (TERN) 2004 English Training Data v 1.0
- C-000596: ACE-2 Version 1.0
- C-000597: ACL/DCI
- C-000598: ARL Urdu Speech Database, Training Data
- C-000599: ATIS0 Read
- C-000600: ATIS0 SD Read
- N-000601: ATIS2
- C-000602: ATIS3 Test Data
- C-000603: ATIS3 Training Data
- C-000604: Air Traffic Control BOS
- C-000605: Air Traffic Control Complete
- C-000606: Air Traffic Control DCA
- C-000607: Air Traffic Control DFW
- G-000608: American English Spoken Lexicon
- C-000609: Arabic Broadcast News Transcripts
- C-000610: Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
- C-000611: Arabic English Parallel News Part 1
- C-000612: Arabic Gigaword Second Edition
- C-000613: Arabic Gigaword
- C-000614: Arabic News Translation Text Part 1
- C-000615: Arabic Newswire Part 1
- C-000616: Arabic Treebank: Part 1 - 10K-word English Translation
- C-000617: Arabic Treebank: Part 1 v 2.0
- C-000618: Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
- C-000619: Arabic Treebank: Part 2 v 2.0
- C-000620: Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
- C-000621: Arabic Treebank: Part 3 v 1.0
- C-000622: Arabic Treebank: Part 4 v 1.0 (MPG Annotation)
- C-000623: Articulation Index
- C-000624: BBN Pronoun Coreference and Entity Type Corpus
- C-000625: BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
- C-000626: BLLIP 1987-89 WSJ Corpus Release 1
- N-000627: BRAMSHILL
- C-000628: Boston University Radio Speech Corpus
- G-000629: Buckwalter Arabic Morphological Analyzer Version 1.0
- G-000630: Buckwalter Arabic Morphological Analyzer Version 2.0
- C-000631: CALLFRIEND American English-Non-Southern Dialect
- C-000632: CALLFRIEND American English-Southern Dialect
- C-000633: CALLFRIEND Canadian French
- C-000634: CALLFRIEND Egyptian Arabic
- C-000635: CALLFRIEND Farsi
- C-000636: CALLFRIEND German
- C-000637: CALLFRIEND Hindi
- C-000638: CALLFRIEND Japanese
- C-000639: CALLFRIEND Korean
- C-000640: CALLFRIEND Mandarin Chinese-Mainland Dialect
- C-000641: CALLFRIEND Mandarin Chinese-Taiwan Dialect
- C-000642: CALLFRIEND Spanish-Caribbean Dialect
- C-000643: CALLFRIEND Spanish-Non-Caribbean Dialect
- C-000644: CALLFRIEND Tamil
- C-000645: CALLFRIEND Vietnamese
- G-000646: CALLHOME American English Lexicon (PRONLEX)
- C-000647: CALLHOME American English Speech
- C-000648: CALLHOME American English Transcripts
- C-000649: CALLHOME Egyptian Arabic Speech Supplement
- C-000650: CALLHOME Egyptian Arabic Speech
- C-000651: CALLHOME Egyptian Arabic Transcripts Supplement
- C-000652: CALLHOME Egyptian Arabic Transcripts
- G-000653: CALLHOME German Lexicon
- C-000654: CALLHOME German Speech
- C-000655: CALLHOME German Transcripts
- G-000656: CALLHOME Japanese Lexicon
- C-000657: CALLHOME Japanese Speech
- C-000658: CALLHOME Japanese Transcripts
- G-000659: CALLHOME Mandarin Chinese Lexicon
- C-000660: CALLHOME Mandarin Chinese Speech
- C-000661: CALLHOME Mandarin Chinese Transcripts
- C-000662: CALLHOME Spanish Dialogue Act Annotation
- G-000663: CALLHOME Spanish Lexicon
- C-000664: CALLHOME Spanish Speech
- C-000665: CALLHOME Spanish Transcripts
- C-000666: CCGbank
- C-000667: CETEMpublico
- G-000668: COMLEX English Syntax Lexicon
- N-000669: COMLEX Syntax Text Corpus Version 2.0
- C-000670: CSLU: 22 Languages Corpus
- C-000671: CSLU: Multilanguage Telephone Speech Version 1.2
- C-000672: CSLU: Names Release 1.3
- C-000673: CSLU: Speaker Recognition Version 1.1
- C-000674: CSLU: Spelled and Spoken Words
- C-000675: CSLU: Spoltech Brazilian Portuguese Version 1.0
- C-000676: CSLU: Voices
- C-000677: CSR-I (WSJ0) Complete
- C-000678: CSR-I (WSJ0) Other
- C-000679: CSR-I (WSJ0) Sennheiser
- C-000680: CSR-II (WSJ1) Other
- C-000681: CSR-II (WSJ1) Sennheiser
- C-000682: CSR-III Speech
- C-000683: CSR-III Text
- C-000684: CSR-IV HUB3
- C-000685: CSR-IV HUB4
- C-000686: CTIMIT
- C-000687: Chinese <-> English Name Entity Lists v 1.0
- C-000688: Chinese English News Magazine Parallel Text
- C-000689: Chinese Gigaword Second Edition
- C-000690: Chinese Gigaword
- C-000691: Chinese News Translation Text Part 1
- C-000692: Chinese Proposition Bank 1.0
- C-000693: Chinese Treebank 2.0
- C-000694: Chinese Treebank 4.0
- C-000695: Chinese Treebank 5.0
- C-000696: Chinese Treebank 5.1
- G-000697: Chinese-English Translation Lexicon Version 3.0
- C-000698: Czech Broadcast News Speech
- C-000699: Czech Broadcast News Transcripts
- C-000700: DCIEM/HCRC
- N-000701: Frontiers in Speech Processing 93
- N-000702: Frontiers in Speech Processing 94
- G-000703: Grassfields Bantu Fieldwork: Dschang Lexicon
- C-000704: Grassfields Bantu Fieldwork: Dschang Tone Paradigms
- C-000705: Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
- C-000706: Gulf Arabic Conversational Telephone Speech, Transcripts
- C-000707: HARD 2004 Text
- C-000708: HARD 2004 Topics and Annotations
- C-000709: HCRC Map Task Corpus
- C-000710: HKUST Mandarin Telephone Speech, Part 1
- C-000711: HKUST Mandarin Telephone Transcript Data, Part 1
- N-000712: HTIMIT
- C-000713: HUB5 Mandarin Telephone Speech Corpus
- C-000714: HUB5 Mandarin Transcripts
- C-000715: HUB5 Spanish Telephone Speech Corpus
- C-000716: HUB5 Spanish Transcripts
- C-000717: ICSI Meeting Speech
- C-000718: ICSI Meeting Transcripts
- C-000719: ISI Arabic-English Automatically Extracted Parallel Text
- C-000720: ISL Meeting Speech Part 1
- C-000721: ISL Meeting Transcripts Part 1
- C-000722: Iraqi Arabic Conversational Telephone Speech, Transcripts
- C-000723: JEIDA/JCSD-Channel 0 City Names
- C-000726: JEIDA/JCSD-Channel 0 Isolated Digits
- C-000734: OGI Multilanguage Corpus
- C-000736: SPIDRE
- C-000738: Switchboard-2 Phase II
- G-000742: Dictionary of words (SINEQUA - Jean Dubois)
- T-000743: AphasiaBank
- T-000744: Art & Architecture Thesaurus (AAT)
- C-000745: Australian Corpus of English
- C-000746: BNC Sampler
- C-000747: BNC Spoken Corpus
- C-000748: BNC-baby
- C-000749: British National Corpus (World Edition)
- C-000750: British National Corpus (XML Edition)
- C-000751: Brown Corpus
- C-000752: CHILDES
- C-000753: COMPARA
- C-000754: CRATER Multilingual Aligned Annotated Corpus
- N-000755: Cambridge Cornell Corpus of Spoken North American English
- N-000756: Cambridge Corpus of Business English
- C-000757: Cambridge Learner Corpus
- C-000758: Cambridge and Nottingham Corpus of Discourse in English
- N-000759: Cambridge and Nottingham Spoken Business English
- C-000760: Collins Word Web
- C-000761: Corpus of Early English Correspondence Sampler
- T-000762: ERIC Thesaurus
- C-000763: ERIC
- C-000764: English-Norwegian Parallel Corpus
- T-000765: EuroWordNet
- C-000766: European Parliament Proceedings Parallel Corpus
- T-000767: FrameNet
- C-000768: Freiburg - Brown Corpus
- N-000769: Freiburg-LOB
- C-000770: Hansards
- T-000771: Hindi WordNet
- C-000772: Innsbruck Computer Archive of Machine-Readable English Texts
- N-000773: International Corpus of Learner English
- C-000774: Kicktionary
- C-000775: Kolhapur Corpus
- C-000776: Lampeter Corpus
- C-000777: Lancaster Parsed Corpus
- C-000778: Lancaster/IBM Spoken English Corpus
- D-000779: Le Petit Robert
- C-000780: London Lund Corpus
- C-000781: Longman Corpus Network
- D-000782: Longman Dictionary of Contemporary English Online
- C-000783: Longman Learners' Corpus
- C-000784: Longman Spoken American Corpus
- C-000785: Longman Written American Corpus
- C-000786: Longman/Lancaster Corpus
- T-000787: Medical Subject Headings
- T-000788: Merriam-Webster's Collegiate® Dictionary & Thesaurus
- T-000789: Metathesaurus
- T-000790: 日本語Mindnet
- C-000791: OPUS
- C-000792: OrienTel Turkish database
- D-000793: Oxford 3000
- D-000794: Oxford Advanced Learner's Dictionary (Online)
- D-000795: Oxford-Hachette French Dictionary
- N-000796: PELCRA
- C-000797: Polytechnic of Wales Corpus
- T-000798: Roget's II: The new thesaurus
- T-000799: Roget's International Thesaurus of English Words and Phrases
- T-000800: SALSA II
- C-000801: THE LOB CORPUS
- C-000802: TIGER
- D-000804: The American Heritage Book of English Usage
- D-000805: The American Heritage Dictionary of the English Language
- C-000806: The Bank of English
- C-000807: The Cambridge International Corpus
- C-000808: The East African Component of The International Corpus of English
- T-000809: The European multilingual thesaurus on health promotion in 12 languages
- T-000810: The Getty Thesaurus of Geographic Names
- C-000811: The Helsinki Corpus of English Texts: Diachronic Part
- C-000812: The Helsinki Corpus of Older Scots
- C-000813: The JRC-Acquis Multilingual Parallel Corpus
- N-000814: The Newdigate Newsletters
- C-000815: The Oslo Multilingual Corpus
- C-000816: The Polyglot Bible
- C-000817: The Polyglot Book of Mormon
- G-000818: The Swedish PAROLE Lexicon
- G-000819: The Swedish SIMPLE Lexicon
- N-000820: The Wellington Corpus of Spoken New Zealand English
- C-000821: The Wellington Corpus of Written New Zealand English
- T-000822: Union List of Artist Names Online
- D-000823: Webster's Revised Unabridged Dictionary
- D-000825: WordNet
- C-000826: WordbanksOnline
- D-000827: DCS-環境問題情報事典
- G-000828: DCS-科学技術略語辞書
- G-000829: DCS-音訓引き難読語辞書
- C-000830: DCS-日本文学作品名よみかた辞書
- D-000831: DCS-季語季題よみかた辞書
- G-000832: DCS-コンピュータ用語辞書
- G-000833: DCS-神社・寺院名よみかた辞書
- G-000834: DCS-動植物名よみかた辞書
- G-000835: DCS-島嶼名辞書
- G-000836: DCS-事項名辞書
- G-000837: DCS-事項名読みふり辞書
- C-000838: DCS-毎日新聞1991~2006データファイル
- G-000839: DCS - Medical term dictionary
- T-000840: DCS-現代イタリア語表現辞書(伊和・和伊/用例・文例)
- G-000841: DCS-駅名よみかた辞書
- D-000842: DCS-ビジネス/技術 実用英語辞書[第3版]
- G-000843: DCS-地名よみかた辞書
- G-000844: DCS-人名辞書
- G-000845: DCS-人名読みふり辞書
- G-000846: DCS-河川名辞書
- G-000847: DCS-カタカナから引く外国人名綴り方辞書
- G-000848: DCS-苗字8万よみかた辞書
- C-000849: DCS-文献目録・憲法論の50年(1945~1995)
- G-000850: DCS-昭和災害史事典
- G-000851: DCS-西洋人名辞書
- D-000852: DCS-歌舞伎人名事典
- D-000853: DCS-NEW 斎藤和英辞書
- G-000854: DCS-機関名辞書
- G-000855: DCS-英米小説原題邦題事典
- D-000856: DCS-教育問題情報事典
- D-000857: Dual Daijirin[Web edition]
- C-000858: 電総研道案内対話音声コーパス
- D-000859: Eijiro on the WEB
- D-000860: Eijiro
- G-000861: GoiTaikei --- A Japanese Lexicon CD-ROM
- D-000862: Japanese Semantic Pattern Dictionary
- G-000863: パターン意味検索プログラムファイル
- G-000864: パターンパーサ・プログラムファイル
- D-000865: 意味類型パターン辞書
- C-000866: NICT JLEコーパス
- C-000867: 鳥バンク(Tori-Bank)
- D-000868: WebLSD
- C-000869: 重点領域研究「音声対話」 対話音声コーパス
- C-000873: ACCOR - English
- C-000874: APASCI
- C-000875: AURORA Project database - Subset of SpeechDat-Car - Danish database - Evaluation Package
- C-000876: AURORA Project database - Subset of SpeechDat-Car - Finnish database - Evaluation Package
- C-000877: AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package
- C-000878: AURORA Project database - Subset of SpeechDat-Car - Italian database - Evaluation Package
- C-000879: AURORA Project database - Subset of SpeechDat-Car - Spanish database - Evaluation Package
- C-000880: An-Nahar Newspaper Text Corpus
- N-000881: Automobile Engineering
- C-000882: BABEL Estonian Database
- C-000883: BABEL Romanian database
- C-000884: BDBRUIT
- N-000885: BDLEX
- C-000886: BDSONS Base de données des sons du français
- C-000887: BREF-120 - A large corpus of French read speech
- C-000888: BREF-80
- C-000889: BREF-POLYGLOT
- C-000890: Basque FDB-1060 database (SpeechDat-like)
- C-000891: Basque Spoken Corpus, by Jon Aske (Department of Foreign Languages, Salem State College - Salem, Massachusetts, USA)
- C-000894: COLLECT
- C-000895: COST232
- C-000896: CRATER corpus
- D-000897: DICO-MORPH_Collocation
- D-000898: DICO-MORPH_Lemme
- D-000899: DST Dictionary - Compound nouns (optional) (DST)
- D-000900: DST Dictionary - Gender, number, conjugation (optional) (DST)
- D-000901: DST Dictionary - Lemma (optional) (DST)
- D-000902: DST Dictionary - Part of Speech (optional) (DST)
- D-000903: DST Dictionary - Prep./Adv. phrases (optional) (DST)
- D-000904: DST Dictionary - Semantical information (optional) (DST)
- D-000905: DST Dictionary - Syntactical information (optional) (DST)
- C-000906: Danish SpeechDat(M) database - DB1
- C-000907: Danish SpeechDat(M) database - DB2
- D-000908: DixAF (Bilingual Dictionary French Arabic, Arabic French)
- C-000909: Dutch PAROLE Distributable Corpus
- C-000910: Dutch Polyphone Database
- C-000911: Dutch SpeechDat(II) MDB-250
- D-000912: Dutch-French Lexicon (LanTmark)
- N-000913: ECI-ELSNET Italian & German tagged sub-corpus
- C-000914: ECI/MCI (European Corpus Initiative/Multilingual Corpus I)
- C-000915: EUROM1f French
- C-000916: EUROM1i
- N-000917: Electrical Engineering
- C-000918: Eleftherotypia Journal Speech database
- N-000919: Energy Technology
- C-000920: English SpeechDat Polyphone database DB1
- C-000921: English SpeechDat(M) Polyphone database DB2
- C-000922: Erlanger Bahnansage - ERBA
- D-000923: EuroWordNet Czech
- D-000924: EuroWordNet English Addition to English WordNet
- D-000925: EuroWordNet Estonian
- D-000926: EuroWordNet French
- D-000927: EuroWordNet German
- D-000928: EuroWordNet Italian
- D-000929: EuroWordNet Spanish
- C-000930: Euskararen Datu-Base Lexikala (EDBL) Lexical Database for Basque
- C-000931: FIXED0IT - DB1
- C-000932: FIXED0IT - DB2
- C-000933: FRESCO: French Polyphone Database (SpeechDat(M)) DB1
- C-000934: Farsdat (Farsi Speech Database)
- C-000935: Finnish Speechdat(II) FDB-4000
- C-000936: Finnish Speecon database
- C-000937: Finnish-Swedish Speechdat(II) FDB-1000
- C-000938: Fixed1frDesign
- C-000939: Fixed1it Design
- C-000940: French Speechdat(II) FDB-1000
- C-000941: French Speecon database
- D-000942: GEOBASE
- C-000943: GeFRePaC - German French Reciprocal Parallel Corpus
- N-000944: German SpeechDat(II) MDB-1000
- C-000945: German SpeechDat-Car
- C-000946: Hebrew Speecon database
- C-000947: ILC Italian Morphological Lexicon
- D-000948: Insurance (Termcat)
- C-000949: Italian Speech Corpus 1 (Appen)
- C-000950: Italian SpeechDat(II) FDB-3000
- C-000951: Italian SpeechDat(II) MDB-250
- C-000952: Italian Speecon database
- C-000953: Italian TTS Speech Corpus (Appen)
- G-000954: KORLEX Croatian Lexicon
- C-000955: Korean Speecon database
- T-000956: LORETO Thesaurus
- D-000957: Linguistics (Termcat)
- C-000958: MTP Annotated German corpus - untagged version
- C-000959: MULTEXT JOC Corpus
- C-000960: Mandarin-5000 database
- C-000961: Modern French Corpus including Anaphors Tagging
- T-000962: NEWBASE - Extended version of ELRA-T0090 GEOBASE
- C-000963: Offensive Word Filter 1
- C-000964: Offensive Word Filter 2
- N-000965: Olympic Sports (Termcat)
- C-000966: OrienTel French as spoken in Morocco database
- C-000967: OrienTel Morocco MSA (Modern Standard Arabic) database
- C-000968: OrienTel Tunisia MCA (Modern Colloquial Arabic) database
- C-000969: OrienTel Tunisia MSA (Modern Standard Arabic) database
- N-000970: PAROLE French Corpus
- G-000971: PHONOLEX (BAS/DFKI)
- C-000972: PRESS 65
- C-000973: Phonetically Balanced Words (1)
- C-000974: Polish Speecon database
- C-000975: PolyVar
- C-000976: RVG1 (Regional Variants of German 1, Part 1)
- C-000977: Russian Speecon database
- D-000978: SCI-FRES-EURADIC French-Spanish Bilingual Dictionary
- D-000979: SCI-FRIT-EURADIC French-Italian Bilingual Dictionary
- D-000980: SCIPER-AL-EURADIC German Monolingual Dictionary (SCIPER-AL-EURADIC)
- D-000981: SCIPER-AN-EURADIC English Monolingual Dictionary (SCIPER-AN-EURADIC)
- D-000982: SCIPER-ES-EURADIC Spanish Monolingual Dictionary
- D-000983: SCIPER-FR-EURADIC French Monolingual Dictionary
- D-000984: SCIPER-IT-EURADIC Italian Monolingual Dictionary (SCIPER-IT-EURADIC)
- C-000985: SIEMENS 1000 - SI1000
- C-000986: SieTill (Siemens Tillman)
- C-000987: Siemens Russian SpeechDat-like FDB-1000
- C-000988: Siemens Shanghai Mandarin FDB-1000
- C-000989: Slovenian SpeechDat(II) FDB-1000
- C-000990: Spanish TTS Speech Corpus (Appen)
- C-000991: SpeechDat Speaker Verification database
- C-000992: Strange Corpus 1 - SC1 (ACCENTS)
- C-000993: Strange Corpus 10 - SC10 ('Accents II')
- C-000994: Strange Corpus 2 - SC2 (Noises)
- C-000995: Swedish Speecon database
- D-000996: THAMUS Generic Italian Dictionary - canonical forms
- D-000997: THAMUS. Generic Italian Dictionary - canonical forms - technical domain
- D-000998: THAMUS. Generic Italian Dictionary - inflected forms
- C-000999: Tagged text in French (MEMODATA) with rules of morphological disambiguation
- C-001000: Turkish Speecon database
- C-001001: Twin database - TWINDB1
- C-001002: UK English Speecon database
- T-001003: VERBA Polytechnic and Plurilingual Terminological Database - D-GA Control of Industrial Pollution
- T-001004: VERBA Polytechnic and Plurilingual Terminological Database - D-GB Air Pollution
- T-001005: VERBA Polytechnic and Plurilingual Terminological Database - D-GC Chemical Pollution
- D-001006: VERBA Polytechnic and Plurilingual Terminological Database - D-GD Marine Pollution
- T-001007: VERBA Polytechnic and Plurilingual Terminological Database - D-GE Soil Contamination
- T-001008: VERBA Polytechnic and Plurilingual Terminological Database - D-GH Structures
- D-001009: VERBA Polytechnic and Plurilingual Terminological Database - D-GI Environmental Law
- D-001010: VERBA Polytechnic and Plurilingual Terminological Database - D-GK Noise Pollution
- D-001011: VERBA Polytechnic and Plurilingual Terminological Database - D-KD Sewage Plant equipment
- N-001012: VERBA Polytechnic and Plurilingual Terminological Database - F-AJ Food Plants
- D-001013: VERBA Polytechnic and Plurilingual Terminological Database - F-AQ Agriculture-General Topics
- D-001014: VERBA Polytechnic and Plurilingual Terminological Database - F-AR Tobacco Industry
- D-001015: VERBA Polytechnic and Plurilingual Terminological Database - F-HA Cattle Breeding
- D-001016: VERBA Polytechnic and Plurilingual Terminological Database - G-AS Software Quality and Engineering
- C-001017: Wolverhampton Business English Corpus
- C-001018: British National Corpus 1.0
- C-001019: CNN TRANSCRIPTS
- C-001020: Centre for English Corpus Linguistic-CECL
- C-001021: Cobuild Concordance and Collocations Sampler
- C-001022: JEIDA/JCSD-Channel 0 Complete
- C-001023: JEIDA/JCSD-Channel 0 Control Words
- C-001024: JEIDA/JCSD-Channel 0 Four Digit Sequences
- C-001025: JEIDA/JCSD-Channel 0 Mono Syllables
- C-001026: JEIDA/JCSD-Channel 1 City Names
- C-001027: JEIDA/JCSD-Channel 1 Complete
- C-001028: JEIDA/JCSD-Channel 1 Control Words
- C-001029: JEIDA/JCSD-Channel 1 Four Digit Sequences
- C-001030: JEIDA/JCSD-Channel 1 Isolated Digits
- C-001031: JEIDA/JCSD-Channel 1 Mono Syllables
- C-001032: JURIS
- C-001033: Japanese Business News Text Supplement
- C-001034: Japanese Business News Text
- C-001035: KING Speaker Verification
- C-001036: Klex: Finite-State Lexical Transducer for Korean
- C-001037: Korean Broadcast News Speech
- C-001038: Korean Broadcast News Transcripts
- C-001039: Korean English Treebank Annotations
- C-001040: Korean Newswire
- C-001041: Korean Propbank
- C-001042: Korean Telephone Conversations Complete Set
- G-001043: Korean Telephone Conversations Lexicon
- C-001044: Korean Telephone Conversations Speech
- C-001045: Korean Telephone Conversations Transcripts
- C-001046: Korean Treebank Annotations Version 2.0
- C-001047: LATINO-40 Spanish Read News
- C-001048: LLHDB
- C-001049: Levantine Arabic Conversational Telephone Speech, Transcripts
- C-001050: Levantine Arabic Conversational Telephone Speech
- C-001052: Levantine Arabic QT Training Data Set 5, Speech
- C-001053: Levantine Arabic QT Training Data Set 5, Transcripts
- C-001054: MACROPHONE
- G-001056: Mawukakan Lexicon
- C-001057: Message Understanding Conference (MUC) 6 Additional News Text
- C-001058: Message Understanding Conference (MUC) 6
- C-001059: Message Understanding Conference (MUC) 7
- C-001060: Middle East Technical University Turkish Microphone Speech v 1.0
- C-001061: Morphologically Annotated Korean Text
- C-001062: Multiple-Translation Arabic (MTA) Part 1
- C-001063: Multiple-Translation Arabic (MTA) Part 2
- C-001064: Multiple-Translation Chinese (MTC) Part 2
- C-001065: Multiple-Translation Chinese (MTC) Part 3
- C-001066: Multiple-Translation Chinese (MTC) Part 4
- C-001068: N4 NATO Native and Non-Native Speech
- C-001069: 2003 NIST Language Recognition Evaluation
- C-001070: NIST Meeting Pilot Corpus Speech
- C-001071: NIST Meeting Pilot Corpus Transcripts and Metadata
- C-001072: NTIMIT
- C-001073: North American News Text Corpus
- C-001074: North American News Text Supplement
- C-001075: OGI Spelled and Spoken Word
- C-001076: PhoneBook: NYNEX Isolated Words
- C-001077: Portuguese Newswire Text
- C-001078: Prague Arabic Dependency Treebank 1.0
- C-001079: Prague Czech-English Dependency Treebank1.0
- C-001080: Prague Dependency Treebank 1.0
- C-001081: Prague Dependency Treebank 2.0
- C-001082: Proposition Bank I
- C-001083: RST Discourse Treebank
- C-001084: RT-03 MDE Training Data Speech
- C-001085: RT-03 MDE Training Data Text and Annotations
- C-001086: RT-04 MDE Training Data Speech
- C-001087: RT-04 MDE Training Data Text/Annotations
- C-001088: Resource Management Complete Set 2.0
- C-001089: Resource Management RM1 2.0
- C-001090: Resource Management RM2 2.0
- C-001091: Road Rally
- C-001092: Russian through Switched Telephone Network (RuSTeN)
- D-001093: SAID
- C-001094: SLX Corpus of Classic Sociolinguistic Interviews
- C-001095: SUSAS Transcripts
- C-001096: SUSAS
- C-001097: Santa Barbara Corpus of Spoken American English Part I
- C-001098: Santa Barbara Corpus of Spoken American English Part II
- C-001099: Santa Barbara Corpus of Spoken American English Part III
- C-001100: Penn Treebank Online
- C-001101: athelstan
- G-001104: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Economics, law & business management
- C-001105: GlobalPhone Arabic
- C-001106: GlobalPhone Chinese-Mandarin
- C-001107: GlobalPhone Chinese-Shanghai
- C-001108: GlobalPhone Croatian
- C-001109: GlobalPhone Czech
- C-001110: GlobalPhone French
- C-001111: GlobalPhone German
- C-001112: GlobalPhone Japanese
- C-001113: GlobalPhone Korean
- C-001114: GlobalPhone Portuguese (Brazilian)
- C-001115: GlobalPhone Russian
- C-001116: GlobalPhone Spanish (Latin American)
- C-001117: GlobalPhone Swedish
- C-001118: GlobalPhone Tamil
- C-001119: GlobalPhone Turkish
- C-001120: Spanish Speech Corpus 1 (Appen)
- C-001121: The identifiable speech database of tabletop speech--the number string (10 persons)
- C-001122: The identifiable speech database of tabletop speech--the number string (120 persons)
- C-001123: The identifiable speech database of tabletop speech--the number string (200 persons)
- C-001125: The identifiable speech database of tabletop speech--the stock (70 persons)
- C-001126: The identifiable speech database of telephone speech--stock (265 people using mobile telephone)
- C-001127: The identifiable speech database of telephone speech--the message (64 people using mobile telephone)
- C-001128: The identifiable speech database of telephone speech--the message (86 people using mobile telephone)
- C-001129: The identifiable speech database of telephone speech--the name of person, the name of place (265 people using mobile telephone)
- C-001130: The identifiable speech database of telephone speech--the name of person, the name of place (285 speakers using stable telephone)
- C-001131: The identifiable speech database of telephone speech--the number string (265 people using mobile telephone)
- C-001132: The identifiable speech database of telephone speech--the number string (285 speakers using stable telephone)
- C-001133: The identifiable speech database of telephone speech--the stock (285 people using stable telephone)
- C-001134: Three parallel language Chinese, English, Japanese corpus developed for Olympic(Chinese and English)
- G-001135: Three parallel language Chinese, English, Japanese corpus developed for Olympic
- G-001136: Tsinghua Chinese Treebank
- G-001137: Tsinghua-Corpus of speech synthesis
- G-001138: VERBA Polytechnic and Plurilingual Terminological Database - S-AR Plant Pathology
- G-001139: VERBA Polytechnic and Plurilingual Terminological Database - S-AS Taxonomy
- G-001140: VERBA Polytechnic and Plurilingual Terminological Database - S-AT Virology
- G-001141: VERBA Polytechnic and Plurilingual Terminological Database - S-AU Zoology, General Topics
- G-001142: VERBA Polytechnic and Plurilingual Terminological Database - S-AV Zoology of Invertebrates
- G-001143: VERBA Polytechnic and Plurilingual Terminological Database - S-AW Zoology of Vertebrates
- G-001144: VERBA Polytechnic and Plurilingual Terminological Database - S-BJ Flora
- G-001145: VERBA Polytechnic and Plurilingual Terminological Database - S-BK Fauna
- G-001146: VERBA Polytechnic and Plurilingual Terminological Database - T-AB Press
- G-001147: VERBA Polytechnic and Plurilingual Terminological Database - T-AC Radio
- G-001148: VERBA Polytechnic and Plurilingual Terminological Database - T-AD TV
- G-001149: VERBA Polytechnic and Plurilingual Terminological Database - T-AE Cinema
- G-001150: VERBA Polytechnic and Plurilingual Terminological Database - T-AG Communications, General Topics
- G-001151: VERBA Polytechnic and Plurilingual Terminological Database - T-AH Photography
- G-001152: VERBA Polytechnic and Plurilingual Terminological Database - T-AI Printing Industry
- G-001153: VERBA Polytechnic and Plurilingual Terminological Database - T-MB Documentation
- G-001154: VERBA Polytechnic and Plurilingual Terminological Database - V-AA Trampoline Jumping
- G-001155: VERBA Polytechnic and Plurilingual Terminological Database - V-AB Target Shooting
- G-001156: VERBA Polytechnic and Plurilingual Terminological Database - V-AC Skating
- G-001157: VERBA Polytechnic and Plurilingual Terminological Database - V-AD Skiing
- G-001158: VERBA Polytechnic and Plurilingual Terminological Database - V-AE Table Tennis
- G-001159: VERBA Polytechnic and Plurilingual Terminological Database - V-AF Lawn Tennis
- G-001160: VERBA Polytechnic and Plurilingual Terminological Database - V-AG Volleyball
- G-001161: VERBA Polytechnic and Plurilingual Terminological Database - V-AH Weight-Lifting
- G-001162: VERBA Polytechnic and Plurilingual Terminological Database - V-AI Waterpolo
- G-001163: VERBA Polytechnic and Plurilingual Terminological Database - V-AJ Wrestling
- G-001164: VERBA Polytechnic and Plurilingual Terminological Database - V-AS American Football
- G-001165: VERBA Polytechnic and Plurilingual Terminological Database - V-AT Field and Track Athletics
- G-001166: VERBA Polytechnic and Plurilingual Terminological Database - V-AU Tenpin Bowling
- G-001167: VERBA Polytechnic and Plurilingual Terminological Database - V-AV Boxing
- G-001168: VERBA Polytechnic and Plurilingual Terminological Database - V-AW Baseball
- G-001169: VERBA Polytechnic and Plurilingual Terminological Database - V-AX Basketball
- G-001170: VERBA Polytechnic and Plurilingual Terminological Database - V-AY Handball
- G-001171: VERBA Polytechnic and Plurilingual Terminological Database - V-AZ Cricket
- G-001172: VERBA Polytechnic and Plurilingual Terminological Database - V-BA Cycling
- G-001173: VERBA Polytechnic and Plurilingual Terminological Database - V-BC Fencing
- G-001174: VERBA Polytechnic and Plurilingual Terminological Database - V-BD Swimming
- G-001175: VERBA Polytechnic and Plurilingual Terminological Database - V-BE Aquatic Choreography
- G-001176: VERBA Polytechnic and Plurilingual Terminological Database - V-BF Football
- G-001177: VERBA Polytechnic and Plurilingual Terminological Database - V-BG Mountaineering
- G-001178: VERBA Polytechnic and Plurilingual Terminological Database - V-BH Golf
- G-001179: VERBA Polytechnic and Plurilingual Terminological Database - V-BI Gymnastics
- G-001180: VERBA Polytechnic and Plurilingual Terminological Database - V-BJ Hockey
- G-001181: VERBA Polytechnic and Plurilingual Terminological Database - V-BK Ice Hockey
- G-001182: VERBA Polytechnic and Plurilingual Terminological Database - V-BL Judo
- G-001183: VERBA Polytechnic and Plurilingual Terminological Database - V-BM Canoeing
- G-001184: VERBA Polytechnic and Plurilingual Terminological Database - V-BN Modern Pentathlon
- G-001185: VERBA Polytechnic and Plurilingual Terminological Database - V-BO Polo
- G-001186: VERBA Polytechnic and Plurilingual Terminological Database - V-BP Rugby
- G-001187: VERBA Polytechnic and Plurilingual Terminological Database - V-BQ Riding
- G-001188: VERBA Polytechnic and Plurilingual Terminological Database - V-BR Rowing
- G-001189: VERBA Polytechnic and Plurilingual Terminological Database - V-BS Sailing
- G-001190: VERBA Polytechnic and Plurilingual Terminological Database - V-BT Sports, General Topics
- G-001191: VERBA Polytechnic and Plurilingual Terminological Database - V-TA Leisure
- G-001192: VERBA Polytechnic and Plurilingual Terminological Database - W-AA Weapons
- G-001193: VERBA Polytechnic and Plurilingual Terminological Database - W-WA Specialised terminology without field coding
- G-001194: VERBA Polytechnic and Plurilingual Terminological Database - W-WB Specialised terminology without fiel coding
- G-001195: VERBA Polytechnic and Plurilingual Terminological Database - Z-ZA General Vocabulary
- G-001196: VERBA Polytechnic and Plurilingual Terminological Database - Z-ZB General Vocabulary
- N-001197: VERBMOBIL II - VM CD 44.1 - VM44.1 (BAS edition)
- G-001198: WCSC--Word Corpus of Standard Chinese
- C-001199: 863 Program in 2003 Assessment and test data of chinese recognition
- C-001200: 863 Program in 2003 Assessment and test data of text classification
- C-001201: 863 Program in 2004 Assessment and test data of machine translation
- N-001202: 863 Program in 2004 Assessment and test data of speech synthesis
- C-001203: 863 Program in 2004 Assessment and test data of text classification
- C-001204: 863 program in 2003 automatic index evaluation data
- N-001205: 863 program in 2003 full text retrieval evaluation data
- N-001206: 863 program in 2003 machine translation evaluation data
- C-001207: 863 program in 2003 part-of-speech evaluation data
- N-001208: 863 program in 2003 speech recognition evaluation data
- G-001210: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Technology, Engineering & Construction
- C-001211: Chinese POS Tagged Corpus
- C-001212: Chinese and English speech corpus
- G-001213: Chinese geographic name storehouse
- D-001214: Chinese-English Olympic Dictionary
- C-001215: Chinese-English Sentence aligned Bilingual Corpus
- C-001216: Czech SpeechDat(E) Database
- C-001217: French Speechdat(II) FDB-5000 database
- G-001219: KORLEX Serbian Lexicon
- D-001220: Mordern Chinese semantic Dictionary based on International Logical Model
- N-001221: Natural Broadcasting Speech corpus
- C-001222: Norwegian SpeechDat(II) FDB-1000
- C-001223: RASC863-annotated 4 regional accent speech corpus(I)
- C-001224: RASC863-annotated 4 regional accent speech corpus(III)
- C-001225: SCSC--Syllable Corpus of Standard Chinese
- C-001226: Siemens Synthesis Corpus - SI1000P
- C-001227: Spanish SpeechDat Database for the Mobile Telephone Network
- C-001228: Special Scene and special domain dialogue corpus
- C-001229: Swiss-French SpeechDat(II) FDB-3000
- C-001230: Swiss-German SpeechDat(II) FDB-2000
- C-001231: TC-STAR 2005 Evaluation Package - ASR English
- C-001232: TC-STAR 2005 Evaluation Package - ASR Mandarin Chinese
- C-001233: TC-STAR 2005 Evaluation Package - ASR Spanish
- N-001234: TC-STAR 2005 Evaluation Package - SLT Chinese-to-English
- N-001235: TC-STAR 2005 Evaluation Package - SLT English-to-Spanish
- N-001236: TC-STAR 2005 Evaluation Package - SLT Spanish-to-English
- C-001237: Taiwan Mandarin Speecon database
- C-001238: The identifiable speech database of Chinese mandarin-----extract database
- C-001239: The identifiable speech database of Chinese mandarin-----wide label
- C-001240: The identifiable speech database of tabletop speech--free topic (50 persons)
- C-001241: The identifiable speech database of tabletop speech--the message (120 persons)
- C-001242: The identifiable speech database of tabletop speech--the message (200 persons)
- C-001244: Turkish Continuous and Isolated Word Speech Database
- N-001245: VERBMOBIL II - VM CD 49.1 - VM49.1 (BAS edition)
- C-001246: 1997 Mandarin Broadcast News Speech (HUB4-NE)
- C-001247: 1997 Mandarin Broadcast News Transcripts (HUB4-NE)
- C-001248: 20 Newsgroups
- C-001249: 2004 NIST Speaker Recognition Evaluation
- C-001250: ATIS0 Complete
- C-001251: Arabic Broadcast News Speech
- D-001252: CMU Pronouncing Dictionary
- C-001253: CSLU: Stories v 1.2
- C-001255: Chinese Treebank Final Release
- C-001256: CSR-II (WSJ1) Complete
- C-001257: FFMTIMIT
- C-001258: Gulf Arabic Conversational Telephone Speech
- C-001259: Iraqi Arabic Conversational Telephone Speech
- C-001260: Penn Chinese Treebank
- C-001261: Reuters-21578
- C-001262: Santa Barbara Corpus of Spoken American English Part IV
- C-001263: Spanish Gigaword First Edition
- C-001264: Spanish Newswire Text, Volume 2
- C-001265: Spanish News Text
- C-001266: Speech Controlled Computing
- C-001267: Speech in Noisy Environments (SPINE) Evaluation Audio
- C-001268: Speech in Noisy Environments (SPINE) Evaluation Transcripts
- C-001269: Speech in Noisy Environments (SPINE) Training Audio
- C-001270: Speech in Noisy Environments (SPINE) Training Transcripts
- C-001271: Speech in Noisy Environments (SPINE2) Part 1 Audio
- C-001272: Speech in Noisy Environments (SPINE2) Part 1 Transcripts
- C-001273: Speech in Noisy Environments (SPINE2) Part 2 Audio
- C-001274: Speech in Noisy Environments (SPINE2) Part 2 Transcripts
- C-001275: Speech in Noisy Environments (SPINE2) Part 3 Audio
- C-001276: Speech in Noisy Environments (SPINE2) Part 3 Transcripts
- C-001277: Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
- C-001278: SummBank 1.0
- C-001279: Switchboard Cellular Part 1 Audio
- C-001280: Switchboard Cellular Part 1 Transcribed Audio
- C-001281: Switchboard Cellular Part 1 Transcription
- C-001282: Switchboard Cellular Part 2 Audio
- C-001283: Switchboard-1 Release 2
- C-001284: Switchboard-2 Phase I
- C-001285: Switchboard-2 Phase III Audio
- C-001286: Syllable-Final /s/ Lenition
- C-001287: TDT Pilot Study Corpus
- C-001288: TDT2 Careful Transcription Audio
- C-001289: TDT2 Careful Transcription Text
- C-001290: TDT2 English Audio
- C-001291: TDT2 Mandarin Audio Corpus
- C-001292: TDT2 Multilanguage Text Version 4.0
- C-001293: TDT3 English Audio
- C-001294: TDT3 Mandarin Audio
- C-001295: TDT3 Multilanguage Text Version 2.0
- C-001296: TDT4 Multilingual Broadcast News Speech Corpus
- C-001297: TDT4 Multilingual Text and Annotations
- C-001298: TDT5 Multilingual Text
- C-001299: TDT5 Topics and Annotations
- C-001300: TI 46-Word
- C-001301: TIDES Extraction (ACE) 2003 Multilingual Training Data
- C-001302: TIDIGITS
- C-001303: TIMIT Acoustic-Phonetic Continuous Speech Corpus
- C-001304: TIPSTER Complete
- C-001305: TIPSTER Volume 1
- C-001306: TIPSTER Volume 2
- C-001307: TIPSTER Volume 3
- C-001308: TRAINS Spoken Dialog Corpus
- C-001309: TREC Mandarin
- N-001310: TREC Spanish
- C-001311: TRECVID 2005 Keyframes & Transcripts
- C-001312: Tactical Speaker Identification Speech Corpus (TSID)
- C-001313: Taiwanese Putonghua Speech and Transcripts
- C-001315: The AQUAINT Corpus of English News Text
- C-001316: The CMU Kids Corpus
- C-001317: West Point Company G3 American English Speech
- C-001318: TimeBank 1.2
- C-001319: カナダ・バイリンガル話ことばコーパス
- C-001320: フランス語(エックス)多言語話し言葉コーパス
- C-001321: フランス語(パリ)多言語話しことばコーパス
- C-001322: マレーシア語多言語話しことばコーパス
- C-001323: スペイン語多言語話しことばコーパス2004年度版
- C-001324: トルコ語多言語話しことばコーパス
- C-001325: ARCADE/ROMANSEVAL corpus
- C-001326: AURORA Project Database 2.0 - Evaluation Package
- C-001327: Amaryllis Corpus - Evaluation Package
- C-001328: Arabic CTS Levantine Fisher Training Data Set 3, Speech
- C-001329: BAS GEO1
- C-001330: BITS Unit Selection Synthesis Corpus
- N-001331: Basic multilingual lexicon (MEMODATA)
- D-001332: Bilingual Collocational Dictionary (Horst Bogatz)
- T-001333: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Aeronautics, Navigation, Mechanical Engineering
- T-001334: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Data Processing, Electronics, Telecoms
- T-001335: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Data Processing
- T-001336: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Economics
- T-001337: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Electrical Engineering
- T-001338: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Exact sciences, Physics, Chemistry, Geology
- T-001339: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Geography, History, Arts
- T-001340: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Leisure, Tourism, Sports, Food
- D-001341: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Natural and medical sciences
- T-001342: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Plastics and Chemistry
- D-001343: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Sociology, Psychology, Pedagogy
- T-001345: Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Telecommunications
- C-001346: Bizkaifon (Bizkaieraren Fonoteka)
- T-001347: BrasiLEX Brazilian Portuguese lexicon
- D-001348: Bulgarian Morphological Dictionary
- C-001349: C-ORAL-ROM - Integrated reference corpora for spoken romance languages. Multi-media edition; tools of analysis; standard linguistic measurements for validation in HLT
- C-001350: CELEX Dutch lexical database - Complete set
- G-001351: CELEX2
- C-001352: CHIL 2004 Evaluation Package
- C-001353: CHIL 2005 Evaluation Package
- C-001354: CLUVI Parallel Corpus
- D-001355: COMPDIC
- C-001356: CONSUMER Corpus of Spanish-Galician-Catalan-Basque consumer information
- C-001357: CRATER 2 Corpus
- C-001358: Cantonese SpeechDat-like MDB-2000
- C-001359: Mandarin Chinese Desktop Speech Recognition Corpus - Digit String (120 people)
- C-001360: Mandarin Chinese Desktop Speech Recognition Corpus - Digit String (200 people)
- C-001361: Mandarin Chinese Desktop Speech Recognition Corpus - Digit String (849 people)
- C-001362: Mandarin Chinese Desktop Speech Recognition Corpus - Digit String (98 people)
- C-001363: Mandarin Chinese Desktop Speech Recognition Corpus - Monosyllabic (98 people)
- C-001364: Mandarin Chinese Desktop Speech Recognition Corpus - Person Name, Place Name (70 people)
- C-001365: Mandarin Chinese Desktop Speech Recognition Corpus - Person name (849 people)
- C-001366: Mandarin Chinese Desktop Speech Recognition Corpus - Person name, Place Name (10 people)
- C-001367: Mandarin Chinese Desktop Speech Recognition Corpus - SMS (120 people)
- C-001368: Mandarin Chinese Desktop Speech Recognition Corpus - SMS (200 people)
- C-001369: Mandarin Chinese Desktop Speech Recognition Corpus - Simple Chinese sentences (850 people)
- C-001370: Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (50 people)
- C-001371: Mandarin Chinese Desktop Speech Recognition Corpus - Spontaneous Speech (849 people)
- C-001372: Mandarin Chinese Desktop Speech Recognition Corpus - Stock (70 people)
- C-001373: Mandarin Chinese Desktop Speech Recognition Corpus - Stock (849 people)
- C-001374: Mandarin Chinese Desktop Speech Recognition Corpus - Stock、 Person Name 、Digit String、Simple Chinese sentences、Spontaneous Speech (50 people)
- C-001375: Mandarin Chinese Telephone Speech Recognition Corpus - Digit String (649 people)
- C-001376: Mandarin Chinese Telephone Speech Recognition Corpus - Digit String
- C-001377: Mandarin Chinese Telephone Speech Recognition Corpus - Person Name (649 people)
- C-001378: Mandarin Chinese Telephone Speech Recognition Corpus - Person Name, Place Name (Mobile telephone 265)
- C-001379: Mandarin Chinese Telephone Speech Recognition Corpus SMS (Mobile telephone 64)
- C-001380: Mandarin Chinese Telephone Speech Recognition Corpus - Simple Chinese sentences (650 people)
- C-001381: Mandarin Chinese Telephone Speech Recognition Corpus - Spontaneous Speech (649 people)
- C-001382: Mandarin Chinese Telephone Speech Recognition Corpus Stock (649 people)
- C-001383: Mandarin Chinese Telephone Speech Recognition Corpus - Stock
- C-001384: Mandarin Chinese Telephone Speech Recognition Corpus -Person Name, Place Name
- C-001385: Mandarin Chinese Speech Recognition Corpus (desktop) - digit string (200 people)
- C-001386: Mandarin Chinese Speech Recognition Corpus (desktop) - place name (200 people)
- C-001387: Colombian Spanish Speech Database
- C-001388: Concise Oxford Dictionary - Audio Files
- T-001389: DESYN - Synonyms dictionary (DESYN)
- T-001390: DICO-SYNT
- D-001391: DIINAR.1 - Arabic Lexical Resource
- C-001392: DSO Corpus of Sense-Tagged English
- C-001393: Danish SpeechDat-Car - Full database
- D-001394: Danish-German dictionary (Institut for Erhvervsinformatik)
- D-001395: Dictionary of French local authorities (SINEQUA - Jean Dubois)
- D-001396: Dictionary of exclamatory stereotyped phrases (SINEQUA - Jean Dubois)
- D-001397: Dictionary of invariable forms and phrases (SINEQUA - Jean Dubois)
- D-001398: Dictionary of noun phrases and plural-only words (SINEQUA - Jean Dubois)
- C-001399: Discourse Graphbank
- D-001400: Dutch-French Lexicon (LanTmark)
- C-001401: ECI Multilingual Text
- D-001402: ENAMDICT/JMnedict
- C-001403: EUROM1e English
- C-001404: Egyptian Colloquial Arabic Lexicon
- C-001405: Emotional Prosody Speech and Transcripts
- C-001406: English Chinese Translation Treebank v 1.0
- C-001407: English Gigaword Second Edition
- C-001408: English Gigaword
- C-001409: English-Arabic Treebank v 1.0
- C-001410: European Language Newspaper Text
- C-001411: FEGA Corpus of French-Galician literary texts
- C-001412: FORM1 Kinematic Gesture
- C-001413: FORM2 Kinematic Gesture
- T-001414: FRESCO: French Polyphone Database (SpeechDat(M)) DB2
- D-001415: Terminology database of finance
- C-001416: Fisher English Training Part 2, Speech
- C-001417: Fisher English Training Part 2, Transcripts
- C-001418: Fisher English Training Speech Part 1 Speech
- C-001419: Fisher English Training Speech Part 1 Transcripts
- C-001420: Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
- C-001421: Fisher Levantine Arabic Conversational Telephone Speech
- C-001422: French Gigaword First Edition
- D-001423: French-English Lexicon (LanTmark)
- C-001424: GRONINGEN
- C-001425: German Polyphone Database (SpeechDat(M)) DB1
- C-001426: German Polyphone Database (SpeechDat(M)) DB2
- C-001427: German Pronunciation Rules Set - PHONRUL 9.0
- C-001428: German SpeechDat(II) FDB-1000
- C-001429: German spoken by Turkish OrienTel database
- C-001430: Hansard French/English
- C-001431: Hempel
- C-001432: Hong Kong Hansards Parallel Text
- C-001433: Hong Kong Laws Parallel Text
- C-001434: Hong Kong News Parallel Text
- C-001435: Hong Kong Parallel Text
- C-001436: ICE-GB (British English component of the International Corpus of English)
- C-001437: ILE: Italian LExicon
- D-001438: Terminology database of expressions
- D-001439: Italian lexicon with morphological information and clitic verbs
- D-001440: JMdict
- C-001441: Japanese Mandarin Speech Recognition Corpus (desktop) single Japanese sentence (200 people)
- C-001442: Japanese Mandarin Speech Recognition Corpus (desktop) Japanese person name (200 people)
- C-001443: Japanese Mandarin Speech Recognition Corpus (desktop) Japanese place name (200 people)
- C-001444: Japanese Mandarin Speech Recognition Corpus (desktop) digit string (200 people)
- D-001445: KANJIDIC/KANJD212
- C-001446: Karl May Korpus (KMK)
- C-001447: Korean Mandarin Speech Recognition Corpus (desktop) place name (150 people)
- C-001448: Korean Mandarin Speech Recognition Corpus (desktop) digit string (110 people)
- C-001449: Korean Mandarin Speech Recognition Corpus (desktop) person name (150 people)
- C-001450: Korean Mandarin Speech Recognition Corpus (desktop) single Korean sentences (40 people)
- C-001451: LABEL-LEX (MW)
- C-001452: LABEL-LEX (SW)
- D-001453: LC-STAR English-Russian Bilingual Aligned Phrasal lexicon
- D-001454: LC-STAR Hebrew (Israel) phonetic lexicon
- D-001455: LC-STAR Russian phonetic lexicon
- D-001456: LC-STAR Spanish phonetic lexicon
- D-001457: LC-STAR Turkish phonetic lexicon
- C-001458: LEGA Corpus of Galician-Spanish legal texts
- C-001459: LEGE-BI Corpus of Basque-Spanish legal texts
- C-001460: LOGALIZA Corpus of English-Galician software localization
- N-001461: "Le Monde Diplomatique" Text corpus in Arabic
- T-001462: LexIn 2:e Swedish Lexicon
- T-001463: LusoLEX European Portuguese Lexicon
- T-001464: MHATLex
- C-001465: MLCC Multilingual and Parallel Corpora
- C-001466: MTP Annotated German corpus - tagged version
- C-001467: MULTEXT Prosodic database
- C-001468: Mandarin Chinese Speech Recognition Corpus (desktop) - place name (120 people)
- C-001469: Mandarin Chinese Speech Recognition Corpus (desktop) - short message (120 people)
- C-001470: Mandarin Chinese Speech Synthesis Corpus (Basic Corpus)
- C-001471: Mandarin Chinese Speech Synthesis Corpus (Integrated Corpus)
- C-001472: Mandarin Chinese Speech Synthesis Corpus
- C-001473: Mandarin Chinese high clarity Speech Recognition Corpus (in recording studio) - (desktop) person name (200 people)
- C-001474: Mandarin Chinese high clarity Speech Recognition Corpus (in recording studio) - (desktop) place name (200 people)
- C-001475: Mandarin Chinese high clarity Speech Recognition Corpus (in recording studio) - (desktop) digit string (200 people)
- C-001476: Mandarin Chinese high clarity Speech Recognition Corpus (in recording studio) - single Chinese sentence (200 people)
- D-001477: Mechanical Engineering
- C-001478: Mixer Corpus
- N-001479: Monolingual Greek corpus
- D-001480: Multi-domain Lexicons
- C-001481: Multilingual Corpus
- D-001482: N de N Dictionary
- C-001483: NEMLAR Broadcast News Speech Corpus
- C-001484: NEMLAR Speech Synthesis Corpus
- C-001485: NEMLAR Written Corpus
- C-001486: ONOMASTICA-COPERNICUS DATABASE
- D-001487: Onomastica
- C-001488: OrienTel Egypt MCA (Modern Colloquial Arabic) database
- C-001489: OrienTel Egypt MSA (Modern Standard Arabic) database
- C-001490: OrienTel English as spoken in Egypt database
- C-001491: Original Short-Message Data Collation I in Chinese (PinYin)
- C-001492: Original Short-Message Data Collation I in Chinese (named entities)
- C-001493: Original Short-Message Data Collation I in Chinese (participles)
- C-001494: Original Short-Message Data Collation I in Chinese
- C-001495: Original Short-Message Data Collation II in Chinese (PinYin)
- C-001496: Original Short-Message Data Collation II in Chinese (named entities)
- C-001497: Original Short-Message Data Collation II in Chinese (participles)
- C-001498: Original Short-Message Data Collation II in Chinese
- T-001499: PAROLE Greek Lexicon
- C-001500: PAROLE Irish Distributable Corpus
- D-001501: PAROLE Portuguese Lexicon
- T-001502: PAROLE-SIMPLE-CLIPS PISA Italian Lexicon Full lexicon
- N-001503: PHONDAT 1 - PD1 (2nd edition)
- C-001504: PHONDAT 2 - PD2 (2nd edition)
- N-001505: POLYCOST
- C-001506: Phonetically Balanced Sentences
- C-001507: Phonetically Balanced Words (2)
- C-001508: Phonetically Balanced Words (4)
- C-001509: Phonetically Balanced Words (5)
- C-001510: Phonetically Balanced Words (3)
- C-001511: Phonetically Rich Words
- C-001512: Qualified POS Tagged Corpus
- C-001513: RVG-J (Regional Variants of German J)
- C-001514: Russian Speech Database
- C-001515: SALA Spanish Chilean Database
- C-001516: SALA Spanish Colombian Database
- C-001517: SIEMENS 100 - SI100
- C-001518: SIelex (Siemens Phonetic lexicon)
- C-001519: SPK
- N-001520: Siemens Chile Spanish FDB-250
- C-001521: Siemens VoiceMail
- C-001522: SmartKom Public
- C-001523: Spanish SpeechDat(M) - DB1
- C-001524: Spanish SpeechDat(M) - DB2
- D-001525: Spanish lexicon with morphological information
- C-001526: Speecon manually pitch-marked reference database for Spanish
- T-001527: Statistics (Termcat)
- C-001528: Swiss-French Polyphone Database 1000 speakers
- C-001529: Swiss-French Polyphone Database 4000 speakers
- C-001530: Swiss-German Speecon database
- C-001531: Switchboard Credit Card
- N-001532: Switchboard-1 Transcripts
- C-001533: TAXI - Multilingual telephone dialog database
- C-001534: TECTRA Corpus of English-Galician literary texts
- D-001535: THAMUS Bilingual dictionaries - Aeronautics (1)
- D-001536: THAMUS Bilingual dictionaries - Computer science (7)
- D-001537: THAMUS Bilingual dictionaries - Economics (1)
- D-001538: THAMUS. Generic Italian Dictionary - inflected forms - technical domain
- T-001539: TSNLP (Test Suites for NLP Testing)
- C-001540: The EMILLE Lancaster Corpus
- C-001541: The Lancaster Corpus of Mandarin Chinese (LCMC)
- D-001542: Toponymic Geography
- C-001543: Translanguage English Database (TED) Speech
- C-001544: Translanguage English Database (TED) Transcripts database
- C-001545: Translanguage English Database (TED) Transcripts
- C-001546: Treebank-2
- C-001547: Treebank-3
- C-001548: UN Parallel Text (Complete)
- C-001549: UN Parallel Text (English)
- C-001550: UN Parallel Text (French)
- C-001551: UN Parallel Text (Spanish)
- C-001552: UNESCO Corpus of English-Galician-French-Spanish scientific-technical divulgation
- C-001553: US English Speecon database
- C-001554: US Spanish Speecon database
- C-001555: USC Marketplace Broadcast News Speech
- C-001556: USC Marketplace Broadcast News Transcripts
- C-001557: VAHA (POLYPHONE II)
- T-001558: VERBA Polytechnic and Plurilingual Terminological Database - A-QA Mathematics
- T-001559: VERBA Polytechnic and Plurilingual Terminological Database - C-AG Petrology
- T-001560: VERBA Polytechnic and Plurilingual Terminological Database - C-GB Climate Studies
- T-001561: VERBA Polytechnic and Plurilingual Terminological Database - C-GC Weather Studies
- T-001562: VERBA Polytechnic and Plurilingual Terminological Database - G-NC Space Communications
- T-001563: VERBA Polytechnic and Plurilingual Terminological Database - L-AD Air Transport
- C-001564: VERBMOBIL - VM CD 13.1 (new edition)
- C-001565: VERBMOBIL - VM CD 8.1 (new edition)
- C-001566: VERBMOBIL II - VM CD 15.1 - VM15.1 (new edition)
- C-001567: VERBMOBIL II - VM CD 23.1 - VM23.1 (BAS edition)
- C-001568: VERBMOBIL II - VM CD 28.1 - VM28.1 (BAS edition)
- C-001569: VERBMOBIL II - VM CD 30.1 - VM30.1 (BAS edition)
- C-001570: VERBMOBIL II - VM CD 31.1 - VM31.1 (BAS edition)
- C-001571: VERBMOBIL II - VM CD 32.1 - VM32.1 (BAS edition)
- C-001572: VERBMOBIL II - VM CD 42.1 - VM42.1 (BAS edition)
- C-001573: VERBMOBIL II - VM CD 43.1 - VM43.1 (BAS edition)
- C-001574: VERBMOBIL II - VM CD 46.1 - VM46.1 (BAS edition)
- C-001575: VERBMOBIL II - VM CD 47.1 - VM47.1 (BAS edition)
- C-001576: VERBMOBIL II - VM CD 51.1 - VM51.1 (BAS edition)
- C-001577: VERBMOBIL II - VM CD 52.1 - VM52.1 (BAS edition)
- C-001578: VERBMOBIL II - VM CD 55.1 - VM55.1 (BAS edition)
- C-001579: VERBMOBIL II - VM CD 56.1 - VM56.1 (BAS edition)
- C-001580: VERBMOBIL II - VM CD 57.1 - VM57.1 (BAS edition)
- C-001581: VERBMOBIL II - VM CD 58.1 - VM58.1 (BAS edition)
- C-001582: VERBMOBIL II - VM CD 59.1 - VM59.1 (BAS edition)
- C-001583: Voice of America (VOA) Czech Broadcast News Transcripts
- C-001584: Voice of America (VOA) Czech Broadcast News Audio
- C-001585: Voicemail Corpus Part I
- C-001586: Voicemail Corpus Part II
- C-001587: WEBCOMMAND
- C-001588: WSJCAM0 Cambridge Read News
- G-001589: Web 1T 5-gram Version 1
- C-001590: West Point Arabic Speech
- C-001591: West Point Croatian Speech
- C-001592: West Point Heroico Spanish Speech
- C-001593: West Point Korean Speech
- C-001594: West Point Russian Speech
- C-001595: YOHO Speaker Verification
- C-001596: Aozora Bunko
- C-001597: Asahi Shimbun News Article Data for Research
- C-001598: CD-Maichichi Shimbun '91 data collection
- C-001599: CD-Mainichi Shimbun '93 Data Collection
- C-001600: CD-Mainichi Shimbun '95 Data Collection
- C-001601: CD-Mainichi Shimbun Data Collection
- C-001602: CD-ROM Mainichi Shimbun '92 Data Collection
- C-001603: CD-ROM Mainichi Shimbun '94 Data Collection
- C-001604: CD-ROM Nikkei Full-text Database - Nikkei Business Daily 2005
- C-001605: CD-ROM Nikkei Full-text Database - Nikkei Business Daily · Nikkei Financial Daily, and Nikkei Marketing Journal 2005
- C-001606: CRL-DB-TEXT-97-1
- C-001607: Mandarin Chinese Telephone Speech Recognition Corpus SMS (Fixed phone 86)
- C-001608: DVD-ROM 公開公報
- D-001609: EDR Concept Dictionary
- D-001610: EDR Electronic Dictionary
- D-001611: EDR English Co-occurrence Dictionary
- D-001612: EDR English Corpus
- D-001613: EDR English Word Dictionary
- D-001614: EDR English-Japanese Bilingual Dictionary
- D-001615: EDR Japanese Co-occurrence Dictionary
- C-001616: EDR Japanese Corpus
- D-001617: EDR Japanese Word Dictionary
- D-001618: EDR Japanese-English Bilingual Dictionary
- D-001619: EDR Technical Terminology Dictionary(Information Processing)
- D-001620: English Business Letter Examples Dictionary CD-ROM version
- D-001621: 講談社和英辞典コーパス2003
- C-001622: Kyoto Text Corpus
- C-001623: RWC-DB-TEXT-94-1
- C-001624: RWC-DB-TEXT-94-2
- C-001625: RWC-DB-TEXT-95-2
- C-001626: RWC-DB-TEXT-95-3
- D-001627: RWC-DB-TEXT-96-2
- C-001628: RWC-DB-TEXT-97-1
- C-001629: RWCP-DB-TEXT-CD2
- C-001630: THE DAILY YOMIURI Articles Data
- C-001631: Telephone/Keyboad Conversation Database
- C-001632: Yomiuri Shimbun Articles Data(Japanese)
- D-001634: Collins Digital Dictionaries — COBUILD DICTIONARY FOR MOBIPOCKET
- D-001635: Collins Digital Dictionaries — COBUILD DICTIONARY FOR SONY UIQ MOBILE PHONE
- D-001636: Collins Digital Dictionaries — COBUILD DICTIONARY FOR BLACKBERRY
- D-001637: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Palm edition; HarperCollins.co.uk-only]
- D-001638: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Windows Mobile edition; HarperCollins.co.uk-only]
- D-001639: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Mac OS X edition; HarperCollins.co.uk-only]
- D-001640: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Symbian Series 60 (1st and 2nd Edition) edition; HarperCollins.co.uk-only]
- D-001641: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Symbian Series 60 (3rd Edition) edition; HarperCollins.co.uk-only]
- D-001642: Collins Digital Dictionaries — COBUILD STUDENT'S DICTIONARY [Windows edition; HarperCollins.co.uk-only]
- D-001643: Collins Cobuild — STUDENT'S DICTIONARY PLUS GRAMMAR: Plus CD-Rom [Third edition]
- D-001644: Collins Cobuild — ADVANCED LEARNER'S ENGLISH DICTIONARY [Fifth edition]
- D-001646: Dualウィズダム和英辞典
- D-001647: Dualウィズダム英和辞典
- D-001648: 医療辞書'07 for ATOK
- D-001649: 医療辞書'06 for ATOK
- D-001650: 医療辞書'05 for ATOK
- D-001651: 医療辞書'04 for ATOK
- D-001652: 医療辞書'03 for ATOK
- D-001653: Collins Robert French-English/English-French Unabridged Dictionary
- D-001654: Papillon project
- D-001655: CJK Lexical Resources
- G-001656: Multilingual Glossary of technical and popular medical terms in nine European Languages
- D-001657: Dictionary of administrative official document notation for ATOK
- D-001658: Kyodo News reporters' handbook dictionary the 10th edition for ATOK
- D-001659: NHK New Dictionary for the usage of technical terms for ATOK revised edition
- D-001660: Yuhikaku legal terms conversion dictionary V2 for ATOK
- N-001661: Yuhikaku legal terms transformable dictionary for ATOK
- G-001662: Life science dictionary Plus 2007 for ATOK
- G-001663: New Oriental Medicine Dictionary V6 for ATOK
- G-001664: New Oriental Medicine Dictionary V5 for ATOK
- D-001665: New Oriental Medicine Dictionary V4 for ATOK
- G-001666: New Oriental Medicine Dictionary V3 for ATOK
- G-001667: New Oriental Medicine Dictionary V2 for ATOK
- G-001668: 170,000 electric・electronic・informational terms transformable・translatable for ATOK
- G-001669: Transformable・translatable dictionary of architecture・civil engineering terms for ATOK
- D-001670: Transformable・translatable dictionary of 170,000 achinery・engineering terms for ATOK
- D-001671: Equine Multilingual Dictionary
- G-001672: Comprehensive Database of Japanese Name Variants
- G-001673: Transformative and translatable dictionary of science・agriculture for ATOK
- D-001674: Wikipedia
- C-001675: Wikibooks
- N-001676: Dictionary for spelling foreign names for ATOK
- N-001677: Kaishashikihou transfoamable dictionary of company names for ATOK
- G-001678: Electric Japanese dictionary of Kojien 5th edition for ATOK
- G-001679: Meikyou Japanese Dictionary・Geneus English Japanese Dictionary・Japanese English DictionaryR.2
- G-001680: Meikyou Japanese Dictionary・Geneus English Japanese Dictionary・Japanese English Dictionary
- G-001681: Dictionary of modern Japanese 2007 for ATOK
- G-001682: Dictionary of modern Japanese 2006 for ATOK
- G-001683: Dictionary of modern Japanese 2005 for ATOK
- G-001684: Dictionary of modern Japanese for ATOK
- G-001685: Electric Encyclopedia Mypedia(edited in August 2006) for ATOK
- C-001687: Wikinews
- C-001688: Wikisource
- C-001689: Wikispecies
- C-001690: Wikiquote
- C-001691: Wikimedia Commons
- C-001692: Mandarin Chinese News Text
- C-001693: Reuters Corpus, Volume 1
- C-001694: Reuters Corpus, Volume 2
- C-001695: 1996 CSR HUB4 Language Model
- D-001696: Wiktionary
- C-001697: Wikiversity
- N-001698: Meta-Wiki
- C-001699: GENIA Corpus Version 3.02
- C-001700: GENIA corpus 3.02p
- C-001701: GENIA Corpus
- C-001702: GENIA Treebank Beta
- G-001703: CrossTowns - Automatically Generated Phonetic Lexicons of Cross-Lingual Pronunciation Variants of European City Names
- C-001704: Academia Sinica Balanced Corpus of Modern Chinese
- C-001705: Sinica Treebank
- C-001706: NEGRA Corpus Version 2
- C-001707: Corpus NILC/São Carlos
- C-001708: CorpusPE
- C-001709: CorpusALCA
- C-001710: CorpusNYT
- C-001711: Corpus Gesproken Nederlands
- T-001712: Arabic WordNet
- D-001713: Japanese-English-Japanese Technical Terms
- C-001714: METU Turkish Corpus
- C-001715: METU-Sabanci Turkish Treebank
- D-001716: English-Japanese Mechanical Engineering Terms
- D-001717: Japanese<>English Dictionary of Chemistry and Chemical Engineering
- D-001718: Japanese<>English Dictionary of Medicine and Pharmaceutics
- D-001719: Japanese<>English Dictionary of Biology and Biotechnology
- G-001720: 日英コンピュータ・IT用語辞典
- D-001721: Contextual Clues for Named Entity Recognition in Japanese
- C-001722: CSLU: Apple Words and Phrases
- G-001723: Multilingual Database of Proper Nouns
- D-001724: CJE Database of Technical Terminology
- G-001725: English-Chinese Dictionary of Computer Terms
- C-001726: TC-STAR 2006 Evaluation Package - ASR Spanish - EPPS
- C-001727: TC-STAR 2006 Evaluation Package - SLT Spanish-to-English - EPPS
- D-001728: The CJKI Japanese English Dictionary
- D-001729: The CJKI English-Japanese Dictionary
- D-001730: JAPANESE PHONOLOGICAL DATABASE
- G-001741: Japanese Lexical Database
- D-001742: Japanese-English Dictionary of Proper Nouns
- G-001743: WESTERN PERSONAL AND PLACE NAMES IN JAPANESE
- G-001744: Katakana Lexical Database
- G-001745: Japanese Orthographic Variants Classified by Type
- G-001746: Japanese Organization and Company Names
- G-001747: Database of Japanese-English Neologisms
- D-001748: Japanese and International Show Business Celebrities
- D-001749: Japanese-English Dictionary of Business and Finance
- D-001750: Cross-Synonym and Cross-Language Searching in Japanese
- D-001751: SINGLE-CHARACTER DICTIONARY FOR JAPANESE INPUT METHODS
- D-001752: The Kodansha Kanji Learner's Dictionary Electronic Dictionary Edition
- C-003102: 2001 Topic Annotated Enron Email Data Set
- C-003103: CSLU: Foreign Accented English Release 1.2
- C-003106: Web日本語Nグラム第1版
- C-003107: Web English N-gram Data
- C-003109: 2003 NIST Rich Transcription Evaluation Data
- C-003110: 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
- C-003111: CSLU: Kids` Speech Version 1.1
- C-003112: CSLU: Yes/No Version 1.2
- C-003152: NICT JLE Corpus
- C-003153: Comprehensive Database of Chinese Name Variants
- C-003171: Chinese-English Database of Proper Nouns
- C-003172: CHINESE LEXICAL DATABASE
- C-003173: Multilingual Dictionary of Proper Nouns
- N-003198: English-to-Simplified Chinese Dictionary
- N-003199: English-to-Traditional Chinese Dictionary
- C-003200: Linguistic Development Corpus
- C-003201: Progression Corpus
- C-003202: Salford Corpus
- C-003203: Brussels Corpus
- D-003204: Japanese-Chinese-English Database of Computer and IT Terminology
- D-003205: Dictionary of Chinese-English Neologisms
- D-003206: Chinese Morphological Database
- D-003207: SC AND TC CHINESE PINYIN DATABASE
- D-003208: SC JAPANESE PROPER NOUNS
- D-003210: Korean Lexical Database
- D-003211: Multilingual Dictionary of Korean Proper Nouns
- G-003212: Dictionary of Arabic Place Name Variants
- D-003213: Database of Arabic Names
- D-003214: Database of Arabic Proper Nouns
- C-003215: Newcastle Corpus
- C-003216: BioCaster ontology
- D-003217: PTS Dictionary (English-Malay)
- T-003218: PTS Thesaurus: (Malay)
- G-003219: Malay/Indonesian glossary
- D-003220: Terminology Dictionary
- C-003221: News articles
- C-003222: Collection of Malay Speech Sentences
- C-003223: UEA Corpus
- C-003224: Lancom Corpus
- C-003225: INL 27 Million Words Newspaper Corpus 1995
- C-003226: INL 5 Million Words Corpus 1994
- C-003227: INL 38 Million Words Corpus 1996
- C-003228: INL PAROLE Corpus 2004
- C-003229: Eindhoven Corpus
- C-003230: IFA Spoken Language Corpus v1.0
- D-003231: e-Lex 1.1
- D-003232: Neologisms Online v2
- D-003233: Reference Database for Belgian-Dutch
- D-003234: Reference Database for Dutch
- C-003235: Spoken Dutch Corpus 2.0
- C-003236: CGN Annotation dvd
- D-003237: Old Dutch Dictionary
- D-003238: Word List of the Dutch Language 1995
- D-003239: Word List of the Dutch Language 2005
- D-003240: CGN lexicon
- D-003241: Fonilex pronunciation database
- G-003242: CGN totalph
- G-003243: CGN lemalph
- C-003244: 重点領域研究「音声言語」・試験研究「音声DB」 連続音声データベース
- C-003245: 筑波大 多言語音声コーパス
- C-003246: 東北大-松下単語音声データベース
- C-003247: 基盤研究(A)「日本語方言の地域差」方言音声コーパス
- C-003248: 音声対話データベース (96年版)
- C-003249: 音声対話データベース (97年版)
- C-003250: 検索・要約用ニュース音声データベース
- C-003251: 会議音声データベース
- C-003252: RWCP 実環境音声・音響データベース
- C-003253: CIAIR 子供の声データベース
- C-003254: 雑音重畳日本語連続数字 音声認識評価環境
- C-003255: 雑音下日本語連続数字 音声区間検出評価環境
- C-003256: 実環境車内 日本語連続数字 音声認識評価環境
- C-003257: 実環境車内 日本語単語 音声認識評価環境
- C-003258: 日本人学生による読み上げ英語音声データベース
- C-003259: 留学生による読み上げ日本語音声データベース
- C-003260: 理研ワープロ操作対話音声コーパス
- C-003261: 日本音響学会新聞記事読み上げ音声コーパス
- C-003262: 新聞記事読み上げ高齢者音声コーパス
- C-003263: ATR Chinese Hotel Reservation Dialogue
- C-003264: Singapore Primary School Chinese Language Text
- C-003265: CSTSC-Flight Corpus
- C-003266: CUCorpora
- C-003267: CUSYL (Version 1.0)
- C-003268: CUWORD (Version 1.0)
- C-003269: CUSENT (Version 1.0)
- C-003270: Simultaneous Interpretation Database
- C-003271: CUDIGIT (Version 1.0)
- C-003272: CUCMD (Version 1.0)
- C-003273: CUCall Cantonese Sentences (Version 1.0)
- C-003274: CUCall Cantonese Words (Version 1.0)
- C-003275: CUCall Cantonese Digits (Version 1.0)
- C-003276: CUCall Cantonese Paragraphs (Version 1.0)
- C-003277: CUCall Cantonese Spontaneous Speech (Version 1.0)
- C-003278: CUCall Putonghua Speech (Version 1.0)
- C-003279: CUCall
- D-003280: CULEX (Version 1.0)
- D-003281: CUPDICT (Version 1.0)
- C-003282: 500-People Telephone Read Speech Corpus
- C-003283: Telephone Name Dialing Corpus
- C-003284: CASS Corpus
- C-003285: Wu-Dialectal Chinese Speech Corpus
- C-003286: BIT-MobileSpeech
- C-003287: BIT-MobileTalk
- C-003288: BIT-TeleSpeech
- C-003289: BIT-TonalName
- C-003290: BIT-MonoSyllable
- C-003291: CCC-VPR3C2005
- C-003292: CCC-VPR2C2005-1000X
- C-003293: CCC-VPR2C2005-1000
- C-003294: CCC-VPR2C2005-3000
- C-003295: CCC-VPR2C2005-6000
- C-003296: CCC-VPR2C2006-10000
- C-003297: CCC-VPR27C2006-50
- C-003298: CCC-VPR36C2006-100
- C-003299: Affective Speech Recognition
- C-003300: Penn Discourse Treebank Version 2.0
- C-003301: cbc4kids - Reading Comprehension Corpus
- C-003302: Chinese Affect Recognition
- C-003303: Chinese Treebank 6.0
- C-003304: Chinese Gigaword Third Edition
- C-003305: Tagged Chinese Gigaword
- C-003306: CELT - Corpus of Electronic Texts
- C-003307: CHRISTINE Corpus
- C-003308: Mandarin Affective Speech
- C-003309: Nationwide Speech Project
- C-003310: Corpus of Spoken, Professional American-English
- C-003311: ELISA English Language Interview Corpus as a Second-Language Application
- C-003312: THE LANCASTER-OSLO/BERGEN CORPUS
- C-003313: The LUCY Corpus
- C-003314: OntoNotes Release 1.0
- C-003315: MICASE
- C-003316: The SUSANNE Analytic Scheme
- D-003317: Biographical Dictionary
- D-003318: Cambridge Dictionaries Online
- D-003319: The CMU Pronouncing Dictionary
- N-003320: LEO Dictionary
- D-003321: The Rap Dictionary
- C-003322: OntoNotes Release 2.0
- T-003323: Thesaurus.com
- D-003324: TU Chemnitz Online Dictionary
- N-003325: Wikipedia - the free Encyclopedia
- C-003326: Global English Monitor Corpus
- C-003327: GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1
- C-003328: GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1
- C-003329: ISI Chinese-English Automatically Extracted Parallel Text
- C-003330: ACE 2005 English SpatialML Annotations
- C-003331: CSLU: Portland Cellular Telephone Speech Version 1.3
- C-003332: Hungarian-English Parallel Text, Version 1.0
- C-003333: Hungarian National Corpus
- D-003334: LC-STAR German Phonetic lexicon
- D-003335: LC-STAR German Phonetic lexicon in the Touristic Domain
- D-003336: LC-STAR Standard Arabic Phonetic lexicon
- D-003337: LC-STAR Finnish Phonetic lexicon
- D-003338: LC-STAR Mandarin Chinese Phonetic lexicon
- D-003339: LC-STAR Greek Phonetic lexicon
- D-003340: LC-STAR Italian Phonetic lexicon
- D-003341: LC-STAR English-German Bilingual Aligned Phrasal lexicon
- D-003342: LC-STAR English-Finnish Bilingual Aligned Phrasal lexicon
- D-003343: LC-STAR English-Italian Bilingual Aligned Phrasal lexicon
- C-003344: SALA II US English database
- C-003345: SALA II Portuguese from Brazil database
- C-003346: SALA II Spanish from Colombia Database
- C-003347: SALA II US Spanish West
- C-003348: GALE Phase 1 Arabic Blog Parallel Text
- C-003349: STC-TIMIT 1.0
- C-003350: CSLU: National Cellular Telephone Speech Release 2.3
- C-003351: スポーツ報知記事データ
- C-003353: ARCADE II Evaluation Package
- C-003355: 現代日本語書き言葉均衡コーパス(モニター公開版)
- C-003357: CESTA Evaluation Package
- C-003358: EQueR Evaluation Package
- C-003359: EvaSy Evaluation Package
- C-003360: CESART Evaluation Package
- C-003361: MEDIA Evaluation Package
- C-003362: ESTER Evaluation Package
- C-003363: ESTER Corpus
- C-003364: Text corpus of "Le Monde"
- C-003365: 太陽コーパス
- D-003367: 計算機用日本語基本辞書IPAL
- C-003369: 大阪外国語大学 多言語平行旅行会話文集 (サンプル)
- C-003370: Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
- C-003371: Orientel United Arab Emirates MSA (Modern Standard Arabic)
- C-003372: Orientel English as spoken in the United Arab Emirates
- C-003373: MEDIA speech database for French
- C-003375: JEITAマルチモーダル対話コーパス
- C-003376: Japanese Speecon database
- C-003377: Danish Speecon Database
- C-003378: Dutch from the Netherlands Speecon Database
- C-003379: Dutch from Belgium Speecon Database
- C-003380: French-Canadian Speecon database
- C-003382: 日本語学習者による日本語作文と,その母語訳との対訳データベース(作文対訳DB) オンライン版
- C-003384: 日本語学習者による日本語/母語発話の対照言語データベース(発話対照DB)
- C-003386: ことばに関する新聞記事見出しデータベース
- C-003388: 近代女性雑誌コーパス
- C-003390: 日本のふるさとことば集成 第1巻 北海道・青森
- D-003391: Polderland Dutch Lexicon of Abbreviations and Acronyms
- D-003392: Polderland Dutch General Lexicon
- D-003393: Polderland Dutch Lexicon of Names
- D-003394: Polderland Dutch Lexicon of Business Terminology
- D-003395: Polderland Dutch Lexicon of Legal Terminology
- D-003396: Polderland Dutch Lexicon of Medical Terminology
- D-003397: Polderland Dutch Lexicon of Social Terminology
- D-003398: Polderland Dutch Lexicon of Technical Terminology
- C-003399: PennBioIE Release 0.9
- T-003402: 分類語彙表-増補改訂版データベース
- C-003404: 日本のふるさとことば集成 第2巻 岩手・秋田
- C-003405: American National Corpus (ANC) Second Release
- D-003407: 形態素解析辞書UniDic
- D-003409: 近代文語UniDic ver.0.7
- C-003410: American National Corpus Second Release - Open Portion
- D-003412: ANC Second Release Frequency Data
- C-003431: ICIC Corpus of Philanthropic Fundraising Discourse
- D-003432: 語種辞書『かたりぐさ』ver.1.0.1
- D-003433: 表記統合辞書 ver.1.0
- C-003436: 自然発話音声データベース SDB
- C-003438: 自然発話音声・言語データベース(日英対訳) SLDB
- C-003440: ATR Dialogue Database
- C-003442: 多数話者音声データベース APP
- C-003444: 多数話者音声データベース APPBLA
- C-003446: 多数話者音声データベース APPDIC
- C-003448: 多数話者音声データベース
- C-003450: 会話表現データベース 模擬会話データ
- C-003452: 会話表現データベース 会話表現集データ
- C-003454: 会話表現データベース
- D-003458: 日本語の語彙特性 第1巻
- D-003460: 日本語の語彙特性 第2巻
- D-003462: 日本語の語彙特性 第3巻
- D-003464: 日本語の語彙特性 第4巻
- D-003466: 日本語の語彙特性 第5巻
- D-003468: 日本語の語彙特性 第6巻
- D-003470: CD-ROM 日本語の語彙特性 第2期
- D-003472: 日本語の語彙特性 第3期 CD-ROM付き
- D-003474: CD-ROM 日本語の語彙特性 第1期
- D-003476: 日本語の語彙特性
- C-003478: 日本のふるさとことば集成 第3巻 宮城・山形・福島
- C-003479: 日本のふるさとことば集成 第4巻 茨城・栃木
- C-003480: 日本のふるさとことば集成 第5巻 埼玉・千葉
- C-003481: 日本のふるさとことば集成 第6巻 東京・神奈川
- C-003482: 日本のふるさとことば集成 第7巻 群馬・新潟
- C-003483: 日本のふるさとことば集成 第8巻 長野・山梨・静岡
- C-003484: 日本のふるさとことば集成 第9巻 岐阜・愛知・三重
- C-003485: 日本のふるさとことば集成 第10巻 富山・石川・福井
- C-003486: 日本のふるさとことば集成 第11巻 京都・滋賀
- C-003487: 日本のふるさとことば集成 第12巻 奈良・和歌山
- C-003488: 日本のふるさとことば集成 第13巻 大阪・兵庫
- C-003489: 日本のふるさとことば集成 第14巻 鳥取・島根・岡山
- C-003490: 日本のふるさとことば集成 第15巻 広島・山口
- C-003491: 日本のふるさとことば集成 第16巻 香川・徳島
- C-003492: 日本のふるさとことば集成 第17巻 愛媛・高知
- C-003493: 日本のふるさとことば集成 第18巻 福岡・佐賀・大分
- C-003494: 日本のふるさとことば集成 第19巻 長崎・熊本・宮崎
- C-003495: 日本のふるさとことば集成 第20巻 鹿児島・沖縄
- C-003497: BYU CORPUS OF AMERICAN ENGLISH
- C-003498: TIME CORPUS
- C-003499: BYU-BNC
- D-003500: OED Corpus of Historical English
- C-003501: Corpus del Español
- C-003502: Corpus del Español: Registers
- C-003503: O CORPUS DO PORTUGUÊS
- C-003504: LDS GENERAL CONFERENCE TALKS
- C-003505: Early English Books Online (EEBO) and Literature Online (LION)
- C-003506: Hong Kong ICE Corpus
- C-003507: East African Component of the International Corpus of English (Release 2)
- C-003508: Indian component of the International Corpus of English
- C-003509: New Zealand ICE corpus
- C-003523: Singapore component of the International Corpus of English
- C-003524: 通話品質測定用多言語音声データベース(擬似音声)
- C-003525: 日本語音声データベース
- C-003526: 単語音声データベース
- C-003527: 多言語音声データベース1994
- C-003528: 多言語音声データベース2002
- C-003529: 通話品質測定用多言語音声データベース(都市名)
- C-003530: 広帯域ステレオ音声データベース
- C-003531: 姓名録音音声データベース Ver. 1.1
- C-003532: 高齢者&子供音声データベース
- C-003533: 音素バランス音声データ
- C-003534: The Philippines component of the International Corpus of English
- C-003535: The LUCY Corpus: Documentation
- C-003537: こども語辞書
- D-003539: goo流行語辞書2006
- D-003541: goo流行語辞書2005
- D-003543: 医療辞書'08 for ATOK
- D-003545: ぎょうせい 公用文表記辞書 for ATOK
- D-003552: 共同通信社 記者ハンドブック辞書 第11版 for ATOK
- D-003554: NHK 新用字用語辞書2008 for ATOK
- D-003555: 有斐閣法律法学用語変換辞書V3 for ATOK
- C-003557: 法令翻訳データ (標準対訳辞書対応)
- D-003559: 標準対訳辞書 ver. 3.0 (CSV)
- D-003561: 標準対訳辞書(計算機用)
- C-003563: 単語音声データベースVol.1(制御用語編)
- C-003571: 単語音声データベースVol.2(地名編)
- C-003572: 単語音声データベースVol.3(人名編)
- C-003573: 単語音声データベースVol.4(商取引用語編)
- C-003574: 単語音声データベース (通信端末版)
- C-003575: 単語音声データベースVol.2(地名編) (通信端末版)
- C-003576: 単語音声データベースVol.3(人名編) (通信端末版)
- C-003577: 単語音声データベースVol.4(商取引用語編) (通信端末版)
- C-003578: TUNA Corpus
- C-003579: North American News Text, Complete
- C-003580: North American News Text, General Release
- C-003581: BLLIP North American News Text, Complete
- C-003582: BLLIP North American News Text, General Release
- C-003585: CD-毎日新聞2007データ集
- C-003588: CD-毎日新聞2007データ集プラス
- C-003589: CD-毎日新聞2006データ集プラス
- C-003590: CD-毎日新聞2005データ集プラス
- C-003593: 朝日新聞記事データ集 学術研究用 2007
- C-003594: 朝日新聞記事データ集 学術研究用 2006
- C-003604: 読売新聞記事データ<邦文>2007年版 (*CSV形式)
- C-003605: 読売新聞記事データ<邦文>2006年版 (*CSV形式)
- C-003606: 読売新聞記事データ集 2007 (*テキスト形式)
- C-003607: 読売新聞記事データ集 2006 (*テキスト形式)
- C-003608: 読売新聞記事データ<英文>2007年版 (*CSV形式)
- C-003609: 読売新聞記事データ<英文>2006年版 (*CSV形式)
- C-003610: THE DAILY YOMIURI 記事データ集 2007 (*テキスト形式)
- C-003611: THE DAILY YOMIURI 記事データ集 2006 (*テキスト形式)
- D-003612: ヨミダス用語辞書
- C-003614: 日本語複合辞用例データベース第1版
- C-003615: UAM Spanish Treebank
- C-003617: OpenMWEコーパスv0.01
- C-003619: 基本慣用句五種対照表
- C-003622: 日本語の語彙特性 第4期
- C-003623: 基本語データベース語義別単語新密度
- C-003624: Turin University Treebank 1.1
- C-003625: Tübingen Partially Parsed Corpus of Written German
- C-003626: Tübingen Treebank of Spoken German
- C-003627: Tübingen Treebank of Written German Release 4
- C-003628: Tübingen Treebank of Spoken English
- C-003629: Tübingen Treebank of Spoken Japanese
- T-003630: GermaNet 5.1
- C-003631: Princeton WordNet Gloss Corpus
- C-003632: SemCor 1.6
- C-003633: MultiSemCor Corpus 1.1
- C-003634: SemCor 1.7
- C-003635: SemCor 1.7.1
- C-003636: SemCor 2.0
- C-003637: SemCor 2.1
- C-003638: SemCor 3.0
- C-003639: Wordnet Domains 3.2
- C-003640: Corpus of Written British Creole
- C-003641: AUTONOMATA Spoken Names Corpus
- C-003642: COREA-coreferentiecorpus
- C-003643: D-Coi-corpus
- C-003644: DuELME
- C-003645: Twente Nieuws Corpus
- C-003646: Malay Concordance Project
- T-003647: Asian WordNet
- C-003648: NTCIRデータセット/テストコレクション
- C-003650: 宇都宮大学パラ言語情報研究向け音声対話データベース
- C-003652: 電総研 単語音声データベース
- C-003655: NTT 乳幼児音声データベース
- C-003658: CASTEL/J CD-ROM バージョン1.2
- D-003659: CASTEL/J CD-ROM ミレニアム・バージョン (v.1.3)
- C-003662: 教育研究情報データベース 高校入試問題
- C-003664: 教育研究情報データベース 初等中等教育諸学校における実践的教育研究主題
- C-003665: 教育研究情報データベース 教育学関係博士・修士学位論文題目
- C-003668: 教育研究情報データベース 教育研究所・教育センター刊行論文
- C-003669: 教育研究情報データベース 地方教育センター等における教職員研修講座
- C-003671: 東京大学史料編纂所データベース
- C-003674: 古文書フルテキストデータベース
- C-003676: 古記録フルテキストデータベース
- C-003677: 奈良時代古文書フルテキストデータベース
- C-003679: 平安遺文フルテキストデータベース
- C-003682: 鎌倉遺文フルテキストデータベース
- D-003683: 電子くずし字字典
- C-003686: JCMD大阪
- C-003687: 日本主要都市方言音声データベース
- C-003690: JCMD「天気予報」Ⅰ~Ⅴ
- C-003691: 琉球方言音声データベース
- D-003694: 今帰仁方言音声データベース
- D-003696: 首里・那覇方言音声データベース
- D-003697: 奄美方言音声データベース
- D-003699: 宮古方言音声データベース
- C-003701: 古今和歌集データベース
- D-003703: 電子音声沖縄語辞典
- C-003706: 連歌連想語彙データベース
- C-003708: 連歌データベース
- C-003710: 古事類苑全文データベース
- C-003711: 和歌データベース
- C-003714: 俳諧データベース
- C-003739: NTCIR Test Collections
- C-003740: NTCIR-1(情報検索/用語抽出研究用テストコレクション)
- C-003741: NTCIR-2(情報検索用テストコレクション)
- C-003742: NTCIR-2 SUMM(テキスト自動要約用テストコレクション)
- C-003743: NTCIR-2 SUMM TAO(自動要約用データ:TAO作成)
- C-003744: NTCIR-3 CLIR(情報検索/言語横断検索用テストコレクション)
- C-003745: NTCIR-3 PATENT(特許検索テストコレクション)
- C-003746: NTCIR-3 QA(質問応答用テストコレクション)
- C-003747: NTCIR-3 SUMM(テキスト自動要約用テストコレクション)
- C-003748: NTCIR-3 WEB(Web検索評価用テストコレクション)
- C-003749: NTCIR-4 CLIRꉡfeXgRNV
- C-003750: NTCIR-4 特許検索タスクテストコレクション
- C-003751: NTCIR-4 QAC2(質問応答テストコレクション)
- C-003752: NTCIR-4 WEB(Web検索評価用テストコレクションタスク文書データ)
- C-003753: NTCIR-5 CLIR ꉡfeXgRNV
- C-003754: NTCIR-5 CLQA 多言語質問応答テストコレクション
- C-003755: NTCIR-5 特許検索タスクテストコレクション
- C-003756: NTCIR-5 QAC 質問応答テストコレクション
- C-003757: NTCIR-5 WEB検索評価用テストコレクション
- C-003758: NTCIR-6 CLIR ꉡfeXgRNV
- C-003759: NTCIR-6 CLQA 多言語質問応答テストコレクション
- C-003760: NTCIR-6 OPINION 意見分析タスクテストコレクション
- C-003761: NTCIR-6 特許検索タスクテストコレクション
- C-003762: NTCIR-6 QAC 質問応答テストコレクション
- C-003763: NTCIR-6 MuST 「動向情報の要約と可視化」テストコレクション
- C-003764: 怪異・妖怪伝承データベース
- C-003767: 現代中国語コーパス
- D-003769: ヒンディー語・日本語・英語辞書
- D-003771: ヒンディー語辞書見出し語データベース
- D-003772: ヒンディー・パンジャービー・カンナダ・テルグの辞書・見出し語辞書
- D-003775: シンハラ語・日本語辞書(試作版)
- D-003777: パンジャービー語・日本語・英語辞書
- D-003778: 現代チベット語動詞辞典
- C-003781: ヒンディー語古典データベース
- C-003782: ヒンドゥー教聖典データベース
- N-003784: 古代チベット語文献オンライン
- N-003787: 漢字字体規範データベース
- C-003788: 中期朝鮮語形態素データベース
- C-003790: 大規模ブログコーパス
- C-003793: 日米特許対訳コーパス
- D-003795: 基本語意味データベース: Lexeed
- C-003796: 100地名単語データベース
- C-003798: 中日対訳コーパス
- T-003801: IPA STaX
- C-003803: 日本語会話データベース
- C-003804: 羅生門
- C-003805: The Small Catechism of Martin Luther
- C-003806: 奥の細道
- C-003807: Learner Corpus Data
- C-003808: Learner Corpus Data: Traveling and Gardening
- C-003809: Learner Corpus Data: Cooking and Gardening
- C-003810: Learner Corpus Data: How Far Did the Kite Go?
- C-003811: Learner Corpus Data: Momotaro
- C-003812: Learner Corpus Data: The North Wind and the Sun
- C-003813: Learner Corpus Data: Mercury and the Workmen
- C-003814: Data Documentation: 9th Graders' Email Messages
- D-003815: Dictionary of Sources of Classical Japan
- C-003816: 日本語音韻データベース
- C-003819: 大正新脩大藏經テキストデータベース
- D-003821: 日英英日専門用語辞書
- C-003823: LEXICAL FREQUENCY STATISTICS IN JAPANESE
- N-003824: Parallel Texts for Medical Scenes
- G-003825: Kawasaki City CEC Handbook of Common Expressions in School
- G-003826: 算数6ヶ国語対訳集
- C-003827: Kawasaki City CEC Teacher to Parent Letter Correspondence
- C-003828: Kawasaki City CEC Elementary School Education Series: Social Studies
- C-003829: Dialog Corpus for Medical Scenes
- T-003830: Multilingual Thesaurus of Academic Terms
- D-003831: Multilingual Dictionary for Kyoto Tourism
- C-003833: 日本人はどんなふうにしゃべっているの?
- C-003835: 新聞記事文庫
- D-003836: EDR Japanese/English Word Dictionary
- N-003837: Picton
- D-003838: ライフサイエンス辞書Plus 2009 for ATOK
- C-003840: EPWING互換形式ライフサイエンス辞書 2009 + Jamming Light
- C-003841: Tinkertoy Corpus
- N-003842: CSD
- C-003843: The Tanaka Corpus
- D-003844: WWWJDIC
- D-003845: French-English-Malay Dictionary
- D-003846: Forgiving Online Kanji Search
- D-003847: SaiKam
- D-003848: Từ điển tiếng Việt
- D-003849: 和独辞典 WaDokuJT (EPWING)
- D-003850: 和独辞典
- C-003852: 中国語方言字音データベース
- C-003853: 広東語常用単語データベース
- T-003855: 多言語間語義ネットワーク
- D-003856: 中国語動詞補語用法オンライン辞書
- C-003857: The JEFLL (Japanese EFL Learner) Corpus
- C-003858: 文法項目別BNC用例集
- C-003859: ORCHID POS-Tagged Corpus
- N-003860: LOTUS
- C-003861: LIVAC Synchronous Corpus
- N-003862: Chinese Dependency Treebank
- C-003863: AnnCorra
- C-003864: Corpus Program
- C-003865: Sinica Balanced Corpus
- C-003866: Word List with Accumulated Word Frequency in Sinica Corpus 3.0
- C-003867: Chinese Electronic Dictionary
- T-003868: Academia Sinica Bilingual WordNet
- T-003869: Academia Sinica Bilingual Ontological Database
- C-003870: Academia Sinica Tagged Corpus of Early Mandarin Chinese
- N-003871: Formosan Language Archive
- N-003872: CIRB030
- C-003873: MAT-160
- C-003874: MAT-400
- C-003875: MAT-2000Edu
- C-003876: MAT-2000Com
- C-003877: MAT-2500ExtV-Edu
- C-003878: MAT-2500ExtV-Com
- C-003879: TCC-300Edu
- C-003880: TCC-300Com
- C-003881: Sinica MCDC
- C-003882: EAT-ALL
- C-003883: EAT-200
- C-003884: MATBN
- C-003885: Affix Database
- T-003886: E-HowNet Ontology
- C-003887: The NIE Corpus of Spoken Singapore English
- C-003888: The Lim Siew Hwee Corpus of Informal Singapore Speech
- C-003889: A Corpus of Spoken PRC English
- C-003890: The Yeo (2001) Corpus of Sec 2 Compositions
- C-003891: Gyan Nidhi
- D-003892: SHABDIKA
- C-003893: Annotated Speech Corpora-DRDO
- C-003894: Tamil Digital Corpus
- C-003895: VerreTaal
- D-003896: Digital Dictionaries of South Asia
- G-003897: The Hobson-Jobson Anglo-Indian dictionary
- D-003898: Candrakānta abhidhāna
- D-003899: A course in Baluchi
- D-003900: A grammar, phrase book and vocabulary of Baluchi
- D-003901: A sketch of the northern Balochi language
- D-003902: A text book of the Balochi language
- G-003903: Baluchi glossary
- D-003904: Samsada Bangala abhidhana
- D-003905: Samsad Bengali-English dictionary
- D-003906: Bangala bhashara abhidhana
- D-003907: A practical Hindi-English dictionary
- D-003908: A dictionary of the Kashmiri language
- D-003909: Dictionary of the Lushai language
- D-003910: A dictionary, Marathi and English
- D-003911: The Aryabhusan school dictionary, Marathi-English
- D-003912: A practical dictionary of modern Nepali
- D-003913: A comparative and etymological dictionary of the Nepali language
- D-003914: The Pali Text Society's Pali-English dictionary
- D-003915: A dictionary of the Puk'hto, Pus'hto, or language of the Afghans
- D-003916: New Persian-English dictionary
- D-003917: A comprehensive Persian-English dictionary
- D-003918: A dictionary of the dialects spoken in the state of Jeypore
- D-003919: The practical Sanskrit-English dictionary
- D-003920: A practical Sanskrit dictionary
- N-003921: A Sindhi-English dictionary
- D-003922: J. P. Fabricius's Tamil and English dictionary
- D-003923: Na Kadirvelu Pillai, Tamil Moli Akarathi
- D-003924: A core vocabulary for Tamil
- D-003925: Tamil lexicon
- D-003926: A comprehensive Tamil and English dictionary of high and low Tamil
- D-003927: Charles Philip Brown, A Telugu-English dictionary
- D-003928: J. P. L. Gwynn, A Telugu-English dictionary
- D-003929: A dictionary of Urdu, classical Hindi, and English
- D-003930: A dictionary, Hindustani and English
- D-003931: A Dravidian etymological dictionary
- D-003932: A comparative dictionary of Indo-Aryan languages
- C-003933: Urdu-Nepali-English Parallel Corpus
- C-003934: Urdu Word List
- G-003935: Urdu 5000 Most Frequently Used Words List
- G-003936: Urdu Closed Class Words List
- D-003937: Sindhi English Dictionary
- C-003938: The Hong Kong Cantonese Child Language Corpus
- G-003939: Multil-language Glossary on Natural Disasters
- D-003940: TCL's Computational Lexicon
- C-003941: Vietnamese Text Corpus
- C-003942: Vietnamese Bitext Corpus
- C-003943: Vietnamese Dictionary
- D-003944: Thai Dictionary
- C-003945: Thai Text Corpus
- C-003946: Thai Bitext Corpus
- C-003947: Burmese Text Corpus
- C-003948: Burmese Bitext Corpus
- D-003949: Burmese Dictionary
- C-003950: Khmer Text Corpus
- D-003951: Khmer Dictionary
- C-003952: Khmer Bitext Corpus
- C-003953: Lao Text Corpus
- D-003954: Lao Dictionary
- D-003955: Shan Dictionary
- C-003956: Shan Bitext Corpus
- D-003957: Sgaw Karen Dictionary
- C-003958: Mon-Khmer languages database
- D-003959: Mon-Khmer etymological dictionary
- C-003960: Corpus of Khmer Inscriptions
- C-003961: The King's Thai
- C-003962: Bar Ahom Lexicon
- C-003963: Bar Ahom Manuscript
- C-003964: Indian languages Corpora
- C-003965: Assamese Corpora
- C-003966: Manipuri Corpora
- D-003967: English-Assamese Dictionary
- D-003968: Assamese-English Dictionary
- D-003969: English-Manipuri Dictionary
- D-003970: Manipuri-English Dictionary
- D-003971: Hindi-Assamese Dictionary
- D-003972: Transliterated Assamese Dictionary
- C-003973: Archive of Recorded World Literature
- D-003974: GujaratiLexicon
- D-003975: Monier Williams Sanskrit Dictionary
- T-003976: Bhagavadgita
- T-003977: Tirukkural Wordlists
- N-003978: dhvani
- D-003979: KK Dictionary
- D-003980: Shabdanjali: English-Hindi dictionary
- D-003981: Marathi - Hindi Dictionary
- D-003982: Kannada - Hindi Dictionary
- D-003983: Telugu - Hindi Dictionary
- D-003984: Punjabi - Hindi Dictionary
- D-003985: Calita Bengali-Hindi Dictionary
- D-003986: Dishi Bengali-Hindi Dictionary
- T-003987: English-Telugu Dictionary
- D-003988: English - Hindi Dictionary : Version-2.0
- D-003989: Trilingual Dictionary
- D-003990: Universal Word - Hindi Dictionary
- C-003991: Newari Lexicon
- C-003992: Digitised-Online Bilingual Puratan Janam Sakhi
- C-003993: Pushto (Pashtun language) instructional recordings
- C-003994: Classical Urdu Poetry
- C-003995: Punjabi instructional recordings
- C-003996: Valmiki Ramayana Translation
- D-003997: Monier Williams Sanskrit-English Dictionary (current versions)
- D-003998: Apte English-Sanskrit Dictionary
- D-003999: Monier Williams Sanskrit-English Dictionary (earlier versions)
- C-004000: Project Madurai Archives
- D-004001: Kamus Bahasa
- C-004002: Corpus of Modern Tamil text
- C-004003: Radio plays
- C-004004: Modern Short Stories for reading comprehension
- C-004005: ACIP RELEASE 6
- C-004006: Great Books of Yoga
- C-004007: Chone Drakpa Shedrup Rinpoche
- C-004008: St. Petersburg Catalog
- D-004009: Wa Dictionary Database
- N-004010: Wa Language Corpus
- C-004011: Comparative Chart of Wa Orthographies
- C-004012: List of about 300 Pairs of Morphologically-related Wa Words
- C-004013: Samples of spoken Wa
- C-004014: Chinese Web 5-gram Corpus
- G-004015: English Keywords for public servants
- C-004016: 863 program in 2007 SSMT machine translation evaluation data
- C-004017: The contemporary chinese general balanced corpus of National Language Committee(Segmentation lexicon)
- C-004018: The contemporary chinese general balanced corpus of National Language Committee(Syntactic Treebank)
- C-004019: The contemporary chinese general balanced corpus of National Language Committee(Segmentation and part-of-speech annotated)
- C-004020: The contemporary chinese general balanced corpus of National Language Committee(Raw)
- C-004021: Chinese-English/Chinese-Japanese parallel corpora
- G-004022: A Glossary of Pali and Buddhist Terms
- C-004023: Audio recordings and streams
- T-004024: Marathi Wordnet
- T-004025: OriNet
- T-004026: San-Net
- D-004027: ORIDIC
- C-004028: HKCAC
- C-004029: HKCPSC
- C-004030: NTU Corpus of Formosan Languages
- C-004031: Cleaneval development dataset
- C-004032: SCoRE: Singapore Corpus of Research in Education
- C-004033: Singaporean Preschoolers Oral Competence in Mandarin
- C-004034: Hindi Speech Data base
- C-004035: Mandarin Topic-oriented Conversation Corpus
- C-004036: Mandarin Map Task Corpus
- C-004037: The Mandarin Conversational Dialogue Corpus
- T-004038: eXtended WordNet
- T-004039: VerbNet
- T-004040: AlbaNet
- T-004041: Croatian WordNet
- T-004042: DanNet
- T-004043: Hebrew WordNet
- T-004044: Líonra Séimeantach na Gaeilge
- T-004045: PersiaNet
- T-004046: sloWNet
- T-004047: Italian Wordform List
- D-004049: Nijmegen Arabic/Dutch Dictionary
- C-004050: The EGYPT Statistical Machine Translation Toolkit
- C-004051: Corpus of Contemporary Arabic
- C-004052: Penman Upper Model
- T-004053: SENSUS
- C-004054: Unified Medical Language System
- T-004055: UMLS Semantic Network
- D-004056: SPECIALIST Lexicon
- C-004057: BulTreeBank
- C-004058: Corpus of the Contemporary Lithuanian Language
- C-004059: Corpus of Spoken Israeli Hebrew
- C-004060: DGT Multilingual Translation Memory of the Acquis Communautaire
- C-004061: multilingual parallel corpus of translation
- C-004062: EVROKORPUS
- C-004063: WPT 05
- C-004064: Croatian National Corpus
- C-004065: Czech National Corpus
- D-004067: Dictionnaire francais-japonais version preliminaire 3
- C-004068: SYN2006PUB
- C-004069: SYN2005
- C-004070: SYN2000
- C-004071: FSC2000
- C-004072: KSK-DOPISY
- C-004073: ORWELL
- C-004074: ORAL2008
- C-004075: ORAL2006
- C-004076: PMK
- C-004077: BMK
- C-004078: DIAKORP
- C-004079: InterCorp
- C-004080: Audio Archive of Linguistic Fieldwork
- C-004081: NewsgroupsUseNet Corpora
- C-004082: Hellenic National Corpus
- C-004083: Dialogue Diversity Corpus Version 2.0
- C-004084: Louvain International Database of Spoken English Interlanguage
- C-004085: SMULTRON
- C-004086: Stockholm Umeå Corpus
- C-004087: Göteborg Spoken Language Corpus
- C-004088: Swedish treebank
- D-004089: NorKompLeks
- C-004090: Norwegian Newspaper Corpus
- C-004091: LOGON parallel tourist corpus
- C-004092: Sofie Treebank
- C-004093: Corpus of Written Estonian
- T-004094: Word Sense Disambiguation Test Collection
- C-004095: Balanced Corpus of Estonian
- C-004096: Estonian Reference Corpus
- C-004097: English-Estonian and Estonian-English parallel corpus
- C-004098: Corpus of Estonian Dialects
- C-004099: Phonetic Corpus of Estonian Spontaneous Speech
- C-004100: Corpus of spoken Estonian
- C-004101: PLUG corpus
- C-004102: Turkish-Swedish Corpus
- C-004103: Talbanken76
- C-004104: Talbanken05
- C-004105: LinGO Redwoods
- C-004106: New Corpus for Ireland
- C-004107: speech accent archive
- T-004108: TermNet
- C-004109: Lancaster Speech, Writing and Thought Presentation Written Corpus
- C-004110: Lancaster Speech, Writing and Thought Presentation Spoken Corpus
- C-004111: Russian National Corpus
- C-004112: Deeply Annotated Corpus
- C-004113: Parallel text corpus
- C-004114: Dialectal corpus
- C-004115: Corpus of Spoken Russian
- C-004116: FIDA
- C-004117: IJS - ELAN
- C-004118: Slovene Dependency Treebank
- T-004119: sloWNet
- C-004120: Multext-East Resources, Version 3
- T-004121: MULTEXT-East morphosyntactic lexicons
- C-004122: MULTEXT-East 1984 corpus
- C-004123: MULTEXT-East comparable corpus
- C-004124: MULTEXT-East parallel speech corpus
- C-004125: EVROKORPUS
- C-004126: IPI PAN Corpus
- C-004127: KACENKA
- C-004128: Korpus 2000
- C-004129: Korpus 90
- C-004130: KorpusDK
- C-004131: Leipzig Corpora Collection
- C-004132: National Corpus of Polish
- C-004133: Uppsala Corpus
- N-004134: Corpus of Interviews
- C-004135: Scottish Gaelic corpus
- C-004136: Welsh corpus
- D-004137: Chronological Morphemic and Word-Formational Dictionary of Russian
- D-004138: Corpus of frequency dictionary of contemporary Polish
- C-004139: Comparable corpus of English and Russian news texts
- C-004140: The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication
- T-004141: Austronesian Basic Vocabulary Database
- T-004142: Bantu Basic Vocabulary Database
- C-004143: BySoc
- C-004144: amph
- C-004145: Digital Morphology Archives
- C-004146: Finnish Broadcast Corpus
- C-004147: Finnish-Swedish Textcollection
- C-004148: Finnish Text Collection
- C-004149: Helsinki Corpus of Swahili
- C-004150: Oulu corpus
- C-004151: SFNET discussion group corpus
- C-004152: Berlin Database of Emotional Speech
- C-004153: Estonian Emotional Speech Corpus
- C-004154: Weblog Data Collection
- C-004155: Web Corpus
- C-004156: Web Corpus 2007
- C-004157: Web Corpus 2006
- C-004158: PERC Corpus
- C-004159: WaC Users
- C-004160: WaC Users Marginal
- C-004161: WaC Users Junk
- C-004162: DeWaC German Web Corpus
- C-004164: 日本語アプレイザル評価表現辞書(JAppraisal 辞書)~態度評価編~
- C-004165: 岩波国語辞典第五版タグ付きコーパス2004
- C-004168: 新聞記事GDAコーパス2004
- C-004170: 京都大学格フレーム(Ver 1.0)
- D-004172: GSK地名施設名辞書
- C-004174: 甲南大学-教育測定研究所 Konan-JIEM Learner Corpus Third Edition
- C-004176: 甲南大学こどもコーパス
- C-004178: CASTEL/J CD-ROM V1.5
- C-004179: A Linguistic Atlas of Early Middle English Version 2.1
- C-004180: A Representative Corpus of Historical English Registers 3.2
- C-004181: British English 2006
- C-004182: Michigan Corpus of Upper-Level Student Papers
- C-004183: The John Swales Conference Corpus
- C-004184: Vienna-Oxford International Corpus of English 1.1
- C-004185: Vienna-Oxford International Corpus of English (version 1.1 XML)
- C-004186: Yahoo-based Contrastive Corpus of Questions and Answers
- C-004187: The Penn-Helsinki Parsed Corpus of Modern British English
- C-004188: The Penn-Helsinki Parsed Corpus of Middle English, second edition
- C-004189: The Penn-Helsinki Parsed Corpus of Early Modern English
- C-004190: The Penn Corpora of Historical English
- C-004191: The Small Corpus of Political Speeches
- C-004192: Corpus of Contemporary American English
- C-004193: Corpus of Historical American English
- C-004194: Corpus of American Soap Operas
- C-004195: ukWaC
- C-004196: deWaC
- C-004197: itWaC
- C-004198: frWaC
- C-004199: PukWaC
- C-004200: SdeWaC
- C-004201: WaCkypedia_EN
- C-004202: Morph-it! Version 0.48
- C-004203: La Repubblica Corpus
- C-004204: Internet Argument Corpus
- C-004205: Film Corpus
- C-004206: Uppsala PErsian Corpus
- C-004207: Uppsala PErsian Dependency Treebank
- C-004208: Bijankhan Corpus
- C-004209: Tehran English-Persian Parallel Corpus
- C-004210: Prague Czech-English Dependency Treebank 2.0
- C-004211: CD-毎日新聞2008データ集
- C-004212: CD-毎日新聞2009データ集
- C-004213: CD-毎日新聞2010データ集
- C-004214: CD-毎日新聞2011データ集
- C-004215: CD-毎日新聞2008データ集プラス
- C-004216: CD-毎日新聞2009データ集プラス
- C-004217: CD-毎日新聞2010データ集プラス
- C-004218: CD-毎日新聞2011データ集プラス
- C-004219: English as a Lingua Franca in Academic Settings
- C-004220: Helsinki Archive of Regional English Speech – Cambridgeshire sampler
- C-004221: Helsinki Archive of Regional English Speech
- C-004222: Cantonese Accent Chinese Speech Corpus
- C-004223: TH Corpus of Speech Synthesis No. 0
- C-004224: Face Emotional Expression
- C-004225: Chinese Event Bank - Part 1
- C-004226: 京都大学テキストコーパス Version 3.0
- C-004227: 京都大学テキストコーパス Version 4.0
- G-004228: NICT 格助詞変換データ Version 1.1
- C-004230: Wikipedia日英京都関連文書対訳コーパス Version 2.01
- D-004233: 日本語WordNet (1.1)
- T-004234: 日本語WordNet同義対データベース ver 1.0
- C-004235: 日本語Wikipediaエントリの係り受けデータベース
- C-004236: 日本語係り受けデータベース (Version 1.1)
- C-004237: 日英中基本文データ
- G-004238: 文脈類似語データベース (Ver. 1.1.2)
- G-004239: 単語共起頻度データベース (Version 1.1)
- G-004240: 基本的意味関係の事例ベース (Version 1.4)
- C-004241: 京都観光ブログの評価情報付与データ (Version 1.0)
- G-004242: 動詞含意関係データベース (Version 1.3.1)
- G-004243: 負担・トラブル表現リスト (Version 1.0)
- T-004244: 上位語階層データ (Version 1.0.1)
- C-004245: 日本語パターン言い換えデータベース (Version 1)
- G-004246: 日本語異表記対データベース (Version 1.1)
- C-004247: 日英翻訳エンジン学習・評価用対訳コーパス (Version 1.0)
- C-004248: A Chinese Dependency Parser(CNP)用中国語解析モデル Version 1
- C-004249: 意見(評価表現)抽出ツール用モデル Version 1.2
- C-004250: 日本語高齢者音声データベース
- C-004251: ノンネイティブ英語音声データベース
- C-004252: 中国語音声データベース
- C-004253: 京都観光案内対話データベース
- C-004254: 日本語小学生音声データベース
- C-004255: 日本語音声データベース
- C-004256: 日英・日中バイリンガル独話音声データベース
- C-004257: 明六雑誌コーパス
- G-004258: Webデータに基づく複合動詞用例データベース/日本語複合動詞リスト (ver.1.1)
- C-004259: Webデータに基づく複合動詞用例データベース
- C-004260: 外国人学習者の日本語誤用例集 データベース版
- C-004261: 外国人学習者の日本語誤用例集 PDF版
- C-004262: OJAD
- C-004263: ことばに関する新聞記事画像データベース
- C-004264: 雑誌『国語学』全文データベース
- C-004266: 米国議会図書館蔵『源氏物語』翻字本文
- C-004268: 楽天データ
- C-004269: IDENTIC
- C-004270: The Thor Corpus
- C-004271: The Jensson Corpus
- C-004272: The RÚV Corpus
- C-004273: English Web Treebank
- C-004274: Enron Email Dataset
- C-004275: The EnronSent Corpus
- C-004277: マルチモーダル音声認識評価環境
- C-004279: 音声研究用X線フィルムデータベース (X-Ray)
- C-004281: 特定領域研究「韻律と音声処理」日本語MULTEXT韻律コーパス
- C-004283: 中国語MULTEXTコーパス
- C-004285: 慶應義塾大学 研究用感情音声データベース
- C-004288: 東工大 多言語音声コーパス アイスランド語
- C-004289: 東工大 多言語音声コーパス インドネシア語
- C-004291: AWA長期間収録音声コーパス
- C-004293: 鶴岡調査音声データベース91-92
- C-004295: 身体情報付き男・女・子どもの母音音声データベース
- C-004297: 残響下日本語連続数字 音声認識評価環境
- C-004299: 千葉大地図課題対話コーパス (MapTask)
- C-004300: Yahoo! Semantically Annotated Snapshot of the English Wikipedia, version 1.0
- C-004301: Yahoo! Answers Manner Questions, version 2.0
- C-004302: Yahoo! Answers Comprehensive Questions and Answers version 1.0
- G-004303: Yahoo! Answers Question Types, version 1.0
- G-004304: Yahoo! Search Query Logs for Nine Languages, version 1.0
- G-004305: Yahoo! Search Popularity by Location for Websites on Politician and Athletes
- C-004306: Yahoo! News extracted metadata: noun phrases and their context, version 1.0
- C-004307: Yahoo! Answers browsing behavior, version 1.0
- C-004308: The ClueWeb09 Dataset
- C-004311: NTT・東北大親密度別単語了解度試験用音声データセット
- G-004312: 難聴者のための単語了解度試験用単語リスト
- C-004313: 電子協騒音データベース
- T-004314: 動詞項構造シソーラス
- C-004315: OpenMWEコーパスv0.02
- C-004316: Textual Entailment 評価データ
- C-004317: Wenzhou Spoken Corpus Version 1.0
- C-004318: The KPG English Corpus
- C-004319: The EMIME Bilingual Finnish/English German/English Database Version 1.0
- C-004320: The EMIME Mandarin/English Bilingual Database Version 1.1
- C-004321: The Accents of the British Isles (ABI-1) Speech Corpus
- C-004322: The Second Accents of the British Isles Speech Corpus
- C-004323: The PF-STAR British English Children's Speech Corpus
- C-004324: 現代日本語書き言葉均衡コーパス
- D-004325: 近代文語UniDic ver.1.3
- D-004326: 中古和文UniDic ver.1.3
- C-004327: EPAC Corpus: orthographic transcriptions
- C-004328: ESTER 2 Corpus
- D-004329: The MWN.PT - MultiWordnet of Portuguese
- C-004330: The CINTIL Corpus International Corpus of Portuguese
- C-004331: SIGNUM Database
- C-004332: SmartKom Home
- C-004333: SmartKom Audio
- C-004334: SmartKom Mobil
- C-004335: European Parliament Interpretation Corpus (EPIC)
- C-004336: GlobalPhone Thai
- C-004337: GlobalPhone Polish
- C-004338: GlobalPhone Vietnamese
- C-004339: GlobalPhone Bulgarian
- C-004340: GlobalPhone Hausa
- C-004341: 朝日新聞記事データ(学術・研究用)2008年版
- C-004342: 朝日新聞記事データ(学術・研究用)2009年版
- C-004343: 朝日新聞記事データ(学術・研究用)2010年版
- C-004344: 朝日新聞記事データ(学術・研究用)2011年版
- D-004345: An English Dictionary of the Tamil Verb Second Edition
- C-004346: Audiovisual Database of Spoken American English
- C-004347: BioProp Version 1.0
- C-004348: Chinese Gigaword Fourth Edition
- C-004349: FactBank 1.0
- C-004350: GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
- C-004351: GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
- C-004352: GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1
- C-004353: GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2
- C-004354: Czech Broadcast Conversation Speech
- C-004355: Czech Broadcast Conversation MDE Transcripts
- C-004356: GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1
- C-004357: GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2
- C-004358: ACE 2005 Mandarin SpatialML Annotations
- C-004360: Chinese Treebank 7.0
- C-004361: Indian Language Part-of-Speech Tagset: Bengali
- C-004362: Indian Language Part-of-Speech Tagset: Hindi
- C-004363: Indian Language Part-of-Speech Tagset: Sanskrit
- C-004364: Korean Newswire Second Edition
- C-004365: MASC I
- C-004366: MASC II
- C-004367: MASC III
- C-004368: Full MASC
- C-004369: MINI-MASC
- C-004370: MASC-CONLL
- C-004371: MASC-PROPBANK-ORIG
- C-004372: Language Understanding Annotation Corpus
- C-004373: NomBank.1.0
- C-004374: XLEL-21
- C-004375: NPS Internet Chatroom Conversations, Release 1.0
- C-004376: Quranic Arabic Corpus - Version 0.4
- C-004377: RWTH-PHOENIX-Weather Database of German Sign Language
- C-004378: The CONCISUS Corpus of Event Summaries
- C-004379: NKI-CCRT Corpus
- C-004380: Arabic Treebank - Broadcast News v1.0
- C-004381: Arabic-Dialect/English Parallel Text
- C-004382: Chinese Dependency Treebank 1.0
- C-004383: Chinese-English Semiconductor Parallel Text
- C-004384: Malto Speech and Transcripts
- C-004385: Turkish Broadcast News Speech and Transcripts
- C-004386: USC-SFI MALACH Interviews and Transcripts English
- C-004387: Tagged and Cleaned Wikipedia
- C-004388: CALLHOME Mandarin Chinese Transcripts - XML version
- C-004389: Annotated English Gigaword
- C-004390: CD-毎日新聞2012データ集
- C-004391: CD-毎日新聞2012データ集プラス
- C-004393: 南琉球新城方言音声データベース
- C-004395: 宮古大神島方言音声データベース
- C-004397: 感情評定値付きオンラインゲーム音声チャットコーパス
- C-004398: ANITA (Audio eNhancement In Telecom Applications)
- C-004399: NetDC Arabic BNSC (Broadcast News Speech Corpus)
- N-004400: Tagged text in French (MEMODATA) with typographic tags
- C-004401: The "SIVA" Speech Database for Speaker Verification and Identification
- N-004402: THAMUS Bilingual dictionaries - Computer Science (1)
- N-004403: Dutch-French Lexicon (LanTmark)
- N-004404: French-Dutch Lexicon (LanTmark)
- N-004405: French-Dutch Lexicon (LanTmark)
- N-004406: THAMUS Bilingual dictionaries - Computer Science (3)
- N-004407: THAMUS Bilingual dictionaries - Law (1)
- N-004408: THAMUS Bilingual dictionaries - Law (3)
- N-004409: THAMUS Bilingual dictionaries - Computer science (5)
- N-004410: THAMUS Bilingual dictionaries - Medicine (1)
- N-004411: THAMUS Bilingual dictionaries - Economics (3)
- N-004412: THAMUS Bilingual dictionaries - Engineering (1)
- N-004413: THAMUS Bilingual dictionaries - Engineering (3)
- N-004414: THAMUS Bilingual dictionaries - Computer Science (2)
- N-004415: THAMUS Bilingual dictionaries - Aeronautics (2)
- N-004416: THAMUS Bilingual dictionaries - Law (2)
- N-004417: THAMUS Bilingual dictionaries - Computer science (6)
- N-004418: THAMUS Bilingual dictionaries - Computer science (8)
- N-004419: THAMUS Bilingual dictionaries - Economics (2)
- N-004420: THAMUS Bilingual dictionaries - Economics (4)
- N-004421: THAMUS Bilingual dictionaries - Engineering (2)
- C-004422: SPINA Corpus ("Robots Commands")
- C-004423: Danish SpeechDat(II) FDB-1000
- N-004424: A "scientific" corpus of modern French ("La Recherche" magazine) - Raw data
- N-004425: A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version
- N-004426: SCI-AN-ALL English-German Bilingual Dictionary
- N-004427: SCIPER-AN-EURADIC English Monolingual Dictionary
- N-004428: SCIPER-AL-EURADIC German Monolingual Dictionary
- N-004429: SCIPER-IT-EURADIC Italian Monolingual Dictionary
- C-004430: FASiL English unimodal fasil-uk corpus
- C-004431: FASiL Portuguese unimodal fasil-pt corpus
- C-004432: FASiL Swedish unimodal fasil-sv corpus
- C-004433: FASiL combined unimodal fasil-all corpus
- C-004434: FASiL multimodal fasil-mm corpus
- C-004435: OrienTel Egypt MCA (Modern Colloquial Arabic) database
- N-004436: The CLEF Test Suite for the CLEF 2000-2003 Campaigns Evaluation Package
- C-004437: IDIOLOGOS 2 Eingenspeakers (NEOLOGOS Project)
- C-004438: Mandarin Chinese Telephone Speech Recognition Corpus - Digit String
- C-004439: Mandarin Chinese Telephone Speech Recognition Corpus - Stock
- C-004440: CHIL 2006 Evaluation Package
- N-004441: ItalWordNet (Italian WordNet)
- N-004442: Bulgarian Linguistic Database
- N-004443: Catalan Corpus of News Articles
- C-004444: MIST Multi-lingual Interoperability in Speech Technology database
- C-004445: N4 (NATO Native and Non Native) database
- C-004446: SpeechDat Catalan FDB database
- C-004447: TC-STAR 2007 Evaluation Package - ASR English
- C-004448: TC-STAR 2007 Evaluation Package - ASR Spanish - CORTES
- C-004449: TC-STAR 2007 Evaluation Package - ASR Spanish - EPPS
- C-004450: TC-STAR 2007 Evaluation Package - ASR Mandarin Chinese
- C-004451: TC-STAR 2007 Evaluation Package - SLT English-to-Spanish
- C-004452: TC-STAR 2007 Evaluation Package - SLT Spanish-to-English - CORTES
- C-004453: TC-STAR 2007 Evaluation Package - SLT Spanish-to-English - EPPS
- C-004454: TC-STAR 2007 Evaluation Package - SLT Chinese-to-English
- C-004455: TC-STAR 2006 Evaluation Package End-to-End
- C-004456: TC-STAR 2007 Evaluation Package End-to-End
- N-004457: AURORA-5
- N-004458: Macedonian Morphological Lexicon (MACPLEX)
- C-004459: TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech
- N-004460: TC-STAR English-Spanish Training Corpora for Machine Translation: Aligned Final Text Editions of EPPS
- C-004461: TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech
- C-004462: TC-STAR Spanish Training Corpora for ASR: Recordings of EPPS Speech
- C-004463: TC-STAR English Test Corpora for ASR
- C-004464: TC-STAR Spanish Test Corpora for ASR
- C-004465: Hungarian SpeechDat(E) Database
- C-004466: UPC-TALP database of isolated meeting-room acoustic events
- N-004467: LC-STAR Slovenian Phonetic lexicon
- N-004468: LC-STAR English-Slovenian Bilingual Aligned Phrasal lexicon
- C-004469: Slovenian BNSI Broadcast News Speech Corpus
- N-004470: euLEX (Lexical Database for Basque)
- C-004471: Swedish EUROM1
- C-004472: SpeechDat Galician Database for the Fixed Telephone Network
- C-004473: SmartWeb Handheld Corpus (SHC)
- C-004474: SmartWeb Motorbike Corpus (SMC)
- C-004475: SmartWeb Video Corpus (SVC)
- C-004476: LILA Hindi-L1 database
- C-004477: BAS PHATT 1.0.X (sub-set)
- C-004478: BAS PHATT 1.1.X (complete corpus)
- C-004479: Laboratory Conditions Czech Audio-Visual Speech Corpus
- C-004480: Czech Audio-Visual Speech Corpus for Recognition with Impaired Conditions
- N-004481: Czech Sign Language Corpus for Recognition Amateur Signer
- N-004482: Czech Sign Language Corpus for Recognition Professional Signer
- C-004483: Cantonese Speecon database
- C-004484: Thai Speecon database
- C-004485: OrienTel Jordan MCA (Modern Colloquial Arabic) database
- C-004486: OrienTel Jordan MSA (Modern Standard Arabic) database
- C-004487: OrienTel English as spoken in Jordan database
- C-004488: Danish EUROM1
- N-004489: Czech WordNet
- C-004490: CHIEDE Corpus: a spontaneous child language corpus of Spanish
- C-004491: LILA Korean database
- C-004492: CHIL 2007 Evaluation Package
- C-004493: FBK-Irst database of isolated meeting-room acoustic events
- C-004494: Hungarian Speecon database
- C-004495: Czech Speecon database
- N-004496: "Le Monde Diplomatique" Arabic tagged corpus
- C-004497: Alcohol Language Corpus (BAS ALC)
- N-004498: Basque WordNet
- N-004499: Multilingual Dictionary of Sports English-French-Greek-Arabic-German-Spanish-Portuguese multilingual database
- N-004500: Multilingual Dictionary of Sports English-French bilingual database
- N-004501: Multilingual Dictionary of Sports English-French-Greek trilingual database
- N-004502: Multilingual Dictionary of Sports English-French-Arabic trilingual database
- N-004503: Multilingual Dictionary of Sports English-French-German trilingual database
- N-004504: Multilingual Dictionary of Sports English-French-Spanish trilingual database
- N-004505: Multilingual Dictionary of Sports English-French-Portuguese trilingual database
- N-004506: English-Persian parallel Corpus
- N-004507: EASy Evaluation Package
- N-004508: BioLexicon
- C-004509: Norwegian EUROM1
- C-004510: SpeechDat(M) Italian Mobile Network Speech Database
- C-004511: TC-STAR female baseline voice: Laura
- C-004512: TC-STAR male baseline voice: Ian
- N-004513: TC-STAR Transcriptions of Spanish Parliamentary Speech
- C-004514: BABEL Polish database
- N-004515: Terminology database of natural sciences
- N-004516: Catalan-Spanish Parallel Corpus
- C-004517: Egyptian Arabic Speecon database
- N-004518: Persian 1984 corpus (Multext-East framework)
- N-004519: Persian Multext-East framework lexicon
- N-004520: Persian Lexicon
- N-004521: DEFT'08 Evaluation Package
- N-004522: CLEF AdHoc-News Test Suites (2004-2008) Evaluation Package
- N-004523: CLEF Domain Specific Test Suites (2004-2008) Evaluation Package
- N-004524: CLEF Question Answering Test Suites (2003-2008) Evaluation Package
- C-004525: TC-STAR Spanish Baseline Female Speech Database
- C-004526: TC-STAR Spanish Baseline Male Speech Database
- C-004527: TC-STAR Bilingual Voice-Conversion Spanish Speech Database
- C-004528: TC-STAR Bilingual Voice-Conversion English Speech Database
- C-004529: TC-STAR Bilingual Expressive Speech Database
- C-004530: LILA Marathi database
- C-004531: A-SpeechDB
- C-004532: Catalan-SpeechDat For the Fixed Telephone Network Database
- C-004533: Catalan-SpeechDat for the Mobile Telephone Network Database
- N-004534: Arabic Morphological Dictionary
- C-004535: Acoustic database for Polish unit selection speech synthesis
- N-004536: MEDAR Evaluation Package
- D-004537: GlobalPhone French Pronunciation Dictionary
- C-004538: Catalan SpeechDat-Car database
- C-004539: Catalan Speecon database
- C-004540: Spanish EUROM.1
- C-004541: Emotional speech synthesis database
- C-004542: FESTCAT Catalan TTS baseline male speech database
- C-004543: FESTCAT Catalan TTS baseline female speech database
- C-004544: FESTCAT Catalan TTS baseline speech database - 8 speakers
- C-004545: Spanish Festival HTS models - male speech
- C-004546: Spanish Festival HTS models - female speech
- C-004547: Bilingual (Spanish-English) Speech synthesis HTS models
- C-004548: Spanish Festival voice male
- C-004549: Spanish Festival voice female
- N-004550: CLEF QAST (2007-2009) Evaluation Package
- D-004551: GlobalPhone German Pronunciation Dictionary
- G-004552: Acoustic database for Polish concatenative speech synthesis
- C-004553: VERIF1DE
- C-004554: LILA Hindi Belt database
- C-004555: Spoken Portuguese Corpus
- C-004556: Fundamental Portuguese Corpus
- N-004557: CINTIL-TreeBank
- N-004558: CINTIL-PropBank
- N-004559: PANACEA English-French and English-Greek parallel corpus acquired for Environment domain
- N-004560: PANACEA English-French and English-Greek parallel corpus acquired for Labour Legislation domain
- N-004561: LT Corpus
- N-004562: PTPARL Corpus
- N-004563: CINTIL-DependencyBank
- N-004564: CINTIL-DeepBank
- D-004565: GlobalPhone Japanese Pronunciation Dictionary
- N-004566: PANACEA Environment English monolingual corpus
- N-004567: PANACEA Labour English monolingual corpus
- N-004568: PANACEA Environment French monolingual corpus
- N-004569: PANACEA Labour French monolingual corpus
- N-004570: PANACEA Environment Greek monolingual corpus
- N-004571: PANACEA Labour Greek monolingual corpus
- N-004572: PANACEA Environment Italian monolingual corpus
- N-004573: PANACEA Labour Italian monolingual corpus
- N-004574: PANACEA Environment Spanish monolingual corpus
- N-004575: PANACEA Labour Spanish monolingual corpus
- C-004576: Quaero Broadcast News Extended Named Entity corpus
- N-004577: Quaero Old Press Extended Named Entity corpus
- C-004578: CHIL 2007+ Evaluation Package
- D-004579: GlobalPhone Arabic Pronunciation Dictionary
- D-004580: GlobalPhone Bulgarian Pronunciation Dictionary
- D-004581: GlobalPhone Czech Pronunciation Dictionary
- D-004582: GlobalPhone Hausa Pronunciation Dictionary
- D-004583: GlobalPhone Polish Pronunciation Dictionary
- D-004584: GlobalPhone Portuguese (Brazilian) Pronunciation Dictionary
- D-004585: GlobalPhone Swedish Pronunciation Dictionary
- D-004586: GlobalPhone Croatian Pronunciation Dictionary
- D-004587: GlobalPhone Russian Pronunciation Dictionary
- D-004588: GlobalPhone Spanish (Latin American) Pronunciation Dictionary
- D-004589: GlobalPhone Turkish Pronunciation Dictionary
- D-004590: GlobalPhone Vietnamese Pronunciation Dictionary
- D-004591: GlobalPhone Chinese-Mandarin Pronunciation Dictionary
- D-004592: GlobalPhone Korean Pronunciation Dictionary
- C-004593: aGender
- N-004594: Amharic-English bilingual corpus
- N-004595: Nepali Monolingual written corpus
- N-004596: English-Nepali Parallel Corpus
- C-004597: LECTRA (LECture TRAnscriptions in European Portuguese)
- C-004598: CORAL Corpus
- N-004599: CLEFeHealth 2013 Task 3 Evaluation Package
- N-004600: ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics
- C-004601: Nepali Spoken Corpus
- C-004602: CLIPS_MT_MANUAL
- C-004604: PortMedia French and Italian corpus
- N-004605: NE3L named entities Arabic corpus
- N-004606: NE3L named entities Chinese corpus
- N-004607: NE3L named entities Russian corpus
- D-004608: GlobalPhone Thai Pronunciation Dictionary
- N-004609: Macedonian lexicon of toponyms (MACPLEX_TOPO)
- N-004610: Macedonian lexicon of proper nouns (MACPLEX_PROPERS)
- N-004611: Macedonian lexicon of derived adjectives (MACPLEX_ADJDERV)
- N-004612: Macedonian lexicon of participles (MACPLEX_ADJPARTIC)
- N-004613: Macedonian lexicon of compound words (MACPLEX_COMP)
- N-004614: Khresmoi manually annotated reference corpus
- N-004615: CLEFeHealth 2014 Task 3 Evaluation Package
- N-004616: 88milSMS. A corpus of authentic text messages in French
- C-004617: REPERE Evaluation Package
- N-004618: MAURDOR Evaluation Package
- N-004619: deL1L2IM corpus
- C-004620: TDT2 Careful Transcription Audio
- N-004621: Korean Newswire
- N-004622: Hong Kong News Parallel Text
- N-004623: Hong Kong Laws Parallel Text
- N-004624: Hong Kong Hansards Parallel Text
- N-004625: Prague Dependency Treebank 1.0
- N-004626: Chinese Proposition Bank 3.0
- N-004627: GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1
- C-004628: 2000 HUB5 English Evaluation Speech
- C-004629: 1997 HUB5 English Evaluation
- N-004630: Multiple-Translation Chinese Corpus
- N-004631: RST Discourse Treebank
- N-004632: Korean English Treebank Annotations
- N-004633: Korean Telephone Conversations Lexicon
- N-004634: Korean Telephone Conversations Transcripts
- N-004635: Multiple-Translation Chinese (MTC) Part 2
- N-004636: Multiple-Translation Arabic (MTA) Part 1
- N-004637: Klex: Finite-State Lexical Transducer for Korean
- N-004638: Prague Dependency Treebank 2.0
- C-004640: 2008 NIST Speaker Recognition Evaluation Training Set Part 1
- C-004641: 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set
- N-004642: Multiple-Translation Chinese (MTC) Part 3
- N-004643: Hong Kong Parallel Text
- N-004644: NIST Meeting Pilot Corpus Transcripts and Metadata
- N-004645: Proposition Bank I
- N-004646: Prague Arabic Dependency Treebank 1.0
- N-004647: Prague Czech-English Dependency Treebank 1.0
- N-004648: Multiple-Translation Arabic (MTA) Part 2
- C-004649: CSC Deceptive Speech
- N-004650: Chinese Treebank 8.0
- N-004651: The ARRAU Corpus of Anaphoric Information
- N-004652: GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1
- C-004653: ATIS0 Pilot
- N-004654: North American News Text Corpus
- N-004655: Japanese Business News Text
- N-004656: CELEX2
- C-004657: RM Isolated and Spelled Word Data
- C-004658: 1996 Speaker Recognition Benchmark
- N-004659: NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets
- N-004660: CALLHOME Spanish Transcripts
- N-004661: North American News Text Supplement
- N-004662: JURIS
- N-004663: Japanese Business News Text Supplement
- N-004664: Portuguese Newswire Text
- C-004665: Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
- N-004666: Korean Propbank
- N-004667: Multiple-Translation Chinese (MTC) Part 4
- N-004668: Levantine Arabic QT Training Data Set 5, Transcripts
- N-004669: Korean Treebank Annotations Version 2.0
- D-004670: An English Dictionary of the Tamil Verb
- N-004671: Levantine Arabic Conversational Telephone Speech, Transcripts
- N-004672: Czech Academic Corpus 2.0
- N-004673: Korean Broadcast News Transcripts
- N-004674: English Gigaword Third Edition
- N-004675: MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE)
- N-004676: GALE Phase 1 Distillation Training
- N-004677: TRECVID 2003 Keyframes & Transcripts
- C-004678: 2004 Spring NIST Rich Transcription (RT-04S) Development Data
- N-004679: Arabic Gigaword Third Edition
- N-004680: GALE Phase 1 Chinese Blog Parallel Text
- N-004681: Arabic Treebank: Part 3 v 3.2
- N-004682: Hindi WordNet
- N-004683: Chinese Proposition Bank 2.0
- C-004684: West Point Brazilian Portuguese Speech
- N-004685: GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2
- N-004686: GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2
- C-004687: 2005 NIST Language Recognition Evaluation
- C-004688: CSLU: Alphadigit Version 1.3
- N-004689: Global Yoruba Lexical Database v. 1.0
- N-004690: PennBioIE CYP 1.0
- N-004691: GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3
- C-004692: CSLU: ISOLET Spoken Letter Database Version 1.3
- N-004693: The New York Times Annotated Corpus
- N-004694: PennBioIE Oncology 1.0
- N-004695: NomBank v 1.0
- N-004696: COMNOM v 1.0
- N-004697: AQUAINT-2 Information-Retrieval Text Research Collection
- C-004698: LDC Spoken Language Sampler
- C-004699: CSLU: Numbers Version 1.3
- C-004700: CHAracterizing INdividual Speakers (CHAINS)
- C-004701: English CTS Treebank with Structural Metadata
- N-004702: 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
- N-004703: Japanese Web N-gram Version 1
- N-004704: Unified Linguistic Annotation Text Collection
- N-004705: REFLEX Entity Translation Training/DevTest
- N-004706: 2008 CoNLL Shared Task Data
- N-004707: English Gigaword Fourth Edition
- N-004708: Tagged Chinese Gigaword Version 2.0
- N-004709: Spanish Gigaword Second Edition
- N-004710: Arabic Newswire English Translation Collection
- C-004711: CSLU: S4X Release 1.2
- N-004712: OntoNotes Release 3.0
- N-004713: Web 1T 5-gram, 10 European Languages Version 1
- N-004714: NXT Switchboard Annotations
- N-004715: French Gigaword Second Edition
- C-004716: 2007 NIST Language Recognition Evaluation Test Set
- C-004717: 2007 NIST Language Recognition Evaluation Supplemental Training Set
- N-004718: ACL Anthology Reference Corpus
- N-004719: Arabic Gigaword Fourth Edition
- N-004720: NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations
- N-004721: Czech Broadcast News MDE Transcripts
- N-004722: Fisher Spanish - Transcripts
- C-004723: Fisher Spanish Speech
- N-004724: Chinese Web 5-gram Version 1
- C-004725: WTIMIT 1.0
- N-004726: 2000 HUB5 English Evaluation Transcripts
- C-004727: 2003 NIST Speaker Recognition Evaluation
- N-004728: NIST 2002 Open Machine Translation (OpenMT) Evaluation
- N-004729: NIST 2003 Open Machine Translation (OpenMT) Evaluation
- N-004730: NIST 2004 Open Machine Translation (OpenMT) Evaluation
- N-004731: TRECVID 2004 Keyframes & Transcripts
- N-004732: LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1
- N-004733: Arabic Treebank: Part 1 v 4.1
- N-004734: TRECVID 2006 Keyframes
- C-004735: Asian Elephant Vocalizations
- N-004736: NIST 2005 Open Machine Translation (OpenMT) Evaluation
- N-004737: Asian Spoken Language Sampler
- N-004738: Message Understanding Conference 7 Timed (MUC7_T)
- N-004739: NIST 2006 Open Machine Translation (OpenMT) Evaluation
- N-004740: ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
- C-004741: TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)
- N-004742: NIST 2008 Open Machine Translation (OpenMT) Evaluation
- N-004743: NIST 2009 Open Machine Translation (OpenMT) Evaluation
- N-004744: Manually Annotated Sub-Corpus First Release
- N-004745: SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages
- N-004746: ACE 2005 English SpatialML Annotations Version 2
- N-004747: OntoNotes Release 4.0
- N-004748: 2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set
- N-004749: NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1
- N-004750: NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2
- N-004751: Broadcast News Lattices
- N-004752: NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
- C-004753: 2005 NIST Speaker Recognition Evaluation Training Data
- N-004754: English Gigaword Fifth Edition
- N-004755: Datasets for Generic Relation Extraction (reACE)
- C-004756: 2006 NIST Spoken Term Detection Development Set
- C-004757: 2006 NIST Spoken Term Detection Evaluation Set
- N-004758: NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
- C-004759: 2005 NIST Speaker Recognition Evaluation Test Data
- N-004760: Arabic Treebank: Part 2 v 3.1
- C-004761: 2008 NIST Speaker Recognition Evaluation Training Set Part 2
- N-004762: 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
- N-004763: French Gigaword Third Edition
- N-004764: 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
- N-004765: ModeS TimeBank 1.0
- C-004766: 2008 NIST Speaker Recognition Evaluation Test Set
- N-004767: Arabic Gigaword Fifth Edition
- N-004768: Spanish Gigaword Third Edition
- C-004769: 2006 NIST Speaker Recognition Evaluation Training Set
- N-004770: Chinese Gigaword Fifth Edition
- C-004771: 2006 NIST Speaker Recognition Evaluation Test Set Part 1
- C-004772: 2008 NIST Speaker Recognition Evaluation Supplemental Set
- C-004773: TORGO Database of Dysarthric Articulation
- C-004774: Digital Archive of Southern Speech
- N-004775: English Translation Treebank: An-Nahar Newswire
- N-004776: 2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News
- N-004777: 2009 CoNLL Shared Task Part 1
- N-004778: 2009 CoNLL Shared Task Part 2
- C-004779: GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1
- N-004780: Catalan TimeBank 1.0
- N-004781: American English Nickname Collection
- N-004782: Spanish TimeBank 1.0
- N-004783: GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2
- N-004784: GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web
- C-004785: Multi-Channel WSJ Audio
- N-004786: MADCAT Phase 1 Training Set
- N-004787: Maninkakan Lexicon
- N-004788: GALE Phase 2 Arabic Newswire Parallel Text
- N-004789: GALE Phase 2 Arabic Broadcast News Parallel Text
- N-004790: GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1
- N-004791: GALE Phase 2 Arabic Web Parallel Text
- N-004792: GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire
- N-004793: GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web
- N-004794: NIST 2012 Open Machine Translation (OpenMT) Evaluation
- C-004795: GALE Phase 2 Arabic Broadcast Conversation Speech Part 1
- N-004796: GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web
- N-004797: Russian-English Computer Security Parallel Text
- N-004798: Chinese-English Biology and Chemistry Abstract Parallel Text
- N-004799: 1993-2007 United Nations Parallel Text
- C-004800: Mixer 6 Speech
- N-004801: GALE Phase 2 Chinese Broadcast Conversation Transcripts
- C-004802: GALE Phase 2 Chinese Broadcast Conversation Speech
- N-004803: MADCAT Phase 2 Training Set
- N-004804: GALE Arabic-English Parallel Aligned Treebank -- Newswire
- C-004805: Greybeard
- N-004806: GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1
- N-004807: Manually Annotated Sub-Corpus Third Release
- C-004808: LDC Spoken Language Sampler - Second Release
- N-004809: GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2
- N-004810: MADCAT Phase 3 Training Set
- C-004811: GALE Phase 2 Arabic Broadcast Conversation Speech Part 2
- N-004812: GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2
- N-004813: Semantic Textual Similarity (STS) 2013 Machine Translation
- N-004814: GALE Phase 2 Chinese Broadcast News Transcripts
- C-004815: GALE Phase 2 Chinese Broadcast News Speech
- N-004816: OntoNotes Release 5.0
- N-004817: CALLFRIEND Farsi Second Edition Transcripts
- C-004818: CALLFRIEND Farsi Second Edition Speech
- N-004819: GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2
- N-004820: NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source
- C-004821: King Saud University Arabic Speech Database
- N-004822: GALE Phase 2 Chinese Broadcast News Parallel Text Part 1
- N-004823: GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web
- N-004824: ETS Corpus of Non-Native Written English
- N-004825: Domain-Specific Hyponym Relations
- C-004826: USC-SFI MALACH Interviews and Transcripts Czech
- N-004827: GALE Arabic-English Parallel Aligned Treebank -- Web Training
- N-004828: HyTER Networks of Selected OpenMT08/09 Sentences
- N-004829: GALE Arabic-English Word Alignment Training Part 2 -- Newswire
- C-004830: Hispanic-English Database
- N-004831: GALE Phase 2 Chinese Broadcast News Parallel Text Part 2
- N-004832: Abstract Meaning Representation (AMR) Annotation Release 1.0
- N-004833: MADCAT Chinese Pilot Training Set
- C-004834: 2009 NIST Language Recognition Evaluation Test Set
- N-004835: GALE Arabic-English Word Alignment Training Part 3 -- Web
- N-004836: GALE Phase 2 Chinese Newswire Parallel Text Part 1
- N-004837: TAC KBP Reference Knowledge Base
- N-004838: GALE Phase 2 Arabic Broadcast News Transcripts Part 1
- C-004839: GALE Phase 2 Arabic Broadcast News Speech Part 1
- N-004840: ACE 2007 Multilingual Training Corpus
- N-004841: GALE Phase 2 Chinese Newswire Parallel Text Part 2
- N-004842: GALE Arabic-English Word Alignment -- Broadcast Training Part 1
- N-004843: Chinese Discourse Treebank 0.5
- C-004844: United Nations Proceedings Speech
- N-004845: GALE Arabic-English Word Alignment -- Broadcast Training Part 2
- N-004846: Fisher and CALLHOME Spanish--English Speech Translation
- N-004847: Boulder Lies and Truth
- N-004848: GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2
- N-004849: GALE Phase 2 Chinese Web Parallel Text
- N-004850: Benchmarks for Open Relation Extraction
- C-004851: GALE Phase 3 Chinese Broadcast Conversation Speech Part 1
- N-004852: GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1
- N-004853: SenSem Databank
- N-004854: GALE Phase 2 Arabic Broadcast News Transcripts Part 2
- C-004855: GALE Phase 2 Arabic Broadcast News Speech Part 2
- N-004856: Avocado Research Email Collection
- N-004857: GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3
- C-004858: RATS Speech Activity Detection
- C-004859: Mandarin-English Code-Switching in South-East Asia
- N-004860: GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text
- N-004861: GALE Chinese-English Parallel Aligned Treebank -- Training
- C-004862: Mandarin Chinese Phonetic Segmentation and Tone
- C-004863: The Subglottal Resonances Database
- N-004864: GALE Phase 3 and 4 Arabic Broadcast News Parallel Text
- N-004865: SenSem Lexicons
- N-004866: Coordination Annotation for the Penn Treebank
- N-004867: GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2
- C-004868: GALE Phase 3 Chinese Broadcast Conversation Speech Part 2
- C-004869: CIEMPIESS
- N-004870: RST Signalling Corpus
- N-004871: 2006 CoNLL Shared Task - Ten Languages
- N-004872: 2006 CoNLL Shared Task - Arabic & Czech
- N-004873: English News Text Treebank: Penn Treebank Revised
- N-004874: GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences
- N-004875: The Walking Around Corpus
- N-004876: TS Wikipedia
- N-004877: GALE Phase 4 Chinese Broadcast News Parallel Sentences
- C-004878: LDC Spoken Language Sampler - Third Release
- C-004879: Arabic Learner Corpus
- C-004880: GALE Phase 3 Arabic Broadcast Conversation Speech Part 1
- N-004881: GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1
- N-004882: ACE 2007 Spanish DevTest - Pilot Evaluation
- N-004883: NewSoMe Corpus of Opinion in News Reports
- N-004884: GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4
- N-004885: GALE Phase 3 and 4 Arabic Newswire Parallel Text
- N-004886: Karlsruhe Children's Text
- C-004887: Articulation Index LSCP
- N-004888: KHATT: Handwritten Arabic Text
- N-004889: GALE Phase 4 Chinese Newswire Parallel Sentences
- N-004890: GALE Phase 3 Chinese Broadcast News Transcripts
- C-004891: GALE Phase 3 Chinese Broadcast News Speech
- N-004892: NewSoMe Corpus of Opinion in Blogs
- N-004893: Arabic Treebank - Weblog
- N-004894: GALE Phase 4 Chinese Weblog Parallel Sentences
- N-004895: BOLT Chinese Discussion Forums
- N-004896: GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2
- C-004897: GALE Phase 3 Arabic Broadcast Conversation Speech Part 2
- N-004898: DEFT Narrative Text
- N-004899: GALE Phase 3 and 4 Arabic Web Parallel Text
- N-004900: GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text
- C-004901: 千葉大学 3人会話コーパス
- C-004903: GALE Phase 4 Chinese Broadcast Conversation Speech
- C-004904: GALE Phase 4 Chinese Broadcast Conversation Transcripts
- C-004905: FoxPersonTracks: a Benchmark for Person Re-Identification from TV Broadcast Shows
- C-004906: TRAD Pashto Broadcast News Speech Corpus
- C-004907: MoveOn Speech and Noise Corpus
- C-004908: GVLEX tales corpus
- C-004909: GlobalPhone Swahili
- C-004910: GlobalPhone Ukrainian
- C-004911: Large Farsdat
- N-004912: H1 Children's Writing
- C-004913: IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c
- N-004914: SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
- N-004915: GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences
- N-004916: HAVIC Pilot Transcription
- N-004917: Chinese Treebank 9.0
- C-004918: CHM150
- N-004919: GALE Phase 4 Arabic Weblog Parallel Sentences
- N-004920: GALE Phase 3 and 4 Chinese Broadcast News Parallel Text
- N-004921: English Speed Networking Conversational Transcripts
- C-004922: Digital Archive of Southern Speech - NLP Version
- C-004923: IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a
- C-004924: IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b
- C-004925: GALE Phase 3 Arabic Broadcast News Transcripts Part 1
- C-004926: GALE Phase 3 Arabic Broadcast News Speech Part 1
- N-004927: ARL Arabic Dependency Treebank
- N-004928: BOLT Chinese-English Word Alignment and Tagging -- Discussion Forum Training
- N-004929: GALE Phase 4 Arabic Broadcast News Parallel Sentences
- C-004930: IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY
- N-004931: Richer Event Description
- C-004932: IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5
- N-004933: KAFD: Arabic Font Database
- C-004934: IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a
- C-004935: Multi-Language Conversational Telephone Speech 2011 -- Slavic Group
- N-004936: GALE Phase 3 and 4 Chinese Newswire Parallel Text
- N-004937: GALE Phase 4 Arabic Newswire Parallel Sentences
- C-004938: IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g
- N-004939: TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014
- N-004940: Bamanankan Lexicon
- N-004941: MWE-Aware English Dependency Corpus
- N-004942: Arabic Speech Recognition Pronunciation Dictionary
- C-004943: IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7
- N-004944: GALE Phase 3 and 4 Chinese Web Parallel Text
- N-004945: First-Year Law Students' Court Memoranda
- N-004946: Chinese-English Parallel Sentences Extracted from Patents
- N-004947: JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
- C-004948: GALE Phase 3 Arabic Broadcast News Speech Part 2
- C-004949: GALE Phase 3 Arabic Broadcast News Transcripts Part 2
- C-004950: IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b
- C-004957: Collins Multilingual database (MLD) – WordBank with audio files
- C-004958: Collins Multilingual database (MLD) – PhraseBank with audio files
- C-004959: Arabic Speech Corpus
- C-004960: Serbian emotional speech database
- C-004961: SecuVoice
- C-004962: SALA II US English database (2000 speakers)
- C-004963: Buckeye Corpus
- C-004964: Annotated Speech Corpora for 3 East Indian Languages
- C-004965: RML Emotion Database
- C-004966: Surrey Audio-Visual Expressed Emotion (SAVEE) Database
- C-004967: 台湾国語多言語話しことばコーパス
- C-004968: スペイン語多言語話しことばコーパス2006年度版
- D-004970: トピック別アイヌ語会話辞典
- C-004971: アイヌ語口承文芸コーパス―音声・グロスつき―
- C-004972: 「日本の消滅危機言語・方言」データベース
- D-004974: GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)
- C-004975: Accented English GlobalPhone
- C-004976: The FAME! Speech Corpus
- C-004977: IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d
- C-004978: Noisy TIMIT Speech
- C-004979: Danish Propbank
- C-004980: TRAD Chinese-French Email Parallel corpus – Test Set
- C-004981: TRAD Chinese-English Email Parallel corpus – Test Set
- C-004982: TRAD Chinese-French Email Parallel corpus – Development Set
- C-004983: TRAD Chinese-English Email Parallel corpus – Development Set
- C-004984: TRAD Chinese-English News Articles Parallel corpus
- C-004985: TRAD Chinese-French News Articles Parallel corpus
- C-004986: TRAD Chinese-English Web domain (blogs) Parallel corpus
- C-004987: TRAD Chinese-French Web domain (blogs) Parallel corpus
- C-004988: TRAD Arabic-English Mailing lists Parallel corpus - Development set
- C-004989: TRAD Arabic-French Mailing lists Parallel corpus - Development set
- C-004990: TRAD Arabic-English Mailing lists Parallel corpus - Test set
- C-004991: TRAD Arabic-French Mailing lists Parallel corpus - Test set
- C-004992: TRAD Arabic-English Web domain (blogs) Parallel corpus
- C-004993: TRAD Arabic-French Web domain (blogs) Parallel corpus
- C-004994: TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech
- C-004995: TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech
- C-004996: TRAD Arabic-English Newspaper Parallel corpus - Test set 1
- C-004997: TRAD Arabic-French Newspaper Parallel corpus - Test set 2
- C-004998: TRAD Arabic-French Newspaper Parallel corpus - Test set 1
- C-004999: TRAD Pashto-English News Articles Parallel corpus
- C-005000: TRAD Pashto-French News Articles Parallel corpus
- C-005001: TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data
- C-005002: TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data
- C-005003: TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data
- C-005004: TRAD Pashto Monolingual text Corpus
- C-005005: Linguatools Webcrawl Parallel Corpus German-English 2015
- C-005006: EUROPARL Corpus Parallel Corpora: Portuguese-English
- C-005007: 2010 NIST Speaker Recognition Evaluation Test Set
- C-005008: CHiME2 Grid
- C-005009: NPChunks
- C-005010: ROMBAC - Romanian balanced corpus
- C-005012: ROCO Romanian journalistic corpus
- C-005013: Arboretum treebank
- D-005014: Collins Multilingual database (MLD) - PhraseBank
- D-005015: Collins Multilingual database (MLD) - WordBank
- C-005016: BOLT Chinese Discussion Forum Parallel Training Data
- C-005017: BOLT Egyptian Arabic SMS/Chat and Transliteration
- C-005018: GALE English-Chinese Parallel Aligned Treebank -- Training
- C-005019: 朝日新聞記事データ(学術・研究用)2012年版
- C-005020: 朝日新聞記事データ(学術・研究用)2013年版
- C-005021: 朝日新聞記事データ(学術・研究用)2014年版
- C-005022: 朝日新聞記事データ(学術・研究用)2015年版
- C-005023: 朝日新聞記事データ(学術・研究用)2016年版
- C-005024: 統語・意味解析情報付き現代日本語コーパス
- C-005025: REXコーパス
- C-005026: 模擬診療録テキスト・データ
- C-005027: CD-毎日新聞2013データ集
- C-005028: CD-毎日新聞2014データ集
- C-005029: CD-毎日新聞2015データ集
- C-005030: CD-毎日新聞2016データ集
- C-005031: CD-毎日新聞2013データ集プラス
- C-005032: CD-毎日新聞2014データ集プラス
- C-005033: CD-毎日新聞2015データ集プラス
- C-005034: CD-毎日新聞2016データ集プラス
- C-005035: IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a
- C-005037: Phrase Detectives Corpus
- C-005038: The EventStatus Corpus
- C-005039: ATR地域別英語音声データベース
- C-005040: ATR地域別中国語音声データベース
- C-005041: ATR日本人話者英語音声データベース
- C-005042: Parallel EMG-Acoustic English GlobalPhone
- C-005043: NICT声優対話コーパス
- D-005044: Pashto phonetic lexicon
- C-005045: 読売新聞記事データ<邦文>2008年版
- C-005046: 読売新聞記事データ<邦文>2009年版
- C-005047: 読売新聞記事データ<邦文>2010年版
- C-005048: 読売新聞記事データ<邦文>2011年版
- C-005050: 読売新聞記事データ<邦文>2013年版
- C-005051: 読売新聞記事データ<邦文>2014年版
- C-005052: 読売新聞記事データ<邦文>2015年版
- C-005053: 読売新聞記事データ<邦文>2016年版
- D-005054: ネット・若者用語辞書
- T-005055: 企業名辞書
- D-005056: NDK英中日企業名辞書
- D-005057: 露和辞書データベース
- D-005058: 和露辞書データベース
- C-005061: CMU_ARCTIC speech synthesis databases
- C-005062: The Vera am Mittag German Audio-Visual Spontaneous Speech Database
- C-005063: AV16.3
- C-005064: Disco-Annotation
- C-005065: Mediaparl
- C-005066: MOBIO
- C-005067: Tense-Annotation
- C-005068: Abstract Meaning Representation (AMR) Annotation Release 2.0
- C-005069: CHiME2 WSJ0
- C-005070: LibriSpeech ASR corpus
- D-005071: Sprakbanken
- D-005072: BEEP Dictionary
- C-005073: The AMI Corpus
- D-005074: Sprakbanken_Swe
- G-005075: Spanish Word list
- C-005076: Quoted Speech Attribution Corpus
- C-005077: The Kiel Corpus of Read Speech Vol. I
- C-005078: The Kiel Corpus of Spontaneous Speech Vol. I
- C-005079: The Kiel Corpus of Spontaneous Speech Vol. II
- C-005080: The Kiel Corpus of Spontaneous Speech Vol. III
- C-005081: 近畿大児童の単語音声データベース
- C-005082: Persian Speech Corpus
- C-005083: NUM 5M Mongolian written corpus
- C-005084: Metalogue Multi-Issue Bargaining Dialogue
- C-005085: UCLA High-Speed Laryngeal Video and Audio