Note that all token counts are based on the Chinese data only. One token is equivalent to one character and one word is equivalent to 1.5 characters.
The Chinese word alignment tasks consisted of the following components:
* Identifying, aligning, and tagging 8 different types of links
* Identifying, attaching, and tagging local-level unmatched words
* Identifying and tagging sentence/discourse-level unmatched words
* Identifying and tagging all instances of Chinese 的(DE) except when they were a part of a semantic link.
Please view the following samples:
* Chinese Raw
* Chinese Token
* English Raw
* English Token
* World Alignment
This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this publication does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
None at this time.