Note that word count is based on the untokenized Arabic source, and token count is based on the tokenized Arabic source.
The Arabic word alignment tasks consisted of the following components:
* Normalizing tokenized tokens as needed
* Identifying different types of links
* Identifying sentence segments not suitable for annotation
* Tagging unmatched words attached to other words or phrases
Please view the following samples
* English Raw
* English Token
* Arabic Raw
* Arabic Token
* Word Alignment
This work was supported in part by the Defense Advanced Research Projects Agency, GALE Program Grant No. HR0011-06-1-0003. The content of this publication does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
None at this time.