Purpose
To provide a generic representation of numbers and a corresponding method of translating them between various scripts and number systems.
Why?
First, this project might be useful for other programs that wish to do internationalization (I18N).
But mainly... NumberTrans was inspired by current research in statistical machine translation (a method of translating of human languages by computers). Some word categories such as prepositions and conjunctions are considered "closed." That is, it is very unlikely that new words in these categories will be added to a language. Others such as verbs and nouns are considered "open." New words in these categories are coined frequently. For example, the verb "google" was added to the dictionary recently.
Numbers, however, are an oddity. Though it is unlikely that any "new" numbers will be added to a given language, nearly any combination of digits or numerals forms an intelligible number. You might say that the category of numbers is semantically closed (we know the meaning of any given number) while being lexically infinite (there is no limit to how many different mearningful numbers can be formed).
This means it is impossible for statistical machine translation systems to encounter all possible numbers during training. NumberTrans is meant to help. Research is currently being performed to explore methods of optimizing the interface between machine translation systems and external translators such as NumberTrans.
Supported Languages
The following languages/scripts are at least partially supported:
Arabic | |
Bengali | |
Chinese | |
English | |
Gujarati | |
Gurumukhi | |
Kannada | |
Korean | |
Malayalam | |
Oriya | |
Tamil | |
Telugu | |
Thai | |
Tibetan |
Developers
NumberTrans is developed by Jonathan Clark.
Status
NumberTrans is currently under heavy development. We welcome feedback as to the usability and functionality of the framework. For a very limited live demonstration, check out our demo applet. Check back soon for a full listing of implemented languages and scripts.
For now, check out our project page. Beta code is provided in our CVS repository.