Error message

Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in _menu_load_objects() (line 579 of /homepages/24/d437680662/htdocs/expert/drupal-7.35/includes/menu.inc).

TM Cleaner

TM Cleaner , a software for identifying the translation units in translation memories or parallel corpora that contains segments that are not translations of each other.

TM Cleaner has the following features :

  1. It is written in python and java.

  2. It is hosted at github : https://github.com/SoimulPatriei/TMCleaner

  3. Free for research and commercial use: it is released under LGPL license.

  4. It uses a combination of machine learning and rule-based classification to identify wrong translation segments.

  5. It is fully integrated with scikit-learn: this means that all classification algorithms in scikit-learn can be easily integrated.

  6. Can be used standalone or as a web service.

  7. It works in three modalities: with Bing translation engine, with Hunalign sentence aligner and with Fastalign word aligner .

  8. It contains a module for evaluating the accuracy of classification

  9. It has easy to follow tutorials.

For comments, bug reports or suggestions please use the email address tm.cleaner@yahoo.com .

authors: