TM Cleaner
TM Cleaner , a software for identifying the translation units in translation memories or parallel corpora that contains segments that are not translations of each other.
TM Cleaner has the following features :
-
It is written in python and java.
-
It is hosted at github : https://github.com/SoimulPatriei/TMCleaner
-
Free for research and commercial use: it is released under LGPL license.
-
It uses a combination of machine learning and rule-based classification to identify wrong translation segments.
-
It is fully integrated with scikit-learn: this means that all classification algorithms in scikit-learn can be easily integrated.
-
Can be used standalone or as a web service.
-
It works in three modalities: with Bing translation engine, with Hunalign sentence aligner and with Fastalign word aligner .
-
It contains a module for evaluating the accuracy of classification
-
It has easy to follow tutorials.
For comments, bug reports or suggestions please use the email address tm.cleaner@yahoo.com .