Gale-Church Filter: Cleaning noisy parallel data for machine translation

The Gacha filter cleans out sentence pairs that have global character mean lower than a certain threshold. Use this cleaner to produce low quantity of high quality sentence pairs. It is an aggressive cleaner that cleaned out ~64% of the HindEnCorp during WMT14 when threshold is set at 20% (Tan and Pal, 2014); achieving lowest TER. (see