ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks

This evaluation metric is based on dense vector spaces and recurrent neural networks. In particular, the metric uses Tree Structured Long Short Term Memory networks (Tai et al., 2015) and Glove word vectors (Pennington et al., 2014). The training data is computed automatically from the WMT-13 (Bojar et al., 2013) human evaluation rankings. The rankings are converted into similarity scores between the reference and the translation. The metric has been tested on WMT-12 and WMT-14 test sets as well as participated in the WMT-15 metric task. For all three years, metric performed either best or second best overall at system level using different correlation measures when evaluating translations into English. ReVal metric is available as open source.

The metric including training data and license can be downloaded from

For more information about the metric please refer the following publications:

  • Rohit Gupta, Constantin Orasan, and Josef van Genabith. 2015. Reval: A simple and effective machine translation evaluation metric based on recurrent neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal.
  • Rohit Gupta, Constantin Orasan, and Josef van Genabith. 2015. Machine Translation Evaluation using Recurrent Neural Networks. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisboa, Portugal, September. Association for Computational Linguistics.