Santanu Pal, ESR2

Location: Universitaet Saarlandes, Germany

Project Title: Investigation of an ideal translation workflow for hybrid translation approaches

Project Description: 

The last decade has seen phenomenal growth in research and development activities in Machine Translation (MT), particularly so for statistical MT (SMT). Researchers have proposed many different approaches to component technologies in SMT, starting from pre-processing techniques to core SMT processes to post-editing techniques, all of which have been able to bring some improvements over baseline SMT. However, despite these immense developments in MT research, from a realistic viewpoint, all these different techniques are not compatible with each other. In a sense, MT workflow is less investigated than the MT processes. Many research have not gone into how to make the best of these individual techniques in an ideal framework, given some resources and constraints for any language pair. In this study we investigate towards an ideal MT workflow. We have studied how different paradigms, like Translation Memory, Example-Based MT and SMT can be harmonized to the best effect. We also carried out study into how different component technologies, like word alignment, phrase alignment, can be hybridized by making use of the state-of-the-art techniques to achieve improved MT. This study also includes an investigation into optimal human-machine interactive MT by taking humans in the loop. We have studied how post-editor feedback can be directly integrated into the system, as well as, how automatic post-editing tools can be developed by making use of the post-edited data. The research also concerns improvement of translation quality and user satisfaction on existing corpus-based TM and MT technologies by allowing different levels of “assistance” in a user-friendly workflow proposed within an Ideal Hybrid Machine translation framework.

Research Interests: Machine Translation (Statistical, Example Based and Rule based Approach), Human Computer interaction in Machine Translation, Multi-Word Expressions, Question Answering and Generation, Natural Language Processing.

Website: http://fr46.uni-saarland.de/index.php?id=3748

Publication list

  1. Lapshinova-Koltunski, E. and Santanu Pal. 2014. Comparability of Corpora in Human and Machine Translation. In Proceedings of BUCC, 7th Workshop on Building and Using Comparable Corpora. Building Resources for Machine Translation Research, (BUCC-2014), Reykjavik, May 27, 2014

    http://www.expert-itn.eu/?q=system/files/private/documentation/bucc-2014.pdf
  2. Pintu Lohar, Pinaki Bhaskar, Santanu Pal and Sivaji Bandyopadhyay (2014) Cross Lingual Snippet Generation using Snippet Translation System. In the Springer LNCS Proceedings of the 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING - 2014), Kathmandu, Nepal.

    http://www.expert-itn.eu/?q=system/files/private/documentation/cicling2014_2.pdf
  3. T. Nayak, S. Pal, N. Sudip, and J. van Genabith. (2016) Beyond translation memories: Generating translation suggestions based on parsing and pos tagging. In Proceedings of the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016), pages 12–20.

  4. S. Pal, S. K. Naskar, and S. Bandyopadhyay. (2013) MWE alignment in phrase based statistical machine translation. In Proceedings of the XIV Machine Translation Summit, pages 61–68, 2013.
  5. Santanu Pal, Pintu Lohar and Sudip Kumar Naskar (2014) Role of Paraphrases in PB-SMT. In the Springer LNCS Proceedings of the 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING - 2014), Kathmandu, Nepal.

  6. Santanu Pal, Partha Pakray and Sudip Kumar Naskar (2014) Automatic Building and Using Parallel Resources for SMT from Comparable Corpora, In Proceedings of the Hybrid Approaches to Translation (HyTra-2014) Workshop in 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), Gothenburg, Sweden, 26–30 April 2014

    http://www.expert-itn.eu/?q=system/files/private/documentation/hytra2014.pdf
  7. Santanu Pal, Ankit Srivastava, Sandipan Dandapat, Andy Way and Josef Van Genabith. 2014. “USAAR-DCU Hybrid MT System”. In Proceedings of 11th International Conference on Natural Language Processing (ICON 2014, MT tool contest). Goa, India, 18-22 December 2014.

  8. Santanu Pal, Braja Gopal Patra, Dipanakar Das, Sudip Kumar Naskar, Sivaji Bandyopadhyay and Josef Van Genabith. 2014. “How sentiment analysis could help Machine Translation”. In Proceedings of 11th International Conference on Natural Language Processing (ICON 2014, Main Conference). Goa, India, 18-22 December 2014.

  9. Santanu Pal, Sudip Kumar Naskar and Sivaji Bandyopadhyay (2014) Word Alignment-Based Reordering of Source Chunks in PB-SMT. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland.

    http://www.expert-itn.eu/?q=system/files/private/documentation/lrec-2014.pdf
  10. Santanu Pal and Sudip Kumar Naskar, “Hybrid Word Alignment”. 2015. Ready to be published in Springer book series “Theory and Applications of Natural Language Processing”, 2015.

  11. Santanu Pal, Partha Pakray, and Alexander Gelbukh. 2015. How Textual Entailment Can Help Corpus-based Machine Translation. In Proceedings of 14th Mexican International Conference on Artificial Intelligence (MICAI), October 25 to 31, 2015, Cuernavaca, Mexico.

  12. Santanu Pal, Mihaela Vela, Sudip Kumar Naskar, Josef van Genabith. 2015. USAAR-SAPE: An English–Spanish Statistical Automatic Post-Editing System. In the Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015), Lisbon, Portugal.

  13. Santanu Pal, Sudip Kumar Naskar, Josef van Genabith. 2015. UdS-Sant: English–German Hybrid Machine Translation System. In the Proceedings of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015), Lisbon, Portugal.

  14. Tapas Nayek, Sudip Kumar Naskar, Santanu Pal, Marcos Zampieri, Mihaela Vela and Josef van Genabith. CATaLog: New Approaches to TM and Post Editing Interfaces. In the Proceedings of the 1st Workshop on Natural Language Processing for Translation Memories (NLP4TM), collocated with RANLP 2015, Hissar, Bulgaria.

  15. Santanu Pal, Partha Pakray, Alexander Gelbukh, Josef Van Genabith. 2015. Mining Parallel Resources from Comparable Corpora to improve performance of Machine Translation. Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science Volume 9041, 2015, pp 534-544, April 14-20, 2015, Cairo, Egypt.

  16. Santanu Pal. 2015. Statistical Automatic Post Editing. The Proceedings of the EXPERT Scientific and Technological Workshop. Malaga, Spain

  17. S. Pal, S. K. Naskar, and J. van Genabith (2016) Multi-engine and multi-alignment automatic post-editing and its impact on translation productivity. In The 26th International Conference on Computational Linguistics, 2016.

  18. S. Pal, S. K. Naskar, M. Vela, and J. van Genabith (2016) A neural network based approach to automatic post-editing. In the Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, page 281, 2016. http://www.aclweb.org/anthology/P16-2046
  19. S. Pal, S. K. Naskar, M. Zampieri, T. Nayak, and J. van Genabith (2016). Catalog online: A web-based cat tool for distributed translation with data capture for APE and translation process research. In The 26th International Conference on Computational Linguistics.
  20. S. Pal, M. Zampieri, S. K. Naskar, T. Nayak, M. Vela, and J. van Genabith (2016) Catalog online: Porting a post-editing tool to the web. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May.
  21. S. Pal, M. Zampieri, and J. van Genabith (2016) Usaar: An operation sequential model for automatic statistical post-editing. In Proceedings of the First Conference on Machine Translation. Association for Computational Linguistics.

  22. Liling Tan and Santanu Pal. 2014. Manawi: using multi-word expressions and named entities to improve machine translation. In Proceedings of Ninth Workshop on Statistical Machine Translation. Baltimore, USA.

    http://www.expert-itn.eu/?q=system/files/private/documentation/manawi-final.pdf