Carolina Scarton, ESR7

Location: University of Sheffield, UK

Project Title: Estimating the confidence and quality of corpus-based approaches to translation

Project Description: 

Predicting the quality and confidence of machine translations is a challenging topic. Along the years, several automatic metrics were proposed, trying to avoid the direct human effort. Quality estimation (QE) of machine translations is a kind of evaluation that only considers source and target documents (without the need of a translation reference). This approach is promising, since we can use machine learning techniques to predict the quality of unseen data, by only having few labelled data points. Most work in QE focuses on sentence-level and word-level prediction. On the other hand, document-level QE can be desirable for applications aimed at other types of end-users (such as gisting) and wherefully automated MT is needed (e.g. because the amount of data is unfeasible for human post-editing). On my PhD project I am addressing document-level QE. More specifically, I am focus on two problems. The first is to find the best features for document-level QE, including studies on discourse phenomena and document-wide information. The second is to find appropriate quality labels for documents, that go beyond the simple aggregation of sentence-level quality scores.

Research Interests: Quality Estimation of machine translation, Automatic evaluation of NLP tasks, Discourse analysis for NLP evaluation, Readability Assessment, Text Simplification and Cross-lingual computational lexical resources

Website: http://staffwww.dcs.shef.ac.uk/people/C.Scarton/

Publication list

  1. O. Bojar, R. Chatterjee, C. Federmann, B. Haddow, C. Hokamp, M. Huck, V. Logacheva, P. Koehn, C. Monz, M. Negri, P. Pecina, M. Post, C. Scarton, L. Specia, and M. Turchi (2015) “Findings of the 2015 Workshop on Statistical Machine Translation,” In Proceedings of the 10th Workshop on Machine Translation (WMT-2015), Lisbon, Portugal, pp. 1–46
  2. O. Bojar, R. Chatterjee, C. Federmann, Y. Graham, B. Haddow, M. Huck, A.J. Yepes, P. Koehn, V. Logacheva, C. Monz, M. Negri, A. Neveol, M. Neves, M. Popel, M. Post, R. Rubino, C. Scarton, L. Specia, M. Turchi, K. Verspoor and M. Zampieri. (2016) ``Findings of the 2016 Conference on Machine Translation''. In Proceedings of the First Conference on Machine Translation (WMT-2016), Berlin, Germany, August 2016, pp. 131--198.

  3. Scarton, Carolina and Specia, Lucia (2014): Document-level translation quality estimation: exploring discourse and pseudo-references. In the Proceedings of EAMT 2014. Dubrovnik, Croatia, pp. 101-108

  4. Scarton, C. and Specia, L. (2014): Exploring Consensus in Machine Translation for Quality Estimation. In the Proceedings of WMT 2014. Baltimore, MD, pp. 342-347.

  5. Carolina Scarton (2015): Discourse and Document-level Information for Evaluating Language Output Tasks. In the Proceedings of NAACL-HLT 2015 Student Research Workshop (SRW), Denver, CO, pp. 118-125

  6. Carolina Scarton, Marcos Zampieri, Mihaela Vela, Josef van Genabith and Lucia Specia (2015): Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation. In the Proceedings of EAMT 2015, Antalya, Turkey, pp. 121-128.
  7. Carolina Scarton, Liling Tan and Lucia Specia. (2015). USHEF and USAAR-USHEF Participation in the WMT15 Quality Estimation Shared Task. In Proceedings of Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal.

    http://www.statmt.org/wmt15/pdf/WMT40.pdf
  8. Carolina Scarton and Lucia Specia (2015): A quantitative analysis of discourse phenomena in machine translation. Discours - Revue de linguistique, psycholinguistique et informatique, number 16.

    https://discours.revues.org/9047
  9. Carolina Scarton (2015): Finding Ways to Assess Machine Translated Documents for Document-level Quality Prediction. In the Proceedings of EXPERT Scientific and Technological Workshop, Malaga, Spain

  10. Scarton, C., Beck, D., Shah, K., Smith, K. S., and Specia, L. (2016). Word embeddings and discourse information for Quality Estimation. In Proceedings of the First Conference on Machine Translation, pages 831–837, Berlin, Germany

  11. Carolina Scarton and Lucia Specia (2016): A Reading Comprehension Corpus for Machine Translation Evaluation. In the Proceedings of the Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia, pp. 3652-3658.
  12. Lucia Specia, Gustavo Henrique Paetzold and Carolina Scarton (2015): Multi-level Translation Quality Prediction with QuEst++. In the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, pp. 115-120.

  13. Liling Tan, Carol Scarton, Lucia Specia and Josef van Genabith. 2015. USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. In Proceedings of Ninth International Workshop on Semantic Evaluation (SemEval 2015). Denver, USA.

    http://www.aclweb.org/anthology/S15-2015
  14. Liling Tan, Carol Scarton, Lucia Specia and Josef van Genabith. 2016. SAARSHEFF: Semantic Textual Similarity with Machine Translation Evaluation Metrics Ensembles. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.