Joachim Daiber, ESR10

Location: Universiteit van Amsterdam, Netherlands

Project Title: Exploiting hierarchical alignments for linguistically-informed SMT models to meet the hybrid approaches that aim at compositional translation

Project Description: 

I am a PhD candidate at the Institute for Logic, Language and Computation at the University of Amsterdam, where I work under the supervision of Dr. Khalil Sima'an. My main research interests lie in the area of statistical machine translation, specifically in linguistically-informed models of syntax and morphology that aim at compositional translation.

Translation into morphologically rich languages often proves to be problematic under the assumption of current models. There are three major challenges for statistical machine translation systems in this area: Firstly, freer word order is difficult to model. Combined with the significant space of possible morphological inflections, this leads to problems of data sparsity. Finally, morphological agreement is often expressed over long distances. For many language pairs, both syntax and morphology has to be taken into account for producing a good translation and both may interact in various ways. The goal of my research is to investigate and produce statistical models for machine translation that compose translation units in a better way.

To this end, I focused on three main problems recently: As a first step towards being able to better handle morphology and word order and their interaction, I introduced a model for predicting target morphology based on the source predicate-argument structure of sentences (work presented at MT Summit 2015). Secondly, I introduced an approach for addressing the rich set of word order choices that morphologically rich language provide (presented at the 1st Deep Machine Translation Workshop in Prague, Czech Republic). And thirdly, to address the problem of compounding, which is an exceedingly productive word formation process in languages such as German, I built and introduced an unsupervised method based on regularities in the semantic representations of words (presented at the 1st Deep Machine Translation Workshop in Prague, Czech Republic). The outcome of this work is a high-quality compound splitting tool that we made available as open source (Apache license version 2) here (https://github.com/jodaiber/semantic_compound_splitting).

I am currently working on unsupervised methods for word order and morphology prediction that extend to a broader set of language pairs.

Research Interests: SMT, Machine Learning, Parsing

Website: http://jodaiber.github.io/

Publication list

  1. Joachim Daiber and Khalil Sima’an (2014) Morphology-Sensitive n-Best Preordering for Machine Translation. Technical report. July
  2. Joachim Daiber, Lautaro Quiroz, Roger Wechsler and Stella Frank (2015) Splitting Compounds by Semantic Analogy. In Proceedings of the 1st Deep Machine Translation Workshop, Prague, Czech Republic, pp. 20 - 28

    http://jodaiber.github.io/doc/compound_analogy.pdf
  3. Daiber, J. and Sima’an, K (2015). Machine Translation with Source-Predicted Target Morphology. Proceedings of MT Summit XV. Miami, USA.

    http://jodaiber.github.io/doc/mtsummit2015.pdf
  4. Daiber, J. and Sima’an, K (2015). Delimiting Morphosyntactic Search Space with Source-Side Reordering Models. In Proceedings of the first Deep Machine Translation Workshop. Prague, Czech Republic.

    http://jodaiber.github.io/doc/preordering_spaces.pdf
  5. Daiber, J., Stanojević, M., and Sima'an, K. (2016) Universal Reordering via Linguistic Typology. In Proceedings of the 26th International Conference on Computational Linguistics.
  6. Daiber, J., Stanojević, M., Aziz, W. and Sima'an, K. (2016) Examining the Relationship between Preordering and Word Order Freedom in Machine Translation. In Proceedings of the First Conference on Machine Translation.

  7. Daiber, J. and van der Goot, R. (2016) The Denoised Web Treebank: Evaluating Dependency Parsing under Noisy Input Conditions. Proceedings of LREC 2016. Portorož, Slovenia. http://www.lrec-conf.org/proceedings/lrec2016/summaries/86.html