Liling Tan, ESR5

Location: Universitaet Saarlandes, Germany

Project Title: Using Terminologies and Ontologies to Improve Translation

Project Description: 

A term is any conventional symbol representing a concept defined In a subject field; a terminology is the aggregate of terms, which represent the system of concepts of an individual subject field. The core characteristic of a term is defined as termhood, i.e. the degree to which a linguistic unit is related to a domain-specific context. An ontology is a structured categorization of domain-specific terms, usually represented in a hierarchical tree of conceptual terms or a graphical representation where the vertices of the graphs refers to the conceptual terms and the edge between the a pair of terms presents the relation between them.

The aim of my project is to develop novel algorithms for term extraction and ontology induction from domain specific text, the focus of the algorithms would be in scalability and maintainability of the algorithm to web-scale data with state-of-art performance. And eventually I hope to show that it is possible to use automatically extracted semantic knowledge (terminology or/and ontology) to improve machine translation quality

Currently, we (I, with fellow ESR and colleagues in the EXPERT project) have developed (i) a Language Model based Pointwise Mutual Information (LMPMI) term extractor that adapts to different domains by using pre-train language models and (ii) an ontology induction system that uses neural representation of words to discover hyper-hyponym relations between terms.

Relating to the translation research paradigm, as part of the final phase of my EXPERT ESR fellowship, I am conducting pilot experiments to validate improvement made to phrase-based machine translation system by extending the models with lexical information using automatically extract terminology.

Research Interests: MT/TM, Word Sense Disambiguation (WSD), Asian Language NLP, Knowledge Base Population (KBP)

Website: http://alvations.com

Publication list

  1. Hanna Bechara, Rohit Gupta, Liling Tan, Constantin Orasan, Ruslan Mitkov and Josef van Genabith. 2016. WOLVESAAR: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.

  2. Jon Dehdari, Liling Tan and Josef van Genabith. 2016. BIRA: Improved Predictive Exchange Word Clustering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California

  3. Jon Dehdari, Liling Tan and Josef van Genabith. 2016. Scaling Up Word Clustering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. San Diego, California

  4. Guy Emerson, Liling Tan, Susanne Fertmann, Alexis Palmer and Michaela Regneri . 2014. SeedLing: Building and using a seed corpus for the Human Language Project. In Proceedings of the use of Computational methods in the study of Endangered Languages (ComputEL) Workshop. Baltimore, USA.

  5. Jose M.M. Martinez and Liling Tan. 2016. Complex Word Identification with Sense Entropy and Sentence Perplexity. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.

  6. Carolina Scarton, Liling Tan and Lucia Specia. (2015). USHEF and USAAR-USHEF Participation in the WMT15 Quality Estimation Shared Task. In Proceedings of Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal.

    http://www.statmt.org/wmt15/pdf/WMT40.pdf
  7. Liling Tan and Santanu Pal. 2014. Manawi: using multi-word expressions and named entities to improve machine translation. In Proceedings of Ninth Workshop on Statistical Machine Translation. Baltimore, USA.

    http://www.expert-itn.eu/?q=system/files/private/documentation/manawi-final.pdf
  8. Liling Tan and Francis Bond. 2014. NTU-MC Toolkit: Annotating a Linguistically Diverse Corpus. In Proceedings of 25th International Conference on Computational Linguistics (COLING 2014). Dublin, Ireland.
  9. Liling Tan, Anne Schumann, Jose M.M. Martinez and Francis Bond. 2014. Sensible: L2 Translation Assistance by Emulating the Manual Post-Editing Process. In Proceedings of the Eighth International Workshop on Semantic Evaluation (SemEval 2014). Dublin, Ireland.

  10. Liling Tan, Marcos Zampieri, Nikola Ljubešic, Jörg Tiedemann. 2014. Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection. In Proceedings of the 7th Workshop on Building and Using Comparable Corpora: Building Resources for Machine Translation Research. Reykjavik, Iceland.

  11. Liling and Francis Bond. 2014. Manipulating Input Data for Machine Translation [poster]. In Workshop for Asian Translation. Tokyo, Japan
  12. Liling Tan, Rohit Gupta, Josef van Genabith (2015). USAAR-WLV: Hypernym Generation with Deep Neural Nets. In Proceedings of the International Workshop on Semantic Evaluation (SemEval-2015), Denver, Colorado, USA.

    http://www.aclweb.org/anthology/S15-2155
  13. Liling Tan, Josef van Genabith and Francis Bond. 2015. Passive and Pervasive Use of a Bilingual Dictionary in Statistical Machine Translations. In Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra). Beijing, China.

    http://glicom.upf.edu/hytra2015/pdf/HyTra-405.pdf
  14. Liling Tan. 2015. EXPERT Innovations in Terminology Extraction and Ontology Induction. In Proceedings of the EXPERT Scientific and Technological Workshop. Malaga, Spain.

  15. Liling Tan, Carol Scarton, Lucia Specia and Josef van Genabith. 2015. USAAR-SHEFFIELD: Semantic Textual Similarity with Deep Regression and Machine Translation Evaluation Metrics. In Proceedings of Ninth International Workshop on Semantic Evaluation (SemEval 2015). Denver, USA.

    http://www.aclweb.org/anthology/S15-2015
  16. Liling Tan and Noam Ordan. 2015. USAAR-CHRONOS: Crawling the Web for Temporal Annotations. In Proceedings of Ninth International Workshop on Semantic Evaluation (SemEval 2015). Denver, USA.

    http://www.aclweb.org/anthology/S15-2143
  17. Liling Tan. 2015. Statistical Machine Translation with NLTK. Poster presented at Python Conference.
  18. Liling Tan, Jon Dehdari and Josef van Genabith. 2015. An Awkward Disparity between BLEU / RIBES and Human Judgements in Machine Translation. In Proceedings of the 2nd Workshop on Asian Translation (WAT2015). Kyoto, Japan.

  19. Liling Tan, Jon Dehdari and Josef van Genabith. 2016. Faster and Lighter Baseline for Machine Translation. In Proceedings of Third Workshop for Asian Translation (WAT2016). Osaka, Japan.

  20. Liling Tan, Carol Scarton, Lucia Specia and Josef van Genabith. 2016. SAARSHEFF: Semantic Textual Similarity with Machine Translation Evaluation Metrics Ensembles. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.

  21. Liling Tan, Francis Bond and Josef van Genabith. 2016. Hyponym Endocentricity. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.

  22. Mihaela Vela and Liling Tan. 2015. Predicting Machine Translation Adequacy with Document Embeddings. In Proceedings of Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal.

    http://www.statmt.org/wmt15/pdf/WMT51.pdf
  23. Constance Wang, Liling Tan. 2014. Explicit Holmes: A Diachronie Investigation of Explicitness and Explicitation in Chinese Translations of Detective Stories. In Kerstin Kunz, Elke Teich, Silvia Hansen-Schirra, Stella Neumann and Peggy Daut (Editors). Caught in the Middle – Language Use and Translation: A Festschrift for Erich Steiner on the Occasion of his 60th Birthday. Germany: Saarland University Press.
  24. Marcos Zampieri and Liling Tan. 2014. Grammatical Error Detection with Limited Training Data: The Case of Chinese. In Proceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA'14), Nara, Japan, 30 November, 2014, pp. 69-74.

  25. Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann. 2014. A Report on the Discriminating between Similar Languages (DSL) Shared Task 2014. Proceedings of the 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial). Dublin, Ireland.
  26. Marcos Zampieri, Liling Tan and Constance Wang. 2014. Translation Clouds: Producing Word Clouds from Non-aligned Parallel Corpora [abstract]. In Proceedings of the 6th International Conference on Corpus Linguistics (CILC6). Las Palmas de Gran Canaria, Spain.

  27. Marcos Zampieri, Liling Tan, Nikola Ljubesic, Jorg Tiedemann, and Preslav Nakov. 2015. Overview of the DSL Shared Task 2015. In Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial), Hissar, Bulgaria

  28. Marcos Zampieri, Shervin Malmasi, Liling Tan and Josef van Genabith. 2016. MacSaar: Zipfian and Character-level features for Complex Word Identification. In Proceedings of Tenth International Workshop on Semantic Evaluation (SemEval 2016). San Diego, USA.