Work Packages

EXPERT has 8 work packages (WPs). Table below presents the general project overview by laying out how the project is organised in work packages:

WP No

Work Package Title

Lead Beneficiary

WP1 Management UoW
WP2 User perspective UMA
WP3 Data collection Translated
WP4 Language technology, domain ontologies and terminologies USAAR
WP5 Learning from and informing translators USFD
WP6 Hybrid corpus-based approaches DCU
WP7 Training UvA
WP8 Dissemination Pangeanic

 

 

 

 

 

 

 

 

 

The activities involved in achieving the goals complement each other and their intertwining ensures high quality and innovative research output.

The proposed research programme is highly multidisciplinary and interdisciplinary, involving the areas of computer science, translation studies, linguistics, language technology, statistics and machine learning, and the interaction among all these areas. It also requires a high degree of intersectoral collaboration by bringing in the strong development and commercialisation experience from industrial partners. This is reflected by the different types and expertise of the academic and industrial institutions in the consortium and their contribution to different but interconnected portions of the scientific, training and implementation of this project.

The five main research sub-topics to be addressed in EXPERT are summarised in the table below, together with the state-of-the-art, main limitations and how these limitations will be addressed in EXPERT in order to obtain technologies that are maximally useful to human translators.

Topic

State-of-the-art and limitations

EXPERT solutions

User perspective MT systems force the users to change their working style by imposing the use of sentential segments and not allowing reuse of translations. Consider the real needs of translators, involving them in the development of technologies, and providing training to prepare them with new skills.
Data collection and preparation Existing TM, EBMT and SMT approaches have particular data constraints which prevent the use of the same data for different approaches. Investigate how data repositories can be built automatically in a way that makes them useful to multiple corpus-based approaches to translation.
Improve matching and retrieval with linguistic processing The lack of linguistic processing constrains the retrieval of previous translation examples to those matching the input text at the surface level. Investigate matching algorithms which rely on lexical, syntactic and semantic variations of texts, including the use of automatically acquired domain ontologies and terminology databases to intelligently replace such invariant parts, making translations consistent
Hybrid approaches for translation Hybrid corpus-based solutions consider each approach individually as a tool, not fully exploiting integration possibilities. Fully integrate corpus-based approaches to improve translation quality and minimize translation effort and cost.
Human translator in the loop: Informing users and learning from user feedback In interactive workflows where humans post-edit/complete system translations, translators are not informed about the quality of the translations. The translators’ choice is at best saved for future use. Generate confidence and quality estimation mechanisms to allow these choices to be based on the quality of the TM/MT output. Make use of translators’ feedback as produced at translation time to improve the system on the fly.