Work Packages
EXPERT has 8 work packages (WPs). Table below presents the general project overview by laying out how the project is organised in work packages:
WP No |
Work Package Title |
Lead Beneficiary |
---|---|---|
WP1 | Management | UoW |
WP2 | User perspective | UMA |
WP3 | Data collection | Translated |
WP4 | Language technology, domain ontologies and terminologies | USAAR |
WP5 | Learning from and informing translators | USFD |
WP6 | Hybrid corpus-based approaches | DCU |
WP7 | Training | UvA |
WP8 | Dissemination | Pangeanic |
The activities involved in achieving the goals complement each other and their intertwining ensures high quality and innovative research output.
The proposed research programme is highly multidisciplinary and interdisciplinary, involving the areas of computer science, translation studies, linguistics, language technology, statistics and machine learning, and the interaction among all these areas. It also requires a high degree of intersectoral collaboration by bringing in the strong development and commercialisation experience from industrial partners. This is reflected by the different types and expertise of the academic and industrial institutions in the consortium and their contribution to different but interconnected portions of the scientific, training and implementation of this project.
The five main research sub-topics to be addressed in EXPERT are summarised in the table below, together with the state-of-the-art, main limitations and how these limitations will be addressed in EXPERT in order to obtain technologies that are maximally useful to human translators.
Topic |
State-of-the-art and limitations |
EXPERT solutions |
---|---|---|
User perspective | MT systems force the users to change their working style by imposing the use of sentential segments and not allowing reuse of translations. | Consider the real needs of translators, involving them in the development of technologies, and providing training to prepare them with new skills. |
Data collection and preparation | Existing TM, EBMT and SMT approaches have particular data constraints which prevent the use of the same data for different approaches. | Investigate how data repositories can be built automatically in a way that makes them useful to multiple corpus-based approaches to translation. |
Improve matching and retrieval with linguistic processing | The lack of linguistic processing constrains the retrieval of previous translation examples to those matching the input text at the surface level. | Investigate matching algorithms which rely on lexical, syntactic and semantic variations of texts, including the use of automatically acquired domain ontologies and terminology databases to intelligently replace such invariant parts, making translations consistent |
Hybrid approaches for translation | Hybrid corpus-based solutions consider each approach individually as a tool, not fully exploiting integration possibilities. | Fully integrate corpus-based approaches to improve translation quality and minimize translation effort and cost. |
Human translator in the loop: Informing users and learning from user feedback | In interactive workflows where humans post-edit/complete system translations, translators are not informed about the quality of the translations. The translators’ choice is at best saved for future use. | Generate confidence and quality estimation mechanisms to allow these choices to be based on the quality of the TM/MT output. Make use of translators’ feedback as produced at translation time to improve the system on the fly. |