English-Vietnamese Machine Translation

Important dates

  • Sep 10, 2020: Registration open

  • Sep 30, 2020: Registration closed

  • Oct 1, 2020: Training and dev/test data released

  • Nov 14, 2020: Docker guide released

  • Dec 02, 2020: Official test set released

  • Dec 04, 2020: System submission deadline

  • Dec 15, 2020: Technical report submission

  • Dec 18, 2020: Result announcement (workshop day)


Machine translation is one of the traditional and difficult tasks in the field of NLP. Recently, due to the rapidly increasing amount of data and computing power in ubiquitous environments, the development of Vietnamese-related translation systems has attracted not only academic institutes but also R&D units from companies, both from inland Vietnam and overseas, even at the global scope. One successful event was the evaluation campaign of  IWSLT 2015, where English-Vietnamese language pair became one of the official translation directions of such a prestigious campaign in the machine translation community. 

For the first time since 2013, Machine Translation makes its way back in the VLSP Evaluation Campaign. This campaign aims to create an authentic environment to automatically and manually evaluate translation systems so that it helps to boost the research on Machine Translation of the Vietnamese Language Technology community. We also welcome research groups to improve the methods and bring their systems into the real-world scenario. Further domestic and international collaborations on the field are included in our goals.

Task Description

The task this year includes only one track: text translation from English to Vietnamese in the news domain. Results would be ranked by human evaluation. Participants can submit constrained and unconstrained systems. Constrained systems are the systems that have been developed by participants and  trained on the data provided by the organizers. Unconstrained systems include the systems which use commercial translation products developed by people rather than the participants (e.g. Systran products), the systems which use softwares developed by others playing the main part of the translation process (e.g. Google Translate) and the systems which use the data not to be provided by the workshop. Only constrained systems will be evaluated and ranked. Unconstrained systems would not be human-evaluated and ranked. You can, however, use other data and systems to demonstrate the significant improvements by means of large data and report them in your system paper. 

To have fair results and ease the evaluation, participants will be asked to package their systems into docker images. A detailed guide for how to do that will be posted.

Training and Test Data

  • Parallel Corpora:


  • Monolingual Corpora:


  • Development set and (public) test set:

The development set and (public) test set will be provided with the training sets. Participants could facilitate those datasets while training to validate their models before applying them to the official (private) test set which will be provided on the planned date. You can safely assume that the development set, the test set and the official test set are in the same domain (news). Participants should use the public test set with an automatic metric to decide which systems to submit. We suggest that you could use BLEU score (Papineni et al., 2002) implemented in SacreBLEU.

  • Official (private) test set:

Will be posted here and informed via VLSP mailing list.

Data Format

  • Input format:

    • For the parallel data, training, development and test sets will be provided as UTF-8 plaintexts, 1-to-1 sentence aligned, one “sentence” per line. Notice that “sentence” here is not necessarily a linguistic sentence but maybe phrases.

    • For the monolingual corpora, we provide UTF-8 plaintexts, one “sentence” per line as you would see when you downloaded them.

  • Output format:

    • UTF-8, precomposed Unicode plaintexts, one sentence per line.  Participants might choose appropriate casing methods in the preprocessing steps: word segmentation, true casing, lowercasing or leaving it all along. You might want to use those tools which are available in the Moses git repository


Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run. Only PRIMARY systems are evaluated.

Docker Guide for Submission


Task Organizers:

  • Thanh-Le Ha (Karlsruhe Institute of Technology and Vingroup Big Data Institute)

  • Kim-Anh Nguyen (Vingroup Big Data Institute)

  • Van-Khanh Tran (Vingroup Big Data Institute)

Copyrights of the data – Acknowledgment: