VLSP 2025 challenge on Medical domain MT with Limited-Pretraining models
VLSP 2025 CHALLENGE ON MEDICAL MACHINE TRANSLATION WITH LIMIT PARAMETER AND RESOURCE USING PRE-TRAIN MODEL
Important dates
June 23, 2025: Registration open
July 3, 2025: Training data and base models release
July 17, 2025: Public test release
August 23, 2025: System submission deadline
August 30, 2025: Private test results release
September 10, 2025: Technical report submission
September 27, 2025: Notification of acceptance
October 3, 2025: Camera-ready deadline
October 29-30, 2025: Conference dates
Task Description
Machine Translation (MT) in the medical field poses a particular challenge due to the high requirements for accuracy and the presence of complex terminology, specific sentence structures, and semantic nuances appropriate to the terminology. When solving this problem with models that have limited pre-training resources, the difficulty increases significantly.
Challenges to be addressed:
Lack of background knowledge: Limited pre-training models often lack the deep understanding of language and the world that large models have. This makes it difficult to grasp the complex grammatical structures and context in medical texts.
Handling specialized terminology: Medical texts contain acronyms, drug names, diseases, and specific procedures. Models developed from scratch will encounter many Out-of-Vocabulary (OOV) problems and cannot accurately translate these terms.
Accuracy: Errors in medical translation can have serious consequences. The lack of large pre-training data reduces the model's ability to produce faithful and reliable translations.
Limited medical bilingual data: High-quality bilingual data in the medical domain is often scarce and expensive, making it more difficult to address the shortage of pre-training through fine-tuning.
Teams will be provided with the same datasets as previous years, including training, dev, and test datasets from the medical domain. Additionally, teams can use pre-trained models with limited parameters and open datasets to address challenges such as those mentioned above for medical machine translation.
Domain: medical
Languages: English into Vietnamese and Vietnamese into English
Evaluation: translation quality
We set this shared task in a restricted context with limited resources: the base LLM is fixed to the Qwen 2.5 family and a maximum of 3B parameters.
For Machine Translation, we focus on the following directions, which are currently favoured by the respective communities:
English to Vietnamese (en→vi)
Vietnamese to English (vi→en)
The submissions for the MT tasks must be generated from the same model per language.
In order to enable participation even with few computational resources, we constrain the base models to a maximum of 3B parameters. Base models are from the Qwen 2.5 and Qwen 3 family:
- 3B: huggingface.co/Qwen/Qwen2.5-3B-Instruct
- 1.5B: huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
- 0.5B: huggingface.co/Qwen/Qwen2.5-0.5B-Instruct
- 1.7 B : https://huggingface.co/Qwen/Qwen3-1.7B
- 0.6 B : https://huggingface.co/Qwen/Qwen3-0.6B
You are also permitted to use any of the quantized versions or unsloth’d versions found here (provided they are 3B or less):
- Qwen 2.5 family: https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e
- Unsloth Qwen 2.5 family: https://huggingface.co/collections/unsloth/qwen-25-66fe4c08fb9ada518e8a0d3f
- Qwen 3 family: https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f
- Unsloth Qwen 3 family: https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95
Evaluation
System results would be ranked by human evaluation. Participants can only submit constrained systems. Constrained systems are the systems developed by participants and trained on the data provided by the organizers. Only constrained systems will be evaluated and ranked. You can, however, use other data and systems to demonstrate the significant improvements employing large data and report them in your system paper. The details on how to submit your systems will be informed later.
Training and Test Data
Parallel Corpora: English-Vietnamese
Monolingual Corpora: English and Vietnamese
Development set and (public) test set: English-Vietnamese
The development set and (public) test set will be provided together with the training sets. Participants could facilitate those datasets while training to validate their models before applying them to the official (private) test set, which will be provided on the planned date. You can safely assume that the development set, the test set, and the official test set are in the same domain. Participants should use the public test set with an automatic metric to decide which systems to submit. We suggest that you could use SacreBLEU (Post, 2018) for evaluation of your machine system.
Official (private) test set will be posted here and informed via VLSP mailing list.
Data Format
Input format:
For the parallel data, training, development and public test sets will be provided as UTF-8 plaintexts, 1-to-1 sentence aligned, one “sentence” per line. Notice that “sentence” here is not necessarily a linguistic sentence but maybe phrases.
For the monolingual corpora, we provide UTF-8 plaintexts, one “sentence” per line as you would see when you downloaded them
Output format:
UTF-8, precomposed Unicode plaintexts, one sentence per line. Participants might choose appropriate casing methods in the preprocessing steps: word segmentation, true casing, lowercasing or leaving it all along. You might want to use those tools which are available in the Moses git repository.
Submission
Participants must submit a working Docker image that satisfies the following constraint:
Self contained – the image contains all your model and its dependency, and must work offline. Do not use any online service/API in your code
Accompanied by a Bash script that uploads an input text file to your Docker image and receives the corresponding translations in an output text file. This bash script will receive (1) the host: port web path of your Docker image, (2) the path to the input file, and (3) the path to the output file as arguments. Additionally, participants are encouraged to add output statistics to standard output, such as total run time, averaged sentences-per-second and words-per-second, etc.
Compressed in a known format: .tar.gz | .tar.bz2 | .7z | .rar | .zip
Provided a MD5 checksum for integrity verification.
The compressed file is to be hosted in a known cloud storage service (e.g Google Drive, Microsoft OneDrive) and given appropriate download permission.
Multiple submissions are allowed, but only the last submission will be evaluated.
Organizers
Van-Vinh Nguyen (vinhnv@vnu.edu.vn) -VNU University of Engineering and Technology (VNU-UET)
Hong-Viet Tran (thviet@vnu.edu.vn) -VNU University of Engineering and Technology (VNU-UET)
Minh-Quy Nguyen (minhquy1624@gmail.com)- VNU University of Engineering and Technology (VNU-UET)
Sponsors and Partners