VLSP 2021 - vieCap4H Challenge: Automatic image caption generation for healthcare domains in Vietnamese

Important dates:

  • Aug 5, 2021: Registration open

  • Aug 30, 2021: Registration closed

  • Sep 20, 2021: Challenge started (via AIHUB.VN)

  • Sep 25, 2021: Public testing phase started

  • Oct 15, 2021: Registration deadline for using pre-trained models. 

  • Oct 20, 2021: (1) Private testing phase started, (2) deadline for team merging, and (3) deadline to sign USER AGREEMENT form [Link] and send back to us via email viecap4h-organizers@aihub.vn.

  • Oct 25, 2021: Private testing phase ended at 23:59:59 GMT+7.

  • Oct 27, 2021: Announce selected teams to submit technical reports.

  • Nov 10, 2021: Deadline for selected teams to submit technical reports.

  • Nov 20, 2021: Final winners announcement.

  • Nov 26, 2021: Result presentation and award ceremony (workshop day).


Humans are unique in their capability to interpret and describe their visual perception in natural language. Although modern AI has achieved ground-breaking successes in the last decade, building a machine that learns to talk about what it sees remains very challenging. In this playground, Image Captioning, a machine learning task to automatically generate natural language descriptions of a given image, has emerged and attracted enormous attention in the AI research community. The task is fascinating and yet challenging at the same time as it sits on the bridge between Computer Vision and Natural Language Processing, the two most important fields of modern AI. 

The COVID-19 pandemic has exacerbated the ongoing shortage of health workers globally, posing an urgent need for smart assistants that can effectively cooperate with humans to fill the gap. Towards this ultimate goal, this challenge aims at assessing the machine’s ability to use Vietnamese to describe the visual content in healthcare settings. It provides the participants an opportunity to contribute their knowledge to advance the field and make potential applications of the task in either healthcare settings and general settings (e.g. virtual assistants for blind and visually impaired people, or visual content indexing and searching) accessible for the local community.

Data Format

The Vietnamese Image Captioning dataset includes:

  • Images: train set, public test set and private test set.

  • The annotations are provided in plain text files in JSON files. Annotation files format is in the following format:

    [{“id”: “uuid_img1”, “captions”: “corresponding_caption1”}, {“id”: “uuid_img2”, “captions”: “corresponding_caption2”}]
  • Evaluation Metric

The submission will be evaluated using BLEU scores against groundtruths. In particular, we use the average score of BLEU-1, BLEU-2, BLEU-3 and BLEU-4 as the evaluation metric for image captioning generation. Please refer to NLTK’s BLEU score implementation at nltk.translate.bleu_score — NLTK 3.6 documentation for reference.

  • Result submission

To evaluate results, please submit a JSON file which contains a generated caption for each image in the test file. The order of image ids in submission files must be in the same order as in the provided test file. The submission format should be the following:

[{“id”: “uuid_img1”, “captions”: “corresponding_caption1”}, {“id”: “uuid_img2”, “captions”: “corresponding_caption2”}]


  • Thao Minh Le
  • Anh Tuan Hoang
  • Long Hoang Dang
  • Thanh-Son Nguyen
  • Xuan-Son Vu


How to cite this data challenge?


title = {VLSP 2021 - VieCap4H Challenge: Automatic Image Caption Generation for Healthcare Domain in Vietnamese},
author = {Le, Thao Minh and Dang, Long Hoang and Nguyen, Thanh-Son and Nguyen, Thi Minh Huyen and Vu, Xuan-Son},
booktitle = {Proceedings of the 8th International Workshop on Vietnamese Language and Speech Processing},
month = {12}, year = {2021}, address = {Ho Chi Minh, Vietnam},
publisher = {VNU Journal of Science: Computer Science and Communication Engineering}}

- Preprint paper: https://people.cs.umu.se/sonvx/files/VieCap4H_VLSP21.pdf 

Contact Us:

Please feel free to contact us if you have any questions (privately) viecap4h-organizers@aihub.vn or publicly at https://groups.google.com/g/viecap4h-organizers



[1] X. Chen, H. Fang, TY Lin, R. Vedantam, S. Gupta, P. Doll ́ar, and C. L. Zitnick. 2015. Microsoft coco captions: Data collection and evaluation server.

[2] Xuan-Son Vu, Thanh-Son Nguyen, Duc-Trong Le, Lili Jiang, Multimodal Review Generation with Privacy and Fairness Awareness, In: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020.

Sponsors - Partners

VinBDI   VinIF     Aimesoft   VBee     Zalo   InfoRe      VTCC    VCCorp    VAIS    ReML.AI     Dagoras


IOIT     HUS     USTH    UET    HUST     Vietlex     INT2 

VLSP 2021 Sponsors


 UIT     HUS     UET     VinIF     Aimesoft     VBee     Zalo     INT2