VLSP 2025 challenge on Multimodal Legal QA on Traffic Sign Rules

Important dates (updated)

26 June 2025: Call for participants.

07 July 2025: Training set releases

15 August 2025: Registration closed

20 August 2025: Private test set release

20-25 August 2025: Testing phase

+ 20-22 August 2025: Testing for Task 1 (start at 0:00 am on August 20, 2025 - end at 11:59 pm on August 22, 2025, UTC+0 time)

+ 23-25 August 2025: Testing for Task 2 (start at 0:00 am on August 23, 2025 - end at 11:59 pm on August 25, 2025, UTC+0 time)

27 August 2025: Result announcement and the beginning of paper submission for the top 5 teams.

10 September 2025: Paper and source code submission deadline for top 5 teams.

27 September 2025: Notification of acceptance.

03 October 2025: Camera ready.

29 - 30 October 2025: Conference (workshop)

Private test submission instruction

The official submission must be done via the Codabench system on the Private test phase. We do not accept the late submission via email.
Each team has a maximum of three submission (trials). The final results will be the best submission among three trials.
In the submission stage, the leaderboard will be hidden.
The data for submission will be sent via email to the team leader. Please check your email frequently during the Private test phase submissions

Registration

Please register via this link: https://forms.gle/jKGSWKDRjpUwzYhA9

Submission

Submission system: https://www.codabench.org/competitions/9525/

Please follow the instructions in the submission system.

Task Description

The question answering (QA) task is a highly applicable problem in the field of artificial intelligence, especially in natural language processing (NLP). In particular, applying QA to the legal domain (legal QA) can help build intelligent support systems that serve users’ legal information retrieval needs. At present, complying with road traffic safety regulations is an urgent issue to ensure traffic safety and to protect the lives and property of citizens when participating in traffic. In road traffic, thoroughly understanding and strictly following traffic instructions through signs and signals is the foundation for ensuring safety for travelers. The Share Task VLSP 2025 MLQA-TSR is introduced with the aim of promoting NLP research through the QA task, helping to build systems that support users in understanding the meanings of road traffic signs and traffic scenarios based on those signs, thereby raising awareness of traffic safety. Notably, this task, for the first time, combines both image and text data with the goal of developing multimodal models to support research in NLP in particular and AI in general.

The VLSP 2025 MLQA-TSR consists of two sub-tasks:

Subtask 1: Multimodal Retrieval

Input:

+ Question about the traffic signs in natural languages.

+ Actual image of the traffic signs on the street..

Output: Reference: articles(s) in LAW ON ROAD TRAFFIC ORDER AND SAFETY (36/2024/QH15) or National Technical Regulation on Traffic Signs and Signals (QCVN 41:2024/BGTVT)

Subtask 2: Question answering

Input:

+ Question about the traffic signs in natural languages.

+ Actual image of the traffic signs on the street..

+ Reference: term(s) in Regulation on Traffic or National Technical Regulation on Traffic Signs and Signals

Output: Multiple-choice (4 options: A,B,C,D) or Yes/No questions.

For example (in Vietnamese language):

The traffice sign image:

Example

Question: Các loại xe nào được phép lưu thông vào đoạn đường trên trong khoảng từ 6:00 đến 22:00:

A. Xe khách 40 chỗ.

B. Xe ô tô con

C. Xe đầu kéo.

D. Ô tô kéo rơ moóc

Reference: Điều 26.1, P.106(a,b) trong Thông tư 54/2019/TT-BGTVT

Correct answer: B.

Evaluation metric

Subtask 1: F2 score

For one sample, the F2 is computed as:

+ precision = the number of correctly retrieved articles / the number of retrieved articles

+ recall = the number of correctly retrieved articles / the number of relevant articles

+ F2 = 5*precision*recal / (4*precision + recall)

The final F2 score is the average value over all samples

Subtask 2: Accuracy.

Accuracy = total correct choices / Total questions

Rules

1. The participating teams will be provided with a dataset by the Organizers and are only allowed to use the dataset provided by the competition; external datasets are not permitted.

2. Teams are allowed to use open-source large language models (LLMs), and are encouraged to adopt methods that utilize small-sized but efficient LLMs. Commercial LLMs such as ChatGPT, Claude Sonet, etc., are not allowed. The used model(s) must have been published on HuggingFace or GitHub.

3. Teams must submit a technical report describing the proposed method to share the task, along with the source code that is capable of reproducing the model or system.

Organizers

Minh Le Nguyen - Japan Advanced Institute of Science and Technology (JAIST)

Ngan Luu-Thuy Nguyen - University of Information Technology, Vietnam National University, Ho Chi Minh City (VNUHCM-UIT)

Kiet Van Nguyen - University of Information Technology, Vietnam National University, Ho Chi Minh City (VNUHCM-UIT)

Vu Tran - Japan Advanced Institute of Science and Technology (JAIST)

Trung Vo - Japan Advanced Institute of Science and Technology (JAIST)

Son Thanh Luu - University of Information Technology, Vietnam National University, Ho Chi Minh City (VNUHCM-UIT), and Japan Advanced Institute of Science and Technology (JAIST)

Hiep Nguyen - Japan Advanced Institute of Science and Technology (JAIST)

Khanh Tran - University of Information Technology, Vietnam National University, Ho Chi Minh City (VNUHCM-UIT)

Contact

Mr. Son Thanh Luu (sonlt@uit.edu.vn)

Association for Vietnamese Language and Speech Processing

Search