VLSP 2025 Challenge on DRiLL: The challenge of Deep Retrieval in the expansive Legal Landscape

Important dates

June 20, 2025: Registration open
July 01, 2025: Training data release
July 15, 2025: Public test release
August 08, 2025: Private test release
August 12, 2025: System submission deadline
August 15, 2025: Private test results release
August 30, 2025: Technical report submission
September 27, 2025: Notification of acceptance:
October 03, 2025: Camera-ready deadline
October 29-30, 2025: Conference dates

Registration

https://bit.ly/vlsp-2025-drill

Task Description

With the rapid advancement of artificial intelligence, particularly generative models in natural language processing (NLP) such as ChatGPT, DeepSeek, and Qwen, the demand for intelligent tools to process legal texts is growing significantly. While Legal NLP research has seen substantial progress in languages like English, Japanese, and Chinese, foundational research for Vietnamese legal text processing remains relatively underdeveloped. In this shared task, we introduce one of the first initiatives aimed at advancing Vietnamese Legal NLP.

Information Retrieval (IR) is a core task in NLP, concerned with identifying which pieces of information are most relevant to a given query. In the legal domain, the Legal Document Retrieval task focuses on determining which legal articles are relevant to a specific legal question. The task can be formalized as follows: Given a set of questions Q = {q1, q2, ..., qn} and a corpus of articles A = {a1, a2, ..., an) the task is required to identify a a subset A′ ⊂ A where each article ai ∈ A' is considered “relevant” to the corresponding question q.

We call an article “Relevant” to a query if the query sentence can be answered Yes/No, entailed from the meaning of the article.

External data & PLMs Usages

To ensure the equality in the competition, participants cannot use external data in any processing step.

You can use pre-trained language models and LLMs whose training data and/or models are publicly available (e.g., Huggingface or similar sites), but you cannot use LLMs whose models are closed (e.g. GPT-4o, Gemini, ...). Additionally, you can only use models released before January 1st, 2025 (VNT).

For reproducibility purposes, please include information on how to obtain the model in the paper.

Evaluation metrics

Automatic Evaluation: Recall, Precision, Macro-F2. We use macro-average (evaluation measure is calculated for each query and then take average) to calculate the final evaluation score.
Human Evaluation

Data Format

Training data:

{

"qid": <integer>, # Unique question identifier

"question": "<string>", # Text of the question

"relevant_laws": [<integer>] # List of `aid` values (article IDs) from `legal_corpus.json`

}

Example:

[

{

"qid": 11938,

"question": "Chế độ báo cáo của doanh nghiệp kinh doanh dịch vụ xếp hạng tín nhiệm quy định thế nào?",

"relevant_laws": [

27053,

27071

]

}

]

Legal corpus:

{

"id": <integer>, # Unique document identifier

"law_id": <integer>, # Official law number

"content": [ # List of dictionaries, each dictionary contains the article's id (aid), the article's content (content_Article)

{

"aid": <integer>, # Article identifier within this document

"content_Article": "<string>" # Full text of the article

...

]

}

Example

[

{

"id": 0,

"law_id": "14/2022/TT-NHNN",

"content": [

{

"aid": 0,

"content_Article": "1. Thông tư này quy định mã số, tiêu chuẩn chuyên môn, nghiệp vụ và xếp lương đối với các ngạch công chức chuyên ngành Ngân hàng.\n\n2. Thông tư này áp dụng đối với công chức làm việc tại các đơn vị thuộc Ngân hàng Nhà nước Việt Nam (gọi tắt là Ngân hàng Nhà nước)."

{

"aid": 1,

"content_Article": "1. Kiểm soát viên cao cấp ngân hàng Mã số: 07.044 2. Kiểm soát viên chính ngân hàng Mã số: 07.045 3. Kiểm soát viên ngân hàng Mã số: 07.046 4. Thủ kho, thủ quỹ ngân hàng Mã số: 07.048 5. Nhân viên Tiền tệ - Kho quỹ Mã số: 07.047 "

......

]

Submission guidelines

The results are submitted using the Codabench platform via the following URL: https://www.codabench.org/competitions/9722/

Please note that you MUST use the registered email to sign in. Only accounts with registered email can access the competition to prevent the use of multiple accounts.
Please ensure that your team name on the leaderboard exactly matches the one provided in your application form. Any discrepancy between the two may result in your submission being disqualified or excluded from the final evaluation.
Only one Codabench account per team is approved to submit results.
The final results will not be considered official until a working notes paper with the full description of the methods is submitted.

Contact

If you have any questions, please get in touch with minhnt@jaist.ac.jp

Organizers

Thi-Hai-Yen Vuong - VNU University of Engineering and Technology (VNU-UET) - yenvth@vnu.edu.vn
Ha-Thanh Nguyen - National Institute of Informatics (NII), Japan
Trong-Khoi Dao - VNU University of Law (VNU-UL)
Tan-Minh Nguyen - Japan Advanced Institute of Science and Technology (JAIST) - minhnt@jaist.ac.jp
Hoang-Trung Nguyen - VNU University of Engineering and Technology (VNU-UET)
Hoang-Quynh Le - VNU University of Engineering and Technology (VNU-UET)

Association for Vietnamese Language and Speech Processing

Search