VLSP 2023 Challenge on Legal Textual Entailment Recognition

Important dates
(Timezone: UTC+7)

Aug 15, 2023: Registration open
Aug 31, 2023: Registration close
Oct 09, 2023: Training data and dev/test data release
Nov 04, 2023: System submission deadline (prediction file + docker)
Nov 26, 2023: Technical report submission
Dec 15-16, 2023: Result announcement - Workshop days

Task description

With the rapid development of AI, especially in natural language processing (NLP) tasks ((i.e., LLM, ChatGPT, Bard), the demand for applying AI in legal text analysis and processing is increasingly critical. Research on NLP for languages such as English, Japanese, and Chinese has been well-established. In this context, through the VLSP (Vietnamese Language Processing) workshop, we introduce the first fundamental research for the Vietnamese language in the legal domain.

The research aims to determine the legal relationship between a legal statement and a legal passage, which is fundamental to Legal AI tasks. Therefore, in this VLSP event, we will explore applications of NLP, deep learning, and generative AI for detecting the relationship between a long legal passage and a quoted statement.

Recognizing Textual Entailment (RTE) is a fundamental task in Natural Language Understanding. The task is to decide whether the meaning of a text can be inferred from the meaning of another one.

A legal textual entailment task is a task to check whether a given statement is entailed by the relevant legal passage(s).

The task can be described as follows: Given a set of statements (assume S is a statement) and a set of legal passages (L1, L2, ..., LN). The task is required to check whether the set of legal passages entails statement S.

Data Format

Data is exchanged in JSON format.

Training Data Format
[
{
"example_id": "DS-101",
"label": "Yes/No",
"statement": "Cơ sở điện ảnh phát hành phim phải chịu trách nhiệm trước pháp luật về nội dung phim phát hành là sai.",
"legal_passages": [
{
"type": "law",
"law_id": "05/2022/QH15",
"article_id": "15"
}
]
}
]

Test Data Format is the same as the training data format with “label” retracted.

Prediction Format

Participating teams shall submit their prediction files in the following format:

[
{
"example_id": "DS-101",
"label": "Yes/No",
}
]

Restriction

With the spirit of fostering open research, participating teams may use any public resources available for the research community, for example, online law libraries like vbpl.vn, open-weight LLMs like Llama-2. However, the use of closed and proprietary services (ChatGPT, GPT-4, etc.) is prohibited. Results obtained through violation of this restriction are disregarded from team ranking consideration.

Evaluation

The evaluation measure will be accuracy, with respect to whether the yes/no label was correctly confirmed. We also consider the use of human evaluation if it is necessary.

Submission

A participating team must complete the following two submission requirements to be considered for team ranking.

System Submission

Each participating team shall submit their system prediction together with their system encapsulated in a docker image to the organizer who will run the following command to get the system prediction for confirmation when necessary:

```cat input_file.json | docker run -i -a stdin -a stdout -a stderr docker-image > prediction_file.json 2> error.log ```

Technical Report Submission

Each participating team shall submit a technical report detailing their methodology to the workshop.

Presentation at the Workshop

The best team will have an oral presentation at the workshop. Other teams will have their posters showcased in poster sessions.

Organizers

Le-Minh Nguyen - JAIST(nguyenml@jaist.ac.jp)
Ha-Thanh Nguyen -NII (nguyenhathanh@nii.ac.jp)
Vu Tran - ISM (vutran@ism.ac.jp)
Nguyen Truong Son -HCMU-US
Nguyen Minh Tien - Hung Yen Technology

Association for Vietnamese Language and Speech Processing

Search

Share This Page

Sponsors and Partners