Skip to main content

Association for Vietnamese Language and Speech Processing

A chapter of VAIP - Vietnam Association for Information Processing

VLSP 2025 challenge on Vietnamese Legal Assistant with Small Language Models

(to be updated)

Important Dates
Motivation

While large language models (LLMs) like ChatGPT and Gemini have demonstrated impressive general capabilities, there remains a strong need for smaller, more efficient, and privacy-compliant models that can run on limited-resource devices, be deployed offline, and be tailored to specific domains.

Legal and administrative tasks in Vietnam involve a mix of formal documents, informal everyday language, and a constantly evolving regulatory environment. Current general-purpose LLMs often struggle with legal accuracy, contextual specificity, and up-to-date coverage of local legal content—leaving room for optimized small models with domain-specific training.

Task Description

Build a Small Language Model (SLM) that can:

  • Understand and respond to real-life legal and administrative questions in Vietnamese
  • Provide legal guidance based on Vietnamese laws (e.g., civil, labor, land, family, social insurance, and public administration)
  • Retrieve and cite relevant legal documents, articles, or procedural steps
  • Operate in offline or low-resource environments, such as on consumer laptops or embedded devices
Input / Output
  • Input: A legal or administrative question written in Vietnamese.
  • Output:
    • A concise, correct, and helpful response
    • May include references to relevant legal documents, articles, or procedural steps
    • (Optional) Links or excerpts from cited documents or templates
Evaluation
  • The organizers will provide two independent evaluation sets:
    • Automatic evaluation: Compare generated responses to gold-standard answers; check legal citation accuracy and procedural correctness
    • Human evaluation (if feasible): Assess legal accuracy, clarity, fluency, and usefulness of responses
  • Detailed evaluation criteria will be announced later on the official task website.
Training and Test Data
Technical Requirements
CategorySpecification
Model size≤ 4 billion parameters (4B)
Inference memory usage≤ 4GB system CPU RAM
Runtime limitWill be announced later
External APIsNot allowed (no GPT, Gemini, Claude, or cloud APIs)
Dataset and Baselines

The organizers will provide:

  • training dataset including:
    • A sample of real-life legal and administrative questions from Vietnamese citizens
    • A curated collection of Vietnamese legal documents (laws, decrees, circulars)
  • Baseline models, such as:
    • QLLaMA, Sailor, etc.

Teams may use additional public datasets, but must clearly declare data sources and comply with licensing terms.

Data Format 
Submission

Submission instructions will be announced later.

Organizers

 

Sponsors and Partners

VinBIGDATA   VinIF  AIMESOFT  bee  Dagoras            

 

 zalo    VTCC  VCCorp

 

 

IOIT  HUS  USTH  UET    TLU  UIT  INT2  jaist  VIETLEX