Nhảy đến nội dung

CLB Xử lí ngôn ngữ và tiếng nói tiếng Việt 

Chi hội của Hội Tin học Việt Nam

VLSP 2025 Challenge on Vietnamese Spoofing-Aware Speaker Verification

Important dates 
  • June 24, 2025: Registration open
  • July 10, 2025: Training data release (Check the Training and Test Data section below)
  • July 15, 2025: Public test release
  • August 8, 2025: Private test release
  • August 12, 2025: System submission deadline
  • August 15, 2025: Private test results release
  • August 30, 2025: Technical report submission
  • September 27, 2025: Notification of acceptance
  • October 3, 2025: Camera-ready deadline
  • October 29-30, 2025: Conference dates
Task Description

The Vietnamese Spoofing-Aware Speaker Verification Challenge (VSASV 2025) aims to accelerate research in speaker verification (SV) and spoof detection for Vietnamese, a language currently limited by data resources. This challenge features two tracks:

  • Task 1: Spoofing-Aware Speaker Verification

Participants will develop SV systems using the dataset provided by the Organisers. Each trial will present an enrollment audio and a test audio, where the test can either be bona fide or spoofed. Systems must verify if the enrollment and test audio represent the same speaker identity while being resilient against spoofing attacks, i.e. if the test audio is spoofed, the SV system must reject the trial.

  • Task 2: Vietnamese Spoof Detection

Participants will build systems to classify a single audio sample as either bona fide or spoofed, without speaker identity information.

Both tasks are evaluated primarily using the Equal Error Rate (EER). To simulate real-world conditions, negative trials may include synthetic samples generated from the target speaker’s voice and recordings captured across diverse devices, introducing both spoofing and channel variability. Competitors are encouraged to design robust solutions capable of handling both speaker imposture and advanced spoofing techniques.

The test set may include partially spoofed audio, which is bona fide audio originally, but has some small segments replaced with synthetic speech.

Evaluation

The performance of the models will be evaluated by the Equal Error Rate (EER), that is the point where the False Acceptance Rate (FAR) equals the False Rejection Rate (FRR).

Training and Test Data

Participants will receive a Vietnamese speech dataset from multiple speakers. The challenge emphasizes generalization ability: the test set will include speakers that were unseen during training, requiring models to effectively adapt to new voices.

The training dataset has been released at the following link.

Submission

Multiple submissions are allowed but under a limitation of each phase, the evaluation result is based on the submission having the lowest EER.

The submission file comprises a header, a set of testing pairs, and a cosine similarity output by the system for the pair. The order of the pairs in the submission file must follow the same order as the pair list. A single line must contain 3 fields separated by tab characters in the following format:

enrollment_wav<TAB>test_wav<TAB>score<NEWLINE>

**where

enrollment_wav - The enrollment utterance

test_wav - The test utterance

score - The cosine similarity

For example:

enrollment_wav test_wav score

file1.wav file2.wav 0.81285

file1.wav file3.wav 0.01029

...

Basic Regulations
  • The use of pre-trained models is allowed only if the checkpoints are publicly available. These pre-trained models include models originally trained for other tasks such as speech recognition, text-to-speech synthesis, speech enhancement, or voice activity detection.
  • Competitors are allowed to utilize non-speech resources (e.g., noise recordings, impulse responses) for data augmentation purposes, but they must clearly declare and share these resources with all other teams.
  • The competition includes both a public and a private test set, with the final rankings determined based on performance on the private test set. Teams WILL be asked to submit their source code for verification and reproduction of the final outcomes, and a technical report in the form of a formal 5-page conference paper.
  • A team will be considered valid for the prize if and only if it satisfies all the conditions below:
    • The code is submitted to the Organizer.
    • Results are reproduced by the Organizer.
    • The report is submitted and conforms to the format of a formal conference paper.
  • Participating teams are allowed to generate additional spoofed data which can be applied using signal manipulation (e.g., Additive Noise, Speed Perturbation, Pitch Shifting,...) or other models, but (1) only from the data provided by the Organiser, and (2) all methods for generating additional data must be submitted to the organizers, along with the generation model and detailed reproduction instructions.
  • The use of any external speaker or speech datasets (speaker recognition, speech recognition, voice cloning, emotion datasets, etc.) is strictly prohibited.
Registration

Participants can register through the link.

Organizers
  • Nguyễn Thị Thu Trang
  • Phương Tuấn Đạt

     Hosted by: Hanoi University of Science and Technology

    Contact at trangntt@soict.hust.edu.vn

Sponsors and Partners

VinBIGDATA   VinIF  AIMESOFT  bee  Dagoras            

 

 zalo    VTCC  VCCorp

 

 

IOIT  HUS  USTH  UET    TLU  UIT  INT2  jaist  VIETLEX