VLSP 2025 Challenge on Vietnamese Spoofing-Aware Speaker Verification

Important dates

June 24, 2025: Registration open
July 10, 2025: Training dataset release
July 15, 2025: Public test release (Participants submit result through this link)
August 8, 2025: Private test release
August 12, 2025: System submission deadline
August 15, 2025: Private test results release
August 30, 2025: Technical report submission
September 27, 2025: Notification of acceptance
October 3, 2025: Camera-ready deadline
October 29-30, 2025: Conference dates

Task Description

The Vietnamese Spoofing-Aware Speaker Verification Challenge (VSASV 2025) aims to accelerate research in speaker verification (SV) and spoof detection for Vietnamese, a language currently limited by data resources. This challenge features two tracks:

Task 1: Spoofing-Aware Speaker Verification

Participants will develop SV systems using the dataset provided by the Organisers. Each trial will present an enrollment audio and a test audio, where the test can either be bona fide or spoofed. Systems must verify if the enrollment and test audio represent the same speaker identity while being resilient against spoofing attacks, i.e. if the test audio is spoofed, the SV system must reject the trial.

Task 2: Vietnamese Spoof Detection

Participants will build systems to classify a single audio sample as either bona fide or spoofed, without speaker identity information.

Both tasks are evaluated primarily using the Equal Error Rate (EER). To simulate real-world conditions, negative trials may include synthetic samples generated from the target speaker’s voice and recordings captured across diverse devices, introducing both spoofing and channel variability. Competitors are encouraged to design robust solutions capable of handling both speaker imposture and advanced spoofing techniques.

The test set may include partially spoofed audio, which is bona fide audio originally, but has some small segments replaced with synthetic speech.

Evaluation

The performance of the models will be evaluated by the Equal Error Rate (EER), that is the point where the False Acceptance Rate (FAR) equals the False Rejection Rate (FRR). The matric for evaluation can be found at this link.

Training and Test Data

Participants will receive a Vietnamese speech dataset from multiple speakers. The challenge emphasizes generalization ability: the test set will include speakers that were unseen during training, requiring models to effectively adapt to new voices.

Baseline model

This task use ECAPA + XLSR-Conformer as the baseline model. This baseline model achieve 33.88% SASV EER.

Submission

Multiple submissions are allowed but under a limitation of each phase, the evaluation result is based on the submission having the lowest EER.

The submission file comprises a header, a set of testing pairs, and a cosine similarity output by the system for the pair. The order of the pairs in the submission file must follow the same order as the pair list. A single line must contain 3 fields separated by tab characters in the following format:

enrollment_wav<TAB>test_wav<TAB>score<NEWLINE>

**where

enrollment_wav - The enrollment utterance

test_wav - The test utterance

score - The cosine similarity

For example:

enrollment_wav test_wav score

file1.wav file2.wav 0.81285

file1.wav file3.wav 0.01029

...

Basic Regulations

The use of pre-trained models is allowed only if the checkpoints are publicly available. These pre-trained models include models originally trained for other tasks such as speech recognition, text-to-speech synthesis, speech enhancement, or voice activity detection.
Competitors are allowed to utilize non-speech resources (e.g., noise recordings, impulse responses) for data augmentation purposes, but they must clearly declare and share these resources with all other teams.
The competition includes both a public and a private test set, with the final rankings determined based on performance on the private test set. Teams WILL be asked to submit their source code for verification and reproduction of the final outcomes, and a technical report in the form of a formal 5-page conference paper.
A team will be considered valid for the prize if and only if it satisfies all the conditions below:
- The code is submitted to the Organizer.
- Results are reproduced by the Organizer.
- The report is submitted and conforms to the format of a formal conference paper.
Participating teams are allowed to generate additional spoofed data which can be applied using signal manipulation (e.g., Additive Noise, Speed Perturbation, Pitch Shifting,...) or other models, but (1) only from the data provided by the Organiser, and (2) all methods for generating additional data must be submitted to the organizers, along with the generation model and detailed reproduction instructions.
The use of any external speaker or speech datasets (speaker recognition, speech recognition, voice cloning, emotion datasets, etc.) is strictly prohibited.

Registration

Participants can register through the link.

Organizers

Nguyễn Thị Thu Trang
Phương Tuấn Đạt

Hosted by: Hanoi University of Science and Technology
Contact at trangntt@soict.hust.edu.vn

CLB Xử lí ngôn ngữ và tiếng nói tiếng Việt

Important dates

Task Description

Evaluation

Training and Test Data

Baseline model

Submission

Basic Regulations

Registration

Organizers

Sponsors and Partners

CLB Xử lí ngôn ngữ và tiếng nói tiếng Việt

Tìm kiếm

VLSP 2025 Challenge on Vietnamese Spoofing-Aware Speaker Verification

Important dates

Task Description

Evaluation

Training and Test Data

Baseline model

Submission

Basic Regulations

Registration

Organizers

Share This Page

Sponsors and Partners