VLSP 2025 Speech Quality Assessment

Important dates

June 23, 2025: Registration open

July 1, 2025: Training data, public test release

August 14, 2025: Private test release

August 14, 2025: System submission deadline

August 14, 2025: Private test results release

August 30, 2025: Technical report submission

September 27, 2025: Notification of acceptance

October 3, 2025: Camera-ready deadline

October 29-30, 2025: Conference dates

General Description

With the advancement of information and communication technology, connecting with others via the Internet and telecommunication systems has become effortless. However, speech transmitted over these networks often degrades, diminishing its original quality and potentially leading to annoyance or misunderstandings. Consequently, Speech Quality Assessment (SQA) is crucial for evaluating the performance of communication systems, drawing interest from telephone companies and Internet service providers. In this task, participants will work with a Vietnamese dataset, where each degraded speech sample is assigned a quality score from 1 to 5. The objective is to develop a model that can predict the channel quality scores for given speech samples.

This year, in addition to the data provided by the organizers, teams can utilize external resources like pretrained models and open datasets. Before the competition starts, teams are encouraged to propose external resources. The organizers will review these suggestions and select resources based on criteria such as accuracy, popularity, and size to ensure fairness. During the competition, teams are only permitted to use the resources approved by the organizers.

Dataset

In this competition, teams will receive speech recordings captured over a mobile network, along with quality scores ranging from 1 to 5. Initially, we recorded the original speech using high-quality equipment. Using Nemo Handy software [1], we made phone calls between two mobile phones. The calling phone played back the recorded speech, while the receiving phone stored the transmitted audio. By comparing the differences, Nemo Handy provided POLQA [2] quality scores for the channel. The speech is stored in .wav format with an 8kHz sampling rate, and each sample includes a channel score.

Evaluation Metrics

In this task, two popular metrics are Pearson Correlation Coefficient (PCC) and Mean Square Error (MSE):

PCC: is a correlation coefficient that measures the linear correlation between two sets of data.
MSE: measures the average of the squares of the errors.

The higher PCC and the lower MSE indicate the better model. Therefore, the overall evaluation metric is calculated as (higher is better):

Final_Score = 0.7 * PCC - 0.3 * MSE

Submission Format

utterance_name<TAB>SQA_score

Note: utterance_name does not contain the file extension (e.g. ~~.wav~~)

For example

0001 4.301

0002 2.975

0003 3.002

Contact

Zalo Group: https://zalo.me/g/jrpmsi296

Registration

https://forms.gle/F9tQjCBUBpM52Zcr9

Organizers

Tạ Bảo Thắng, Hanoi University of Science and Technology, tabaothang97@gmail.com
Lê Minh Tú, WorldQuant, minhtutx@gmail.com
Lê Quang Trung - Torilab - trungle.bka@gmail.com
Đỗ Văn Hải, Thuyloi University, haidv@tlu.edu.vn

References

https://www.keysight.com/us/en/assets/7018-05575/flyers/5992-2050.pdf
Beerends, John G., Christian Schmidmer, Jens Berger, Matthias Obermann, Raphael Ullmann, Joachim Pomy, and Michael Keyhl. "Perceptual objective listening quality assessment (polqa), the third generation itu-t standard for end-to-end speech quality measurement part i—temporal alignment." Journal of the audio engineering society 61, no. 6 (2013): 366-384.
G. Mittag, B. Naderi, A. Chehadi, and S. Moller, “NISQA: A deep cnn-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” In Proc. Interspeech 2021, pp. 2127–2131, 2021
Le, Minh Tu, Bao Thang Ta, Phi Le Nguyen, Van Hai Do. "A Gaussian Distribution Labeling Method for Speech Quality Assessment." International Conference on Computational Data and Social Networks. Singapore: Springer Nature Singapore, 2023.
Bao Thang Ta, Minh Tu Le, Van Hai Do, and Huynh Thi Thanh Binh. "Enhancing No-Reference Speech Quality Assessment with Pairwise, Triplet Ranking Losses, and ASR Pretraining." In Proc. Interspeech 2024, pp. 2700-2704. 2024.

Association for Vietnamese Language and Speech Processing

Search

Share This Page

Sponsors and Partners