VLSP 2021 - Vietnmamse Automatic Speech Recognition | Association for Vietnamese Language and Speech Processing

Important dates

Aug 5, 2021: Registration opens
Aug 30, 2021: Registration closes
Sept 06, 2021: Dataset building starts
Sept 24, 2021: Dataset building ends
Sept 30, 2021: Training data for the ASR-T1 released
Nov 8, 2021: Test set released for both ASR-T1 and ASR-T2
Nov 10, 2021: Test result submission
Nov 19, 2021: Technical report submission
Nov 26, 2021: Result announcement (workshop day)

General Description

VLSP2021 ASR will feature two evaluation tasks. Teams can participate in one of the tasks or both.

Task-01 (ASR-T1): Focusing on a full pipeline development of the ASR model from scratch. The organizer will provide two training datasets. The first dataset is around 280 hours of transcribed data in the general domain. Each participant has to label a part of the dataset before receiving it. The second dataset is around 400 hours of untranscribed in-domain data. All participants are required to use only this provided data to develop models including acoustic and language models. Any use of another resource for model development is not acceptable.

Task-02 (ASR-T2): Focusing on spontaneous speech in different real scenarios e.g., meeting conversation, lecture speech. For this task, the organization will not provide training data, participants can use all available data sources to develop their models without any limitation.

Training Data

For the ASR-T1 task:

Dataset	Size (hours)	With transcription	Domain
Transcribed data	280	Yes	General domain
Untranscribed data	400	No	In-domain

Evaluation Data

Two evaluation sets will be made available which are vlsp2021-asr-t1 and vlsp2021-asr-t2 for tasks ASR-T1 and ASR-T2, respectively.

Evaluation metric

The quality of the models will be evaluated by the Syllable Error Rate (SyER) metric.

SyER = (S+D+I)/N

where

S is the number of substitutions,
D is the number of deletions,
I is the number of insertions,
C is the number of correct syllables,
N is the number of syllables in the reference (N=S+D+C)

Submission Guidelines

ASR Run Submission Format

Submissions have to be made in UTF-8, lower-case and one line for each utterance.

utterance_name recognized_text_sequence

For example

0001.wav chào mừng các bạn đã tham dư cuộc thi
0002.wav tên tôi là nguyễn văn a

Output Conventions

Since there are cases that input speech can be interpreted in different ways, the below rules are applied to mitigate such an issue:

1. Numbers, dates etc. need to be transcribed in words as they are spoken, not in digits.

2. Common acronyms such as nato, fifa, are written as one word, without any special markers between the letters. This applies no matter whether they are spoken as one word or spelled out as a letter sequence. All other letter spelling sequences are written as individual letters with space in between.

3. For English words, names of people and places in other languages such as youtube, facebook, are written as it, not in Vietnamese pronunciation.

Association for Vietnamese Language and Speech Processing

Search

VLSP 2021 - Vietnmamse Automatic Speech Recognition

Share This Page

Sponsors and Partners