The VLSP 2022 evaluation campaign deals with 7 shared-tasks for text and speech processing.
Text processing
- Vietnamese Constituency Parsing: Assigning a constituency structure to a Vietnamese sentence (datasets released by the Association for VLSP)
- Machine Translation: Chinese - Vietnamese and/or Vietnamese - Chinese machine translation (datasets released by VNU-UET)
- Multilingual Visual Question Answering: EVJVQA Challenge - multilingual English-Vietnamese-Japanese Visual Question Answering (corpus released by VNUHCM-UIT)
- Vietnamese Abstractive multi-document summarization : AbMusu Challenge - generate abstractive summary of multiple input documents.
Speech processing
- Automatic Speech Recognition for Vietnamese: Automatic speech recognition for conversational speech (datasets released by Thuyloi University)
- Vietnamese Text-To-Speech: Emotional speech synthesis (datasets released by HUST)
- Speaker Verification: Multilingual SV challenge joint with O-COCOSDA 2022.
VLSP shared-tasks aim at promoting the most efficient methods for these important tools. The organization of these campaigns with sponsorships from academia and industry permit to build and offer to the VLSP community gold datasets for training and testing Vietnamese text and speech processing systems.
Participants of all speech shared tasks this year have to contribute or join to build the dataset before receiving it. The main task is to transcribe or to correct the transcription or to verify the same identity for a small part of the dataset.
The participants to the evaluation campaign will be asked to present their system in a dedicated paper.