VLSP 2021 - Vietnamese Machine Reading Comprehension

Task Description

Machine Reading Comprehension (MRC) has lately emerged as an area in computational linguistics (CL) in which automatic systems are developed to find correct answers to questions posed in human language, given documents containing the answers. The task of Vietnamese Machine Reading Comprehension is the extraction-based machine reading comprehension on Vietnamese Wikipedia-based texts. Based on SQuAD [1, 2], we developed Vietnamese Question Answering Dataset (UIT-ViQuAD), which is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia Vietnamese articles, where the answer to every question is a span of text, from the corresponding reading passage, or the question might be unanswerable.

UIT-ViQuAD2.0 combines the 23K questions in UIT-ViQuAD1.0 [3] with over 12K unanswerable questions written adversarially by crowd-workers to look similar to answerable ones. To do well on UIT-ViQuAD2.0, MRC systems must not only answer questions when possible but also determine when no answer is supported by the context and abstain from answering. In this task, participating teams use UIT-ViQuAD2.0 to evaluate machine reading comprehension models.

UIT-ViQuAD1.0, the previous version of the UIT-ViQuAD dataset [3], contains 23K+ question-answer pairs on 170+ articles.

Dataset Information

We provide UIT-ViQuAD2.0 consisting of over 35K questions to participating teams. The dataset is stored in .json format. Here are a few question examples extracted from the dataset.

Context: Khác với nhiều ngôn ngữ Ấn-Âu khác, tiếng Anh đã gần như loại bỏ hệ thống biến tố dựa trên cách để thay bằng cấu trúc phân tích. Đại từ nhân xưng duy trì hệ thống cách hoàn chỉnh hơn những lớp từ khác. Tiếng Anh có bảy lớp từ chính: động từ, danh từ, tính từ, trạng từ, hạn định từ (tức mạo từ), giới từ, và liên từ. Có thể tách đại từ khỏi danh từ, và thêm vào thán từ.

question: Tiếng Anh có bao nhiêu loại từ?
is_impossible: False. // There exists an answer to the question.
answer: bảy.
question: Ngôn ngữ Ấn-Âu có bao nhiêu loại từ?
is_impossible: True. // There are no correct answers extracted from the Context.
plausible_answer: bảy. // A plausible but incorrect answer extracted from the Context has the same type which the question aims to.

Note: All data should be transferred to participating teams via email.

Evaluation Metrics

Following the evaluation metrics on SQuAD2.0 [2], we use EM and F1-score as evaluation metrics for Vietnamese machine reading comprehension:

  •  Exact Match (EM): For each question-answer pair, if the characters of the MRC system's predicted answer exactly match the characters of (one of) the gold standard answer(s), EM = 1, otherwise EM = 0. EM is a stringent all-or-nothing metric, with a score of 0 for being off by a single character. When evaluating against a negative question, if the system predicts any textual span as an answer, it automatically obtains a zero score for that question.

 

  • F1-score: F1-score is a popular metric for natural language processing and is also used in machine reading comprehension. F1-score estimated over the individual tokens in the predicted answer against those in the gold standard answers. The F1-score is based on the number of matched tokens between the predicted and gold standard answers.

    Precision=(the number of matched tokens)/(the total number of tokens in the predicted answer)

    Recall=(the number of matched tokens)/(the total number of tokens in the gold standard answer)

    F1-score=(2*Precision*Recall)/(Precision+Recall)

The final ranking is evaluated on the test set, according to the F1-score (EM as a secondary metric when there is a tie).

Baseline System

We provide a simple baseline system that is based on mBERT (Vietnamese, Base).

Submission System

All phases of the competition on the system: https://aihub.vn/competitions/35

Important Dates

  • August 5 - September 30, 2021: Shared-task registration (here).
  • October 1, 2021: Trial Data.
  • October 5, 2021: Public Test.
  • October 25: Private Test.
  • October 27: Competition End.
  • November 10: Paper submission due. The top 3 teams are required to submit a paper to VLSP 2021 to get their achievement acknowledged. If any top teams did not submit their papers, follow-up teams can submit and take their places. Accepted papers will be published in the VLSP 2021 proceedings on ACL Anthology.
  • November 15, 2021: Notification of acceptance.
  • November 20, 2021: Camera-ready due.
  • November 26, 2021: Presentation at VLSP 2021 Shared Task.

Contact Us

Please feel free to contact us if you need any further information: kietnv@uit.edu.vn and sonlt@uit.edu.vn.

References

[1] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. "SQuAD: 100,000+ Questions for Machine Comprehension of Text." Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.

[2] Pranav Rajpurkar, Robin Jia, and Percy Liang. "Know What You Don’t Know: Unanswerable Questions for SQuAD." Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2018.

[3] Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen. "A Vietnamese Dataset for Evaluating Machine Reading Comprehension." Proceedings of the 28th International Conference on Computational Linguistics. 2020.