VLSP 2021 - Vietnamese and English-Vietnamese Textual Entailment

Introduction

Natural language inference (NLI) is the task of determining whether a “hypothesis” is true (entailment), false (contradiction), or undetermined (neutral) given a “premise”.

This challenge aims to determine, for a given pair of sentences, if the two sentences semantically agree, disagree, or are neutral/irrelevant to each other. Here, the sentences are in English or Vietnamese and may not be in the same language. This task is important in identifying, from large online information sources, the evidence that supports or refuses a statement. The identification of such evidence is subsequently useful for many information tracking applications, such as opinion mining, brand and reputation management, and particularly fighting against fake news. Through this challenge, we would like to provide an opportunity for participants who are interested in the problem, to contribute their knowledge to improve the existing techniques and methods for the task, so as to enhance those applications’ effectiveness.

Important dates

  • October 1, 2021: Training dataset available

  • October 1 - November 8, 2021: Challenge time

  • November 9, 2021: Testing dataset available

  • November 10, 2021: Submit results on testing dataset

  • November 15, 2021: Result notification

  • November 30, 2021: Technical report submission

  • December 10, 2021: Notification of acceptance

  • December 15, 2021: Camera-ready due

  • December 18, 2021: VLSP 2021 Workshop

Data Format

Each instance includes 6 main attributes as follows:

  • id: unique id for the sentence pair
  • lang_1: language of the first sentence, either ‘vi’ or ‘en’ for Vietnamese or English respectively
  • lang_2: language of the second sentence, either ‘vi’ or ‘en’ for Vietnamese or English respectively
  • sentence_1: the first sentence
  • sentence_2: the second sentence
  • label: a manually annotated label which marks the entailment relationship of the two sentences
    • agree: If the two sentences semantically agree with each other
    • disagree: If the two sentences semantically disagree with each other
    • neutral: If the two sentences semantically neutral or irrelevant to each other

Evaluation methods

Evaluation data: The test is a JSON file, which contains a list of instances. Each instance includes 3 attributes:

  • id: unique id for the TEST sentence pair
  • sentence_1:  the first sentence
  • sentence_2:  the second sentence

Note, the language of the 2 sentences is not the same, so there are 2 type of this sentence pair:

  • vi - vi
  • en - vi

Result submission: the result submission is a JSON file, which contains a list of instances. Each instance includes 2 attributes:

  • id: unique id for the TEST sentence pair
  • label: a prediction label
    • agree
    • disagree
    • neutral

The performance of NLI systems will be evaluated by the F1 score (for each label type):

                   F1 = 2 * P * R/(P + R)

in which P (Precision), and R (Recall) are determined as follows:

                   P = SP-true/SP-sys

                   R = SP-true/SP-ref

where:

  • SP-ref: The number of SPs (sentence pair)  in gold data
  • SP-sys: The number of SPs in recognizing system
  • SP-true: The number of SPs which is correctly predicted by the system