Vietnamese Natural Language Generation

Important dates

  • Sep 10, 2020: Registration open

  • Sep 30, 2020: Registration closed

  • Oct 01, 2020: Sample input & output for the NLG task released

  • Dec 07, 2020: Test input release & test result submission after 1 hour

  • Dec 15, 2020: Technical report submission

  • Dec 18, 2020: Result announcement (workshop day)

NLG teams are encouraged to freely choose any technology, and are encouraged to explain the method or share the work, so that the VLSP community can study and reproduce similar results. 

The organizer will provide a sample input and a sample expected output. 

Sample input: a topic title in Vietnamese and a list of open access online documents related to the given topic (scientific papers, established online news, …) in English & Vietnamese. 

Sample output: encyclopedia content in Vietnamese summarizing the knowledge about the topic, in Wikipedia source code format, containing several sections: introduction section with several main summary points, followed by sections expanding the content for each summary point mentioned in the introduction, ending by reference section. Each sentence in the sample output is followed by an inline citation. 

Teams will develop systems that can take the given input and generate the given output. Then at the evaluation phase, new sample input (of a similar topic, and in similar format) is given for each team, and within 1 hour, each team must submit the output generated by their system. Output is evaluated by human referees, based on the guidelines in the Evaluation Metric below. The grade of output content is the average grade of all referees.

Submission Guidelines

Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) for the respective track will be used as the PRIMARY run.

Runs have to be submitted as a gzipped TAR archive and uploaded to an assigned folder. Participants will receive the folder URL after the registration.

 Evaluation Metric

Criteria group Details Barem Evaluation method
Well written Each sentence is understandable, clear, concise, correct spelling & grammar 2 points Total number of good sentences / total number of sentences * 2
Follow the structure: introduction section with several main summary points, followed by sections expanding the content for each sumary point mentioned in the introduction, ending by reference section. 2 points Total number of correct sections / total number of sections * 2
Verifiable Each sentence in the sample output is followed by an inline citation. If containing citations from sources outside of the list given in the input, these sources must be relevant, verifiable & reliable. 1 point Total number of good sentences / total number of sentences
Containing no copyright violations nor plagiarism. 1 point Total number of good sentences / total number of sentences 
Coverage Focusing on the topics, no irrelevant content, and no import point from sources missing 2 points Total number of good sentences / total number of sentences * 2
Summary style, no unnecessary detail 1 point Total number of good sentence / total number of sentence
Neutral & balance the weight of different viewpoint from the sources on the topic, no editorial bias 1 point Total number of good sentence / total number of sentence