Detail annotation guidelines (in Vietnamese)
1. Introduction
With the development of technology and the Internet, different types of social media such as social networks and forums have allowed people to not only share information but also to express their opinions and attitudes on products, services and other social issues. The Internet becomes a very valuable and important source of information. People nowadays use it as a reference to guide their decisions when buying a product or using a service. Moreover, this kind of information also lets manufacturers and service providers receive feedback about limitations of their products and therefore improve them to better meet their customer’s needs. Furthermore, it can also help authorities know the attitudes and opinions of their residents on social events so that they can make appropriate adjustments.
Since the early 2000s, opinion mining and sentiment analysis have become a new and active research topic in Natural language processing and Data mining. Topics in this field include:
- Subjective classification: this is the task of detecting whether a document contains personal opinions or not (only provides facts).
- Polarity classification (Sentiment classification): classify the opinion expressed in a document into one of three types, which are “positive”, “negative” and “neutral”.
- Spam detection: detect fake reviews and reviewers.
- Rating: reflect the personal opinion expressed in a document as a rating from 1 star to 5 star (very negative to very positive).
Recently, Aspect Based Sentiment Analysis (ABSA), which is the task of mining and summarizing opinions from text about specific entities and their aspects, has been attracting more research. ABSA for English and other languages (but not Vietnamese) was introduced as a SemEval task in 2014 (SE-ABSA14), 2015 (SE-ABSA15), and 2016 (SE-ABSA16). These ABSA tasks provide benchmark datasets of reviews and evaluation frameworks in which the datasets were annotated with opinion target expressions and sentiment polarities.
The first related campaign for Vietnamese language sentiment analysis was organized in VLSP 2016, which only focused on polarity classification. The dataset consisted of short reviews annotated with one of the three labels: “positive”, “negative”, and “neutral”.
In this task for Vietnamese, we address the problem ABSA in which we are given a review and the task is how to determine aspects assigned with the corresponding sentiment polarities. We call this task VABSA (i.e. Vietnamese Aspect Based Sentiment Analysis).
2. Task Description
This task if similar to the SemEval 2016 ABSA subtask 2. Given a set of customer reviews about a target entity (e.g. a hotel or a restaurant), the goal is to identify a set of {aspect, polarity} tuples that summarize the opinions expressed in each review.
The aspect is identified by the tuple (entity, attribute). The polarity includes the classes positive, negative, neutral.
3. Data
In this campaign, we deal with data in two domains: restaurant and hotel.
3.1 Data for domain “restaurant”
A) Opinion polarity labels
- positive
- negative
- neutral
B) Aspect labels are made up of entity-attribute pairs
Entities:
- RESTAURANT (in general)
- AMBIENCE
- LOCATION
- FOOD
- DRINKS
- SERVICE
Attributes:
- GENERAL
- QUALITY
- PRICE
- STYLE_OPTIONS
- MISCELLANEOUS
The possible combinations between these entities and attributes are given in the following table.
Some examples of annotation:
#1 180K/suất không phải là cái giá rẻ đối với người Sài Gòn. Nhưng nếu so với giá Chả cá Anh Vũ - Giảng Võ và vị trí ngay khu TT Q3 thì cũng hợp lý. Phải nói là chả có ngon nhất SG mình từng ăn tới lúc này. Bài trí và hương vị khá giống nguyên bản ngoài Hà Nội. Từ chanh ớt, vị mắm tôm cho tới cái chảo rán cá. Duy chỉ có mùi ngũ vị hương hơi đậm là mình ko khoái so với ngoài kia. Creme Brulee khá ngon và ít ngọt. Nói chung sẽ quay lại nhiều. {RESTAURANT#PRICE, neutral}, {FOOD#QUALITY, postive}, {FOOD#STYLE_OPTIONS, positive}, {RESTAURANT#GENERAL, positive} #2 Hôm nay mình với 3 bạn nữa tới ăn. Quán thấy khá đông, vị trí ngay trung tâm, không gian bình thường. Mấy bạn nhân viên thấy khá lúng túng trong việc phục vụ, phải nhắc 2 3 lần mới được. Ăn thì thấy cũng được thôi, có lẽ do mình không thích món Bắc lắm. Quán thì chỉ có đúng 1 món thôi, mình kêu 4 phần ra được 2 chảo cá không nhiều lắm. Bánh trứng thì ngon. 4 bạn ăn gần hết 900k. Giá quá mắc cho món này. Do mọi người khen nhiều nên ăn 1 lần cho biết thôi. {FOOD#PRICE, negative}, {SERVICE#GENERAL, negative}, {AMBIENCE#GENERAL, neutral}, {LOCATION#GENERAL, positive}, {FOOD#QUALITY, positive}, {RESTAURANT#GENERAL, negative}
3.2 Data for domain “hotel”
A) Opinion polarity labels
- positive
- negative
- neutral
B) Aspect labels are made up of entity-attribute pairs
Entities:
- HOTEL (in general)
- ROOMS
- ROOM_AMENITIES
- FACILITIES
- SERVICE
- LOCATION
- FOOD&DRINKS
Attributes:
- GENERAL
- PRICES
- DESIGN&FEATURES
- CLEANLINESS
- COMFORT
- QUALITY
- STYLE&OPTIONS
- MISCELLANEOUS
The combination between these entities and attributes are given in the following table.
Some examples of annotation: #1 Khách sạn giá rẻ, gần biển.Nhân viên lễ tân thiếu lịch sự với khách, thái độ khó chịu. {HOTEL#PRICES, positive}, {LOCATION#GENERAL, positive}, {SERVICE#GENERAL, negative} #2 Phòng ốc rộng rãi và thoáng. Nhân viên phục vụ rất tận tình, chu đáo. Nói chung là ok. Nếu đưa ra khuyết điểm thì có lẽ là lối đi vào buổi tối hơi khó vì gần parking. {ROOMS#COMFORT, positive}, {SERVICE#GENERAL, positive}, {HOTEL#GENERAL, positive}, {HOTEL#MISCELLANEOUS, negative}
4. Evaluation
Evaluation data: The test set is composed of two text files, one for hotel domain and the other for restaurant domain. These files are similar to training and development files, except for annotation removal. Each team should submit the result in the same format with the training and development data.
Result submission: Each team can submit one or several system results in separated folders with numbered folder's name to precise the priority of their results.
The performance of participating systems will be evaluated in two phases.
4.1. Phase A: Aspect (Entity-Attribute)
The F1 score will be calculated for aspects only.
Let A be the set of predicted aspects (entity-attribute pairs), and B be the set of annotated aspects, the precision, recall, and the F1 score can be computed as follows:
Precision=|A∩B|/|A|
Recall=|A∩B|/|B|
F1=2*Precision*Recall/(Precision+Recall)
4.2. Phase B: Full (Aspect-Polarity)
The F1 score will be calculated for both aspects and sentiment polarities.
Let A be the set of predicted tuples (entity-attribute-polarity), and B be the set of annotated tuples, the precision, recall, and the F1 score can be computed in a similar way as in Phase A.
5. References
http://alt.qcri.org/semeval2016/task5/
... to be updated ...