Measuring Emotion – Exploring the Feasibility of Automatically Classifying Emotional Text

Date: 10 October 2014

Venue: Willem Elsschotzaal, Hof van Liere - Prinsstraat 13 - 2000 Antwerpen

Time: 3:00 PM

Organization / co-organization: Department of Linguistics

PhD candidate: Frederik Vaassen

Principal investigator: Walter Daelemans

Short description: PhD Defense Frederik Vaassen, Department of Linguistics

Abstract: This thesis explores methods for the automatic supervised categorization of text according to the emotions of its author. Can we train a computer to recognize the emotion the author of a text felt when writing it, simply based on the information in said text?



Measuring Emotion – Exploring the Feasibility of Automatically Classifying Emotional Text

This thesis explores methods for the automatic supervised categorization of text according to the emotions of its author.  Can we train a computer to recognize the emotion the author of a text felt when writing it, simply based on the information in said text?

To explore the possibilities and limitations of automatic emotion classification, we develop two very different case studies in detail.

In a first case study, we attempt to automatically classify sentences from business conversations according to a dimensional emotion model called Leary's Rose, or the Interpersonal Circumplex. We discretize this model into eight emotion categories, and attempt to find the best features to capture the problem using a Support Vector Machines-based classification system. We determine that character n-grams are good at capturing relevant information, and that they are well supplemented with carefully selected emotion keywords. In theory, a speaker's previous emotional state makes for an excellent predictor of their current position on the Circumplex, but in practice, we find that performance is too low for contextual class information to be useful. We compare the classifiers' performance to the human performance on the dataset, and conclude that the ambiguity is also present in the gold standard data, as even human agreement is very low.

The second case study concerns the automatic identification of emotions in suicide notes. We again classify on a sentence by sentence basis, and assign each sentence one or more of 15 possible emotion labels. We again conclude that character n-grams capture relevant features well. The results of the case study also show very clearly that the number of training instances for a label has a very large influence on the classifier's performance on that emotion. This is hardly a surprising finding, but given the limited size of existing emotion datasets, sparse classes are a problem that needs to be tackled.

Having learned from the case studies that low-agreement training data and data sparseness are significant hurdles for emotion classification, we try to find solutions to work around these problems, or to at least mitigate their negative impact. While it seems unrealistic to build a large, high-agreement corpus for every new emotion classification task, it might be possible to build a single large, reliable corpus, which can then be used in a variety of other tasks. Through a variety of cross-domain experiments, we come to validate this hypothesis.

We conclude our exploration of automatic emotion classification with some best practices: it is usually a good idea to have a larger dataset with higher-confidence emotion annotations, even if this training dataset and the target data do not belong to the same domain. When supplemented with a part of the target data, it will serve as a good basis for a cross-domain emotion classifier. In terms of features, we have found that character n-grams capture a lot of relevant information, and that it is a good idea to supplement these with a limited set of carefully selected emotion keywords (from an emotion lexicon, for instance). Dealing with any remaining noise and data skewedness is best left to the classifier, as manipulating the instance space prior to training will likely result in performance loss.

For the future, we hope to the see available emotion datasets grow both in size and in quality, but we believe text-based emotion classification can only go so far. This is why we hope to see more multi-modal approaches to emotion detection, approaches that do not abstract away from the treasure of emotional information that is to be found in intonation, facial expressions, body language, and other non-textual sources.