Contreras et al_2022_Haiti_polarity_predictions.xlsx (697.26 kB)
Download file

Sentiment analysis (SA) (supervised and unsupervised classification) of original Twitter data posted in English about the 10th anniversary of the 2010 Haiti Earthquake

Download (697.26 kB)
posted on 06.05.2022, 14:35 by Diana Contreras MojicaDiana Contreras Mojica, Dimosthenis Antypas, Jose Camacho-Collados, Sean Wilkinson


This dataset contains the sentiment analysis (SA) of original tweets posted in English by users related to the 10th anniversary of the 2010 Haitian earthquake. Tweets are classified according to their polarity or not related. This classification includes supervised and unsupervised classification. This dataset compares the accuracy (ACC) of three tools for unsupervised text classification: a no-code machine learning (ML) classification platform: ‘MonkeyLearn’ and two trained models finetuned for SA: ‘troberta’ and ‘btweet’. These last ones are language models based on RoBERTa ( and BERTweet ( architecture, respectively. Both models are available in the platform: Hugging Face. The first author performed the supervised classification and trained the tweets on the MonkeyLearn platform at the tweet level using samples of 1, 5 and 10 per cent of the tweets in the dataset (excluded to test ACC in the prediction). This supervised classification is compared to the unsupervised classification performed by ‘MonkeyLearn’, ‘troberta’ and ‘btweet’. We can observe that the average confidence in the classification increase with the number of trained tweets in the case of ‘MonkeyLearn’ (0.39, 0.56 and 0.64) while the average confidence in their own classification by troberta (0.89) and btweet (0.92) is very high and higher than MonkeyLearn’s average confidence.


Learning from Earthquakes: Building Resilient Communities Through Earthquake Reconnaissance, Response and Recovery

Engineering and Physical Sciences Research Council

Find out more...