Comparative sentiment analysis (SA) of original Twitter data posted in English about the 10th anniversary of the 2010 Haiti Earthquake
This dataset contains the sentiment analysis (SA) of original tweets posted in English by users related to the 10th anniversary of the 2010 Haitian earthquake. Tweets are classified according to their polarity or not related. This classification includes supervised and unsupervised classification. This dataset compares the accuracy (ACC) of three tools for unsupervised text classification: a no-code machine learning (ML) classification platform: ‘MonkeyLearn’ and two trained models finetuned for SA: ‘troberta’ and ‘btweet’. These last ones are language models based on RoBERTa (https://aclanthology.org/2020.findings-emnlp.148/) and BERTweet (https://aclanthology.org/2020.emnlp-demos.2/) architecture, respectively. Both models are available in the platform: Hugging Face. The first author performed the supervised classification and trained the tweets on the MonkeyLearn platform at the tweet level using samples of 1, 5 and 10 per cent of the tweets in the dataset (excluded to test ACC in the prediction). This supervised classification is compared to the unsupervised classification performed by ‘MonkeyLearn’, ‘troberta’ and ‘btweet’. We can observe that the average confidence in the classification increase with the number of trained tweets in the case of ‘MonkeyLearn’ (0.39, 0.56 and 0.64) while the average confidence in their own classification by troberta (0.89) and btweet (0.92) is very high and higher than MonkeyLearn’s average confidence.
Funding
Learning from Earthquakes: Building Resilient Communities Through Earthquake Reconnaissance, Response and Recovery
Engineering and Physical Sciences Research Council
Find out more...History
Usage metrics
Categories
- Architectural heritage and conservation
- Building science, technologies and systems
- Community planning
- Other earth sciences not elsewhere classified
- Geology not elsewhere classified
- Geophysics not elsewhere classified
- Civil geotechnical engineering
- Civil engineering not elsewhere classified
- Natural language processing