The Identification of Invalid Information about the COVID-19 Coronavirus Pandemic on a Social Networking Platform
Abstract
The outbreak of COVID-19 caused a parallel contagion which affected the sphere of information called infodemic. Social media as a popular communication channel, enhanced the phenomenon of misinformation causing multidimensional effects both in societal and individual level. Twitter as a web forum, host various types of false content that either deliberately or unintentionally were posted from experts, politicians or civilians. This democratized environment may offer the opportunity of opinion exchange but can maximize the consequences of misinformation. Conspiracy theories, false therapies and dystopian future prediction monopolized Twitters daily activity highlighting the need of a supervisory mechanism which would eliminate such content. In this paper, Machine learning techniques are implemented in order to detect fake COVID-19 related content. For this purpose, algorithms of Natural Language Processing (NLP) are utilized. The data used to train the algorithms are derived from a publicly accessible dataset that contains tweets related to the current pandemic and were published in Greek language. These tweets were classified and annotated in three categories, true, irrelevant, or false. Once a sufficient number of data has been annotated, the most common words are visualized through word clouds for each category. In addition, a set of linguistic and morphological features were extracted from them by applying methods of converting texts into vectors, as well as features related to the subjectivity of the tweets’ texts.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The copyright of individual articles are with the author(s) and CEPOL. Reproduction without alterations is authorised for non-commercial purposes, provided the source is acknowledged.