Catching Multimodal Metaphors? New Twitter Corpus Available for Academics
Twitter announced last Tuesday that it is opening up an archive which can be used for academic research. This could promise to open up research innovations on public discourse, social media, political communication, and more.
In work with Luciane Corrêa Ferreira, we recently compared twitter data to the type of data we usually collect in laboratory interview experiments and to the data I had found from a large corpus of American English Coca. We found that there was something to be gained from Twitter data. It allowed us to collect informal, multimodal, instant metaphor creations. Hashtags can make for great search tools.
I haven't used the archived corpus myself but remain hopeful that it may included the emoticons, gifs, photos, and other media elements that help make Twitter special.
Another benefit of Twitter is that people tweeting are often in the midst of the activity they are participating in. In the article with Ferreira, we found people describing their current bodily states or sharing photos from the time they created the tweet (for example, when they were out in nature or even during spiritual rituals). In cognitive psychology, even a few minutes after the event concludes, we know that many times people don't fully remember the emotions they were experiencing. Instead, scientists tend to prefer "online" data or data collected right when participants are having the experiences. Twitter seems to provide such online, situated accounts which result in finding different kinds of metaphors from those can be typically be found in a regular corpus. It also has its advantages over a laboratory interview where you can only ask people to try to remember or try to simulate the experiences. Each method has its strengths and benefits.
Twitter has already been involved in several neat research projects. Here are few others to check out:
- A metaphor Twitterbot @MetaphorMagnet which aims to "temper the uncanny with aptness" created by Tony Veale, well known for his work in metaphor, AI, and computational creativity.
- a project that does multilingual emoticon sentiment analysis by Nurendra Choudhary, Rajat Singh, Vijjini Anvesh Rao, Manish Shrivastava
- Tagged corpus specializing in political and non-political language. This is only for 2009, but may serve as a good comparison group to more recent political tweet behaviors.
- a Covid Twitter corpus developed by a team of researchers and including sentiment analysis
Ethics can be a concern for Twitter data so be sure to get an IRB/ethics review plan and anonymize data if the tweets aren't from public personalities. This corpus is only for academics with affiliations. And this dataset does not include any banned data, for example, from ex-POTUS Trump who was banned after the incidents on January 6th.
Are you familiar with any neat twitter projects? Tell me about them in the comments
PS: Sentiment Analysis
For those curious, sentiment analysis is the automatic analysis of text (now extended to cover visual or multimodal data) by AI to use text to find affective cues. For example, a simple sentiment analysis may categorize according to positive or negative content. AKA opinion mining or emotional AI.
More sophisticated systems can categorize along more specific emotions. Check out this analysis of Zuckerberg's testimony in front of congress which analyzes a number of emotions evoked in the speech using "lexical emotional content".