Mets Twitter Sentiment Tracking
Mets fans have a reputation for being pessimisstic and overly negative. Given the team's history these feelings aren't completely unfounded, but I wanted to quantify the negativity of the Mets fanbase at any given point in time. And there is no better source than #MetsTwitter to measure the emotions of the Mets fanbase. So in June 2022 I began measuring the real-time sentiment of #MetsTwitter. What follows is a little background about the project and a couple things I learned along the way.
A quick note, I started this project in June 2022 with the intention of working through bugs and have a process in place for the 2023 MLB season. Since then Twitter changed ownership and with the new ownership access to the API has been restricted making this project finanical infeasible, which is a real bummer.
Getting the Twitter Data
First things first, I needed data from Twitter. To get the data from Twitter I set up an AWS EC2 instance to run a Tweepy StreamingClient to get tweets from the stream. When new tweets are received they are written to an Elastic index. A separate process runs every minute to apply the NLP models and update the Elastic documents in batches. Initially I tried applying the NLP models as tweets came through the stream, but the latency from the processing caused the stream to fall behind. Separating the stream processing and the NLP processing into separate tasks sped up the overall tweet processing.
Creating the Models
Two models were trained, one for tweet sentiment and another to identify player names. To measure sentiment the Huggingface bertweet-base-sentiment-analysis model is used. Reviewing tweets coming from the stream the model works reasonably well out-of-the-box. Sports-related tweets can be dripping in sarcasm that the base model might not pickup. For example consider this tweet:
"Can’t wait to see Dom go 0-4 2Ks and 2 Ground ball outs one GIDP and a groundout to first"
The author is being sarcastic, but the base model classified this tweet as having a positive sentiment (probably because of the "Can't wait" piece). Finetuning the model using the baseball tweets would help adjust for this sarcasm when determining tweet sentiment.
I not only want to measure the sentiment of the fanbase overall, but also the sentiment towards specific players. To do that any player names needed to be extracted from the tweets, so the second model is a custom NER model trained using spaCy to identify player names and other common nicknames (looking at you Daniel Vogelbomb!) Common nicknames were then mapped to the player so that all tweets related to a player would be captured.
For each model I manually annotated tweets and built the datasets that were needed and then built a model training pipeline to have the ability to update the models periodically. Each day a sample of tweets were annotated to make sure the data in the model was sourced from a variety of different circumstances.
Once the data pipelines were built and the models were trained I could actually begin doing the analytics. Initially I started measuring the overall sentiment of Mets fans, which for the most part proved to be more positive than I had thought (at least at one point during the summer) During one game Gary called out #MetsTwitter as overflowing with negativity and as of the season through August 19, 2022 Gary's claims were unfounded:
The most active periods of #MetsTwitter was during games, so I started thinking about how monitoring real-time tweet activity could be used to enhance live events. For each game I tracked Mets fans reactions and sentiment trends. the overall sentiment of Mets fans, sentiment trend during the game, and players of the game according to #MetsTwitter, which were summarized in the form of a tweet after each game.
Get it back tomorrow with deGrom.— Mets Vibes (@MetsMood) August 31, 2022
Mets Twitter Game Summary 8/30/22:
🗣 Activity Rank: 16
Word of the Game: "gon na"
Players of the Game:
1. Mark Canha
2. Starling Marte
3. Brandon Nimmo#LGM pic.twitter.com/hpVLAERwJp
For a more interactive experience I began building a Streamlit app for users to explore trends for the season and during the active game. Full disclosure, the summer got away from me as more of my time was spent exploring new career opportunities, so I didn't make as much progress as I thought and was content setting this piece aside to finish building later when I had the time. Below is a link to an early version if you're interested in building a Streamlit app that interacts with an Elastic data store. I've deleted the data, so it's no longer interactive, but the code is still there for reference that I've found myself referencing for other projects now and then.
A bit of a bummer this project ended how it did, but that's the reality most of the time. My favorite part was seeing this project evolve and grow on its own. Each week there was a new idea to explore. I started this project to practice fine-tuning NLP models. By the end I was thinking about how activity of digital communities like #MetsTwitter could be leveraged to enhance the broadcast experience of live events. I still think there is an opportunity, just maybe not with Twitter given the new API limits.
In the end this project seems like a bit of a failure, but I learned a lot working through the various problems that were encountered and there's always value in doing something to learn!