NFTs Sentiment Analysis (2nd position Solution)
This hackathon was hosted and organized by Dphi and Bitscrunch.
My solution came 2nd amongst other competitors for solving the business use case provided and judged by the bitscrunch team.
Attached Above is the congratulatory message for my Win in the Challenge.
Abstract
A non-fungible token is a non-interchangeable unit of data stored on a blockchain, a form of digital ledger, that can be sold and traded. Types of NFT data units may be associated with digital files such as photos, videos, and audio. In this project, we used data collected with various NFTs tags on Twitter to identify the trend of NFTs using natural language processing (Sentiment Analysis) and machine learning. we are able to identify the trend data using.
Introduction
This rising cryptocurrency niche recorded over $23 billion in trading volumes as per the latest DappRadar report. Currently, NFT-related active wallets account for close to 50% of the total crypto industry usage, a statistic that will likely increase given the continued interest in 2022.
Before jumping into the developments and prospects, it is worth understanding why NFTs are gaining traction across the board. Well, there are many factors behind the sudden surge but the most significant one is the indistinguishable nature of NFTs. Each NFT token has a unique value, making them a suitable on-chain asset to represent digital collectibles such as in-game items or off-chain assets like property and tokenized stocks.
Problem Statement
We are currently living in a world, where there is a massive explosion of digital assets — hundreds of blockchains, thousands of metaverses, tens of thousands of NFT collections, and millions of NFTs. Also, this number is growing rapidly day by day. So there is a dire need to identify the new and trending NFT collections across different blockchains to keep up with the latest happenings. Social media plays a crucial role in today’s NFT world. Collectors flaunt their NFT arts on social media platforms which become viral soon. So the aim of this challenge is to identify those collections early using these social media signals.
Hackathon Objective:
Identify the trending NFT collections on Twitter using Twitter data on a daily basis and analyze their sentiments.
Methodology
- Data collection and gathering.
- Sentiment analysis.
- Generating statistical and time data features
- Exploratory Data Analysis.
- Named Entity Recognition
- Data Preprocessing
- Word Cloud on each Sentiment
- Text Feature extraction
- Machine learning model
- Model Explainability.
Data Collection and Gathering
Most texts come as unstructured data. The data was collected from Twitter Developer API using Twint at 14 days intervals for the challenge. The image below shows the pipeline for the data collection on Twitter. Hence necessary features are selected for the analysis.
The image above is the pipeline for the data collection on Twitter, hence necessary features are selected for the analysis.
Sentiment Analysis
This is an unsupervised learning task. Thus, we label the data using text blob API for sentiment analysis on each data collected.
Statistical and Time data features
Generating statistical features on each tweet like word_count, character_count, and word_density.
Generating some features from the date feature like the days of week and days.
Exploratory Data Analysis
In this section, we analyzed our data using univariate and bivariate analysis to answer the following questions in our data
1. The percentage of each Sentiment in the data
2. Top 10 most occurring word count and character count in the data
3. The 10 Likes count in the data
4. The percentage of days of the week
5. Average Likes counts for each Sentiment
6. Sentiment with the total number of words and character count
7. Days of week with Sentiment percentage
Univariate and Bivariate Analysis
On Friday, we have more positive sentiment with 48.3%.
On Tuesday, we have a more neutral sentiment with 54.2%.
On Saturday, we have 15.2% of negative Sentiment, which is the highest percentage for negative Sentiment across the days of the week.
Named Entity Recognition
In this section, we used a spacy model to analyze the top 5 tweets with the most likes. After that, I will be showing the Top 2 tweets analysis. Other analyses can be found in the notebook link on my GitHub page.
Data Preprocessing
In this section, we preprocessed our raw text data by lowering the text case, removing stop words, repeated words, and other noise. Then we lemmatized the text data.
Word Cloud analysis
We analyzed the clean text data in this section using a word cloud.
Machine Learning
The data sentiment has been transformed into class, and then the data was split into 80% training data and 20% test data. We then extracted features from the text data using TfidfVectorizer. Afterwards, we compared 5 machine learning models; Logistic Regression, RandomForestClassifier, DecisionTreeClassifier, Lightgbm, and, Xgboost, to come up with the model that gives the best accuracy using accuracy score as the evaluation metrics.
Model accuracy
It was observed that a Gradient Boosting model Xgboost, a non-linear model, performed better with 75.10% accuracy, and Logistic Regression which is a linear model also performed better with an accuracy of 74.69%.
Model Explainability
We can interpret the model using lime, which is done to understand better the Sentiment of a text and model debugging in the long run.
I give a shout-out to Dphi and Nexford University.
Dphi has improved my technical skills in solving data science-related problems through the boot camps have attended.
Nexford University has improved my problem-solving skills through their well-created educational content for their students.
Important document
Contact
References
- https://dphi.tech/challenges/nft-datacrunch-league-bitscrunch-bronze-edition/198/overview/about
- https://dphi.tech/blog/winners-of-the-bitscrunch-nft-datathon-envisioned-towards-safeguarding-the-nft-ecosystem/
3. https://www.youtube.com/watch?v=_SqgSh3aR1g&t=420s
4. https://medium.com/analytics-vidhya/how-to-scrape-tweets-from-twitter-with-python-twint-83b4c70c5536