For the capstone project in my Data Science course, I worked with a team of three to analyse whether Elon Musk's tweets correlate with Tesla's daily closing price. The hype around social media influence made for a compelling problem, but we wanted to validate it with rigorous methodology and transparent communication of limitations.
Building the Data Pipeline
- Collected 10+ years of Tesla stock data and matched it to timestamped tweet data using the Twitter API.
 - Structured the pipeline in Python with Pandas for cleaning, resampling, and merging datasets.
 - Normalised data to daily intervals to align price movements and public communications.
 
Sentiment Analysis with a Human-in-the-Loop
We used a Llama 3 language model to score tweet sentiment. To counter sarcasm and emoji ambiguity, we integrated a human-in-the-loop review process. Samples were manually checked, and the team documented model failure patterns so future iterations could improve accuracy.
Regression & Interpretation
Linear regression models tested correlations between sentiment scores and stock price changes. Results were statistically insignificant, indicating other macro factors dominate price movements at the daily level. We explored shorter windows, but the noise floor remained too high without intraday data.
Skills Demonstrated
- Python data engineering with Pandas and API integration.
 - Applying large language models to NLP tasks with HITL validation.
 - Statistical modelling, reporting, and communicating nuanced findings.