Hey, I’m Rosalina 👋
Data Engineering Portfolio
About Me
Hi, I’m Rosalina — a driven Data Analytics Engineering graduate student at Northeastern University with over a decade of experience in sales strategy, channel development, and analytics. Today, I’m channeling that business acumen into mastering Machine Learning and AI Engineering, turning raw data into intelligence that empowers decision-making and drives innovation.
🎓 Education
Currently pursuing my Master’s in Data Analytics Engineering, I specialize in ML/AI applications. My coursework spans advanced statistics, predictive modeling, big data ecosystems, cloud computing, and AI governance—equipping me to design systems that are not just powerful, but also ethical and trustworthy.
💡 What Drives Me
I’m fascinated by the convergence of data engineering and artificial intelligence—how the right architecture can transform millions of scattered data points into real-time insights. My passion lies in building scalable pipelines, production-ready ML models, and intelligent systems that solve high-impact, real-world challenges.
🚀 Career Goals
My vision is to become an ML/AI Engineer who develops end-to-end solutions—from automated data ingestion to AI-driven decision support—helping businesses, policymakers, and communities thrive in a data-first world. I’m particularly drawn to projects that advance education, equity, and economic growth through evidence-based innovation.
My Projects
🚀 Real-Time Crypto ML Trading Pipeline
Developed a production-ready cryptocurrency trading system using Random Forest ML and real-time WebSocket streaming. Built for scalability, performance, and intelligent decision-making in live market conditions.
- Real-time data ingestion from multiple exchanges via WebSockets
- Ensemble ML model with feature engineering (RSI, MACD, Bollinger Bands, sentiment, on-chain metrics)
- Interactive dashboard with live predictions, confidence scores, and trade simulation
- Containerized deployment with Docker and REST API integration
🌍 Democracy Clustering Analysis
Applied K-Means and Hierarchical Clustering to The Economist’s Democracy Index to uncover patterns of governance beyond traditional classifications. Revealed data-driven insights into regime types, democratic transitions, and global political trends.
- Cleaned, normalized, and reduced dataset dimensions using PCA (82% variance retained)
- Implemented K-Means and Agglomerative Clustering with optimal k determined by Elbow + Silhouette methods
- Created Democratic Stability Index to identify countries at transition boundaries
- Geographic visualization of democracy patterns across 160+ nations
- Compared algorithmic clusters against official EIU regime classifications
🧠 Network & Word Frequency Analysis for Data Mining
Applied Network Science and Text Mining to academic article metadata to uncover semantic relationships through keyword co-occurrence. Built weighted network graphs to reveal thematic clusters, bridge concepts, and influential terms across research domains.
- Extracted and cleaned author-assigned keywords from academic metadata
- Constructed weighted co-occurrence matrices and converted them into undirected graphs
- Applied centrality metrics (degree, betweenness, eigenvector) to identify influential nodes
- Detected thematic communities using the Louvain algorithm
- Developed interactive and static network visualizations for research mapping
🏃 MotionInsight: Advanced Human Activity Recognition
Engineered a high-precision time series analysis system leveraging Permutation Entropy and Jensen–Shannon Complexity to differentiate between subtle human activities from accelerometer data. Achieved superior accuracy in separating walking, running, and climbing patterns—unlocking new potential for wearables, healthcare, and workplace safety solutions.
- Processed accelerometer time series from chest-mounted sensors with advanced feature engineering
- Optimized parameters (dimension & delay) via F-statistic to maximize activity discrimination
- Identified characteristic entropy-complexity signatures for each activity type
- Created interactive dashboard with 3D activity mapping, performance metrics, and feature correlations
- Demonstrated immediate applications in fitness wearables, healthcare monitoring, and industrial safety
🤖 Boston Heatwave Monitor
Real-time heat monitoring system analyzing vulnerable populations across 10 Boston neighborhoods with interactive dashboards and risk assessments.
- Live weather data from NOAA Weather Service API (Logan Airport)
- Green space correlation analysis
- Vulnerable Population Assessment
- Interactive Visualizations
- Enhanced user experience with Streamlit’s powerful features
📊 ForecastPro Analytics
Built a professional time series forecasting platform with React enabling users to upload data, select multiple forecasting models, and visualize predictions alongside performance metrics.
- Drag-and-drop CSV upload with instant data processing
- Multiple forecasting models: ARIMA, Prophet, Exponential Smoothing, Holt-Winters, Moving Average
- Interactive charts updating in real time with model comparison metrics (MAE, RMSE, MAPE)
- Export forecasts as CSV files for further analysis
- Clean, responsive UI with inline styling and iconography from Lucide React
Technical Skills
🐍 Programming & Data Engineering
🤖 Machine Learning & AI
☁️ Cloud & DevOps
📊 Data Analytics & Visualization
📚 Frameworks & Libraries
⚖️ Strategic & Leadership
Beyond the Code
The human behind the algorithms 🎯
Lines of Code
250K+And only half of them have bugs! 😄
Neural Networks
47Built from scratch (TensorFlow is still my friend though)
Fuel Source
3-5Cups of coffee per model training session
Learning Hours
1000+Online courses, papers, and “just one more tutorial”
Debugging Sessions
∞It’s not a bug, it’s an undocumented feature!
Deploy Success Rate
99.9%That 0.1%? We don’t talk about production Fridays