Welcome to my Projects page, where I showcase data-driven analysis and predictive modeling work.
Each project highlights my ability to extract insights from complex datasets, optimize processes, and apply machine learning techniques to solve real-world business problems. From predictive modeling to campaign optimization, these projects demonstrate my technical skills and analytical approach to driving meaningful outcomes.
House Price Prediction Model Using Machine Learning
In this project, I analyzed housing data from Ames, Iowa, to predict house sale prices using a multiple linear regression model. The process involved extensive data preparation, including handling missing values, identifying outliers, and addressing data quality issues. I selected key variables such as overall quality, basement size, garage capacity, and the year built, which showed strong correlations with sale prices.
I split the dataset into training and testing sets and built various regression models to optimize accuracy. After multiple iterations, I finalized a model that achieved an 82% prediction accuracy. Assumptions such as multicollinearity and normality were tested using VIF and Durbin-Watson tests, ensuring the model's robustness.
This project demonstrates my ability to clean data, build predictive models, and perform comprehensive evaluation and assumption testing to deliver actionable insights.
Click here to access full document.
Bank Term Deposit Subscription Prediction Using Logistic Regression
In this project, I worked with data from a bank's telemarketing campaign to predict whether customers would subscribe to a long-term deposit product. I applied logistic regression to address this binary classification problem. The data underwent extensive cleaning, including handling missing values, identifying outliers, and correcting data types. Key variables such as customer age, employment status, loan status, and the Euribor 3-month rate were selected due to their impact on subscription likelihood.
After splitting the data into training and testing sets, I built several logistic regression models, selecting the final model based on its performance and AIC value. The final model achieved a prediction accuracy of 72% on the test set. To ensure the robustness of the model, I checked key assumptions like multicollinearity and linearity of the logit using VIF scores and residual analysis.
This project showcases my ability to clean complex datasets, build predictive models using logistic regression, and conduct thorough evaluation and assumption testing to produce actionable business insights.
Click here to access full document.
Life Insurance Purchase Prediction Using Machine Learning
In this project, I analyzed customer data from Imperials Ltd to predict whether a customer would purchase life insurance using various machine learning models. The dataset contained information on customer demographics, financial data, and other relevant metrics. I conducted thorough data preparation, including handling missing values and transforming categorical data into numerical formats.
I experimented with several classification algorithms, including logistic regression, random forest, and support vector machine (SVM). After testing these models, logistic regression was selected as the final model due to its high accuracy and interpretability. The model achieved a 72% accuracy on the test set. I also performed feature importance analysis to understand which variables, such as house value, education, and income, had the most significant impact on predicting the likelihood of purchase.
This project highlights my skills in data preprocessing, model selection, and performance evaluation using various machine learning techniques to provide actionable business insights.
Click here to access full document.
Twitter Engagement Analysis Using Text Mining and Machine Learning
In this project, I used text mining techniques to analyze Twitter engagement metrics based on Elon Musk’s tweets. The goal was to identify patterns that influence the number of likes and retweets a tweet receives. The dataset contained over 3,000 tweets, including metrics like retweets, likes, and sentiment scores.
I conducted data cleaning by removing short tweets, stop words, and irrelevant characters. Features like hashtags, mentions, sentiment score, and time of day were extracted. A linear regression model, random forest, and support vector regression were used to predict the number of likes based on these features.
The linear regression model had the highest accuracy with an R² score of 0.87, and sentiment score was found to be the most important predictor of engagement. This project demonstrates my ability to conduct text mining, feature engineering, and build predictive models to derive actionable insights from social media data.
Click here to access full document.
Market Segmentation Analysis Using Clustering Techniques
In this project, I applied hierarchical and k-means clustering techniques to segment customers of a restaurant chain based on their preferences and behaviors. The goal was to identify distinct customer segments to inform targeted marketing strategies as the restaurant looked to expand in Belfast.
After cleaning and preparing the data from a survey of 1,000 customers, 14 key variables, including average order size, food quality, location, service, and pricing, were selected for analysis. Hierarchical clustering identified three distinct customer segments: value-conscious, quality-conscious, and technology-savvy. I validated these results using k-means clustering, which confirmed the segments' characteristics.
The analysis showed that the value-conscious segment was driven by reasonable pricing, while the quality-conscious segment valued food and service quality. The technology-savvy segment was interested in innovation and restaurant technology. These insights helped provide actionable recommendations for the restaurant’s marketing strategies.
Click here to access full document.
Laptop Product Design Using Conjoint Analysis and Perceptual Maps
In this project, I performed a conjoint analysis to understand customer preferences for laptop attributes such as brand, hard drive capacity, RAM, screen size, and price. A dataset containing 20 product profiles and survey responses from 132 consumers was analyzed to determine the part-worths of each attribute and to estimate market share for various laptop configurations.
The conjoint analysis revealed that RAM size was the most influential factor in consumer decision-making, followed by screen size and price. Brands like Apple and Lenovo were found to have a significant influence on customer preferences, with Apple expected to capture the majority market share. Additionally, I used Principal Component Analysis (PCA) to create perceptual maps, visualizing the relationships between different laptop brands and attributes.
This project demonstrates my ability to use advanced analytics techniques to inform product development and segmentation strategies.
Click here to access full document.
Current and Future Benefits of HR Analytics to Firm Performance
In this article, I explored the strategic role of Human Resource (HR) analytics in enhancing organizational performance. HR analytics is a data-driven approach to workforce management, using metrics and predictive models to support decision-making in areas like employee retention, recruitment, and productivity.
The article highlights the core pillars of HR analytics, including descriptive, diagnostic, predictive, and prescriptive analytics, and shows how HR analytics can help businesses optimize employee performance, reduce attrition, and improve recruitment quality. I also discussed case studies of successful companies like Google, Microsoft, and Walmart, demonstrating how HR analytics has transformed their operations by integrating data into their HR functions.
This article underscores the growing importance of HR analytics in building data-driven strategies to boost employee satisfaction, productivity, and overall organizational success.
Click here to access full document.
Optimizing Workforce and Logistics Using Data-Driven Models
In this project, I tackled three distinct optimization problems using data-driven decision-making models and R programming:
Nurse Planning Problem: The objective was to minimize the cost of scheduling full-time and part-time nurses across a week while meeting daily staffing requirements. The model incorporated cost variations for weekdays, weekends, and the maximum allowable part-time nurses. I used linear programming to determine the optimal number of nurses required per day, minimizing costs to €28,015 per week.
Chipset Logistics Optimization: This problem focused on minimizing the shipping cost of chipsets between fabrication plants and distribution centers. I formulated a linear program to optimize the number of chipsets to be sent across 26 different links, achieving a total shipping cost of €36,400,000.
Ocean Internet Cables Production: In this problem, I optimized the production schedule of two types of internet cables across two plants over three months. The model aimed to maximize profit while meeting demand and adhering to production capacity constraints. Unfortunately, the linear programming model yielded no feasible solution due to conflicting constraints.
These projects demonstrate my ability to apply optimization techniques, such as linear programming, to solve real-world problems in workforce scheduling, logistics, and production planning.
Click here to access full document.
Insurance Data Management and Quality Assessment
In this project, I used SQL, R, and Microsoft Access to manage and analyze customer data for an insurance company. The goal was to clean, organize, and analyze datasets related to motor, health, and travel insurance policies, enabling the company to refine its marketing strategies.
I imported and linked datasets in MS Access to create an analytics base table using SQL queries. I then performed data quality checks in R, addressing issues like missing values, inconsistencies, and invalid data. For example, I corrected typos in categorical fields like gender and eliminated outliers in the age column.
The analysis revealed key insights, such as the relationship between gender and claims, where females had a higher number of claims, suggesting targeted marketing strategies for renewal periods. I also explored the correlation between age groups and health policies, helping the company identify opportunities to market policies more effectively to younger customers.
This project highlights my ability to manage large datasets, perform data quality assessments, and derive actionable insights for business strategies.
Click here to access full document.
Big Data Applications in Netflix's Personalization and Content Delivery
This article examines how Netflix leverages big data to personalize recommendations and enhance its content delivery strategies. Netflix collects vast amounts of structured and unstructured data, including user demographics, viewing habits, and metadata about content. By applying machine learning algorithms and predictive analytics, Netflix tailors its recommendations to individual users based on their interests and viewing history.
In addition to content recommendations, Netflix utilizes big data to optimize its content delivery network, ensuring smooth streaming experiences worldwide. Cloud computing platforms like AWS, distributed systems, and content delivery networks (CDNs) form the technical infrastructure behind Netflix's scalability and high performance.
This analysis highlights the key role of big data in Netflix’s success, from personalized recommendations to efficient content delivery, contributing to its competitive edge in the streaming industry.
Click here to access full document.
Healthcare Worker Technology Tendency Analysis Using Machine Learning
In this project, I analyzed a dataset of over 79,000 healthcare workers from three U.S. states (OH, NV, PA) to predict their technology adoption tendencies. The data included information such as primary specialty, gender, graduation year, and state. The goal was to identify the factors influencing healthcare workers' tendency to adopt new technologies.
I cleaned and preprocessed the data, addressing missing values, duplicates, and outliers. I applied various classification models, including logistic regression, linear discriminant analysis (LDA), and Naive Bayes, to predict technology adoption based on key features like specialty and accuracy of predictions. I also used K-fold cross-validation to validate model performance.
The logistic regression model achieved an accuracy of 68%, with insights revealing that healthcare workers in Family Medicine and Dermatology were more inclined to adopt new technologies compared to other specialties.
This project demonstrates my ability to handle large datasets, apply machine learning models, and generate actionable insights from healthcare data.
Click here to access full document.
Airbnb Price Prediction Using Machine Learning Models
In this project, I analyzed Airbnb listings from Athens, Bologna, and Copenhagen to predict property rental prices. The dataset included 5867 observations and 49 variables related to property attributes, host details, and location. The goal was to develop a model that could accurately predict property prices based on various features.
I applied three regression models: LASSO Regression, Random Forest, and Support Vector Regression (SVR). After cleaning the data and addressing outliers, I conducted exploratory data analysis, examining the relationships between price and key variables like the number of bathrooms, accommodates, and host listings.
The Random Forest model performed the best, with an RMSE of 67.58 and an R-squared value of 0.36, explaining approximately 36% of the variance in prices. The analysis revealed that bathrooms, accommodates, and host listings were significant factors in predicting prices.
This project showcases my ability to perform data cleaning, feature engineering, and apply machine learning models to predict property prices accurately.
Click here to access full document.
Implications of Artificial Intelligence in Lethal Autonomous Weapons System (LAWS)
In this blog, I explored the diverse implications of artificial intelligence (AI) in Lethal Autonomous Weapon Systems (LAWS).
I discussed the potential risks and advantages of AI-powered autonomous weapon systems, including ethical concerns regarding accountability, human oversight, and the impact on international humanitarian law.
The blog delves into how AI may shape the future of warfare by reducing human involvement but raises critical questions about moral and legal responsibilities.
Click here to access full document.
Implications of Artificial Intelligence in Leadership and Management
In this blog, I explored the diverse implications of artificial intelligence (AI) in leadership and management.
It covers the challenges faced by managers and leaders as AI technology transforms traditional business models.
The blog highlights how organizations can leverage AI to gain competitive advantages while also emphasizing the need for leadership adaptability in a rapidly evolving technological landscape.
Click here to access full document.
Implications of Artificial Intelligence in Mass Surveillance
In this blog, I explored the diverse implications of artificial intelligence (AI) in mass surveillance.
I analyzed the growing use of AI in mass surveillance, exploring both its benefits and ethical challenges. AI's ability to enhance data collection and decision-making poses significant privacy and security concerns, particularly regarding employment, human rights, and social equality.
This article illustrates the far-reaching implications of AI in modern society, emphasizing both its transformative potential and the need for ethical considerations.
Click here to access full document.