Projects
FestMart Sales Analysis Project
Objective: Provide a dynamic BI dashboard to present FestMart’s overall sales, profitability, and category-level performance, highlighting the influence of discounts and regional trends on revenue and profit outcomes.
Approach: Implemented advanced drill-down and drill-through functionalities for seamless navigation between sales, profit, and order metrics. Employed interactive visualisations, data ingestion, and cleaning processes to ensure insightful, user-friendly reporting.
Analysis: Recorded total sales of £2.30 million and profits of £286.40 thousand, led by strong Technology sales. Although the West and East regions dominated in volume, the South region revealed higher profit margins despite lower sales. Notably, minimal discounts correlated with higher profits, and seasonal peaks were uncovered, guiding inventory and marketing strategies.
Outcome: Recommended refining discount policies, particularly where deeper discounts reduce margins, while capitalising on high-margin opportunities in the South. Proposed leveraging identified seasonal peaks for timely marketing and inventory decisions. Advised re-evaluating the Central region’s furniture segment to address potential losses and boost profitability.
Festman Electronics Sales Performance Dashboard
Objective: Provide a comprehensive analysis of Festman Electronics’ global sales data, emphasising revenue, profit, brand performance, and seasonal trends to inform strategic decision-making.
Approach: Developed a Power BI dashboard equipped with advanced analytics to consolidate key metrics. The process involved data ingestion, cleaning, and creating interactive reports spanning multiple product categories, subcategories, and brands.
Analysis: Identified top-performing categories such as Computers, Home Appliances, and TV & Video. Discovered recurring seasonal spikes in January and February. Compared online and in-store sales channels, revealing a slightly higher Average Order Value for in-store transactions.
Outcome: Highlighted growth opportunities in product expansion and marketing campaigns, particularly around seasonal peaks. Recommended bolstering online promotions to close the spending gap between in-store and digital customers.
Deep Learning for Personalised Book Recommendation System
Objective: Develop a robust book recommendation system leveraging advanced deep learning methods.
Approach: Utilised collaborative filtering and transformer architectures to deliver personalised suggestions.
Dataset: Performed thorough data quality checks on the Bookcrossing dataset, ensuring high integrity for model training.
Outcome: Refined the system through experimentation with various models, selecting the most effective solution and gaining valuable insights for continued AI research.
Analysis on WeRateDog Tweets
Objective: Examine and visualise trends within the WeRateDog Twitter dataset.
Approach: Corrected issues of data quality and tidiness to maintain dataset accuracy.
Analysis: Executed detailed data transformations and visualisations to uncover meaningful patterns.
Outcome: Contributed to the study of social media trends by revealing critical insights from the data.
Student Performance Evaluation
Objective: Investigate the primary factors influencing student performance.
Approach: Extracted and transformed data from over 600 features, focusing on 29 key indicators for concise analysis.
Analysis: Employed multivariate visualisation to examine how parental engagement affects student outcomes.
Outcome: Offered actionable insights to enhance educational strategies, demonstrating mastery of data wrangling and statistical analysis.
The Movies Database Analysis
Objective: Explore and analyse pivotal trends in a comprehensive movies database.
Approach: Drafted more than eight targeted questions to guide the project’s analytical goals.
Dataset Preparation: Undertook extensive data cleaning and wrangling to address quality and tidiness, preserving coherent and reliable data.
Outcome: Effectively answered all research queries and presented results through a variety of visualisations, yielding clear, in-depth insights.
Punch News Web Scraper
Objective: Streamline the collection of the latest Punch news articles by crawling relevant URLs, capturing titles, content, and source links in a single repository.
Approach: Developed a Scrapy-based crawler configured to store data in different formats (text file, CSV) and deploy seamlessly to Scrapyhub. All dependencies are managed through a minimal installation, requiring only Scrapy.
Analysis: Validated the extraction process across multiple news pages to ensure complete and accurate article retrieval. Emphasised maintaining a consistent structure that facilitates reuse and further development.
Outcome: Simplified the acquisition of Punch news content for timely reference and analysis, offering a modular and easily deployable solution for both local setups and scalable production environments.
BehindTheName.com Global Names Scraper
Objective: Gather user-submitted names, genders, locations, and country-specific descriptions from behindthename.com to create a comprehensive dataset reflecting diverse naming practices worldwide.
Approach: Leveraged the Scrapy framework to systematically crawl behindthename.com. The initial version targeted Eastern Africa, while the updated iteration expanded to all available countries, capturing each region’s specific information and descriptive text.
Outcome: Compiled a structured, scalable dataset of global name submissions, laying the groundwork for deeper analysis of cultural naming trends across numerous regions.
Bank Institution Term Deposit Predictive
Objective: Construct a predictive model to assess whether new bank clients would opt for a term deposit, guiding data-driven marketing and engagement efforts.
Approach: Carried out data preprocessing with one-hot encoding, outlier handling, and scaling (e.g., MinMaxScaler, StandardScaler). Deployed dimensionality reduction techniques (t-SNE, autoencoders, PCA) to boost model performance. Employed a 90/10 train/test split alongside K-Fold and Stratified K-Fold cross-validation.
Analysis: Compared diverse machine learning algorithms—Logistic Regression, XGBoost, Multilayer Perceptron, SVM, Decision Trees, and Random Forest—using metrics such as ROC, F1, and Accuracy. Investigated performance variations under Stratified K-Fold to address bias from improper data splitting.
Outcome: Identified the three most effective models, emphasising the significance of feature engineering, scaling, and rigorous validation approaches. Published a Medium article covering key findings and made the complete codebase and documentation available on GitHub.
Zimnat Insurance Recommendation Challenge
Objective: Build a machine learning model for Zimnat to suggest additional insurance products that existing customers are likely to require, enhancing targeted product offerings.
Approach: Analysed data from roughly 40,000 customers—each holding at least two insurance products—and reframed the task of finding the “missing” product across categories. Explored a range of classification and recommendation systems to balance accuracy with scalability.
Analysis: Investigated demographic factors, purchasing behaviour, and product ownership combinations. Assessed model performance on a 10,000-customer test sample by deliberately omitting one product per customer, illustrating the model’s cross-selling potential.
Outcome: Devised a reliable recommendation system, giving Zimnat data-driven advice on refining product diversification and marketing efforts in the Zimbabwean insurance market.
Rossman Sales Prediction
Objective: Forecast a six-week sales window for over 1,100 Rossmann stores across Germany to enhance stock control and staffing decisions.
Approach: Unified and cleansed historical sales data, store attributes, and temporal elements such as promotions and holidays. Trialled an array of forecasting methods (e.g., regression and ensemble models) to capture store-specific and seasonal trends. Collaboratively refined models within the 10 Academy – Team Harar Jugol.
Analysis: Performed exploratory data analysis to detect patterns in day-to-day and weekly sales, noting store-level variations. Engineered temporal features (day-of-week, promotional periods) to increase predictive accuracy. Validated the final models against competition benchmarks through team coordination.
Outcome: Generated reliable sales forecasts, storing all code, models, and figures in a shared repository for reproducibility. Demonstrated how precise sales predictions can inform strategic promotions and workforce allocation in the retail sector.
Water Analysis Project
Objective: Examine water usage across multiple locations and measure tap flow, aiming to inform resource management and conservation strategies.
Approach: Implemented a dedicated Python module (Data
class) to streamline data handling and analysis. Allowed direct method invocation from the module for a more efficient workflow than traditional Jupyter Notebook. Incorporated built-in Python documentation features (?Data
, help(Data)
) for real-time method references.
Analysis: Reviewed water resource allocation and utilisation at each location. Assessed tap flow rates to detect consumption patterns and pinpoint opportunities for optimised usage. Employed modular methods for data cleaning, transformation, and visual exploration.
Outcome: Delivered a flexible, module-based framework for water usage analytics. This self-contained solution simplifies adaptation to new datasets or requirements, supporting ongoing research and evidence-based decision-making in water resource management.
Autism Project
Objective: Use exploratory data analysis and machine learning to predict autism status in children, focusing on features that distinguish autistic from non-autistic participants.
Approach: Investigated univariate statistics (e.g., box plots) for the first four attributes to judge their ability to separate autism and non-autism classes. Analysed data distributions across Australia, Germany, Italy, and India via bar charts, with discussions on alternative visual approaches. Built and tested a linear regression model using the two most indicative attributes (score, score2) to forecast autism status.
Analysis: Found that India constituted the largest segment of children in the dataset. Noted higher standard and alternative test scores for autistic children compared to non-autistic peers. Highlighted elevated testing costs for autistic children, suggesting potential socio-economic implications.
Outcome: Demonstrated the predictive value of key features and validated the utility of both simple (bar charts, box plots) and more advanced analytics. Established a springboard for refining models and further exploring autism-related variables in paediatric studies.