Vendor Performance Data Analytics End-To-End Project | SQL + Python + Power BI + Reporting (ENG-SUB)
Understanding Real-World Data Analytics Projects
Importance of Advanced Projects
- The project discussed is not for beginners; it reflects company-standard practices used by top data analysts.
- Common beginner projects (like basic EDA, SQL dashboards in Power BI or Tableau) do not prepare candidates for real job opportunities.
- Recruiters prefer seeing actual company-level projects on resumes rather than sample projects.
Challenges Faced by Aspiring Analysts
- Many individuals are unaware of the types of projects that impress recruiters and how to integrate multiple skills/tools (SQL, Python, Power BI).
- There is a misconception that basic projects will suffice in job interviews.
- Analysts often struggle to extract meaningful insights relevant to specific business problems.
Project Overview and Structure
- The project will follow a structured approach starting with defining a business problem, which is crucial in corporate settings.
- A dataset stored in a database will be explored using SQL to understand its structure and identify valuable data points for analysis.
Data Processing Steps
- The process includes merging and cleaning data from various tables to create an aggregated final table for further analysis.
- This aggregated table will be saved back into the database, reflecting common practices where companies store data across multiple tables.
ETL Pipeline and Analysis
- The project will cover the complete ETL pipeline: extracting, transforming, and loading data at scheduled intervals.
- After creating the final dataframe in Jupyter Notebook using Python, exploratory data analysis (EDA), cleaning, and research question formulation based on the business problem will occur.
Visualization and Reporting
- A Power BI dashboard will be created based on insights derived from Python analyses.
- Report writing is emphasized as essential; analysts must present findings effectively to stakeholders through reports and presentations.
Defining the Business Problem
Focus on Inventory Management
- The project's central business problem revolves around effective inventory and sales management critical for profitability in retail/wholesale industries.
Vendor Performance Analysis
- Using vendor data (e.g., from Amazon), the project aims to analyze purchasing prices versus selling prices to assess vendor performance.
Analysis of Brand Performance and Vendor Relationships
Identifying Underperforming Brands
- The primary goal is to identify underperforming brands that need promotional and pricing adjustments. This includes recognizing brands with low sales, similar to how Amazon uses sponsored ads or price reductions to boost visibility and sales.
Top Vendors and Their Impact
- It is essential to determine which top vendors contribute significantly to sales and gross profit. Strong relationships with these vendors can yield more benefits for the company. Understanding who these vendors are is crucial for strategic planning.
Analyzing Bulk Purchasing Effects
- The analysis will include examining how bulk purchasing affects unit costs, specifically how prices change when products are bought in larger quantities. This insight helps in negotiating better deals with suppliers.
Inventory Management Insights
- Accessing inventory turnover rates is vital for reducing holding costs and improving operational efficiency. This involves assessing how much inventory remains unsold versus what has been sold over a specific period.
Profitability Variance Investigation
- Investigating the profitability variance between high-performing and low-performing vendors will highlight areas for improvement. By identifying successful vendors, strategies can be developed to enhance relationships or address shortcomings with less effective ones.
Data Handling Techniques
Data Storage Strategy
- The project involves creating a report using SQL, Python, and dashboarding tools by storing various CSV files (e.g., inventory data, purchase prices) into a database system tailored for this analysis rather than relying on pre-existing company databases.
Database Creation Process
- A SQLite database named "inventory" will be created where all relevant CSV files will be imported as tables after reading them from the designated directory using Python libraries like Pandas and SQLAlchemy for efficient data handling.
Functionality of Data Ingestion
- A function called
df_ingest_dbwill be established to automate the process of inserting data frames into the database tables while ensuring that existing data can either be replaced or appended based on requirements during continuous updates from CSV files.
Scripting Automation Importance
- Emphasizing the necessity of scripting in professional environments, scripts can automate regular data ingestion processes (e.g., every 15 minutes), ensuring timely updates without manual intervention, thus enhancing operational efficiency in managing large datasets.
Challenges with Large Datasets
- When dealing with extensive datasets during SQL queries, performance issues may arise; therefore, optimization techniques must be considered to ensure smooth operations when retrieving or manipulating large volumes of data within the database system being utilized.
Understanding Data Ingestion and Logging in Python
Importance of Database Management
- The speaker emphasizes the significance of database management, noting that simply using Kaggle data for projects won't guarantee job opportunities. Understanding how to optimize large datasets is crucial.
- It is highlighted that handling large datasets requires inserting them into a database first, rather than directly analyzing from Excel.
Script Structure and Logging
- The initial script for data insertion is discussed, with a focus on maintaining proper coding structure by encapsulating code within functions.
- The concept of logging is introduced as a method to track program execution, including warnings and errors. This helps identify which steps are completed or where issues arise.
Implementing Logging in Scripts
- A structured directory for logs and project data is recommended. Logs should be stored in a dedicated folder to maintain organization.
- Basic configuration for logging is explained, including setting file names, log levels (e.g., debug), and formatting messages with timestamps.
Function Creation and Time Tracking
- The process of creating functions for loading raw data into the database is outlined. Instead of print statements, logging methods like
infoanderrorare suggested for better monitoring.
- Time tracking during ingestion processes is discussed; start and end times are recorded to calculate total ingestion time effectively.
Finalizing the Script
- The importance of documenting functions with docstrings is emphasized. This provides clarity on what each function does for future reference.
- A complete function structure includes error handling through try-except blocks, although this was not elaborated upon in detail during the discussion.
Running the Script
- Finally, the speaker mentions calling the main function within a Python script file named
injection_db.py, indicating readiness to run it from a notebook environment while demonstrating how logs will be generated.
Database Creation and Data Exploration
Overview of Database Creation
- A file has been created in the database log, showing the insertion of a CSV file into the database.
- The last table contains over 1 crore records, leading to a longer processing time; the total time for the complete process exceeded 8 minutes.
- The script for data creation has been successfully completed.
Initial Data Exploration
- The next step involves exploring the types of values present in various tables within the database relevant to business problems such as vendor performance, pricing strategies, and inventory management.
- This phase is termed exploratory data analysis (EDA), focusing on identifying necessary columns for final data extraction.
Database Connection and Table Inspection
- Importing pandas and establishing a connection to the 'inventory' database using
ask3.connect.
- A review of existing tables reveals names like purchase, purchase prices, vendor invoice begin inventory, end inventory, and sales.
Querying Table Records
- SQL queries are utilized to count records from each table; an example query is
SELECT COUNT(*) FROM purchases.
- By applying limits in queries, only top five records are displayed initially to understand column types and values.
Displaying Data Frame Information
- For each table iterated through, record counts are printed alongside initial values from each column.
- The structure of the purchase table includes key fields such as inventory ID, store number, brand number, vendor details, purchase order dates, prices, quantities, etc.
Insights on Purchase Prices and Inventory Tables
- In the purchase prices table with 12k records: it includes brand descriptions and pricing information but lacks continuous vendor entries throughout the year.
- Both beginning and ending inventory tables provide yearly snapshots but do not contribute significantly to ongoing business problem analysis due to limited temporal coverage.
Sales Table Analysis
- The sales table captures essential metrics including inventory ID, store details, sales quantity/dollars/prices along with excise tax information.
Vendor Sales and Purchase Data Analysis
Overview of Vendor Data Tables
- The analysis begins with an exploration of vendor sales and purchase data, focusing on how much has been sold by each brand, including sales dollars and prices derived from the sales table.
- A specific vendor (ID: 466) is chosen to illustrate how information is stored across different tables. Filtering the first table reveals 2000 records related to this vendor.
- Further filtering shows three records in the Purchase Prices table, around 55 records in the Vendor Invoice table, and 9453 records in the Sales table for the selected vendor.
Detailed Examination of Purchase Data
- In the purchases table, both brand names and purchase prices are repeated. Grouping by brand allows for summation of quantities and dollar amounts to analyze total purchases.
- The purchase price for a specific brand (ID: 5215) is noted as $9.41, along with details about quantity purchased and dates from the purchase table.
Insights from Vendor Invoice Table
- The Vendor Invoice table includes unique purchase order numbers (PO numbers), which are confirmed to be unique across all entries for this vendor.
- PO numbers are derived from the Vendor Invoice, ensuring that each entry maintains uniqueness based on both PO number and vendor.
Sales Data Analysis
- The sales table captures actual transactions detailing brands sold by vendors, including quantities sold, selling prices, and revenue earned. Grouping by brand allows comparison with previous purchase data.
- For example, comparing sales data for brands ID: 5215 shows significant profits; it highlights that these brands have performed well in terms of sales value over various dates.
Summary Points on Data Structure
- Key points regarding data structure include:
- The Purchases Table contains actual purchase data such as date, amount paid in dollars, and quantity purchased.
- The Purchase Prices Table provides product-wise actual prices at which vendors bought items.
- The Vendor Invoice aggregates data summarizing quantities and dollar amounts while maintaining uniqueness based on PO number per vendor.
Need for Aggregated Summary Table
- An aggregated summary table is necessary due to distributed data across multiple tables. This final summary should include:
- Purchase transactions made by vendors
- Sales transaction data
- Freight costs associated with deliveries
- Freight cost refers to delivery expenses incurred during transportation. A comprehensive query will be developed to extract summaries from different tables effectively.
Freight Summary and Vendor Analysis
Overview of Freight Summary
- The discussion begins with the creation of a freight summary using SQL queries to extract data from the vendor invoice table.
- Key columns identified in the vendor invoice include vendor number, vendor name, invoice date, PO number, and PO date. The focus is on grouping by vendor number to calculate total freight costs.
- A SQL query is constructed to select the vendor number and sum of freight costs from the vendor invoice table, resulting in 126 unique vendors.
Purchase Prices Data Integration
- The next step involves joining purchase prices data to analyze actual purchase prices against freight costs for each vendor.
- Details required include quantity purchased, price per item, and total dollars spent. A new SQL query is initiated to gather this information.
- The query specifies selecting columns such as vendor number, brand, purchase price (sum), quantity (total), and total purchase dollars while applying necessary joins.
Data Validation and Filtering
- Analyzing results reveals discrepancies; some items show zero values for both purchase price and actual price. This indicates potential data entry errors.
- A condition is proposed to filter out records where the purchase price is zero since these do not contribute meaningfully to analysis.
Sales Table Insights
- Moving forward, attention shifts to the sales table which includes columns like vendor number, brand, sales quantity, sales dollar amount, and sales price.
- Another SQL query aims to summarize sales data by selecting relevant fields including total sales dollars and excise tax while grouping by both vendor number and brand.
Final Table Construction
- After executing three separate queries for freight summary, purchase prices, and sales dataโeach yielding significant insightsโthe next step involves combining these into a final comprehensive table.
- Emphasis is placed on performing joins across all three datasets to create a unified view that encapsulates all relevant metrics for analysis.
Query Optimization in Data Analysis
Understanding Query Execution Time
- The speaker discusses the importance of measuring query execution time, indicating that they will use
time.time()to track how long it takes to run a specific SQL query.
- They explain that the final table is created by joining three different tables: purchase price, sales pay, and vendor invoice pay, followed by a group by operation.
Aggregation and Data Structure
- The speaker highlights the presence of quantity dollars in both the purchase and vendor invoice tables, emphasizing their role in data aggregation.
- They mention grouping data by vendor number, brand, price pay, and purchase prices to streamline data retrieval.
Importance of Query Optimization
- The necessity for optimized queries is stressed; inefficient queries can lead to unacceptable delays in environments requiring real-time data access.
- If a query takes longer than the required refresh rate (e.g., 5 minutes), it risks causing data leakage or outdated information availability.
Interview Insights on Query Performance
- The speaker notes that during job interviews for data analyst positions, candidates are often asked about strategies for optimizing slow-running SQL queries.
Handling Operational Errors
- An operational error occurs due to insufficient database disk space when attempting to execute a memory-intensive query. This necessitates further optimization efforts.
Data Volume Challenges
- The discussion reveals that while individual tables may be small (126 rows or 10,000 rows), larger datasets (over 1 crore records in sales) complicate joins and filtering processes.
Efficient Table Creation Strategy
- To improve performance, separate queries were created as smaller summary tables before being joined together efficiently using left joins based on vendor numbers.
Results of Optimized Queries
- After optimization efforts, the final data frame was generated quickly (in under one minute), yielding 10,692 records with all necessary values included.
Significance of Data Consolidation
- The process illustrates how large datasets can be distilled into meaningful summaries for analysis; only relevant records are retained for further work.
Storing Processed Data
- Emphasizing best practices in companies, storing processed tables allows analysts to avoid repetitive querying and facilitates easier access for future analyses or dashboard creation.
Performance Optimization Techniques Summary
- Key points regarding performance optimization include generating vendor-wise sales and purchases summaries through pre-aggregated results which mitigate repeated calculations.
Data Cleaning and Feature Engineering in Vendor Sales Summary
Importance of Storing Vendor and Brand Data
- The stored vendor and brand-wise data facilitates fast dashboarding and reporting, allowing dashboards to fetch data quickly from the vendor sales summary instead of running expensive queries each time.
Data Consistency Checks
- Initial checks for data consistency reveal that all numerical columns are in float format, while 'Volume' is an object type, indicating potential inconsistencies needing correction.
- Missing values identified in four columns (178 records), suggesting some products were purchased but not sold, leading to gaps in sales quantity and dollars.
Addressing Inconsistencies
- Irrelevant white spaces found in vendor names need removal; categorical variables like descriptions show no such issues.
- The volume column's data type will be converted to float 64, and missing values will be filled with zeroes to indicate no sales.
Creating New Features for Analysis
- New features such as gross profit, stock turnover, and profit margin are essential for deeper analysis. Gross profit is calculated by subtracting purchase dollars from sales dollars.
- Profit margin is determined using the formula: (Gross Profit / Total Sales Dollars) * 100. Negative profits indicate losses which will be flagged later.
Stock Turnover Calculation
- Stock turnover measures how many products were sold relative to how many were purchased. A ratio greater than one indicates previous stock was available.
- Additional metrics include the sales-to-purchase ratio calculated as Total Sales Dollars divided by Total Purchases.
Finalizing the Vendor Sales Summary Table
- A new table named "vendor_sales_summary" is created with appropriate columns and data types defined. The primary key consists of a combination of vendor number and brand.
- After executing the creation query, values are inserted into the newly created table ensuring it reflects accurate vendor sales summaries derived from earlier analyses.
Creating a Final Data Frame
Overview of the Process
- The main step involves creating the final data frame by bringing data from a query, cleaning it, adding new columns, and inserting it into the database.
- A Python script named "Get Vendor Summary" will be created to automate this process for repetitive tasks in data analysis.
Script Setup
- Essential libraries such as
escat3,pandas, andloginare imported. The log file is named after the Python file for easy identification.
- A function called
create_vendor_summaryis defined to generate vendor sale summaries using a specific query.
Data Cleaning and Ingestion
Functions for Data Management
- A separate function for cleaning data consolidates all steps taken during the cleaning process along with new feature additions.
- The ingestion function from a previously created Python file (
ingestion_db) is utilized to insert cleaned data into the database.
Execution Steps
- After establishing a connection with the inventory database, functions are called sequentially: creating vendor summary, printing top records, cleaning data, and ingesting it into the database.
- The script can be run directly from VS Code or through other means while logging timestamps for tracking progress.
Exploratory Data Analysis (EDA)
Setting Up Analysis Environment
- A new notebook titled "Vendor Performance Analysis" is created where necessary libraries like Pandas and Seaborn are imported alongside tools for hypothesis testing.
Loading and Exploring Data
- A connection to the inventory database is established to fetch complete vendor sales summary data without conditions. EDA will help understand distribution patterns and identify anomalies in this dataset.
Summary Statistics Examination
Initial Insights from EDA
- Summary statistics provide insights into variable relationships within the dataset; understanding these relationships aids in determining which variables should be included in further analysis.
Observations on Data Quality
- Upon examining summary statistics, issues such as negative gross profit values indicate losses. Additionally, infinite profit margins arise due to division by zero errors.
Identifying Anomalies
Understanding Sales Dynamics
- Notably low total sales quantities suggest that products were purchased but never sold. This raises questions about product performance and inventory management strategies.
Data Visualization and Outlier Analysis
Distribution Plots for Numerical Data
- The speaker discusses the importance of plotting distribution plots to understand the range of values and identify outliers in numerical data.
- A histogram is created for each numerical column, with sub-plots organized by column names. Notably, many columns show outliers, particularly those with purchases exceeding $4000 due to high brand prices.
Box Plot Visualization
- The speaker transitions to using box plots to visualize outliers across different columns, confirming their presence in every column analyzed.
- Summary statistics highlight significant discrepancies between maximum values and means, indicating a substantial number of outliers in purchase and actual price data.
Insights from Summary Statistics
- The analysis reveals that some brands are shipped in bulk or have high shipping costs. Stock turnover ranges from 0 to 274.5, suggesting rapid sales for some products while others remain unsold.
- Emphasis is placed on filtering records where gross profit, profit margin, and total sales quantity exceed zero for further analysis.
Handling Outliers
- Instead of removing all outliers indiscriminately, the focus is on identifying premium brands while filtering out inconsistent outliers (e.g., negative gross profits).
- A SQL query is utilized to filter data directly from the database based on specified criteria (gross profit, profit margin > 0).
Data Cleaning and Further Analysis
- After cleaning the dataset, a new frame contains 8565 records compared to an initial count of around 1600. This indicates a more refined dataset ready for analysis.
- Count plots are generated to display vendor names and product descriptions among the top ten categories based on record counts.
Correlation Analysis Using Heat Maps
- To explore relationships between numerical variables, a heat map is plotted using correlation matrices derived from numerical columns.
- The speaker explains how correlation values are formatted for clarity within the heat map visualization process.
Correlation Analysis and Business Insights
Understanding Correlation in Sales Data
- The correlation heat map indicates a weak relationship between purchase price and total sales dollar (0 to 0.01), suggesting minimal impact of price variation on sales revenue and profit.
- An increase in purchase price does not guarantee higher sales revenue or profit, indicating that these variables are not significantly correlated.
- A strong correlation exists between total purchase quantity and total sales quantity, confirming efficient inventory turnover where stock is sold effectively.
- There is a negative correlation between profit margin and total sales price; as the sales price increases, profit margins tend to decrease due to competitive pricing pressures.
- Stock turnover shows a weak negative correlation with gross profit and profit margin, implying that higher turnover does not necessarily lead to increased profitability.
Identifying Brands for Promotional Adjustments
- The primary business problem is identifying brands with lower sales performance but higher profit margins that may require promotional or pricing adjustments.
- To analyze brand performance, thresholds are set: low sales threshold at the 15th percentile of total sales dollars, while high margin threshold is above 85%.
- Brands meeting the criteria of less than $560 in total sales dollars and more than 64% in profit margin are targeted for promotional needs.
- Out of 7000 records analyzed, 198 brands were identified as having low sales but high profits, indicating they need pricing adjustments to boost revenue.
Visualizing Brand Performance
- A scatter plot visualizes brand performance against established thresholds; target brands are highlighted in red while others appear blue for clarity.
- The visualization includes lines representing high margin and low margin thresholds; filtering out outliers enhances clarity without losing overall data representation.
Analyzing Top Vendors and Brands
- The second research question focuses on identifying vendors and brands demonstrating the highest sales performance through grouping by vendor name and description.
- Total sales dollars are summed up for each vendor/brand combination to filter out the top ten performers based on highest values.
- For better readability of financial figures, a formatting function converts dollar amounts into thousands or millions using appropriate suffixes (k/m).
Sales Analysis and Vendor Contribution
Understanding Sales Figures
- The sales figures are presented in millions for clarity, with specific values noted as 7.96 million, 7.40 million, and 7.21 million to enhance readability.
Bar Plot Visualization
- A bar plot is created to visualize the top vendors and brands by sales, using vendor indices on the y-axis and their corresponding sales values on the x-axis.
- The actual sales values are displayed next to each bar using a formatted dollar representation for better understanding.
Vendor Performance Analysis
- The analysis shifts to identifying which vendors contribute most significantly to total purchase dollars through a group-by operation.
- A new column titled "Purchase Contribution" is introduced, calculated as each vendor's total purchase dollars divided by the overall total.
Percentage Contribution Insights
- For example, if a vendor contributes $30 out of a total of $100 in purchases, this indicates a 30% contribution towards total purchases.
- Values are rounded to two decimal places for clarity; all monetary figures are formatted consistently in dollars.
Top Vendors' Contributions
- The top ten vendors' contributions reveal Diego at 16%, Martini at 8%, with the last vendor contributing only 3%.
- Itโs noted that these ten vendors account for over half of the total purchases among approximately 119 vendors analyzed.
Cumulative Contribution Visualization
- A cumulative contribution chart is proposed to illustrate both individual vendor contributions and overall totals visually.
- This involves plotting both bar charts for percentage contributions alongside line plots for cumulative contributions.
Total Procurement Dependency on Top Vendors
- An inquiry into how much procurement relies on top vendors shows they account for about 65.69% of total procurement.
Visual Representation of Procurement Data
- A pie chart is suggested to represent remaining contributions versus those from top vendors effectively, illustrating that other vendors contribute only about 34.3%.
Bulk Purchasing Impact on Unit Price
- The discussion transitions into whether bulk purchasing reduces unit prices and explores optimal purchase volumes for cost savings.
Researching Optimal Purchase Volumes and Vendor Performance
Understanding Discounts and Purchase Volumes
- The discussion begins with the concept of negotiating discounts based on the quantity of items purchased, highlighting how unit prices decrease with larger orders.
- The primary research question focuses on vendor performance to enhance relationships and increase product sales.
Calculating Unit Purchase Price
- The unit purchase price is calculated by dividing total purchase dollars by total purchase quantity, providing insights into pricing strategies.
- Data is categorized into small, medium, and large order sizes using quantiles to analyze purchasing behavior effectively.
Analyzing Order Sizes and Pricing Trends
- By grouping data based on order size categories, the mean unit purchase price for each category is determined.
- Results show that smaller orders have a higher unit price ($39), while larger orders significantly reduce costs ($10), indicating bulk purchasing benefits.
Implications of Bulk Purchasing Strategies
- A clear correlation emerges: as vendors increase their order sizes, they benefit from lower unit prices, leading to higher profit margins despite reduced per-unit revenue.
- The substantial price difference (up to 72% reduction in cost for large orders) suggests that effective bulk pricing strategies can drive increased sales volume.
Identifying Low Inventory Turnover Vendors
- Another research question addresses which vendors exhibit low inventory turnover rates, indicating excess stock or slow-moving products.
- Vendors with a stock turnover ratio below one are prioritized for analysis since they indicate poor sales performance relative to inventory levels.
Analyzing Unsold Inventory Capital
- The focus shifts to calculating capital locked in unsold inventory by assessing total purchases against sales quantities multiplied by purchase prices.
- Summing these values reveals significant unsold inventory capital amounting to $2.7 million, prompting further investigation into vendor contributions to this figure.
Analysis of Vendor Performance and Profit Margins
Identifying Top and Low Performing Vendors
- The analysis begins with the need to identify top and low performing vendors based on their profit margins, specifically focusing on a 95% confidence interval.
- A threshold is established: vendors above the 75th percentile in sales are classified as top performers, while those below the 25th percentile are considered low performers.
Calculating Confidence Intervals
- A function for calculating confidence intervals is introduced, which takes vendor data as input. The default confidence level can be adjusted to 0.95.
- Key statistical measures such as mean value, standard deviation, and standard error are calculated to determine the margin of error for the confidence interval.
Visualizing Profit Margins
- Code is written to plot histograms for both top and low performing vendors, illustrating their respective profit margins within defined confidence intervals.
- The results show that the profit margin for low performing vendors (40.48% - 42.62%) is significantly higher than that of top performing vendors (30.74% - 31.61%).
Insights from Profit Margin Analysis
- It is suggested that lower performing vendors may maintain higher profit margins due to premium pricing or lower operational costs.
- Recommendations for improving profitability among high-performing vendors include selective price adjustments, cost optimization, and bundling strategies.
Hypothesis Testing on Profit Margin Differences
- A hypothesis test is conducted to assess whether there is a significant difference in profit margins between top and low performing vendors.
- The null hypothesis states no significant difference exists; however, results indicate a statistically significant difference in profit margins.
Conclusion and Future Steps
- The analysis concludes with an emphasis on utilizing various statistical methods including hypothesis testing and confidence intervals to address business problems effectively.
Creating a Power BI Dashboard
Overview of Dashboard Components
- An introduction to creating a Power BI dashboard based on previous analyses includes visualizations like total sales, purchases, gross profits, etc.
Customization of Visual Elements
- Discussion about customizing dashboard themes (blue and green colors), enhancing its professional appearance while presenting key metrics related to vendor performance.
Data Analysis and Visualization Techniques
Data Filtering and Preparation
- The initial data filtering process involves ensuring that total sales quantity, gross profit, and profit margin are all greater than zero. This results in a cleaned dataset of 8000 records from an original 10,000.
- Three specific filters were applied: total sales quantity, gross profit, and profit margin must each exceed zero to refine the dataset for analysis.
Visualizations in Power BI
- The first visualization created is a card visualization displaying total sales over the year. Customizations include adding dollar signs to totals and adjusting color schemes for clarity.
- Additional metrics such as total purchase dollars and average profit margin are included in subsequent visualizations. A new column for unsold capital was created by multiplying total purchase quantity by purchase price.
Creating New Tables
- A new table called "Purchase Contribution" summarizes vendor sales data without altering the original vendor sales summary. It includes columns for vendor name and total purchase dollars.
- The calculation for purchase contribution is derived from dividing total purchase dollars by the sum of contributions, emphasizing the importance of prior Python analysis before dashboard creation.
Insights on Vendor Performance
- A donut chart visualizes top vendors based on their contributions to purchases. This chart helps identify key players within the vendor landscape.
- Two additional charts highlight top vendors by sales and top brands by sales using similar methodologies as previous visualizations, reinforcing consistency across analyses.
Identifying Low Performing Vendors
- A table was created to analyze low-performing vendors based on stock turnover rates below one, which serves as a critical metric in assessing vendor efficiency.
- The filtered data variable captures this low turnover condition from the vendor sales summary, allowing for targeted insights into underperforming areas within the business model.
Vendor Sales Analysis and Dashboard Creation
Overview of Vendor Sales Summary
- The vendor sales summary includes the vendor name and average stock turnover, calculated by grouping records and taking the mean of stock turnover.
- For dashboard visualization, only the top five vendors with the lowest turnover are displayed using a funnel chart, categorizing by vendor name and average stock turnover.
Low Performing Vendors Identification
- A new table was created to summarize low-performing vendors based on total sales and profit margins.
- The analysis involved summing total sales in dollars and calculating average profit margin while applying group-by functions in Python.
Target Brand Identification Criteria
- Brands were identified as targets if their total sales were below 15% of a threshold value while having a profit margin exceeding 85%.
- An if condition was implemented to classify brands accordingly, leading to the creation of a scatter plot for visual representation.
Dashboard Visualization Techniques
- Filters applied included setting total sales to less than $1,000 for better visualization without additional filters affecting data interpretation.
- Screenshots of summary statistics tables and histogram plots were shared alongside insights regarding outliers detected through high standard deviation.
Insights from Research Questions
- The report highlights 198 brands with lower sales but higher profit margins as potential candidates for promotion or pricing adjustments.
- Insights on top vendors by sales contribution were presented under headings like "Impact of Bulk Purchasing," emphasizing strategic vendor management.
Recommendations for Business Strategy
- Suggestions include reevaluating pricing strategies for low-selling high-margin brands to boost volume without sacrificing profitability.
- Additional recommendations focus on diversifying vendor partnerships, optimizing inventory management, enhancing marketing strategies for low performers, and achieving overall operational efficiency.
Conclusion on Project Outcomes
- A comprehensive report was created detailing analysis results along with suggestions for improving business operations through effective data-driven decisions.