Modelamiento de Datos en POWER BI. DESCUBRE TODOS LOS SECRETOS DEL MODELO ESTRELLA Y SUS VARIANTES.
Understanding Data Modeling in Power BI
Introduction to Data Modeling
- Julio introduces the topic of data modeling, emphasizing its importance for effective reporting in Power BI.
- A good report relies on a well-constructed data model; issues with indicators often stem from model construction rather than DAX formula errors.
Importance of Separate Tables
- While single-table models may suffice for basic reports, analytical capabilities improve significantly with separate tables.
- The video discusses the need for balance in presenting information through separate tables without excessive fragmentation.
Dimensional Modeling and Star Schema
- The star schema is introduced as a standard approach for dimensional modeling, which will underpin future DAX formulas learned in the course.
- Key topics include disadvantages of single-table models, essential elements of dimensional models, and complexities involving multiple fact tables.
Advanced Topics and Recommendations
- Advanced concepts such as snowflake models and relationship properties (cardinality and cross-filter direction) will also be covered.
- Julio recommends watching a previous video on DAX tips before proceeding due to overlapping content.
Challenges with Single Table Models
Example Scenario
- An example is presented where all necessary fields are contained within a single table, illustrating potential issues with this approach.
Indicator Calculation Issues
- A request arises to calculate average annual income per customer by category for promotional eligibility.
- Using the
AVERAGEfunction returns incorrect results because it averages transaction values instead of customer-specific averages.
Understanding Average Calculations
- The calculation error is highlighted: it reflects average income per transaction rather than per customer.
Correcting the Calculation Methodology
Understanding Data Models and Their Challenges
The Importance of Accurate Averages
- The correct average value in a data model can yield different results when placed in a matrix, highlighting the importance of accurate indicators. For instance, an indicator greater than 50,000 for the "montañera" category suggests that previous calculations may be flawed.
Granularity Issues in Single Table Models
- Using a single table model complicates age formulas due to incorrect granularity assignment. A master table containing all clients would simplify this process significantly.
Redundancy and Update Challenges
- In datasets with multiple clients and products, redundancy occurs as client types repeat. Changing a client's category requires numerous updates across repeated entries, leading to inefficiencies.
- Manual updates pose risks of errors, especially when data is stored in flat files or Excel sheets. This can lead to integrity issues where one client might appear under multiple categories if not corrected properly.
Solutions Through Master Tables
- To address these problems, creating master tables for entities within business processes (e.g., doctors, patients in medical contexts; clients and products in commercial contexts) is essential.
- By isolating records related to specific entities like customers and removing duplicates, we can create a comprehensive customer master table that simplifies data retrieval.
Transitioning to Data Models
- With a customer master table established, we can utilize functions (like VLOOKUP in Excel) to retrieve information efficiently without needing extensive columns.
- Similarly, product-related records should be isolated to generate effective formulas for information retrieval.
Key Components of Data Models
Structure of Data Models
- A basic data model consists of two types of tables: master tables (e.g., customer and product masters) and intermediary tables known as dimension tables in Power BI.
Characteristics of Dimension Tables
- Dimension tables must contain at least one unique identifier column (primary key), which should not repeat across rows. For example:
- Product dimension uses the product column as its primary key.
- Customer dimension typically uses the customer name as its primary key.
Understanding Dimensions through Business Questions
- Identifying dimensions involves asking critical questions about the data:
- Customer Dimension: Answers "Who do I sell to?"
- Product Dimension: Answers "What do I sell?"
The Essential Time Dimension
- The time or calendar dimension is crucial for any data model as it answers "When?" It is recommended to always include this dimension since many DAX functions rely on it for proper functionality.
Understanding Dimension and Fact Tables in Power BI
Characteristics of Dimension Tables
- Dimension tables contain various attributes such as month, day, year, and week that can be used for filtering or categorizing data in Power BI reports.
- These tables serve as the basis for all filters applied to reports, ensuring that any information needed for analysis comes from dimension tables.
Overview of Fact Tables
- The central table in a star schema is known as the fact table (or "fact" table), represented with the prefix "fct."
- Each row in a fact table represents an event or transaction, such as a product purchase made by a customer at a specific time.
- Fact tables include two types of columns: foreign keys (which link to dimension tables) and metrics (numerical data to be analyzed).
- They typically have the highest level of granularity, representing combinations of multiple entities (e.g., product purchases by customers at stores).
Star Schema Model
- The star schema model resembles a star diagram where relationships exist only between dimensions and the fact table.
- Normalization involves separating data into distinct tables; primary keys from dimension tables connect with foreign keys in the fact table.
Creating Basic Star Models
- A basic star model is created by linking primary keys from dimension tables to corresponding foreign keys in the fact table.
- It's possible to incorporate additional fact tables into a star model; however, they must share some dimensions while maintaining unique ones.
Handling Multiple Fact Tables
- When adding another fact table (e.g., stock information), identify shared dimensions but recognize that not all dimensions may overlap.
- Treat models with different facts sharing some dimensions as separate models within the same environment while adhering to star schema rules.
Combining Similar Data Sources
- Be cautious when dealing with seemingly multiple fact tables; if they share identical dimensions, combine them into one unified table.
- For example, sales data across different years should be merged rather than treated separately if they represent similar events.
Snowflake Schema Variant
Understanding Data Modeling in Power BI
Normalization and Dimensional Relationships
- The speaker discusses the concept of normalization within the customer dimension, suggesting the creation of an independent table to manage location combinations (districts and departments) effectively.
- A distinction is made between star and snowflake models; a snowflake model arises when dimensions relate indirectly through another dimension rather than directly to a fact table.
- The validity of using a snowflake model is acknowledged, although it may lead to more tables than necessary. It’s noted that this approach does not significantly save space in Power BI.
- The recommendation is to maintain a standard star model unless subdivision is essential, emphasizing simplicity in data modeling.
Cardinality in Relationships
- Cardinality refers to the relationship type between tables, illustrated with examples involving one-to-many relationships where each record in one table corresponds to multiple records in another.
- An example highlights how for each product record, there can be multiple associated records in the fact table, establishing a one-to-many relationship crucial for effective reporting.
- Two additional types of relationships are introduced: one-to-one and many-to-many. One-to-one relationships suggest combining tables as they represent a single entity for reporting purposes.
Handling Many-to-Many Relationships
- In cases of many-to-many relationships (e.g., between redemption and call logs), creating an intermediary table is recommended. This helps establish clear connections among multiple entities involved.
- The speaker emphasizes that defining an intermediary dimension allows for maintaining a star schema while managing complex relationships effectively across different fact tables.
Managing Ambiguities in Relationships
- A critical point about primary keys relates to their inability to connect with multiple foreign keys from the same table without causing ambiguity.
Understanding Relationships in Power BI
Defining Active and Inactive Relationships
- The speaker explains how to activate either an active or inactive relationship based on the required calculation in Power BI. An active relationship exists between "fecha" (date) and "fecha orden" (order date), while an inactive relationship is defined between "fecha" and "fecha Delivery" (delivery date).
Calculating Units Sold vs. Units Shipped
- A measure called "unidades vendidas" (units sold) sums elements from the sales date column, utilizing the active relationship. To calculate units shipped instead, a DAX formula is employed to activate the inactive relationship.
Using DAX for Relationship Activation
- The speaker introduces DAX functions to modify existing measures. By using a modifier called
USERelationship, they can switch from calculating units sold to units shipped by activating the previously inactive relationship.
Cross Filter Direction in Relationships
- The concept of cross filter direction is discussed, where a matrix report shows months as rows sourced from a calendar dimension. The default filter direction flows from dimensions to fact tables, allowing monthly unit sales calculations.
Handling Unique City Sales Indicators
- When creating an indicator for distinct cities sold per month, issues arise due to filtering direction. Since city data resides in the dimension table rather than the fact table, it does not flow correctly through filters unless adjustments are made.
Adjusting Filter Directions for Accurate Data
- To resolve filtering issues with city sales indicators, one option is to change filter directions from single-directional to bidirectional. This allows filters to flow back up towards dimension tables for accurate monthly city counts.
Risks of Bidirectional Filtering
- While enabling bidirectional filtering may seem beneficial, it can lead to ambiguity within models containing multiple fact tables. Multiple paths for filtering could yield unexpected results and complicate maintenance of indicators.
Best Practices for Filter Direction Management
- It’s recommended that filter directions remain unidirectional by default to maintain clarity in relationships between tables. Specific calculations requiring different relationships can be handled using DAX modifiers like
CALCULATEandCROSSFILTER.
This structured approach provides insights into managing relationships within Power BI effectively while highlighting key concepts related to DAX usage and data modeling best practices.
Advanced Variants of the Star Schema Model
Overview of Advanced Variants
- The speaker discusses advanced variants of the star schema model, emphasizing that while the standard star schema is often sufficient, there are specific cases where variations are necessary.
Key Variants Explained
- Header-Detail Model: This variant involves two fact tables with a one-to-many relationship. Each case warrants its own detailed explanation.
- Time Interval Models: These models deal with data over time intervals rather than daily events, complicating calendar dimension treatment.
- Snapshot Tables: Snapshot tables capture static information at a point in time (e.g., financial balances), requiring special handling due to their non-additive nature.
- Changing Dimensions Models: In these models, dimensions do not remain fixed; attributes can change over time (e.g., customer details), necessitating specific modeling adjustments.
- Parent-Child Hierarchical Models: These hierarchical models allow for group-level indicators based on individual relationships within a hierarchy, focusing on collective metrics rather than individual ones.
Recommended Reading
- The speaker recommends two books for further understanding:
- "Análisis" which covers relevant topics discussed.