CONSTRUYAMOS UN MODELO DIMENSIONAL EN POWER BI. INCLUYE ENLACES DE DESCARGA.

CONSTRUYAMOS UN MODELO DIMENSIONAL EN POWER BI. INCLUYE ENLACES DE DESCARGA.

Introduction to Power BI Data Modeling

Overview of the Video

  • Julio welcomes viewers and introduces the focus of the video: practical application of data modeling in Power BI.
  • The session will utilize Power Query functionalities, with a promise to cover DAX topics in future videos.

Objectives and Learning Goals

  • Emphasis on understanding how to build a data model for effective information analysis.
  • Examples provided are generic; viewers are encouraged to adapt them to their specific situations.

Building a Dimensional Model

Identifying Dimensions and Primary Keys

  • The initial model consists of a single table; the goal is to convert it into a dimensional model by identifying dimensions such as time, store, product attributes, and customer.
  • For products, either product name or SKU can serve as primary keys; SKU is preferred due to lower duplication risk.

Handling Customer Dimension Challenges

  • No clear primary key exists for customers; alternatives will be explored using Power Query.

Steps in Power Query

Creating Tables from Original Data

  • Four duplicates will be created from the original table: one fact table and four dimension tables.
  • Each dimension (calendar, store, product, customer) will have its relevant columns retained while removing duplicates.

Defining Primary Keys for Dimensions

  • Calendar dimension uses 'date' as its primary key; store dimension uses 'store name'.
  • For customer dimension, email serves as the primary key after evaluation. A client code can also be generated if needed.

Finalizing Fact Table Structure

Preparing Fact Table

  • The fact table should only contain foreign keys and metrics necessary for analysis. Email may be replaced with transaction codes later.

Configuring Power BI Settings

Disabling Automatic Relationship Detection

How to Configure Data Sources in Power Query

Initial Setup and Configuration

  • The process begins with configuring the data source by updating the file path to point to the local file on your computer. After selecting the correct file, you need to accept and close the dialog before refreshing the preview.

Column Distribution and Duplicates

  • The interface includes a column distribution view that can be enabled from the "View" tab. This feature is essential for validating columns during data processing.
  • Four duplicates of the original query are created, each named according to their respective dimensions: fac ventas, dim calendario, dim tienda, dim producto, and dim cliente.

Dimension Table Adjustments

Time Dimension (dim calendario)

  • Focus on retaining only time-related columns by selecting them and removing others. This involves using right-click options for column management.
  • To eliminate duplicate entries, select all relevant rows and use the "Remove Duplicates" function.

Primary Key Validation

  • A primary key must have an equal number of distinct values as unique values. An example illustrates this concept with a series of numbers demonstrating how to identify valid primary keys.

Store Dimension (dim tienda)

  • Similar steps are taken for the store dimension where relevant columns (store name, zone, region, country) are selected while ensuring no duplicates exist.

Product Dimension (dim producto)

  • For product dimensions, select necessary columns from product name through category. Again, ensure there are no duplicates before determining a primary key.

Customer Dimension (dim cliente)

  • The customer dimension follows suit by selecting all relevant fields (name, surname, date). The email field is identified as a potential primary key due to its uniqueness.

Handling Missing Keys

  • If no suitable primary key exists (like an email), consider creating a composite key by concatenating multiple fields such as name and surname.

Composite Key Creation

  • Use Power Query's "Combine Columns" feature to create a new column that serves as a composite key if needed; however, in this case, since an email exists, it will not be utilized.

Power BI Data Modeling Techniques

Creating an Index Column

  • To create a unique identifier for clients, you can add an index column in Power BI. This is done by navigating to the transformation menu and selecting "Add Column" followed by "Index Column" starting from one.

Renaming and Organizing Columns

  • After creating the index column, it should be renamed to "Client ID." It’s recommended to move this column to the beginning of your dataset for better organization.

Filtering Fact Table Columns

  • In the fact table, unnecessary columns should be removed. Retain only foreign keys, metrics, and transaction codes. The focus should be on essential data that contributes to analysis.

Replacing Email with Client Code

  • Replace email addresses in the dataset with client codes using the "Merge Queries" feature. This involves matching email columns from both tables (fact table and client dimension).

Finalizing Data Model Structure

  • Once all transformations are complete, load the information into your model. A well-organized data model typically has dimensions at the top and fact tables below, resembling an inverted triangle structure.

Handling Multiple Tables in Power BI

Analyzing Sales and Returns Tables

  • When working with sales and returns tables that share common columns (like SKU), avoid creating direct relationships as they may lead to many-to-many relationships which are not advisable.

Identifying Dimensions for Analysis

  • Each table must have clearly defined dimensions. The sales table includes four dimensions: time, store, product, and customer; while the returns table has three dimensions.

Creating Unified Dimensions

  • Since combining tables directly isn't feasible due to differing dimensions, create unified dimensions that relate to both tables simultaneously.

Deriving Store Dimension from Both Tables

  • Start by deriving a store dimension from both sales and returns tables while removing duplicates. This ensures all stores are accounted for regardless of whether they have associated returns or not.

Optimizing Product Dimension Creation

Data Modeling in Power BI

Creating Dimensions from Sales Data

  • The customer dimension is derived solely from the sales table, as it only exists at the sales level and not for returns.
  • It is recommended to create a separate calendar dimension table rather than deriving it from other tables. A pre-created time table will be used for this purpose.
  • Relationships are established where the store relates to both tables, the product also connects with both, while the customer links only to the sales table and the calendar connects with both tables.

Building a Star Schema Model

  • Transitioning into Power BI's Query Editor to convert data into a star schema model begins with creating a copy of the returns table named "store returns."
  • Selecting relevant columns related to stores (store name, zone, department) ensures that only necessary data is retained in "store returns."
  • The "sales" table is duplicated and renamed "store sales," retaining only essential columns like store name.

Merging and Cleaning Data

  • Both queries ("store returns" and "store sales") are appended together to form a unified dataset.
  • Duplicate rows are removed to ensure unique store entries; this allows future additions of new stores without affecting existing records.

Finalizing Store Dimension

  • The process continues by duplicating the sales table again for creating a product dimension based on its attributes (SKU, color, category).
  • A similar approach is taken for creating a customer dimension by selecting relevant fields (customer name, date of birth, marital status).

Importing Calendar Dimension

  • The calendar dimension is imported from an Excel file containing all necessary dates; this avoids issues associated with incomplete date ranges in direct tables.
  • This dedicated calendar file ensures that all years and dates are accounted for properly in analyses.

Structuring Fact Tables

  • Adjustments are made to original fact tables by selecting foreign keys and metrics needed for analysis (e.g., units sold, amounts).

Modeling Data in Power BI: Star vs Snowflake

Building the Model

  • The speaker discusses loading a model with multiple fact tables (two in this case) and dimension tables, emphasizing the importance of organization for ease of access.
  • Relationships are established between dimensions and fact tables; for instance, the calendar dimension links to both sales and returns via date fields, while the customer dimension connects solely to sales.

Understanding Star Schema

  • The star schema is highlighted as advantageous because it allows for comprehensive reporting. For example, one can analyze sales units by store alongside return units simultaneously.
  • A snowflake model is introduced, characterized by dimensions relating to other dimensions through intermediate relationships. This complexity can be simplified into a star schema by denormalizing data.

Transitioning from Snowflake to Star Schema

  • To convert a snowflake model into a star schema, one must incorporate sub-dimension information directly into their respective main dimensions.
  • The speaker warns against common beginner mistakes such as trying to link city data from different tables which could lead to ambiguity in reports.

Addressing Ambiguity Issues

  • Ambiguity arises when attempting to create reports that require filtering on multiple paths (e.g., country based on customer or store). This can result in inactive relationships within Power BI.
  • The importance of maintaining clear relationships is emphasized; if two paths exist for generating indicators, it complicates data interpretation.

Normalization Process

  • The speaker suggests normalizing data directly rather than creating ambiguous relationships. This involves transforming data within Power BI's query editor.
  • An example is provided where customer data is combined with city information using queries to enhance the dataset without retaining unnecessary dimensions.

Finalizing the Model

  • After combining necessary columns from various dimensions (like continent and state), redundant tables can be hidden or disabled from loading.
Video description

En el video anterior vimos los temas teóricos mas importantes de modelamiento. Para afianzar los conocimiento construyamos desde cero un modelo de datos usando las funcionalidades de Power Query y la vista de modelamiento de Power BI Desktop. * Si tienes algun problema o consulta relacionada a Power BI, ya puedes acceder a consultas personalizadas en nuestro sitio web :) https://menquisbi.com/s/asesoria-personalizada * Ya está disponible nuestro primer curso de SQL :), puedes adquirirlo aquí: https://menquisbi.com/curso/sql-server-para-analisis-de-datos Conviértete en miembro del canal: https://www.youtube.com/channel/UCu25jQpvCNUgYV-Tl0iBnQg/join Enlace de descarga de archivos: https://drive.google.com/drive/folders/17-GUjQcd43StifFDeBjw5tvaG9CS-_tL?usp=drive_link 00:00 Introducción 01:51 Modelo dimensional a partir de 1 sola tabla. 16:27 Modelo dimensional con varias tablas de hechos. 27:27 Modelo Estrella a partir de un modelo Copo de nieve. Suscribete y dale like para mas contenido :)