How to Aggregate Large Datasets in Power BI (with Tristan Malherbe)

How to Aggregate Large Datasets in Power BI (with Tristan Malherbe)

Introduction

The speakers introduce themselves and discuss their experiences presenting virtually and in-person.

Introductions

  • Tristan and Reed introduce themselves as Microsoft MVPs on the data platform.
  • They discuss their experiences presenting virtually and in-person, including a recent virtual presentation they both gave at the Dynamics Community Summit.

Presenting Virtually vs In-Person

The speakers discuss the differences between presenting virtually and in-person.

Differences Between Virtual and In-Person Presentations

  • Both speakers prefer presenting in-person but have had to adapt to virtual presentations due to COVID-19.
  • Reed discusses how he misses being able to read the room during virtual presentations since most people turn off their cameras.
  • Tristan shares his experience of doing some trainings with his company but has not yet done any public presentations since COVID-19.

Background Information

Tristan provides background information about himself, his consulting company, and the French Power BI user group he founded.

Background Information About Tristan

  • Tristan is based in France and runs a consulting and training company called Data Ports.
  • His company provides consulting services and training for Power BI, Azure, Power Apps, Automate, etc.
  • He founded Club Power BI which is the French Power BI user group that has set up groups in many different cities.
  • The group has an active forum, YouTube channel, and recently reached 3,000 members on its meetup page making it one of the biggest user groups in the world.

Overall Summary

Tristan introduces himself as a Microsoft MVP on the data platform based in France who runs a consulting/training company called Data Ports. He also founded Club Power BI which is one of the biggest user groups worldwide with over 3k members. Both speakers discuss their experiences presenting virtually and in-person. They also discuss the differences between virtual and in-person presentations.

Introduction

In this section, the speaker introduces himself and the topic of Power BI aggregations. He also mentions that he will be using a practical example to illustrate the power of aggregations.

  • The speaker is new to working with large data sets and irrigation.
  • Aggregations can be useful for handling larger data sets in the hundreds of millions or billions of rows.
  • User-defined aggregations were released by Microsoft three years ago and are available in both Power BI Pro and Premium.

What are Power BI Aggregations?

This section covers what Power BI aggregations are, how they work, and their types.

  • There are two types of Power BI aggregations: user-defined and automatic.
  • User-defined aggregations must be defined manually by the user.
  • Automatic aggregations are a premium-only feature released by Microsoft recently.
  • Aggregations leverage composite models which allow combining tables stored in different ways within the same data model.
  • Composite models allow combining tables stored in import mode (imported into cache memory) with those stored in direct query mode (not imported into Power BI).

How to Build a Data Model with Aggregations

This section covers how to build a data model with user-defined aggregations using an example dataset.

  • The speaker uses the New York taxi dataset as an example to demonstrate how to build a data model with user-defined aggregations.
  • The demo shows how to create a composite model by combining tables from different sources (import mode and direct query mode).
  • The demo shows how to define user-defined aggregations on top of this composite model.

Conclusion

This section summarizes the pros and cons of using user-defined aggregations in Power BI.

  • User-defined aggregations can significantly improve performance and optimization for large datasets.
  • They are available in both Power BI Pro and Premium.
  • Automatic aggregations are a premium-only feature that is not covered in this video.
  • Composite models allow combining tables stored in different ways within the same data model.

Power BI Aggregations

This section introduces Power BI aggregations and how they work in Power BI.

What are Power BI Aggregations?

  • Power BI aggregations allow users to analyze large datasets without the need for a premium license.
  • Aggregated tables can be stored in import mode, while detailed tables can be stored in direct query mode.
  • Dimensions tables adjust automatically depending on the grain of your query and what you want to achieve in your report.

Demo: Example with New York City Taxi Trips

  • The demo uses data from monthly CSV files available online for free.
  • Data is loaded into an Azure SQL database to support direct query and enable the use of Power BI aggregations.
  • Without using Power BI aggregations, importing all 325 million rows of data would be necessary.

Options for Analyzing Large Datasets

This section discusses options for analyzing large datasets without using Power BI aggregations.

Import Mode

  • Importing all data into Power BI is an option but may not be feasible due to size limitations.
  • Large datasets may require more memory than available on a user's computer.

Direct Query Mode

  • Direct query mode allows users to connect directly to a data source without importing it into Power BI.
  • Limited functionality is available when using direct query mode compared to import mode.

Composite Model

  • A composite model combines both import and direct query modes within one report.
  • This allows users to take advantage of the benefits of both modes.

Power BI Aggregations in Action

This section demonstrates how to use Power BI aggregations in a report.

Creating an Aggregated Table

  • To create an aggregated table, select the detailed table and choose "New Aggregation" from the modeling tab.
  • Choose the columns to aggregate and set the aggregation type (e.g. sum, average).
  • Select a storage mode for the aggregated table (e.g. import, direct query).

Using an Aggregated Table

  • Aggregated tables can be used in visuals just like any other table.
  • When a user interacts with a visual, Power BI automatically determines whether to use the detailed or aggregated table based on the level of detail needed.

Performance Benefits

  • Using aggregated tables can significantly improve report performance by reducing query times.
  • Users can analyze large datasets without experiencing slow load times or crashes.

Building a Power BI Data Model with Big Data

In this section, the speaker discusses how he built a data model in Power BI using Power Query parameters and filtered data to get a subset of the data. He also talks about the importance of query folding when working with big amounts of data and setting up incremental refresh.

Importing Subset of Data into Power BI Desktop

  • Using Power Query parameters and filtering data to get a subset of the data in Power BI desktop.
  • Applied filter for incremental refresh which supports query folding.
  • Only worked with two days' worth of data (May 6th and May 7th, 2020) to build the data model, relationships, measures, etc.

Publishing Data Set into Power BI Service

  • Published the dataset into Power BI service in the cloud.
  • Adjusted values of parameters in the cloud to fully expand scope of analysis and get 325 million rows (5 years' worth of data).
  • Faced several problems due to such a large amount of data.

Importing All Data into Memory

  • Cannot import all the data into memory as it will quickly exceed one gigabyte per dataset size limit available in Power BI Pro.
  • Requires a Power BI Premium license because with 325 million rows there is a high chance to go above this limit.

Refreshing All Data

  • Full refresh took around 1 hour and 15 minutes from Azure SQL database to Power BI.
  • Setting up incremental refresh can help with quicker refreshes later on but refreshing history can still take long if importing whole dataset.

Power BI Import Mode vs Direct Query vs Aggregations

This section discusses the pros and cons of using Power BI Import Mode, Direct Query, and Aggregations.

Limitations of Power BI Import Mode

  • The data set size is limited by the amount of memory available in Power BI Pro.
  • Processing time can be long, even with incremental refresh.
  • Every query and visual will always create a query for all 325 million rows of data.

Limitations of Direct Query

  • Queries are sent back to the underlying SQL database, which can be slow.
  • Interactive reports are not possible due to slow query times.

Advantages of Power BI Aggregations

  • Combines the benefits of both Import Mode and Direct Query.
  • Allows for interactive reports while still being able to handle large data sets.

Introduction to Aggregated Tables

In this section, the speaker explains how he imported some tables into memory and kept others in Azure SQL Database. He also discusses how he used composite models and directory mode to adjust the grain of his analysis.

Importing Tables into Memory

  • The speaker imported some tables like ag1, ag2, and ag3 into memory.
  • These tables are aggregated versions of the detailed table that remains in Azure SQL Database.
  • The choice to import these tables was made so that composite models and directory mode could be used.

Adjusting Grain with Composite Models

  • The speaker's dimensions include facts that are either directory or import.
  • Depending on the grain of his query, his dimension will automatically adjust by querying either an import or aggregated table.
  • For example, his calendar table is dual-linked to ag1, ag2, and ag3 depending on the grain of his data.

Preparing Aggregated Tables

  • The speaker performed exploratory analysis using DAX or SQL to determine which analytical axes were worth aggregating.
  • By importing only five analytical axes from a table containing 325 million rows, he reduced it to 100 thousand rows.
  • He added additional analytical axes for pickup location and drop-off location to further reduce the size of the table.

Choosing Aggregation Levels

  • Several levels of aggregation can be defined in one table depending on use cases and data.
  • Finding a good balance between storing too few or too many aggregated tables is important.

Preparing Data for Power BI Importation

In this section, the speaker explains several options for preparing data for Power BI importation.

Using Views in SQL

  • One option is building a SQL view using group by and join statements.

Using Power Query

  • Another option is using Power Query, which has a feature that allows for the creation of aggregated tables.

Conclusion

  • The speaker emphasizes the importance of finding a good balance between storing too few or too many aggregated tables.
  • He also stresses the importance of performing exploratory analysis to determine which analytical axes are worth aggregating.

Using Group By Feature in Mercury

In this section, the speaker discusses three options for using the group by feature in Mercury. The third option is pre-computing and storing materialized aggregated tables in a data warehouse.

Three Options for Using Group By Feature

  • Option 1: Use SQL view
  • Option 2: Use Power Query
  • Option 3: Pre-compute and store materialized aggregated tables in a data warehouse (Azure SQL Database)

Pre-Computing Aggregated Tables with Stored Procedures

  • Build stored procedures to select analytical axes from raw data
  • Materialize table in Azure SQL Data Warehouse
  • Define three stored procedures to create additional tables in the data warehouse
  • Discusses advantages and disadvantages of using stored procedures versus SQL views

Comparing Stored Procedures and SQL Views

This section compares the use of stored procedures versus SQL views.

Advantages of Stored Procedures

  • Data is pre-computed and pre-aggregated, leading to faster refresh times on Power BI side
  • Better performance when dealing with large datasets

Advantages of SQL Views

  • Accessed in real-time, making it easier to set up
  • Less resource-intensive than stored procedures

Conclusion

The speakers summarize their discussion on using group by feature in Mercury and comparing stored procedures versus SQL views.

Key Takeaways

  • Three options for using group by feature: SQL view, Power Query, and pre-computing aggregated tables with stored procedures
  • Stored procedures are better for large datasets and faster refresh times on Power BI side, while SQL views are easier to set up and less resource-intensive.

Power BI Aggregations

In this section, the speaker discusses the importance of managing Power BI aggregations and how to link tables together.

Importance of Managing Power BI Aggregations

  • The choice between pre-computing data or letting Power BI do it on the fly depends on factors such as what you want to achieve, number of users, database capacity, and processing time.
  • It is difficult to make a choice that preserves all three sides of the triangle (front-end, back-end, ergonomy) when dealing with large amounts of data. A compromise must be found.

Linking Tables Together

  • All fact tables (import mode and directory mode) follow a clear and logical star schema with dimensions linked to them.
  • Aggregated tables are in import mode and will impact dataset size and refresh. Detailed data can only be accessed through directory table which is in directory mode.
  • Dimension tables are installed in dual mode so they can query aggregated tables in import mode or send a SQL query to the SQL database depending on analysis grade.

Setting Up Power BI Aggregations

  • To set up Power BI aggregations, link all tables together using star schema logic.
  • Use filtered data for import mode only datasets for larger tables in memory and set up incremental refresh for more efficient work in Power BI desktop file.
  • Manage aggregations by doing two additional things after linking tables together:
  • Help engine choose between each table.
  • Make sure that detailed data can be accessed through directory table while aggregated data is used from import mode tables.

Building Reports with Aggregations

In this section, the speaker explains how to build reports on top of a data set using aggregations. The speaker demonstrates how to hide tables from users and map columns between tables.

Mapping Columns and Adjusting Precedence

  • To create an aggregated table, map columns from the aggregated table to the columns of the detail table.
  • Adjust precedence levels to tell the engine which table to query first when analyzing data.
  • Use a user-friendly interface that does not require coding.

Benefits of Aggregations

  • Once aggregations are defined, DAX measures will adjust and choose the right path depending on the grain of analysis.
  • No complex DAX or IF statements are needed for DAX measures in your model.

Monitoring Aggregation Performance

  • A demo is shown at the end of the video that demonstrates how to monitor and know which aggregation table is hit or not.

Power BI Aggregations

In this section, the speaker discusses the advantages of using aggregations in Power BI.

Advantages of Using Aggregations

  • Aggregations allow for simpler measures and queries without complex DAX statements.
  • Using aggregations reduces the size of the data set, making it much lighter and easier to manage.
  • The processing time for refreshes is much quicker when using aggregations because only aggregated levels are imported.
  • Reports and queries run faster when using aggregations.

Data Set Size Reduction

  • By using aggregations, the speaker was able to reduce the size of their data set from 6GB to less than 1GB.
  • This reduction in size allows for running a Power BI data set in Power BI Pro instead of requiring a premium feature.

Processing Time Improvement

  • Refresh times were reduced from one hour and 50 minutes to nine minutes by using aggregations.
  • Incremental refreshes take less than one minute with aggregated levels.

Faster Reports and Queries

  • A report connected to an aggregated Power BI model containing 325 million rows runs very quickly, even when analyzing large amounts of data.

Monitoring Performance with DAX Studio

In this section, the speaker demonstrates how to monitor the performance of DAX queries and determine which aggregated table is being hit.

Precedence in Calculation Groups

  • The speaker explains that Microsoft's official documentation recommends using a multiplier of ten for precedence values.
  • However, any number can be used as long as it is higher or lower than other values to indicate order of execution.

Using DAX Studio for Performance Troubleshooting

  • The speaker connects to their premium workspace using DAX Studio and runs a query on their aggregated data set.
  • They use server timings to monitor which aggregate table is being hit by the query.
  • This allows users to compare the performance of DAX queries between import and aggregated data sets.

Introduction to Aggregations in Power BI

In this section, the presenter introduces the concept of aggregations in Power BI and explains how they can be used to optimize data sets for faster performance.

Adding a New Column to an Aggregation Table

  • The presenter demonstrates how to add a new column, such as pickup location, to an aggregation table that is not present in the original table.
  • After adding the new column, the query should hit both the aggregation table and the original table.

Running DAX Queries on Aggregated Data Sets

  • The presenter runs a DAX query on an aggregated data set and shows that it takes around 2.6 seconds to execute.
  • After running the same query multiple times, it consistently takes around 2.3 - 2.5 seconds.
  • The presenter then runs the same query on an import data set (without using aggregations), which takes twice as long as running it on the aggregated data set.
  • By using aggregations, you can have lighter data sets with faster refreshes and better front-end performance in DAX.

Conclusion

  • Aggregations are a powerful feature that allow for optimized data sets and improved performance in Power BI. They can be used in Power BI Pro as long as you have a data source that supports them.

Introduction

In this section, the speaker introduces himself and his topic. He also mentions that the model he will be discussing is less than one gigabyte in size.

Speaker Introduction

  • The speaker introduces himself and his topic.
  • He mentions that the model he will be discussing is less than one gigabyte in size.

Q&A Session

In this section, the speaker answers questions from the audience.

Answering Questions

  • The speaker invites the audience to ask any remaining questions they may have.
  • He mentions that most of the important questions were answered during the presentation itself.
  • One question from Donald Parsh was about whether or not Phil C Mark's aggregations were used. The speaker recommends checking out Phil C Mark's blog for useful articles on aggregations.

Importance of Large Data Sets

In this section, the speaker discusses why it is important to use techniques like aggregations when working with large data sets.

Importance of Large Data Sets

  • The speaker emphasizes that using techniques like aggregations becomes necessary when working with larger data sets.
  • He compares it to a specialty knife in a tool set - you don't need it often, but when you do, it's essential.
  • The audience is encouraged to share the video on social media if they found it helpful.
Video description

In this session, Tristan Malherbe (Microsoft Data Platform MVP) will show you how you can leverage Power BI aggregations to analyze big volumes of data in Power BI. Tristan will illustrate the power of Aggregations with the famous New York City Taxi dataset. GUEST BIO 👤 Tristan Malherbe is the Founder of Data Pulse and a Microsoft Data Platform MVP since 2017. He is also the co-founder and current co-leader of the French Power BI User Group in France (Club Power BI). His favorite topics are: advanced data modelling, DAX, Data Visualization & performance tuning. RELATED CONTENT 🔗 Tristan's LinkedIn -- https://www.linkedin.com/in/tristanmalherbe/ Tristan's Twitter -- https://twitter.com/Datatouille Tristan's YouTube -- https://www.youtube.com/c/TristanMalherbe LET'S CONNECT! 🧑🏽‍🤝‍🧑🏽 🌟 -- https://twitter.com/HavensBI -- https://www.linkedin.com/in/reidhavens -- https://www.youtube.com/c/HavensConsulting HAVENS CONSULTING PAGES 📄 Home Page - http://www.havensconsulting.net Blog - http://www.havensconsulting.net/blog-and-media Blog Files - http://www.havensconsulting.net/blog-files Files & Templates - http://www.havensconsulting.net/files-and-templates Consulting Services - http://www.havensconsulting.net/consulting-services Contact & Support - http://www.havensconsulting.net/contact-and-support EMAIL US AT 📧 info@havensconsulting.net #PowerBI #powerplatform #microsoft #businessintelligence #datascience #data #dataanalytics #excel #powerapps #datavisualization #dashboard #bi #analytics #dax #pagination #paginated #aggregations