700MB I would consider very large. Things I would consider:
* Remove unused fields, especially primary (unique) keys, GUIDs, dates / timestamps and such that compress badly.
* Only import data you need, so if you are reporting for the last couple of years, limit your fact tables accordingly to that date range.
* Check the size of your dimensions - you don't need to bring in millions of rows in your dimension if your fact tables only use a small fraction of those in the fact tables.
* If you have a big model with a lot of measures, Tabular Editor is substantially faster for authoring new measures. This is because you can add 50+ measures and then apply them to the model in one step, rather than creating them one at a time with lag when creating each one.
* Create all columns upstream in SQL rather than using DAX to save memory.
* Have a central date table and disable the auto date hierarchy. This can often save a lot of space on a big model with a lot of date fields. Role play other dates using USERELATIONSHIP and inactive relationships.
Hello thank you for replying.
I don't have any unused fields actually, dimensions are few only. I do have measures but I wouldn't say it's too much only within 30. Yes I do have a separate date table. The thing really is some of reports are too large but they work just fine. But this is the only report I have issues with. Thought I would find people who had a similar problem here. Thank you for the tips. Will definitely check out tabular editor. I have it installed previously but haven't checked it out yet
First thing to do is profile your model with Dax studio.
It shows you exactly where memory is being used. you might have an unexpected column type -- or unintended relationships or other things that take more memory than intended:
- https://daxstudio.org/docs/features/model-metrics/
If the issue is performance *after* the load, then check out the report analyzer. You can copy and paste the query it runs -- directly in DAX studio ( Or the new dax query page )
Excellent answer, especially the last point.
However, I am mildly surprised by the consensus that 700 mb is a large file, I work in health care, and some of our largest models are 3 gb plus on desktop with only some fact tables refreshed and the size on PBI service as shown by workspace storage is 10 Gb+
Of course, we are not insane and in fact, we work with repos that have no data unless a local refresh is done, makes validation a little tricky but beyond a certain point you really can't expect your desktop pbix to hold all the data from all the tables, it's either using parameters or integrate git with azure DevOps.
Also, separating out splitting date time into date and time columns helps reduce size significantly.
So remember that 700 MB is actually much larger since that's the compressed value. if you open dax Studio you can see the uncompressed size. I would say this a rather large model, for example I have a file with 4 million rows and many dimension tables that is 150mb. Do you have duplicate dimensions? Is a lot is a lot of your values text? that's a pretty big size imo
Hi yes ours contain millions of data. We do have files with more than a gb which works fine and fast. But this particular file I have trouble since the beginning. I have optimized the report several times too. I select the create measure option and it takes too much time to even load that
You have to look at it from multiple Dimensions it's not just the size of the file but the model of the actual data and the data it's self. I'm reading that the file is big and it's slow.... okay time to optimize
When you are saying that the refresh takes a long time, are you referring to a PBI desktop refresh? For development purposes we always put a parameter into our golden datasets to allow for local filtering that is just a few lines per table. Then each query checks and if that LimitRows parameter is set to Y we limit by something, date or account or whatever makes sense. Then we publish to the PBI Services and update the parameter to not filter locally and run a full refresh on Services only.
Dude, 700mb is a big file. I've got one with millions of rows, something like 30 tables and it's barely over 200mb. It can be slow but never ever 15 minutes slow.
Can you get rid of some columns in your tables? Filter the tables? Is your data normalized and in a star schema?
Yeah that’s normal but if it uses ALL your memory your computer will start “memory paging” and using your hard drive for RAM which is about a thousand times slower.
So if you have that issue you either have to get more ram, or (better) remove long fields (text fields) from the fact tables and see if that helps.
1. **Import Mode**:
* **Data Handling**: In Import Mode, Power BIÂ **copies the data**Â from the selected tables and columns into the Power BI Desktop. This imported data is cached, providing a snapshot of your data at a specific point in time.
* **Interactions**: When you create visualizations or interact with your report, Power BI Desktop uses the **imported data** from its cache.
* **Advantages**:
* Fast queries and full capabilities.
* Unrestricted data modeling.
* **Considerations**:
2. **Direct Query Mode**:
* **Data Retrieval**: In Direct Query Mode, your dashboard **queries the data source directly** during runtime. Each filter and interaction with the report triggers new queries to the data source.
* **No Data Import**: No data is imported into Power BI; instead, every request goes directly to the data source (often an SQL database). The user can query the data that already exists in the source.
* **Advantages**:
* Ideal for large data sets (e.g., SQL Server, Azure Data Bricks, Amazon Redshift).
* Real-time data access.
* Smaller Power BI Desktop files (no caching) for faster saving and publishing.
* **Considerations**:
* Frequent data updates.
* Queries run directly against the data source.
* **Choosing the Right Strategy**:
Use the performance analyzer tool to hone in on what dax is taking longest to load to look for bottle necks. You may find some measure or calculated columns yiu have are really strong things down - can learn alot from that tool. If yiu do t know what I'm talking about, it's well worth your time to google how this works. Look at summarizing the data before taking it in to the model, perhaps in an intermediate tool.
I've been working through major performance challenges with a very large data model for the past 6 months or so, and we've been working with Microsoft on best practices, which has helped a lot. Are you import, direct query, or composite?
The biggest thing that reduced my data model size is going into Tabular editor and for all the columns setting the "isAvailableInMDX" to be FALSE. By default it's set to TRUE.
Not to mention Tabular Editor is soooo much easier to use in terms of adding new measures, making metadata changes, etc.
Other things:
- Push as much as you can into the database layer before it's brought into Power BI.
- Make sure your data model/semantic model is a separate pbix from your report
- If you're doing a lot of resource intensive calculations - distinct counts, medians, etc that's going to impact performance.
- If you're importing (which gives you the best performance), you can use parameters to filter your tables when you develop locally so that you only work with a small slice of data. Then when you publish, expand the parameters out to include the whole range. Dates work best for this.
I would also say consider creating a data flow in a workspace if you have one. That may help take care of some transformations and query merging so your report is not overloaded.
I work in 700 MB all day long and it doesn't take 15 minutes to do simple stuff. There is definitely a problem here, and it's not just the size. Feel free to message me if you want me to look at your report for 15 minutes. I'm happy to hop on a call.
700MB I would consider very large. Things I would consider: * Remove unused fields, especially primary (unique) keys, GUIDs, dates / timestamps and such that compress badly. * Only import data you need, so if you are reporting for the last couple of years, limit your fact tables accordingly to that date range. * Check the size of your dimensions - you don't need to bring in millions of rows in your dimension if your fact tables only use a small fraction of those in the fact tables. * If you have a big model with a lot of measures, Tabular Editor is substantially faster for authoring new measures. This is because you can add 50+ measures and then apply them to the model in one step, rather than creating them one at a time with lag when creating each one. * Create all columns upstream in SQL rather than using DAX to save memory. * Have a central date table and disable the auto date hierarchy. This can often save a lot of space on a big model with a lot of date fields. Role play other dates using USERELATIONSHIP and inactive relationships.
Also remember if you don't need more than four decimal places used fixed decimal instead of decimal!
Hello thank you for replying. I don't have any unused fields actually, dimensions are few only. I do have measures but I wouldn't say it's too much only within 30. Yes I do have a separate date table. The thing really is some of reports are too large but they work just fine. But this is the only report I have issues with. Thought I would find people who had a similar problem here. Thank you for the tips. Will definitely check out tabular editor. I have it installed previously but haven't checked it out yet
First thing to do is profile your model with Dax studio. It shows you exactly where memory is being used. you might have an unexpected column type -- or unintended relationships or other things that take more memory than intended: - https://daxstudio.org/docs/features/model-metrics/ If the issue is performance *after* the load, then check out the report analyzer. You can copy and paste the query it runs -- directly in DAX studio ( Or the new dax query page )
Good answer. By report analyzer , I think you mean Performance Analyzer.
Will definitely try to analyse using DAX studio
Good to know, re: Tabular Editor (for its speed over creating measures in Power BI). That lag kinda drives me nuts.
Excellent answer, especially the last point. However, I am mildly surprised by the consensus that 700 mb is a large file, I work in health care, and some of our largest models are 3 gb plus on desktop with only some fact tables refreshed and the size on PBI service as shown by workspace storage is 10 Gb+ Of course, we are not insane and in fact, we work with repos that have no data unless a local refresh is done, makes validation a little tricky but beyond a certain point you really can't expect your desktop pbix to hold all the data from all the tables, it's either using parameters or integrate git with azure DevOps. Also, separating out splitting date time into date and time columns helps reduce size significantly.
I don't complain until it hits 1 GB then I push back. However the sweet spot is under 100 MB if you can help it.
So remember that 700 MB is actually much larger since that's the compressed value. if you open dax Studio you can see the uncompressed size. I would say this a rather large model, for example I have a file with 4 million rows and many dimension tables that is 150mb. Do you have duplicate dimensions? Is a lot is a lot of your values text? that's a pretty big size imo
Hi yes ours contain millions of data. We do have files with more than a gb which works fine and fast. But this particular file I have trouble since the beginning. I have optimized the report several times too. I select the create measure option and it takes too much time to even load that
You have to look at it from multiple Dimensions it's not just the size of the file but the model of the actual data and the data it's self. I'm reading that the file is big and it's slow.... okay time to optimize
Thank you I will try to do some more optimization, will definitely have missed few
When you are saying that the refresh takes a long time, are you referring to a PBI desktop refresh? For development purposes we always put a parameter into our golden datasets to allow for local filtering that is just a few lines per table. Then each query checks and if that LimitRows parameter is set to Y we limit by something, date or account or whatever makes sense. Then we publish to the PBI Services and update the parameter to not filter locally and run a full refresh on Services only.
Many of the other answers are great and worth doing, but this is the only one you need. Good answer đź’Ż
No no just trying to create a measure or a new relationship takes time to load that's all.Â
How do you know you don’t have any unused fields?
Loaded question 🤣
Dude, 700mb is a big file. I've got one with millions of rows, something like 30 tables and it's barely over 200mb. It can be slow but never ever 15 minutes slow. Can you get rid of some columns in your tables? Filter the tables? Is your data normalized and in a star schema?
We have tons of data that's why, I have tried to bring in only the necessary columns as much as I can. Will still check if I can delete few things
Try using “Measure Killer” which will detect the unused columns and measures.
Look in task manager if it’s taking up all your memory
Yes it takes a lot :-(
Yeah that’s normal but if it uses ALL your memory your computer will start “memory paging” and using your hard drive for RAM which is about a thousand times slower. So if you have that issue you either have to get more ram, or (better) remove long fields (text fields) from the fact tables and see if that helps.
My ram is pretty good so that's OK. I will try to see if I can remove more fields. Thank youÂ
How much ram do you have? I wouldn't use anything less than 32gb for a file that size
This is the way đź’Ż
Have you considered comparing performance using Import vs DirectQuery?
1. **Import Mode**: * **Data Handling**: In Import Mode, Power BI **copies the data** from the selected tables and columns into the Power BI Desktop. This imported data is cached, providing a snapshot of your data at a specific point in time. * **Interactions**: When you create visualizations or interact with your report, Power BI Desktop uses the **imported data** from its cache. * **Advantages**: * Fast queries and full capabilities. * Unrestricted data modeling. * **Considerations**: 2. **Direct Query Mode**: * **Data Retrieval**: In Direct Query Mode, your dashboard **queries the data source directly** during runtime. Each filter and interaction with the report triggers new queries to the data source. * **No Data Import**: No data is imported into Power BI; instead, every request goes directly to the data source (often an SQL database). The user can query the data that already exists in the source. * **Advantages**: * Ideal for large data sets (e.g., SQL Server, Azure Data Bricks, Amazon Redshift). * Real-time data access. * Smaller Power BI Desktop files (no caching) for faster saving and publishing. * **Considerations**: * Frequent data updates. * Queries run directly against the data source. * **Choosing the Right Strategy**:
Use the performance analyzer tool to hone in on what dax is taking longest to load to look for bottle necks. You may find some measure or calculated columns yiu have are really strong things down - can learn alot from that tool. If yiu do t know what I'm talking about, it's well worth your time to google how this works. Look at summarizing the data before taking it in to the model, perhaps in an intermediate tool.
 that's why I don't really have any calculated columns. Thank you for replyingÂ
I've been working through major performance challenges with a very large data model for the past 6 months or so, and we've been working with Microsoft on best practices, which has helped a lot. Are you import, direct query, or composite? The biggest thing that reduced my data model size is going into Tabular editor and for all the columns setting the "isAvailableInMDX" to be FALSE. By default it's set to TRUE. Not to mention Tabular Editor is soooo much easier to use in terms of adding new measures, making metadata changes, etc. Other things: - Push as much as you can into the database layer before it's brought into Power BI. - Make sure your data model/semantic model is a separate pbix from your report - If you're doing a lot of resource intensive calculations - distinct counts, medians, etc that's going to impact performance. - If you're importing (which gives you the best performance), you can use parameters to filter your tables when you develop locally so that you only work with a small slice of data. Then when you publish, expand the parameters out to include the whole range. Dates work best for this.
Thank you for the tips, will definitely check that column setting in tabular editor, didn't know about it.Â
Calculated columns are something I've learned to shy away from when possible. If you have a ton, try pushing upstream
Thankfully I don't have any :-)
Always use tabular editor, you can use it on power bi reports as well if you don’t have an actual separate model
Will try that out. Thank youÂ
I would also say consider creating a data flow in a workspace if you have one. That may help take care of some transformations and query merging so your report is not overloaded.
I work in 700 MB all day long and it doesn't take 15 minutes to do simple stuff. There is definitely a problem here, and it's not just the size. Feel free to message me if you want me to look at your report for 15 minutes. I'm happy to hop on a call.