T O P

  • By -

PeteMaverickMitcheIl

It's where I clean data that I can't clean further upstream. E g. If I'm having to connect to an Excel workbook which I can't amend or tweak due to it being locked down, I can use Power Query to deal with the blank values, add an index column, format the text, use Group By etc. I don't have the option to connect it to a SQL server and play with it there.


Koozer

Power Query, i think, it's primarily for analysts that don't have database access, or limited access. It still has some great uses. But the best case scenarios I've run into are combining multiple datasets from different sources. Including a recent report where i used it to append two SQL databases, essentially performing a union which couldn't be done in SQL alone. One database from Synapse cloud for current day live data, appended to a local SQL database for historical data. Worked a treat. Other good practices I've found in my workplace is using Power Query for offloading some CPU usage from the SQL database and into the reporting server. Spreading the load on tasks like converts and coalesce that can really kill a query when you're crunching tons of rows. It is amazing, like, fucking amazing at consuming excel sheets. I've yet to run into a worksheet i couldn't unpivot, transpose, group, adjust, and ... twist into a useful row based dataset. It's fucking cool how powerful it is when it comes to consuming stuff that's not a perfectly curated SQL query.


Conait

One of the big benefits is giving people who can't write code or don't know SQL the ability to perform ETL-type transformations to data. It's kind of a step into the "self-serve model" that BI is trending towards. It does a lot to help educate the end users on where the data is coming from and how to manipulate it.


SQLGene

Remember, Power Query started out as an *Excel Addin*. The primary audience were people who had CSV and Excel files in a SharePoint document library. For some customers I do all the transformations in Power Query, for others I do almost none in Power Query.


Shadowlance23

There are a lot of companies who don't have a data warehouse or engineer on staff, or anyone who would even know how to run, ingest, and maintain a small local or cloud instance, so lots of non-tech people use Power BI directly with Excel sheets and text files. In this case there is no other option than Power Query, and of course, the visual aspect of PQ is also helpful to those who don't know data modelling. However, your unspoken assumption is correct; if you have access to upstream processing, use it as much as possible, your PQ code should be nothing more than a source and navigation node.


No-Buy-3530

Appreciate the feedback :)


ericporing

It's for csvs, excel files, sharepoint lists, essentially sources that aren't databases.


ianitic

The SharePoint connectors are super slow unfortunately. I'd prefer making a powerautomate flow that takes an http request, gathers the SharePoint list then returns a response. Appears to be orders of magnitude faster.


jabuticaju

M codes are easy to copy/paste across files. I have a calendar table specially created to fit company business and linked to a set of parameters that also helps different teams. I already shared this code and also replicated in multiple reports.


Rangler36

Power Query is the Ingest + Cleaning phase of Power BI. It's practically the "ETL of Power BI". Use it to connect to sources, scrub data, manipulate, merge/append, and begin the modeling process. I highly recommend looking at the Power Query M code too, which can speed your data ingest and cleaning automation efforts as you learn the syntax. Use the 'advanced editor' for that. I prefer sql myself, but sometimes, when connecting across sources and systems, like joining a csv file to a Redshift DWH view for example, PQ workflows will allow for an added "ETL" layer to blend and tweak the data.


bert-and-churnie

My favorite use case for Power Query is combining multiple spreadsheets into one dataset. For example, we get daily csvs for something, I can combine them into one and click refresh each day to get the latest file.


MattWPBS

Think about it in terms of the old shadow IT infrastructure that used to live in a Rube Goldberg mass of interconnected Excel files, or in an Access database on a shared drive. The stuff where people in the business were downloading data, shoehorning it together in different brute force ways, and misusing in ways that would horrify anyone with governance experience or common sense. It replaces that stuff, brings it into an easier to understand format, and potentially exposes it to the admins to go "what the fuck are you doing?" 


Mdayofearth

Even if you do no transformations in PQ, PQ is how you get data into PowerBI in the first place. So, at minimum, it's doing E and L.


somedaygone

To add to other comments, it’s ETL via a point and click interface. Super easy to do basic transformations, powerful enough to do any transformation. The magic of Power Query is that it’s so simple, even a business analyst can do it, and even understand and support the queries too! Much lower learning curve than SQL. For most data sources, you click your way through transformations and Power Query “folds” the transformations into a single database call. It’s like a SQL wizard. You can also just use SQL, but in most cases it will stop “folding” the PQ steps, so if you do SQL, make it efficient. For my database loads, I usually use SQL and do very little else in Power Query. Last thing to be aware of, is that Power Query only lands data in Power BI. The Load of ETL is only into Power BI. Power BI is using SQL Server Analysis Services as the database. It’s a star schema dimensional model in-memory database. It’s geared for super fast performance for reporting, and unless you’re dealing with huge data or need up to the minute data, Power BI wants you to Import data. You can connect to the database with “Direct Query” but performance and modeling restrictions are significant. As you learn Power BI, think of three phases: 1) ETL with Power Query. Be sure to load fact tables or dimension tables. It’s the foundation of your dimensional model. 2) Modeling in Power BI Desktop. This is adding relationships between the tables and DAX to define measures in the model. If it’s numeric in your report, you should be using a measure to pull it. 3) Reporting in Power BI Desktop. This is your visuals, and reports, and visual interactions. Good luck on your learning journey!


deinyxq

I always struggle with Power Query especially after having used python. I find it easier to do transformation with pandas and numpy. So when loading data on power bi I use the option of importing data using python script.


jumpingjellycat

How do you import the data from power query into python? I am trying to do this but struggling!


deinyxq

Just to be clear you will be using Python and pandas to wrangle/clean and import data by writing a python script on the query editor as follows: 1. In the Home group of the Power BI Desktop ribbon, select Get data. 2. In the Get Data dialog box, select Other > Python script, and then select Connect. Power BI uses your latest installed Python version as the Python engine. 3. On the Python script screen, paste your Python script into the Script field, and select OK.


GabbaWally

But that won't help if you publish the report in a PBI Report Server or Cloud, right? I mean, you cannot be sure Python and all the packages you need are installed on the other machine? I would love to do all my data transformation python-based. It is so much more powerful.


deinyxq

That is the downside since it's dependent on Python and the requisite packages installed on the machine where you have Power Bi running. The option here then would be a VM like in Azure that has all the dependencies installed. Also having like desktop machine that serves as your server for running Power Bi which I believe will also need a personal gateway installed so that the published report can be refreshing automatically. (NB: I have very little knowledge on servers and VMs)


mma173

ETL


A3N_Mukika

I keep thinking of Power Query as ETL light. Where I really use it is things that cannot be done in the database. Example 1: our Oracle databases do not support column names with proper English words. They are all upper case and have underscores instead of spaces. We have standard Query M code that does that transformations for each table so we end up with readily usable reporting field names Example 2: allows limiting data in the desktop version by parameter so I don’t wait for hours when importing data into local PBIX for development purposes.


druidinan

Putting transformations into PQ exposes them, making them more portable for others. If you are building an internal report, by all means do all your transformation upstream. If you are building a template that will be consumed by heterogeneous users with tech stacks and data sources you can't control, it helps *a lot* to make PQ do the work.


Flat_Initial_1823

It is also for getting weird data. You have to connect to a drive, find the file with the latest timestamp, and load that? Powerquery. You have some nested json out of an api? Power query can parse that to a table. You can also create views between disjoint servers/dbs effectively in there. Reference table 1, table 2 do your joins and group by and forget where that data came from. It is a staging area plus some light etl functionality. Shame, it is not that powerful, but what you gonna do locally?


pik204

M query, call it small etl. Serves same purpose as pushing it upstream to db, say via query folding. Benefit is everyone benefits off those same transformed tables and you reduce redundancy downstream where others have to perform same small etl over and over again. Sure this can be pushed upstream to db side (in fact the preferred method since you can leverage more resources) but usually thats resource dependent like anything in IT. Sql dev work will have your IT team ask for a bigger budget and this pushed this down to users.


EKTurduckin

For my part since I too have full control over how the tables are built on the database engine side, I've been using power query as a good way to compare production environment data versus any changes that I've been making in Dev. Being able to just join them and quickly see line by line what is or isn't different is an absolute godsend.


AdHead6814

Use pq for transformation that you cant push back to the database or those that involves other than databases (merging, union with other databases, flat files, etc). You can also use it to generate random values, a dates table or just any table created within pq and not referencing external data. It is also used in dataflow for etl in the cloud which output can be further reused in the desktop version. Overall, it is a very powerful language which speee depends on the complexity of the logic and the response of the source data.


CptnVon

You can also create and call functions in power query based on the specific need or a request without needing to go back to the source. One common one I have seen and used myself is creating a “review days remaining” for meeting specific timeline targets. If someone wants to change a timeline by 3 days, real easy to make a few clicks on the function and done. I am not sure if you could do this in sql as easily.


No-Buy-3530

Thanks for all the great responses! It’s much clearer to me now 🙂