Power Query team up with SSIS

Power Query is one of my favorite tools of the Power BI suite, the self-service BI offering of Microsoft. The strategy of Microsoft self-service BI (the very very short version) is to empower people to to analyze and report on data without the direct evolvement of IT-professionals.

With Power Pivot Microsoft offers in the self-service BI solution the concept of using the data on a personal, team and corporate level. This is a nice BI life cycle approach, when an Information Worker starts with Excel, deploys his workbook to SharePoint for the team and if it get used by more and more people the IT-Pro can deploy the model on SSAS Tabular Server.

When using Power Query this bit is missing. There is no function of using a Power Query and publish it to the equivalent Integration Services. Of course, I hope that Microsoft is working on something like this, but I believe it is not that easy to just take the “M” script behind Power Query and copy it into SSIS use it like BIML and off we go….

Lately Tillmann and myself were at the SQL Saturday in Vienna and our fellow Regis (@regbac) had a nice idea in his session on Power Query. The idea was to use an Integration Service Script Task to address the Excel Object to requery a workbook with  Power Query queries insight. The idea is not new, Matt already blogged about it. But this was a trigger for us to think about a little SSIS component, which does the job without a script task, just easy.

Here we go: In our session today at the SQL Saturday in Lisbon (a great event btw, organized by our fellow Niko (@NikoNeugebauer) and his awesome crew) we announced this little component in a short demo. You can download the component on Codeplex here.

How does this SSIS Excel refresh work?

Let’s do a little sample: We create a Power Query workbook. This workbook combines streams from different sources, SQL Server, Azure data market and 180 Text Files from a folder. It also uses dynamic filtering to show the refresh capabilities of the component.

PQ Dataflow

Doing dynamic filtering with M

The red box is just a simple view from MDS, which give us the function of changing parameters,
which filter the stream of mashed up data from pervious queries.

M script2

This view has a content and factor attribute. We use the factor values to filter and change the data on the fly in the query. Simple. So we are dynamic by now in Power Query.

Using the workbook in SSIS

Now let’s assume this workbook was created by an information worker and you want to use it as a source in SSIS. Yes, you can take the workbook and use the Excel Data Source and load data
from the Excel spreadsheet to your data flow. But now that PowerQuery is dynamic, how would
you do it? Now the SSIS Excel refresh component comes into place:

So, let’s look to the Control Flow: Very easy, there are just 3 tasks: Two Excel refresh and one data flow.

ControlFlow

The configuration for the Excel Refresh is shown below:

ExcelRefreshConfig

PQ queries are represented as OLE DB connections within Excel. In the component you can choose the Excel Connection Manager of your package. I my case, I choose the PowerQueryWorkbook connection. Next, I can choose the timeout, because there is latency between the time Power Query runs the queries and when you can access it via Excel externally and the component needs to know, when to close the Excel Connection.
Up next I can choose to refresh all connections or just specific ones. Here, I do this with a specific connection first. Why? Currently the component don’t offer a sequence of refreshing the connections (maybe in the future). So I need to simulate this via the control flow. As the Parameters have to be updated first, to filter and manipulate the data stream they are first. The next task will do the refresh of the sales data stream.

I can tick a box for creating a backup of the queried Excel Workbook, just in case my query does bad stuff 😉 Hey folks this is beta! 🙂

But that’s it, the next step is just using the Excel datasource:

DataFlow

Run, run

ReRunSSIS

Now, let’s start the package and see what happens: [Step 1] we run the package. We get a result. [Step 2] we change parameter values from 3 to 30000 in Masterdata Service Excel Add In. We log it. [Step 3] we rerun the package and voilá our Power Query dynamic script returns the newly transformed values. No big development in SSIS, just using the query of the Information Worker.

Hmmmm…

Folks OK, we know there are a lot of downsides to this approach. Excel on Server? Be careful.
Security issues accessing data sources in the Power Query? Are not transparent. Controlling the Excel Object for long running queries? Etc., etc., etc….

But hey, in between waiting for a professional solution from the SSIS Team maybe go for it and have fun. Please send us your thoughts…