There are three ways to use R in Power BI
- Load a R Script as a data source
- Add an R Script as a QE step – introduced in the July 2016 update
- Plot a R Visual
In this article we’ll look at the first of these. The subsequent articles will look at the other two ways. We’ll use as an example in a typical reporting and visualisation challenge taken from the area of report progress of task to meet certain regulatory requirements – in this instance the 11 principles for good aggregation and reporting of data issued in by the BCBS – details here if you are interested.
Imagine the source data looks like this.
This shows a row for each task with some properties (Key, Name, Start Date, End Date). It also shows which task are applicable for which regions (denoted by the x in the cell). The set of regions is implicit in the column names. It also shows the owners of each task – if there is more than one owner, these are in a comma separated-list. In a similar way, it also shows the principles covered by each task. There are 11 principles and we know this list will only have integers between 1 and 11. We also have on a separate sheet the details of the principles
This is not a good structure for reporting and analysis. We need to tease out the data into separate, related tables –
- TaskPrincipleMap (to hold the many to many relationships between Tasks and Principles),
- TaskRegionMap (again to hold the many to many relationships between Tasks and Regions)
and combine this with the Principle table to arrive at a data model that looks something like this
The R script below uses the tidyr and dplyr packages to perform a sequence of data transformations to achieve exactly this. These include projecting columns (select), filtering rows (filter), splitting a comma-separated column into individual values (spread) and unpivot (gather). The result is a set of data frames (R’s equivalent of a table) that correspond to the tables required.
We can paste this RScript into the “Load R Script“ window.
One of the nice features of this is that you can then choose exactly which dataframes created by the script to import – and that you can load several dataframes not just one.
This then loads our data into the Query Editor below – each selected R data frame becomes a separate query. (in the snapshot below, I have renamed the queries from R naming convention to more typical table / query names e.g. from df.task.principle.map to TaskPrincipleMap)
In this case, we can do these operations directly in Power BI of course – but R has many advance statistical capabilities that are not in Power BI; decision tress, clustering, correlations. Some of these are shown in the R Script Showcase here.