Category Archives: R

The R statistical language

Using R in Power BI Part (3) – Create a R Script Visual

The third way of using R in Power BI is to create an R Script visual.  This is especially useful when we want to create a visual such as small multiples that is not available (yet) in the suite of standard or custom PBI visuals.

 Image that we have a daily time-series for December 2015 for the profit and loss (P&L), and we want to show a trellis of bar charts; one chart  for each  region (Asia, EMEA and US) and one for a  Volcker flag (which is either In or Out). So we want to show six charts in all.   We also want to show a bar on each chart in a different colour for each of three clusters.   Given a data frame named dataset with date, Region, Volcker, Cluster and P&L columns, we can create a this in R using the ggplot2 library with the following code.

 ggplot R code

Once we have written and tested this code in RStudio, we can create a R Script Visual. The RScript visual provides  a dataframe (R’s equivalent of a table)  named dataset.  This contains  a column for each of the fields in the Values well – in this case Region, Cluster, VolckerFlag, Date and P&L.

 R Script Values Well

We can copy our R  snippet into the bottom of the R Script Visual code window. 

R script editor

 Once done we can see our chart in Power BI.

R visual

 

Using R in Power BI Part (2) – Execute A R Step in a Query

The second way of using R in Power BI is as a step in a query in the Query Editor. Imagine we have a Task query that has a TaskOwner column that contains a comma separated list of one or more owners – like the snapshot below.

Task Query

Our objective is to create a new query that contains the mapping of tasks to owners – this requires us to split the TaskOwner column and create a row for each task and owner.

Since we are creating an additional dataset, let’s start by duplicating the Task query and rename to TaskOwnerMap. We can now insert a R step (rightmost icon on the Transform ribbon below).

Run R Script button

This opens a dialog where we can paste our R code snippet.

run r script window

The code above uses the functions from the tidyr package. The separate function splits the TaskOwner columns into several columns, one for each owner. The gather function then unpivots these columns to create an attribute and value column. The  select chooses the columns required for our final dataset.  (It’s best to develop, debug and test in the RStudio or other development environment beforehand.)

This transforms the data as required – we’ve named our result dataframe task.owner.map and we expand this in the next step to see the columns.

QE Final TaskOwnerMap

This particular example (splitting a column) is more easily accomplished directly in the Query Editor than using R but I wanted to show a simple case to explain the mechanics of the process.

Using R in Power BI Part (1) – Load a R Script

There are three ways to use R in Power BI

  1. Load a R Script as a data source
  2. Add an R Script as a QE step – introduced in the July 2016 update
  3. Plot a R Visual

In this article we’ll look at the first of these.  The subsequent articles will look at the other two ways.  We’ll use as an example in a typical reporting and visualisation challenge taken from the area of report progress of task to meet certain regulatory requirements – in this instance the 11 principles for good aggregation and reporting of data issued in by the BCBS – details here if you are interested.

Imagine the source data looks like this.

Excel Compliance Task Data Source

This shows a row for each task with some properties (Key, Name, Start Date, End Date). It also shows which task are applicable for which regions (denoted by the x in the cell).  The set of regions is implicit in the column names.  It also shows the owners of each task – if there is more than one owner, these are in a comma separated-list. In a similar way, it also shows the principles covered by each task.  There are 11 principles and we know this list will only have integers between 1 and 11. We also have on a separate sheet the details of the principles

Excel Compliance Principle List

This is not a good structure for reporting and analysis.  We need to tease out the data into separate, related tables –

  • Task,
  • TaskPrincipleMap (to hold the many to many relationships between Tasks and Principles),
  • Region,
  • TaskRegionMap (again to hold the many to many relationships between Tasks and Regions)

and combine this with the Principle table to arrive at a data model that looks something like this

Target Data Map

The R script below uses the tidyr and dplyr packages to perform a sequence of data transformations to achieve exactly this.  These include projecting columns (select), filtering rows (filter), splitting a comma-separated column into individual values (spread) and unpivot (gather).  The result is a set of data frames (R’s equivalent of a table) that correspond to the tables required.

Compliance Data Transform R Code

We can paste this RScript into the “Load R Script“ window.

Get Data R Script

 

execute R script window

One of the nice features of this is that you can then choose exactly which dataframes created by the script to import – and that you can load several dataframes not just one.

Navigator Window

This then loads our data into the Query Editor below – each selected R data frame becomes a separate query.  (in the snapshot below, I have renamed the queries from R naming convention to more typical table / query names e.g. from df.task.principle.map  to TaskPrincipleMap)

QE view

In this case, we can do these operations directly in Power BI of course – but R has many advance statistical capabilities that are not in Power BI; decision tress, clustering, correlations.  Some of these are shown in the R Script Showcase here.

Using The R Script Visual in Power BI To Build Charts With Precise Layouts

Financial regulators are unreasonable people – at least that’s the impression those of us who report a bank’s risk numbers to them would have you believe. Not only do they demand daily reporting but they also require the charts that we provide to have a defined layout and format. This does raise a problem for Power BI in that while it has very good visual practices, these are not exactly those of the regulator. However, we can use the R visual to create a chart in exactly the fashion that a regular demands.

One very important chart for regulators is the market risk back test chart. It is realy two charts in one; a bar and area chart. It compares the profit and loss (P&L) that the bank, or part of the back such as a legal entity or desk, against a margin of safety, known as value-at-risk or VaR, that we promise that the P&L will stay within most of the time. On those days that the P&L exceeds the VaR, that’s a breach. If we have too many breaches, the regulator will take action.

The regulator wants to see the VaR as an area chart with VaR as a mirror image both above and below the x (date) axis since a breach occurs if the bank makes either a profit or a loss outside this VaR envelope. Regulators want this VaR envelope to be partly transparent so that we can superimpose the P&L on it. The daily P&L must be shown as blue bars with breaches shown in red.

In Power BI chart we can get close to these requirements – close, but no cigar.

Backtest built in native PBI

There are three problems

    • The VaR envelope is shown as two boundary lines rather than area chart. The Power BI combo chart visual combines a bar chart (used for the P&L) with a line chart but we can’t combine with any other chart
      • The P&L bars are all blue; there is no means of changing the colour of individual bars based on a condition
      •  The snapshot below show synchronized axes for P&L and VaR but if the the ranges of these two measures are sufficient different, the chart will automatically have different scales for P&L and VaR, which is not what we want and is potentially misleading.

However, we can meet all requirements if we use the R visual. We can use the fine tuning allowed in the plotting package to define our chart elements precisely. The snapshot below shows the R chart.

Backtest built in PBI R Script Visual

The R visual creates a script editor in a lower pane. It provides the first 2 lines of R code to create an R dataframe (R’s equivalent of a table) that contains all unique values of the columns in the values well. From here we can add a few lines of R code firstly to determine if there is a breach on a given day and then to plot the chart. The R code uses two very popular R packages. The dplyr package enables us to manipulate dataframes and the ggplot2 package gives us the ability to plot exactly the chart we require. The ggplot call first creates the P&L bar chart. It selects the colour of the P&L bars based on the breach condition. It then overlays this with partially transparent area charts – one for the upper VaR bound, then one for the lower (negative) VaR bound.

PBI R Script Editor Window

Even with a small amount of R code like this, it is best to build and test the R code within the RStudio, the typical development environment for R, then copy it into the code window for the visual once it is fully working.

The ggplot2 package allows fine tuning of a charts appearance – enough to satisfy the most unreasonable regulator.

This article first appeared on the Microsoft MVP Technical Tuesdays blog  here
The source data is here.

Charting A Mashup Stress Scenario using R

From the previous post we have our mashup stress scenario data in R and now we’d like to plot it. We could of course save the R data to database or file, import into a data visualisation tool such as Tableau and chart it there. However, R has some powerful charting capabilities and this article will demonstrate these.

Let’s create a lattice chart with each of the five scenarios in a different panel. In each panel, we’ll show the stress P&L by counterparty. Firstly, we need to summarise the data. Figure 1 shows the data frame.
Figure 1
Figure 1

Figure 2 shows the R code to sum the stress P&L and group by the two categories we want to see in our plot (counterparty and scenario) and to build the plot.
Figure 2
Figure 2

Figure 3 shows the resulting plot.
Figure 3
Figure 3

We may like to see a different perspective on the data – perhaps as a clustered bar chart showing the stress P&L by scenario for each counterparty. Figure 4 shows the code to generate the plot and Figure 5 the result.

Figure 4
Figure 4

Figure 5
Figure 5

This article just scratches the surface of R’s plotting capabilities. It’s also possible to create box plots, line charts and scatter plots, to have a great deal of precision on the detail of each chart, to layout several charts on a single page and to produce charts in various formats including pdf.

Calculating A Mashup Stress Scenario Using R

A previous article described the business case for creating a mashup scenario quickly.  Often, this needs to be a self-service solution created by risk managers or business analysts themselves. One approach to doing this is to use R, an open source language for statistical use that has very good capabilities for transforming data. This article will use R to calculate the mashup scenario. The next article will use R to provide graphs of the results.

The first step is to load the stress data of the four original scenarios into an R data frame (R’s equivalent of a database table).  Figure 1 shows some of the stress data in an R data frame.
Figure 1
Figure 1

Now we can build the mashup scenario in a few steps.  The previous post described the business rules.  The first rule is to copy over trades from the Grexit scenario where either the issuer or counterparty is from Greece.  Figure 2 shows the single line of code necessary.
Figure 2
Figure 2

The next step is to add trades from the Rally scenario where the instrument currency is in Euros.  However, as this is not the first step, we have to be careful not to copy over any trades already in the mashup scenario.  Figure 3 shows the code.
Figure 3
Figure 3

The following steps to append the trades from the other two scenarios are similar.  The final steps are just tidying up the scenario names and appending the mashup data set onto the original data set.  Figure 4 shows the newly created mashup scenario rows
Figure 4
Figure 4

Figure 5 shows the code in its entirety.  It shows how R can perform quite complicated operations very concisely. 
Figure 5
Figure 5