Visualizing Bootrapped Stepwise Regression in R using Plotly

May 29, 2016, 7:12 pm

≫ Next: Radial bar charts in R using Plotly

≪ Previous: ggplot2 docs completely remade in D3.js

We all have used stepwise regression at some point. Stepwise regression is known to be sensitive to initial inputs. One way to mitigate this sensitivity is to repeatedly run stepwise regression on bootstrap samples.

R has a nice package called bootStepAIC() which (from its description) “Implements a Bootstrap procedure to investigate the variability of model selection under the stepAIC() stepwise algorithm of package MASS.”

It provides a lot of information as an output and sometimes it can get challenging to keep track of all of this information especially if there are a lot of covariates. In this post we’ll try to come up with a simple visualization aimed at summarizing the output from the function boot.stepAIC().

Running `boot.stepAIC()`

Using the boot.stepAIC() is fairly simple. Just input an already fitted lm/glm model and th associated dataset.

We’ll use the BostonHousing dataset from the mlbench package. More details here

library(bootStepAIC)
library(plotly)
library(mlbench)

# Load Boston housing dataset
data("BostonHousing")

# Fit Linear regression model
fit <- lm(crim ~ ., data = BostonHousing)

# Run bootstrapped stepwise regression
fit.boot <- boot.stepAIC(fit, data = BostonHousing, B = 100) # That's it !

Collecting required information

The output from boot.stepAIC() contains the following. Note that each output is shown as a percentage (based on the total number of bootstrapped samples)

No of times a covariate was featured in the final model from stepAIC()
No of times a covariate’s coefficient sign was positive / negative
No of times a covariate was statistically significant (default at alpha = 5%)

We’ll collect all of this information first and create data frames so as to make charting easier later on.

Note that in this particualr example there is a variable by the name chas which is a factor with levels 0 and 1. R renames the variable as chas1 by default.

# Extract data
nBoot <- summary(fit.boot)[8,1]
origModel <- paste(names(coef(fit.boot$OrigModel)), collapse = " + ")
stepModel <- paste(names(coef(fit.boot$OrigStepAIC)), collapse = " + ")

# Names of covariates
covariates <- rownames(fit.boot$Covariates)
nCovariates <- length(covariates)

# Matrix of number of times each covariate was picked
coef.pick <- fit.boot$Covariates

# Matrix for the consistency of sign on each covariate
coef.sign <- fit.boot$Sign

# Change name for "chas" since it is a factor
rownames(coef.sign)[7] <- "chas"
coef.sign <- coef.sign[match(rownames(coef.pick), rownames(coef.sign)),]

# Matrix for statistical significance
coef.stat <- fit.boot$Significance

# Change name for "chas" since it is a factor
rownames(coef.stat)[11] <- "chas"
coef.stat <- coef.stat[match(rownames(coef.pick), rownames(coef.stat)),]

# Make into long form for charting later
coef.stat.long <- data.frame()

for(i in 1:length(coef.stat)){
  n <- round(coef.stat[i],0)
  vec <- seq(0, n, by = 2)
  mat <- data.frame(rep(names(coef.stat)[i], length(vec)), vec, paste("% Sig", n))
  names(mat) <- c("variable", "sig", "text")
  
  # We'll use mode = "line". NA helps separate line segments
  coef.stat.long <- rbind(coef.stat.long, mat, c(NA, NA))
}

# Convert to dataframes
coef.pick <- as.data.frame(coef.pick)
coef.stat <- as.data.frame(coef.stat)
coef.sign <- as.data.frame(coef.sign)

names(coef.pick) <- "pick"
names(coef.sign) <- c("pos", "neg")
names(coef.stat) <- "stat"

Plot

Now that we have all the information we need, we just need to plot. The plot is arranged as such:

One layer for the number of times a variable was picked up by stepAIC() (barplot)
One layer for the positive and negative coefficients (scatter plot using triangles)
One layer for the number of times a variable was significant (vertical line chart)
Annotation for some other information

# Base plot for number of times a variable was picked by stepAIC
plot_ly(coef.pick, x = rownames(coef.pick), y = pick,
        type = "bar", opacity = 0.75, name = "Times picked (%)",
        hoverinfo = "text", text = pick.text,
        marker = list(color = "#00994d", line = list(width = 2))) %>% 
  
  # Layer for number of times a variable was statistically significant at 5%
  add_trace(data = coef.stat.long, x = variable, y = sig, 
            type = "scatter", mode = "markers + line", name = "Stat. Sig (%)",
            line = list(color = "#ffdb4d", width = 15),
            hoverinfo = "text", text = text) %>% 
  
  # Layer for number of times a variable's coefficient was positive
  add_trace(data = coef.sign, x = rownames(coef.pick), y = rep(-5, nCovariates), 
            type = "scatter", mode = "markers", name = "Coef Sign(% pos)",
            marker = list(symbol = "triangle-up", size = pos/scale, color = "#4da6ff",
                          line = list(color = "black", width = 2)),
            hoverinfo = "text", text = sign.text.up) %>% 
  
  # Layer for number of times a variable's coefficient was negative
  add_trace(data = coef.sign, x = rownames(coef.pick), y = rep(-10, nCovariates), 
            type = "scatter", mode = "markers", name = "Coef Sign(% neg)",
            marker = list(symbol = "triangle-down", size = neg/scale, color = "#ff704d",
                          line = list(color = "black", width = 2)),
            hoverinfo = "text", text = sign.text.down) %>% 
  
  # Layout, annotations, axis options etc
  layout(xaxis = list(title = "<b>Covariates</b>"),
         yaxis = list(title = "<b>Percentage(%)</b>",
                      tickmode = "array", 
                      tickvals = round(seq(0, 100, length.out = 10), 0),
                      domain = c(0.2, 1)),
         plot_bgcolor = "#e1efc3",
         paper_bgcolor = "#e1efc3",
         
         annotations = list(
           list(x = 0.1, y = 1, 
                xref = "paper", yref = "paper", 
                xanchor = "left", yanchor = "top",
                ax = 0, ay = 0,
                text = "Visualizing <em>boot.stepAIC()</em>",
                font = list(family = "serif", size = 30)),
           
           list(x = 0.3, y = 0.1, 
                xref = "paper", yref = "paper", 
                xanchor = "left", yanchor = "top",
                ax = 0, ay = 0,
                text = paste("<em>Original Model:</em>", origModel),
                font = list(family = "PT Sans Narrow", size = 15)),
           
           list(x = 0.21, y = 0.05, 
                xref = "paper", yref = "paper", 
                xanchor = "left", yanchor = "top", align = "left",
                ax = 0, ay = 0,
                text = paste("<em>Stepwise Model:</em>", stepModel),
                font = list(family = "PT Sans Narrow", size = 15)),
           
           list(x = 0.8, y = 0.90, 
                xref = "paper", yref = "paper", 
                xanchor = "left", yanchor = "top", align = "left",
                ax = 0, ay = 0,
                text = paste0("<em>No. of Covariates:</em>", nCovariates, "<br>",
                              "<em>No. of bootstrap samples:</em>", nBoot, "<br>"),
                font = list(family = "PT Sans Narrow", size = 15))
         ))

↧

Radial bar charts in R using Plotly

June 10, 2016, 9:03 am

≫ Next: Radar charts in R using Plotly

≪ Previous: Visualizing Bootrapped Stepwise Regression in R using Plotly

Creating a radial barchart is fairly easy using plotly. In this post we’ll focus on modifying a radial line chart to make it look like a bar chart so come up with a nice visualization for CO₂ emissions.

The visualization is inspired by this awesome chart.

# inspired by 
# https://s-media-cache-ak0.pinimg.com/736x/22/1a/d0/221ad079e362ba13969b1bef30b6a5f2.jpg

library(plotly)

# read in data
df <- read.csv("https://cdn.rawgit.com/plotly/datasets/master/Emissions%20Data.csv", stringsAsFactors = F)

# Show only 2011 values
df <- subset(df, Year == "2011")

# Arrange in increasing order of emissions
df <- df %>% dplyr::arrange(Emission)
df <- df[-(1:50),]

#  Add colors
colors <- RColorBrewer::brewer.pal(length(unique(df$Continent)), "Spectral")
continent <- unique(df$Continent)

df$colors <- df$Continent

for(i in 1:length(continent)){
  idx <- df$colors %in% continent[i]   
  df$colors[idx] <- colors[i]
}

# Get incremental angle value
n <- nrow(df) + 20
dtheta <- 2*pi / n
theta <- pi / 2

# Initialise
x.coord <- c()
y.coord <- c()
cols <- c()

# This is for the white - circle in the middle
adjust <-  20

# Initialize plot
p <- plot_ly()

for(ctr in 1:nrow(df)){
  
  a <- df$Emission[ctr] + adjust
  
  x1 <- adjust * cos(theta)
  y1 <- adjust * sin(theta)
  
  x2 <- a * cos(theta)
  y2 <- a * sin(theta)
  
  x.coord <- c(x.coord, x1, x2, NA)
  y.coord <- c(y.coord, y1, y2, NA)
  cols <- c(cols, df$Continent[ctr], df$Continent[ctr], NA)
  
  theta <- theta + dtheta
  
  p <- add_trace(p, 
                 x = c(x1, x2),
                 y = c(y1, y2),
                 mode = "lines", 
                 line = list(width = 5, color = df$colors[ctr]),
                 evaluate = T)
}

# Keep x and y axis extents the same
up <- max(na.omit(c(x.coord, y.coord))) + 10
down <- min(na.omit(c(y.coord, y.coord))) - 10

# Add layout options, shapes etc
p <- layout(p,
            showlegend = F,
            xaxis = list(range = c(down, up), domain = c(0, 0.5),
                         title = "", showgrid = F, zeroline = F, showticklabels = F),
            yaxis = list(range = c(down, up), 
                         title = "", showgrid = F, zeroline = F, showticklabels = F),
            shapes = list(
              list(type = "circle",
                   x0 = (-5 - adjust),
                   y0 = (-5 - adjust),
                   x1 = (5 + adjust),
                   y1 = (5 + adjust),
                   fillcolor = "transparent",
                   line = list(color = "white", width = 2)),
              
              list(type = "circle",
                   x0 = (-15 - adjust),
                   y0 = (-15 - adjust),
                   x1 = (15 + adjust),
                   y1 = (15 + adjust),
                   fillcolor = "transparent",
                   line = list(color = "white", width = 2)),
              
              list(type = "circle",
                   x0 = (-25 - adjust),
                   y0 = (-25 - adjust),
                   x1 = (25 + adjust),
                   y1 = (25 + adjust),
                   fillcolor = "transparent",
                   line = list(color = "white", width = 2)),
              
              list(type = "circle",
                   x0 = (-35 - adjust),
                   y0 = (-35 - adjust),
                   x1 = (35 + adjust),
                   y1 = (35 + adjust),
                   fillcolor = "transparent",
                   line = list(color = "white", width = 2))))


# Add annotations for country names
p <- plotly_build(p)

theta <- pi / 2
textangle <- 90

for(ctr in 1:nrow(df)){

  a <- df$Emission[ctr] + adjust
  a <- a + a/12

  x <- a * cos(theta)
  y <- a * sin(theta)
  
  if(ctr < 51) {xanchor <- "right"; yanchor <- "bottom"}
  if(ctr > 51 & ctr < 84) {xanchor <- "right"; yanchor <- "top"}
  if(ctr > 84) {xanchor <- "left"; yanchor <- "top"}

  p$layout$annotations[[ctr]] <- list(x = x, y = y, showarrow = F,
                             text = paste0(df$Country[ctr]),
                             textangle = textangle,
                             xanchor = xanchor,
                             yanchor = yanchor,
                             font = list(family = "serif", size = 9),
                             borderpad = 0,
                             borderwidth = 0)
  theta <- theta + dtheta
  textangle <- textangle - (180 / pi * dtheta)
  
  if(textangle < -90) textangle <- 90
}

# Titles and some other details
p$layout$annotations[[148]] <- list(xref = "paper", yref = "paper",
                                    x = 0, y = 1, showarrow = F,
                                    xanxhor = "left", yanchor = "top",
                                    align = "left",
                                    text = "<em>Carbon dioxide emissions</em><br><sup>(metric tons per capita)</sup>",
                                    font = list(size = 25, color = "black"))

p$layout$annotations[[149]] <- list(xref = "paper", yref = "paper",
                                    x = 0, y = 0.9, showarrow = F,
                                    xanxhor = "left", yanchor = "top",
                                    align = "left",
                                    text = "Emissions from burning of solid, liquid and <br>gas fuels and the manufacture of cement.",
                                    font = list(size = 18, color = "#808080"))

p$layout$annotations[[150]] <- list(xref = "paper", yref = "paper",
                                    x = 0.15, y = 0.5, showarrow = F,
                                    xanxhor = "left", yanchor = "top",
                                    align = "left",
                                    text = "<b>Annual CO<sub>2</sub> emissions</b><br><b>for 147 countries.</b>",
                                    font = list(size = 13, color = "black"))

p$data[[149]] <- list(x = rep(-7, 6), y = c(-6, -4, -2, 0, 2, 4), mode = "markers",
                      marker = list(color = colors, size = 10))

p$data[[150]] <- list(x = rep(1, 6), y = c(-6, -4, -2, 0, 2, 4), mode = "text",
                      text = rev(continent),
                      marker = list(color = colors, size = 10))
p

↧

Radar charts in R using Plotly

June 15, 2016, 11:22 pm

≫ Next: Interactive Q-Q Plots in R using Plotly

≪ Previous: Radial bar charts in R using Plotly

This post is inspired by this question on Stack Overflow..

We’ll show how to create excel style Radar Charts in R using the plotly package.

library(plotly)
library(dplyr)

# Read in data
df <- read.csv("https://cdn.rawgit.com/plotly/datasets/master/Consumer%20Complaints.csv", 
               stringsAsFactors = F, check.names = F)

# Melt
df <- reshape2::melt(df, id = c("Company"))
colnames(df) <- c("Company", "Complaint", "Percent")

getPolarCoord <- function(r, matrix = F, na = F){
  # Get starting angle and angle increments
  theta <- 0
  dtheta <- 360 / length(r)
  dtheta <- (pi / 180) * dtheta  # in radians
  
  # Get polar coordinates
  x <- c()
  y <- c()
  
  for(i in 1:length(r)){
    
    x <- c(x, r[i] * cos(theta))
    y <- c(y, r[i] * sin(theta))
    
    theta <- theta + dtheta
  }
  
  x[length(x) + 1] <- x[1]
  y[length(y) + 1] <- y[1]
  
  if(na == T){
    x[length(x) + 1] <- NA
    y[length(y) + 1] <- NA
  }
  
  
  if(matrix == T){
    return(cbind(x, y))
  }else{
    return(list(x = x, 
                y = y))
  }
  
}

coords <- by(df, df[,"Complaint"], function(r){
  x <- getPolarCoord(r[,3])
  x <- cbind(x$x, x$y)
  x <- data.frame(rbind(r, r[1,]), x = x[,1], y = x[,2])
  return(x)
})

coords <- rbind(coords[[1]], coords[[2]], coords[[3]])
df <- data.frame(coords, txt = paste(coords$Company, "<br>", 
                                     coords$Complaint, ":", 
                                     round(coords$Percent*100, 2), "%"))

# Plot
smooth <- 1
bgcolor <- "white"

p <- plot_ly(data = df, 
             x = x, y = y, mode = "lines", 
             group = Complaint,
             fill = "toself",
             line = list(smoothing = smooth, shape = "spline"),
             hoverinfo = "text",
             text = txt) %>% 
  
  add_trace(data = df, 
            x = x, y = y, mode = "markers", 
            marker = list(color = "white", 
                          size = 10, 
                          line = list(width = 2)),
            hoverinfo = "none",
            showlegend = F) %>% 
  
  layout(xaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F,
                      domain = c(0.02, 0.48)),
         yaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F,
                      domain = c(0, 0.92)),
         font = list(family = "serif", size = 15),
         legend = list(x = 0.55, y = 0.9, bgcolor = "transparent"),
         plot_bgcolor = bgcolor,
         paper_bgcolor = bgcolor)

# Add grids
grid <- rbind(getPolarCoord(rep(0.05, 50), matrix = T, na = T),
              getPolarCoord(rep(0.10, 80), matrix = T, na = T),
              getPolarCoord(rep(0.15, 150), matrix = T, na = T),
              getPolarCoord(rep(0.20, 170), matrix = T, na = T),
              getPolarCoord(rep(0.25, 200), matrix = T, na = T))

grid <- as.data.frame(grid)

p <- add_trace(p, data = grid,
               x = x, y = y, mode = "lines",
               line = list(color = "#57788e", dash = "4px", width = 1),
               showlegend = F,
               hoverinfo = "none")

inner <- getPolarCoord(rep(0.06, 5))
outer <- getPolarCoord(rep(0.27, 5))

x = t(cbind(inner$x, outer$x))
y = t(cbind(inner$y, outer$y))

x <- as.numeric(apply(x, 2, function(vec){
  return(c(vec, NA))
}))

y <- as.numeric(apply(y, 2, function(vec){
  return(c(vec, NA))
}))

linegrid <- data.frame(x = x, y = y)

p <- add_trace(p, data = linegrid,
               x = x, y = y, mode = "lines",
               line = list(color = "#57788e", dash = "4px", width = 1),
               showlegend = F,
               hoverinfo = "none")

# Add text
banks <- c("Bank of<br>America",
           "Wells Fargo<br>&Company",
           "JP Morgan<br>Chase & Co.",
           "CitiBank",
           "Capital One")
labels <- paste0("<em>", banks, "</em>")
p <- add_trace(p, data = getPolarCoord(rep(0.28, 5)),
               x = x, y = y, mode = "text", text = labels,
               showlegend = F,
               hoverinfo = "none",
               textfont = list(family = "serif", color = "#808080"))

# Add a gray circle
p <- add_trace(p, data = getPolarCoord(rep(0.24, 200)),
               x = x, y = y,
               fill = "toself",
               fillcolor = "rgba(200, 200, 200, 0.3)",
               line = list(color = "transparent"),
               mode = "lines",
               hoverinfo = "none",
               showlegend = F)

# Add titles, description etc

p <- layout(p, 
            annotations = list(
              list(xref = "paper", yref = "paper", 
                   xanchor = "left", yanchor = "top",
                   x = 0.03, y = 1, 
                   showarrow = F, 
                   text = "<b>Consumer complaints for five large banks in the U.S.</b>",
                   font = list(family = "serif",
                               size = 25, 
                               color = "#4080bf")),
              
              list(xref = "paper", yref = "paper", 
                   xanchor = "left", yanchor = "top",
                   x = 0.03, y = 0.95, 
                   showarrow = F, 
                   text = '<em>Source: Consumer Financial Protection Bureau</em>',
                   font = list(family = "serif",
                               size = 16, 
                               color = "#679bcb")),
              
              list(xref = "paper", yref = "paper", 
                   xanchor = "left", yanchor = "top",
                   x = 0.60, y = 0.20, 
                   showarrow = F, 
                   align = "left",
                   text = "Complaints received by the Consumer Financial Protection Bureau<br>regarding financial products and services offered by five large banks in<br>in the United States expressed as a percentage of total nummber<br>of complaints.",
                   font = list(family = "arial",
                               size = 12)),
              
              list(xref = "paper", yref = "paper", 
                   xanchor = "left", yanchor = "top",
                   x = 0.60, y = 0.05, 
                   showarrow = F, 
                   align = "left",
                   text = '<a href = "https://catalog.data.gov/dataset/consumer-complaint-database">Click here to go to source</a>',
                   font = list(family = "arial",
                               size = 14))
            ),
            
            shapes = list(
              list(
              xref = "paper", yref = "paper",
              x0 = 0, x1 = 0.95,
              y0 = 0, y1 = 1,
              type = "rect",
              layer = "above",
              fillcolor = "rgba(191, 191, 191, 0.1)",
              line = list(color = "transparent"))
            ))

print(p)

↧

Interactive Q-Q Plots in R using Plotly

June 27, 2016, 3:01 pm

≫ Next: Macroeconomic charts by the Fed using R and Plotly

≪ Previous: Radar charts in R using Plotly

Introduction

In a recent blog post, I introduced the new R package, manhattanly, which creates interactive manhattan plots using the plotly.js engine.

In this post, I describe how to create interactive Q-Q plots using the manhattanly package. Q-Q plots tell us about the distributional assumptions of the observed test statistics and are common visualisation tools in statistical analyses.

Visit the package website for full details and example usage.

Quick Start

The following three lines of code will produce the Q-Q plot below

install.packages("manhattanly")
library(manhattanly)
qqly(HapMap, snp = "SNP", gene = "GENE")

Notice that we have added two annotations (the SNP and nearest GENE), that are revealed when hovering the mouse over a point. This feature of interactive Q-Q plots adds a great deal of information to the plot without cluttering it with text.

The Data

Inspired by the heatmaply package by Tal Galili, we split the tasks into data pre-processing and plot rendering. Therefore, we can use the manhattanly::qqr function to get the data used to produce a Q-Q plot. This allows flexibility in the rendering of the plot, since any graphics package, such as plot in base R can make used to create the plot.

The plot data is derived using the manhattanly::qqr function:

qqrObject <- qqr(HapMap)
str(qqrObject)

## List of 6 ## $ data :'data.frame': 14412 obs. of 3 variables: ## ..$ P : num [1:14412] 6.75e-10 3.41e-09 3.95e-09 4.71e-09 5.02e-09 ... ## ..$ OBSERVED: num [1:14412] 9.17 8.47 8.4 8.33 8.3 ... ## ..$ EXPECTED: num [1:14412] 4.46 3.98 3.76 3.61 3.51 ... ## $ pName : chr "P" ## $ snpName : logi NA ## $ geneName : logi NA ## $ annotation1Name: logi NA ## $ annotation2Name: logi NA ## - attr(*, "class")= chr "qqr"

head(qqrObject[["data"]])

## P OBSERVED EXPECTED ## 4346 6.75010e-10 9.170690 4.459754 ## 4347 3.41101e-09 8.467117 3.982633 ## 4344 3.95101e-09 8.403292 3.760784 ## 4338 4.70701e-09 8.327255 3.614656 ## 4342 5.02201e-09 8.299122 3.505512 ## 4341 6.22801e-09 8.205651 3.418362

This qqrObject which is of class qqr can also be passed to the manhattanly::qqly function to produce the inteactive Q-Q plot above:

qqly(qqrObject)

Related Work

This work is based on the qqman package by Stephen Turner. It produces similar manhattan and Q-Q plots as the qqman::manhattan and qqman::qq functions; the main difference here is being able to interact with the plot, including extra annotation information and seamless integration with HTML.

↧

Macroeconomic charts by the Fed using R and Plotly

July 4, 2016, 6:37 am

≫ Next: Time series charts by the Economist in R using Plotly

≪ Previous: Interactive Q-Q Plots in R using Plotly

In this post we’ll try to replicate some of the charts created by the Federal Reserve which visualize some well known macroeconomic indicators. We’ll also showcase the new Plotly 4.0 syntax.

Key Macroeconomic Indicators

library(plotly)
library(zoo)

df %

  add_lines(data = df %&gt;% filter(variable != "recession"),
            color = ~variable, line = list(width = 3),
            hoverinfo = "x + y") %&gt;%

  add_lines(data = df %&gt;% filter(variable == "recession"),
            line = list(width = 0),
            fill = "tozerox",
            fillcolor = "rgba(64, 64, 64, 0.2)",
            showlegend = F,
            hoverinfo = "none") %&gt;%

  layout(title = "<b>Key Macroeconomic Indicators</b>",
         legend = list(x = 0.3, y = 0.05, orientation = "h"),
         yaxis = list(title = "", range = c(-5, 20), showgrid = F, zerolinewidth = 2, zeroliecolor = "#b3b3b3",
                      domain = c(0.1, 0.9),
                      showline = T,
                      ticklen = 4),
         xaxis = list(title = "", showgrid = F,
                      showline = T,
                      ticklen = 4,
                      rangeselector = list(x = 0.1, y = 0.95,
                        buttons = list(
                          list(
                            count = 5,
                            label = "5 Y",
                            step = "year",
                            stepmode = "todate"),

                          list(
                            count = 10,
                            label = "10Y",
                            step = "year",
                            stepmode = "todate"),

                          list(
                            count = 15,
                            label = "15 Y",
                            step = "year",
                            stepmode = "todate"),

                          list(
                            step = "all")
                        )
                      )),

         annotations = list(
           list(x = 0.9, y = -0.05,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = 'source: <a href="http://www.frbsf.org/education/files/CTF_chart_sources.pdf">BLS, BEA and Federal Reserve</a>'),

           list(x = -0.05, y = 0.95,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Percent</b>"),

           list(x = 0.05, y = 0.98,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Zoom</b>")
         ),

         width = 1024,
         height = 600)

Monitary Policy Transmission

library(plotly)
library(zoo)

df %

  add_lines(data = df %&gt;% filter(variable != "recession"),
            color = ~variable, line = list(width = 3),
            hoverinfo = "x + y") %&gt;%

  add_lines(data = df %&gt;% filter(variable == "recession"),
            line = list(width = 0),
            fill = "tozerox",
            fillcolor = "rgba(64, 64, 64, 0.2)",
            showlegend = F,
            hoverinfo = "none") %&gt;%

  layout(title = "<b>Monitory Policy Transmission</b>",
         legend = list(x = 0.3, y = 0.05, orientation = "h"),
         yaxis = list(title = "", range = c(0, 20), showgrid = F, zerolinewidth = 2, zerolinecolor = "#b3b3b3",
                      domain = c(0.1, 0.9),
                      showline = T,
                      ticklen = 4),
         xaxis = list(title = "", showgrid = F,
                      showline = T,
                      ticklen = 4,
                      rangeselector = list(x = 0.1, y = 0.95,
                                           buttons = list(
                                             list(
                                               count = 5,
                                               label = "5 Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               count = 10,
                                               label = "10Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               count = 15,
                                               label = "15 Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               step = "all")
                                           )
                      )),

         annotations = list(
           list(x = 0.9, y = -0.05,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = 'source: <a href="http://www.frbsf.org/education/files/CTF_chart_sources.pdf">Freddie Mac and Federal Reserve</a>'),

           list(x = -0.05, y = 0.95,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Percent</b>"),

           list(x = 0.05, y = 0.98,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Zoom</b>")
         ),

         width = 1024,
         height = 600)

Nominal and Real Fed Funds Rate

library(plotly)
library(zoo)

df %

  add_lines(data = df %&gt;% filter(variable != "recession"),
            color = ~variable, line = list(width = 3),
            hoverinfo = "x + y") %&gt;%

  add_lines(data = df %&gt;% filter(variable == "recession"),
            line = list(width = 0),
            fill = "tozerox",
            fillcolor = "rgba(64, 64, 64, 0.2)",
            showlegend = F,
            hoverinfo = "none") %&gt;%

  layout(title = "<b>Nominal and Read Fed Funds Rate</b>",
         legend = list(x = 0.3, y = 0.05, orientation = "h"),
         yaxis = list(title = "", range = c(-10, 25), showgrid = F, zerolinewidth = 2, zerolinecolor = "#b3b3b3",
                      domain = c(0.1, 0.9),
                      ticklen = 4,
                      showline = T),
         xaxis = list(title = "", showgrid = F,
                      showline = T,
                      ticklen = 4,
                      rangeselector = list(x = 0.1, y = 0.95,
                                           buttons = list(
                                             list(
                                               count = 5,
                                               label = "5 Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               count = 10,
                                               label = "10Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               count = 15,
                                               label = "15 Y",
                                               step = "year",
                                               stepmode = "todate"),

                                             list(
                                               step = "all")
                                           )
                      )),

         annotations = list(
           list(x = 0.9, y = -0.05,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = 'source: <a href="http://www.frbsf.org/education/files/CTF_chart_sources.pdf">BEA and Federal Reserve</a>'),

           list(x = -0.05, y = 0.95,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Percent</b>"),

           list(x = 0.05, y = 0.98,
                xref = "paper", yref = "paper",
                showarrow = F,
                text = "<b>Zoom</b>")
         ),

         width = 1024,
         height = 600)

Some other examples:

↧

Time series charts by the Economist in R using Plotly

July 11, 2016, 6:53 am

≫ Next: Principal Component Analysis Cluster Plots with Plotly

≪ Previous: Macroeconomic charts by the Fed using R and Plotly

In this post we’ll recreate two info graphics created by The Economist. The code uses the new Plotly 4.0 syntax.

Note: Plotly 4.0 has not been officially released yet. You can download the dev version using

devtools::install_github("ropensci/plotly@fix/nse")

Volume of google searches related to immigrating to Canada

library(plotly)
library(zoo)

# Trends Data
trends <- read.csv("https://cdn.rawgit.com/plotly/datasets/master/Move%20to%20Canada.csv", check.names = F, stringsAsFactors = F)
trends.zoo <- zoo(trends[,-1], order.by = as.Date(trends[,1], format = "%d/%m/%Y"))
trends.zoo <- aggregate(trends.zoo, as.yearmon, mean)

trends <- data.frame(Date = index(trends.zoo),
                     coredata(trends.zoo))

# Immigration Data
immi <- read.csv("https://cdn.rawgit.com/plotly/datasets/master/Canada%20Immigration.csv", stringsAsFactors = F)

labels <- format(as.yearmon(trends$Date), "%Y")
labels <- as.character(sapply(labels, function(x){
  unlist(strsplit(x, "20"))[2]
}))

test <- labels[1]
for(i in 2:length(labels)){
  if(labels[i] == test) {
    labels[i] <- ""
  }else{
    test <- labels[i]
  }
}
labels[1] <- "2004"
hovertext1 <- paste0("Date:<b>", trends$Date, "</b><br>",
                     "From US:<b>", trends$From.US, "</b><br>")

hovertext2 <- paste0("Date:<b>", trends$Date, "</b><br>",
                     "From Britain:<b>", trends$From.Britain, "</b><br>")


p <- plot_ly(data = trends, x = ~Date) %>%

  # Time series chart

  add_lines(y = ~From.US, line = list(color = "#00526d", width = 4),
            hoverinfo = "text", text = hovertext1, name = "From US") %>%

  add_lines(y = ~From.Britain, line = list(color = "#de6e6e", width = 4),
            hoverinfo = "text", text = hovertext2, name = "From Britain") %>%

  add_markers(x = c(as.yearmon("2004-11-01"), as.yearmon("2016-03-01")),
              y = c(24, 44),
              marker = list(size = 15, color = "#00526d"),
              showlegend = F) %>%

  add_markers(x = c(as.yearmon("2008-07-01"), as.yearmon("2016-07-01")),
              y = c(27, 45),
              marker = list(size = 15, color = "#de6e6e"),
              showlegend = F) %>%

  # Markers for legend
  add_markers(x = c(as.yearmon("2005-01-01"), as.yearmon("2005-01-01")),
              y = c(40, 33.33),
              marker = list(size = 15, color = "#00526d"),
              showlegend = F) %>%

  add_markers(x = c(as.yearmon("2005-01-01"), as.yearmon("2005-01-01")),
              y = c(36.67, 30),
              marker = list(size = 15, color = "#de6e6e"),
              showlegend = F) %>%

  add_text(x = c(as.yearmon("2004-11-01"), as.yearmon("2016-03-01")),
           y = c(24, 44),
           text = c("<b>1</b>", "<b>3</b>"),
           textfont = list(color = "white", size = 8),
           showlegend = F) %>%

  add_text(x = c(as.yearmon("2008-07-01"), as.yearmon("2016-07-01")),
           y = c(27, 45),
           text = c("<b>2</b>", "<b>4</b>"),
           textfont = list(color = "white", size = 8),
           showlegend = F) %>%

  # Text for legend
  add_text(x = c(as.yearmon("2005-01-01"), as.yearmon("2005-01-01"), as.yearmon("2005-01-01"), as.yearmon("2005-01-01")),
           y = c(40, 36.67, 33.33, 30),
           text = c("<b>1</b>", "<b>2</b>", "<b>3</b>", "<b>4</b>"),
           textfont = list(color = "white", size = 8),
           showlegend = F) %>%

  # Bar chart
  add_bars(data = immi, x = ~Year, y = ~USA, yaxis = "y2", xaxis = "x2", showlegend = F,
           marker = list(color = "#00526d"), name = "USA") %>%

  add_bars(data = immi, x = ~Year, y = ~UK, yaxis = "y2", xaxis = "x2", showlegend = F,
           marker = list(color = "#de6e6e"), name = "UK") %>%

  layout(legend = list(x = 0.8, y = 0.36, orientation = "h", font = list(size = 10),
                       bgcolor = "transparent"),

         yaxis = list(domain = c(0.4, 0.95), side = "right", title = "", ticklen = 0,
                      gridwidth = 2),

         xaxis = list(showgrid = F, ticklen = 4, nticks = 100,
                      ticks = "outside",
                      tickmode = "array",
                      tickvals = trends$Date,
                      ticktext = labels,
                      tickangle = 0,
                      title = ""),

         yaxis2 = list(domain = c(0, 0.3), gridwidth = 2, side = "right"),
         xaxis2 = list(anchor = "free", position = 0),

         # Annotations
         annotations = list(
           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = 1, showarrow = F,
                text = "<b>Your home and native land?</b>",
                font = list(size = 18, family = "Balto")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = 0.95, showarrow = F,
                align = "left",
                text = "<b>Google search volume for <i>'Move to Canada'</i></b><br><sup>100 is peak volume<br><b>Note</b> that monthly averages are used</sup>",
                font = list(size = 13, family = "Arial")),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = as.yearmon("2005-03-01"), y = 40, showarrow = F,
                align = "left",
                text = "<b>George W. Bush is re-elected</b>",
                font = list(size = 12, family = "Arial"),
                bgcolor = "white"),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = as.yearmon("2005-03-01"), y = 36.67, showarrow = F,
                align = "left",
                text = "<b>Canadian minister visits Britain, ecourages skilled workers to move</b>",
                font = list(size = 12, family = "Arial"),
                bgcolor = "white"),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = as.yearmon("2005-03-01"), y = 33.33, showarrow = F,
                align = "left",
                text = "<b>Super tuesday: Donald Trump wins 7 out of 11 republican primaries</b>",
                font = list(size = 12, family = "Arial"),
                bgcolor = "white"),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = as.yearmon("2005-03-01"), y = 30, showarrow = F,
                align = "left",
                text = "<b>Britain votes 52-48% to leave the Europen Union</b>",
                font = list(size = 12, family = "Arial"),
                bgcolor = "white"),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = 0.3, showarrow = F,
                align = "left",
                text = "<b>Annual immigration to Canada</b>",
                font = list(size = 12, family = "Arial")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = -0.07, showarrow = F,
                align = "left",
                text = "<b>Source:</b> Google trends and national statistics",
                font = list(size = 12, family = "Arial")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0.85, y = 0.98, showarrow = F,
                align = "left",
                text = 'Inspired by <a href = "http://www.economist.com/blogs/graphicdetail/2016/07/daily-chart">The economist</a>',
                font = list(size = 12, family = "Arial"))),

         paper_bgcolor = "#f2f2f2",
         margin = list(l = 18, r = 30, t = 18),
         width = 1024,height = 600)

print(p)

AIDS related Visualization

library(plotly)
library(zoo)
library(tidyr)
library(dplyr)

# Aids Data
df <- read.csv("https://cdn.rawgit.com/plotly/datasets/master/Aids%20Data.csv", stringsAsFactors = F)

# AIDS Related Deaths ####
plot.df <- df %>%
  filter(Indicator == "AIDS-related deaths") %>%
  filter(Subgroup %in% c("All ages estimate",
                         "All ages upper estimate",
                         "All ages lower estimate"))

# Munge
plot.df <- plot.df %>%
  select(Subgroup, Time.Period, Data.Value) %>%
  spread(Subgroup, Data.Value) %>%
  data.frame()

hovertxt <- paste0("<b>Year: </b>", plot.df$Time.Period, "<br>",
                   "<b>Est.: </b>", round(plot.df$All.ages.estimate/1e6,2),"M<br>",
                   "<b>Lower est.: </b>", round(plot.df$All.ages.lower.estimate/1e6,2),"M<br>",
                   "<b>Upper est.: </b>", round(plot.df$All.ages.upper.estimate/1e6,2), "M")

# Plot
p <- plot_ly(plot.df, x = ~Time.Period, showlegend = F) %>%
  add_lines(y = ~All.ages.estimate/1e6, line = list(width = 4, color = "#1fabdd"),
            hoverinfo = "text", text = hovertxt) %>%
  add_lines(y = ~All.ages.lower.estimate/1e6, line = list(color = "#93d2ef"),
            hoverinfo = "none") %>%
  add_lines(y = ~All.ages.upper.estimate/1e6, line = list(color = "#93d2ef"),
            fill = "tonexty",
            hoverinfo = "none")

# New HIV Infections ####
plot.df <- df %>%
  filter(Indicator == "New HIV Infections") %>%
  filter(Subgroup %in% c("All ages estimate",
                         "All ages upper estimate",
                         "All ages lower estimate"))

# Munge
plot.df <- plot.df %>%
  select(Subgroup, Time.Period, Data.Value) %>%
  spread(Subgroup, Data.Value) %>%
  data.frame()

hovertxt <- paste0("<b>Year: </b>", plot.df$Time.Period, "<br>",
                   "<b>Est.: </b>", round(plot.df$All.ages.estimate/1e6,2),"M<br>",
                   "<b>Lower est.: </b>", round(plot.df$All.ages.lower.estimate/1e6,2),"M<br>",
                   "<b>Upper est.: </b>", round(plot.df$All.ages.upper.estimate/1e6,2), "M")

# Add to current plot
p <- p %>%
  add_lines(data = plot.df, y = ~All.ages.estimate/1e6, line = list(width = 4, color = "#00587b"),
            hoverinfo = "text", text = hovertxt) %>%
  add_lines(data = plot.df, y = ~All.ages.lower.estimate/1e6, line = list(color = "#3d83a3"),
            hoverinfo = "none") %>%
  add_lines(data = plot.df, y = ~All.ages.upper.estimate/1e6, line = list(color = "#3d83a3"),
            fill = "tonexty",
            hoverinfo = "none")

# People receiving ART ####
x <- c(2010:2015)
y <- c(7501470, 9134270, 10935600, 12936500, 14977200, 17023200)

hovertxt <- paste0("<b>Year:</b>", x, "<br>",
                   "<b>Est.:</b> ", round(y/1e6,2), "M")

p <- p %>%
  add_lines(x = x, y = y/1e6, line = list(width = 5, color = "#e61a20"),
            yaxis = "y2",
            hoverinfo = "text", text = hovertxt)

# Layout
p <- p %>%
  layout(xaxis = list(title = "", showgrid = F, ticklen = 4, ticks = "inside",
                      domain = c(0, 0.9)),
         yaxis = list(title = "", gridwidth = 2, domain = c(0, 0.9), range = c(-0.01, 4)),
         yaxis2 = list(overlaying = "y", side = "right", showgrid = F, color = "#e61a20",
                       range = c(5,18)),

         annotations = list(
           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = 1, showarrow = F, align = "left",
                text = "<b>Keeping the pressure up<br><sup>Worldwide, (in millions)</sup></b>",
                font = list(size = 18, family = "Arial")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0, y = -0.07, showarrow = F, align = "left",
                text = "<b>Source: UNAIDS</b>",
                font = list(size = 10, family = "Arial", color = "#bfbfbf")),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = 1995, y = 3.92, showarrow = F, align = "left",
                text = "<b>New HIV Infections(per year)</b>",
                font = list(size = 12, family = "Arial", color = "#00587b")),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = 1999, y = 1, showarrow = F, align = "left",
                text = "<b>AIDS related deaths (per year)</b>",
                font = list(size = 12, family = "Arial", color = "#1fabdd")),

           list(xref = "plot", yref = "plot", xanchor = "left", yanchor = "right",
                x = 2010, y = 3, showarrow = F, align = "left",
                text = "<b>People receving Anti-<br>Retroviral Therapy (total)</b>",
                font = list(size = 12, family = "Arial", color = "#e61a20")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "right",
                x = 0.85, y = 0.98, showarrow = F,
                align = "left",
                text = 'Inspired by <a href = "http://www.economist.com/blogs/graphicdetail/2016/05/daily-chart-23">The economist</a>',
                font = list(size = 12, family = "Arial")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "middle",
                x = 0.375, y = 0.9, showarrow = F, align = "left",
                text = "<b>Lower bound</b>",
                font = list(size = 10, family = "Arial", color = "#8c8c8c")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "middle",
                x = 0.375, y = 0.95, showarrow = F, align = "left",
                text = "<b>Higher bound</b>",
                font = list(size = 10, family = "Arial", color = "#8c8c8c")),

           list(xref = "paper", yref = "paper", xanchor = "left", yanchor = "middle",
                x = 0.485, y = 0.925, showarrow = F, align = "left",
                text = "<b>Estimate</b>",
                font = list(size = 10, family = "Arial", color = "#8c8c8c"))
         ),

         shapes = list(
           list(type = "rectangle",
                xref = "paper", yref = "paper",
                x0 = 0.45, x1 = 0.48, y0 = 0.9, y1 = 0.95,
                fillcolor = "#d9d9d9",
                line = list(width = 0)),

           list(type = "line",
                xref = "paper", yref = "paper",
                x0 = 0.45, x1 = 0.48, y0 = 0.9, y1 = 0.9,
                line = list(width = 2, color = "#8c8c8c")),

           list(type = "line",
                xref = "paper", yref = "paper",
                x0 = 0.45, x1 = 0.48, y0 = 0.95, y1 = 0.95,
                fillcolor = "#bfbfbf",
                line = list(width = 2, color = "#8c8c8c")),

           list(type = "line",
                xref = "paper", yref = "paper",
                x0 = 0.45, x1 = 0.48, y0 = 0.925, y1 = 0.925,
                fillcolor = "#bfbfbf",
                line = list(width = 2, color = "#404040"))),

         height = 600,width = 1024)

print(p)

↧

Principal Component Analysis Cluster Plots with Plotly

July 19, 2016, 9:16 am

≫ Next: Candlestick charts using Quandl and Plotly

≪ Previous: Time series charts by the Economist in R using Plotly

The Problem

When clustering data using principal component analysis, it is often of interest to visually inspect how well the data points separate in 2-D space based on principal component scores. While this is fairly straightforward to visualize with a scatterplot, the plot can become cluttered quickly with annotations as shown in the following figure:

Solution using `ggrepel`

The ggrepel package by Kamil Slowikowski implements functions to repel overlapping text labels away from each other and away from the data points that they label. It’s an easy to use package that works well in this example as shown in the following figure:

Solution using `plotly`

An alternative solution is to use interactive plots that are usable from the R console, in the RStudio viewer pane, in R Markdown documents, and in Shiny apps. Annotations can be viewed by hovering the mouse pointer over a point or dragging a rectangle around the relevant area to zoom in. Interactive plots using plotly allow you to de-clutter the plotting area, include extra annotation information and create interactive web-based visualizations directly from R. Once uploaded to a plotly account, plotly graphs (and the data behind them) can be viewed and modified in a web browser.

The resulting plot is clean and not cluttered with text annotations. While the ggrepel package provides a nice solution in this example, the plotly solution will be even more useful with a larger number of data points.

The Code

Principal Component Analysis and Hierarchical Clustering

# cor = TRUE indicates that PCA is performed on 
# standardized data (mean = 0, variance = 1)
pcaCars <- princomp(mtcars, cor = TRUE)

# view objects stored in pcaCars
names(pcaCars)

# proportion of variance explained
summary(pcaCars)

# scree plot
plot(pcaCars, type = "l")

# cluster cars
carsHC <- hclust(dist(pcaCars$scores), method = "ward.D2")

# dendrogram
plot(carsHC)

# cut the dendrogram into 3 clusters
carsClusters <- cutree(carsHC, k = 3)

# add cluster to data frame of scores
carsDf <- data.frame(pcaCars$scores, "cluster" = factor(carsClusters))
carsDf <- transform(carsDf, cluster_name = paste("Cluster",carsClusters))

First figure using `ggplot2`

library(ggplot2)
p1 <- ggplot(carsDf,aes(x=Comp.1, y=Comp.2)) +
      theme_classic() +
      geom_hline(yintercept = 0, color = "gray70") +
      geom_vline(xintercept = 0, color = "gray70") +
      geom_point(aes(color = cluster), alpha = 0.55, size = 3) +
      xlab("PC1") +
      ylab("PC2") + 
      xlim(-5, 6) + 
      ggtitle("PCA Clusters from Hierarchical Clustering of Cars Data") 

p1 + geom_text(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

Second figure using `ggplot2` with `ggrepel`

library(ggplot2)
library(ggrepel)

p1 + geom_text_repel(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

Interactive plot using `plotly`

library(plotly)
p <- plot_ly(carsDf, x = Comp.1 , y = Comp.2, text = rownames(carsDf),
             mode = "markers", color = cluster_name, marker = list(size = 11)) 

p <- layout(p, title = "PCA Clusters from Hierarchical Clustering of Cars Data", 
       xaxis = list(title = "PC 1"),
       yaxis = list(title = "PC 2"))

p

References

PCA with R by Gaston Sanchez

↧

Candlestick charts using Quandl and Plotly

July 19, 2016, 9:56 am

≫ Next: New feature: Dropdown menus in Plotly and R

≪ Previous: Principal Component Analysis Cluster Plots with Plotly

In this post we’ll show how to create candle stick charts using the new plotly 4.0 syntax. You can refer to this older post as well.

This time we’ll use the Quandl package to retrieve stock data. See here for more details.

install.packages("Quandl")
library(Quandl)
df <- Quandl("WIKI/AAPL")
df <- df[,c(1, 9:12)]

names(df) <- c("Date", "Open", "High", "Low", "Close")
df$Date <- as.Date(df$Date)

df <- df[1:1000,]

hovertxt <- Map(function(x, y)paste0(x, ":", y), names(df), df)
hovertxt <- Reduce(function(x, y)paste0(x, "<br&gt;", y), hovertxt)

plot_ly(df, x = ~Date, xend = ~Date, hoverinfo = "none",
        color = ~Close > Open, colors = c("#00b386","#ff6666")) %>%
  
  add_segments(y = ~Low, yend = ~High, line = list(width = 1, color = "black")) %>%
  
  add_segments(y = ~Open, yend = ~Close, line = list(width = 3)) %>%
  
  add_markers(y = ~(Low + High)/2, hoverinfo = "text",
              text = hovertxt, marker = list(color = "transparent")) %>% 
  
  layout(showlegend = FALSE, 
         color = "white",
         yaxis = list(title = "Price", domain = c(0, 0.9)),
         annotations = list(
           list(xref = "paper", yref = "paper", 
                x = 0, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste0("<b>AAPL</b>")),
           
           list(xref = "paper", yref = "paper", 
                x = 0.75, y = 1, showarrow = F, 
                xanchor = "left", yanchor = "top",
                align = "left",
                text = paste(range(df$Date), collapse = " : "),
                font = list(size = 8))),
         plot_bgcolor = "#f2f2f2")

↧

New feature: Dropdown menus in Plotly and R

August 9, 2016, 11:51 pm

≫ Next: Using cranlogs in R with Plotly

≪ Previous: Candlestick charts using Quandl and Plotly

In this post we’ll showcase the new dropdown menu button feature. It adds a layer of interactivity that is similar to shiny but supported within the plotly package.

Adding new menu buttons is as simple as specifying the updatemenus parameter inside layout.
Each dropdown menu needs to be a separate list with position parameters x and y a well as a buttons parameter which houses individual named items with some additional parameters. Below are some examples:

Re-Styling a graph

library(plotly)

x <- seq(-2*pi, 2*pi, length.out = 1000)
df <- data.frame(x, y1 = sin(x), y2 = cos(x))

p <- plot_ly(df, x = ~x) %>%
  add_lines(y = ~y1, name = "A") %>%
  add_lines(y = ~y2, name = "B", visible = F)


p <- p %>% layout(
  title = "Drop down menus - Styling",
  xaxis = list(domain = c(0.1, 1)),
  yaxis = list(title = "y"),
  updatemenus = list(
    list(
      y = 0.8,
      buttons = list(

        list(method = "restyle",
             args = list("line.color", "blue"),
             label = "Blue"),

        list(method = "restyle",
             args = list("line.color", "red"),
             label = "Red"))),

    list(
      y = 0.7,
      buttons = list(
        list(method = "restyle",
             args = list("visible", list(TRUE, FALSE)),
             label = "Sin"),

        list(method = "restyle",
             args = list("visible", list(FALSE, TRUE)),
             label = "Cos")))
  ))

p

Changing chart type

library(MASS)
library(plotly)
covmat <- matrix(c(0.8, 0.4, 0.3, 0.8), nrow = 2, byrow = T)
df <- mvrnorm(n = 10000, c(0,0), Sigma = covmat)
df <- as.data.frame(df)

colnames(df) <- c("x", "y")
p <- plot_ly(df, x = ~x, y = ~y) %>%
  add_markers(marker = list(opacity = 0.3, line = list(color = "black", width = 1)))

p <- p %>% layout(
  title = "Drop down menus - Plot type",
  xaxis = list(domain = c(0.1, 1)),
  yaxis = list(title = "y"),
  updatemenus = list(
    list(
      y = 0.8,
      buttons = list(

        list(method = "restyle",
             args = list("type", "scatter"),
             label = "Scatter"),

        list(method = "restyle",
             args = list("type", "histogram2d"),
             label = "2D Histogram")))
  ))

p

↧

Using cranlogs in R with Plotly

August 22, 2016, 7:48 am

≫ Next: NBA shots analysis using Plotly shapes

≪ Previous: New feature: Dropdown menus in Plotly and R

In this post we’ll use the cranlogs package to visualize the number of downloads for Plotly’s R API

library(cranlogs)
library(plotly)
library(zoo)

# Get data
df <- cran_downloads(packages = c("plotly", "ggplot2"), 
                     from = "2015-10-01", to = "2016-08-01")

# Convert dates
df$date <- as.Date(df$date)

# Make data frame
df <- data.frame(date = unique(df$date),
                 count.plotly = subset(df, package == "plotly")$count,
                 count.ggplot = subset(df, package == "ggplot2")$count)

# Smooth data 
# 5 day moving averages
nWidth <- 5
df <- zoo(df[,-1], order.by = df[,1])
df.smooth <- rollapply(df, width = nWidth, FUN = mean)

df <- data.frame(date = index(df.smooth),
                 count.plotly = round(df.smooth[,1],0),
                 count.ggplot = round(df.smooth[,2],0))

plot_ly(df, x = ~date) %>% 
  add_lines(y = ~count.plotly, fill = "tozeroy", line = list(shape = "spline"), name = "Plotly") %>% 
  add_lines(y = ~count.ggplot, fill = "tozeroy", line = list(shape = "spline"), name = "ggplot2", 
            yaxis = "y2", xaxis = "x2") %>% 
  
  layout(yaxis = list(domain = c(0.55,1), title = "Count", anchor = "xaxis",
                      tickfont = list(color = "#595959", size = 12)),
         
         yaxis2 = list(domain = c(0, 0.45), title = "Count", anchor = "xaxis2",
                       tickfont = list(color = "#595959", size = 12)),
         
         xaxis = list(anchor = "yaxis", title = "Date"),
         
         xaxis2 = list(anchor = "free", title = "Date"),
         
         annotations = list(
           
           list(x = 0.05, y = 1, xref = "paper", yref = "paper",
                showarrow = FALSE,
                text = "<b>Plotly downloads</b>",
                font = list(size = 17)),
           
           list(x = 0.05, y = 0.45, xref = "paper", yref = "paper",
                showarrow = FALSE,
                text = "<b>ggplot2 downloads</b>",
                font = list(size = 17))))

↧

NBA shots analysis using Plotly shapes

August 25, 2016, 8:08 am

≫ Next: Analyzing Plotly’s Python package downloads

≪ Previous: Using cranlogs in R with Plotly

In this post, we will analyse the shots by Stephen Curry, ‘Top Scorer’ of the NBA season 2015-16.

You can create SVG shapes like line, circle, rectangle, and path using Plotly’s shapes feature. With the help of the shapes, we will create the basketball court and plot all his shots on it.

Data Collection

We will collect the necessary data from NBA Stats.

import requests as r

# Chrome's user-agent string, to simulate a browser visiting the webpage
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
}

# Stephen Curry's player id
player_id = 201939

# season details
season = '2015-16'
season_type = 'Regular Season'

# request parameters
req_params = {
 'AheadBehind': '',
 'ClutchTime': '',
 'ContextFilter': '',
 'ContextMeasure': 'FGA',
 'DateFrom': '',
 'DateTo': '',
 'EndPeriod': '',
 'EndRange': '',
 'GameID': '',
 'GameSegment': '',
 'LastNGames': 0,
 'LeagueID': '00',
 'Location': '',
 'Month': 0,
 'OpponentTeamID': 0,
 'Outcome': '',
 'Period': 0,
 'PlayerID': player_id,
 'PointDiff': '',
 'Position': '',
 'RangeType': '',
 'RookieYear': '',
 'Season': season,
 'SeasonSegment': '',
 'SeasonType': season_type,
 'StartPeriod': '',
 'StartRange': '',
 'TeamID': 0,
 'VsConference': '',
 'VsDivision': ''
}

res = r.get('http://stats.nba.com/stats/shotchartdetail', params=req_params, headers=headers)

Data Transformation

After collecting the necessary data, the next step is to transform the data in proper format for querying.

Using pandas we can create a DataFrame object from the JSON response content.

import pandas as pd

res_json = res.json()

# column names
rows = res_json['resultSets'][0]['headers']
# row content
shots_data = res_json['resultSets'][0]['rowSet']

shots_df = pd.DataFrame(shots_data, columns=rows)

We can see that there are two unique values (‘Made Shot’, ‘Missed Shot’) for the column ‘EVENT_TYPE’ in the DataFrame. They represent the shots that made (or missed) it to the basket.

shots_df['EVENT_TYPE'].unique()
&gt;&gt; array([u'Made Shot', u'Missed Shot'], dtype=object)

A row in the DataFrame represents a single shot and the columns ‘LOC_X’ and ‘LOC_Y’ represents the location of that shot.

Shot Locations

Let’s start with creating a scatter chart of all the ‘missed’ shots by Stephen Curry.

import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode()

shot_trace = go.Scatter(
    x = shots_df[shots_df['EVENT_TYPE'] == 'Missed Shot']['LOC_X'],
    y = shots_df[shots_df['EVENT_TYPE'] == 'Missed Shot']['LOC_Y'],
    mode = 'markers'
)

data = [shot_trace]
layout = go.Layout(
    showlegend=False,
    height=600,
    width=600
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

We can see that most of the shots are near the hoop and three-point arc (line).

Similarly, you can create the chart for all the successful shots using the following Pandas selection syntax.

shots_df[shots_df['EVENT_TYPE'] == 'Made Shot']

It will select all the shot with ‘EVENT_TYPE’ equal to ‘Made Shot’.

Creating the court

The X-axis and Y-axis of our court chart will range from -300 to 300 and -100 to 500 respectively, 10 units on the chart scale is equal to 1 feet.

For reference to the court dimensions, we are using this image linked in the post by Savvas Tjortjoglou.

1. Outer Lines

The boundary of the court looks like a rectangle of the size 50(ft.) X 94(ft.), we are drawing just the half (47 ft.) of it in length.

Here, the points (x0, y0) and (x1, y1) represents the bottom-left and top-right points of the rectangle.

# list containing all the shapes
court_shapes = []

outer_lines_shape = dict(
  type='rect',
  xref='x',
  yref='y',
  x0='-250',
  y0='-47.5',
  x1='250',
  y1='422.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(outer_lines_shape)

2. basketball hoop

We will draw it using a circle shape. The center of the circle is at the origin of the graph, with the radius being 7.5 unit.

hoop_shape = dict(
  type='circle',
  xref='x',
  yref='y',
  x0='7.5',
  y0='7.5',
  x1='-7.5',
  y1='-7.5',
  line=dict(
    color='rgba(10, 10, 10, 1)',
    width=1
  )
)

court_shapes.append(hoop_shape)

3. Basket Backboard

The Backboard is a raised vertical board with a basket attached. It’s 72 inches (60 unit) wide.

backboard_shape = dict(
  type='rect',
  xref='x',
  yref='y',
  x0='-30',
  y0='-7.5',
  x1='30',
  y1='-6.5',
  line=dict(
    color='rgba(10, 10, 10, 1)',
    width=1
  ),
  fillcolor='rgba(10, 10, 10, 1)'
)

court_shapes.append(backboard_shape)

4. Outer box of three-second area

It’s a rectangle with 16 ft. in width and 19 ft. in length.

outer_three_sec_shape = dict(
  type='rect',
  xref='x',
  yref='y',
  x0='-80',
  y0='-47.5',
  x1='80',
  y1='143.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(outer_three_sec_shape)

5. Inner box of three-second area

It’s a rectangle with 12 ft. in width and 19 ft. in length.

inner_three_sec_shape = dict(
  type='rect',
  xref='x',
  yref='y',
  x0='-60',
  y0='-47.5',
  x1='60',
  y1='143.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(inner_three_sec_shape)

6. Three-point line (left)

The left side line of the Three-point line, 14 ft. in length.

The points (x0, y0) and (x1, y1) represents the edges of the line.

left_line_shape = dict(
  type='line',
  xref='x',
  yref='y',
  x0='-220',
  y0='-47.5',
  x1='-220',
  y1='92.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(left_line_shape)

7. Three-point line (right)

The right side line of the Three-point line.

right_line_shape = dict(
  type='line',
  xref='x',
  yref='y',
  x0='220',
  y0='-47.5',
  x1='220',
  y1='92.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(right_line_shape)

8. Three-point arc

The extreme point of the arc is 23.9 feet away from the origin.

We are using the Curve Command (C) to draw the half circle (arc) path. You can learn more about SVG paths from this tutorial by Mozilla.

three_point_arc_shape = dict(
  type='path',
  xref='x',
  yref='y',
  path='M -220 92.5 C -70 300, 70 300, 220 92.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(three_point_arc_shape)

9. Center circle

This circle has a radius of 6 feets.

center_circe_shape = dict(
  type='circle',
  xref='x',
  yref='y',
  x0='60',
  y0='482.5',
  x1='-60',
  y1='362.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(center_circle_shape)

10. Restraining circe

This circle has a radius of 2 feets.

res_circle_shape = dict(
  type='circle',
  xref='x',
  yref='y',
  x0='20',
  y0='442.5',
  x1='-20',
  y1='402.5',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(res_circle_shape)

11. Free-throw circle

This circle has a radius of 6 feets.

free_throw_circle_shape = dict(
  type='circle',
  xref='x',
  yref='y',
  x0='60',
  y0='200',
  x1='-60',
  y1='80',
  line=dict(
      color='rgba(10, 10, 10, 1)',
      width=1
  )
)

court_shapes.append(free_throw_circle_shape)

12. Restricted area

We are using the dash property to style the circle, it has a radius of 6 feet.

res_area_shape = dict(
  type='circle',
  xref='x',
  yref='y',
  x0='40',
  y0='40',
  x1='-40',
  y1='-40',
  line=dict(
    color='rgba(10, 10, 10, 1)',
    width=1,
    dash='dot'
  )
)

court_shapes.append(res_area_shape)

That’s the basketball court outline created with the help of Plotly shapes and annotations.

Charting the shots

Now that we have all the shapes for the court, we will plot all the shots on it as a scatter plot.

We have made two traces for the ‘missed’ and ‘made’ type shots, with different colors to help us identify.

missed_shot_trace = go.Scatter(
    x = shots_df[shots_df['EVENT_TYPE'] == 'Missed Shot']['LOC_X'],
    y = shots_df[shots_df['EVENT_TYPE'] == 'Missed Shot']['LOC_Y'],
    mode = 'markers',
    name = 'Missed Shot',
    marker = dict(
        size = 5,
        color = 'rgba(255, 255, 0, .8)',
        line = dict(
            width = 1,
            color = 'rgb(0, 0, 0, 1)'
        )
    )
)

made_shot_trace = go.Scatter(
    x = shots_df[shots_df['EVENT_TYPE'] == 'Made Shot']['LOC_X'],
    y = shots_df[shots_df['EVENT_TYPE'] == 'Made Shot']['LOC_Y'],
    mode = 'markers',
    name = 'Made Shot',
    marker = dict(
        size = 5,
        color = 'rgba(0, 200, 100, .8)',
        line = dict(
            width = 1,
            color = 'rgb(0, 0, 0, 1)'
        )
    )
)

data = [missed_shot_trace, made_shot_trace]

layout = go.Layout(
    title='Shots by Stephen Curry in NBA session 2015-16',
    showlegend=True,
    xaxis=dict(
        showgrid=False,
        range=[-300, 300]
    ),
    yaxis=dict(
        showgrid=False,
        range=[-100, 500]
    ),
    height=600,
    width=650,
    shapes=court_shapes
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

You can toggle and view the distribution of different shot types by clicking on the legend.

↧

Analyzing Plotly’s Python package downloads

August 29, 2016, 6:03 am

≫ Next: Radial Stacked Area Chart in R using Plotly

≪ Previous: NBA shots analysis using Plotly shapes

In this post, we will collect and analyze download statistics for Plotly’s Python package available on PyPI. We will also compare the downloads with other interactive charting tools like Bokeh, Vincent, and MPLD3.

Data Collection

PyPI used to show download stats for the packages, but they have terminated the service as they are currently developing the next generation of Python Package Repository, warehouse.

Linehaul will act as a statistics collection daemon for incoming logs from the new PyPI (warehouse). Right now, the current activity log on PyPI is being stored in a BigQuery database. (source: [Distutils] Publicly Queryable Statistics)

import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

init_notebook_mode(connected=True)

We will use the gbq.read_gbq function to read BigQuery dataset into Pandas DataFrame objects.

import pandas as pd
from pandas.io import gbq

import numpy as np

We will use linregress function for linear regression of scatter plots.

from scipy.stats import linregress

Read the post Using Google BigQuery with Plotly and Pandas to create a new project.

project_id = 'sixth-edition-678'

This query will collect the timestamp, package name, and total download count columns from the table (on a daily basis).

daily_download_query = """
SELECT
  DATE(timestamp) as day,
  MONTH(timestamp) as month,
  file.project,
  COUNT(*) as total_downloads,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    TIMESTAMP("20120701"),
    CURRENT_TIMESTAMP()
  )
WHERE
  file.project = '{0}'
GROUP BY
  day, file.project, month
ORDER BY
  day asc
"""

The following function run the query and returns a DataFrame object, if successful.

def package_df(package):
    """ Return the query result as a pandas.DataFrame object
    
    param: package(str): Name of the package on PyPI
    """
    
    try:
        df = gbq.read_gbq(daily_download_query.format(package), project_id=project_id)
        return df
    except:
        raise IOError

We will construct different DataFrames for each package.

plotly_df = package_df('plotly')
bokeh_df = package_df('bokeh')
matplotlib_df = package_df('matplotlib')
mpld3_df = package_df('mpld3')
vincent_df = package_df('vincent')

Inspecting for missing data

Using a simple TimeDelta calculation, we can find if some rows are missing from the DataFrame.

from datetime import datetime, timedelta

# Number of rows in the DataFrame
actual_rows = len(plotly_df)

start_date = datetime.strptime(plotly_df.iloc[0]['day'], '%Y-%m-%d') # 2016-01-22
end_date = datetime.strptime(plotly_df.iloc[actual_rows - 1]['day'], '%Y-%m-%d') # 2016-08-29

# Expected rows if there was no missing data (day)
expected_rows = (end_date - start_date).days + 1

if (actual_rows != expected_rows):
    print "{0} rows are missing in the DataFrame.".format(expected_rows - actual_rows)

We find that there are no rows from 2016-03-06 to 2016-05-21.

Data Transformation

Here, we will append the missing values in the DataFrames.

missing_data_start_date = '2016-03-06'
missing_data_end_date = '2016-05-21'

# starting/ending date for missing data and time differene (1 day)
s = datetime.strptime(missing_data_start_date, '%Y-%m-%d')
e = datetime.strptime(missing_data_end_date, '%Y-%m-%d')
diff = timedelta(days=1)

# generate all the missing dates in the same format
missing_dates = []
missing_dates_month = []

while (s &lt;= e):
    missing_dates.append(s.strftime('%Y-%m-%d'))
    missing_dates_month.append(int(s.strftime('%m')[1]))
    s += diff
    
missing_row_count = len(missing_dates) # 77

We are using the pandas.concat function to append the new DataFrame with missing values to the old DataFrame.

The following function returns the updated DataFrame after sorting it (sort_values) by the values in the column ‘day’.

def append_missing_data(dataframe, package):
    """Append the missing dates DataFrame to a given DataFrame
    
    param: dataframe(pandas.DataFrame): DataFrame to append
    param: package(str): Name of package on PyPI
    """
    
    missing_dates_df = pd.DataFrame({'day': missing_dates,
                                    'month': missing_dates_month,
                                    'file_project': [package for i in range(missing_row_count)],
                                    'total_downloads': [0 for i in range(missing_row_count)]}
                                   )
    
    # place the appended columns at their right place by sorting
    new_df = pd.concat([dataframe, missing_dates_df])
    
    return new_df.sort_values('day')

Updated DataFrames with the recovered missing data.

bokeh_df = append_missing_data(bokeh_df, 'bokeh')
matplotlib_df = append_missing_data(matplotlib_df, 'matplotlib')
mpld3_df = append_missing_data(mpld3_df, 'mpld3')
plotly_df = append_missing_data(plotly_df, 'plotly')
vincent_df = append_missing_data(vincent_df, 'vincent')

Package Downloads Comparison (daily)

trace1 = go.Scatter(
    x=plotly_df['day'],
    y=plotly_df['total_downloads'],
    name='Plotly',
    mode='lines',
    line=dict(width=0.5,
              color='rgb(10. 240, 10)'),
    fill='tonexty'
)

trace2 = go.Scatter(
    x=bokeh_df['day'],
    y=bokeh_df['total_downloads'],
    name='Bokeh',
    mode='lines',
    line=dict(width=0.5,
              color='rgb(42, 77, 20)'),
    fill='tonexty'
)

trace3 = go.Scatter(
    x=mpld3_df['day'],
    y=mpld3_df['total_downloads'],
    name='MPLD3',
    mode='lines',
    line=dict(width=0.5,
              color='rgb(20, 33, 61)'),
    fill='tonexty'
)

trace4 = go.Scatter(
    x=vincent_df['day'],
    y=vincent_df['total_downloads'],
    name='Vincent',
    mode='lines',
    line=dict(width=0.5,
              color='rgb(0, 0, 0)'),
    fill='tonexty'
)

data = [trace1, trace2, trace3, trace4]

layout = go.Layout(
    title='Package Downloads Comparison (Daily)',
    showlegend=True,
    xaxis=dict(
        type='category',
        showgrid=False
    ),
    yaxis=dict(
        title='No. of downloads (daily)',
        type='linear',
        range=[1, 10000]
    ),
    plot_bgcolor='rgba(250, 250, 250, 1)',
    shapes=[
        dict(
            type='line',
            xref='x',
            yref='y',
            x0='45',
            y0='2000',
            x1='120',
            y1='2000'
        )
    ],
    annotations=[
        dict(
            x=75,
            y=2400,
            xref='x',
            yref='y',
            text="PyPI's stats collection service was down from March 6 to May 21",
            showarrow=False
        ),
        dict(
            x=115,
            y=9600,
            xref='x',
            yref='y',
            text='From Jan 22, 2016 To Aug 29, 2016',
            showarrow=False
        ),
        dict(
            x=121,
            y=2000,
            xref='x',
            yref='y',
            text="",
            showarrow=True,
            ay=0,
            ax=-5
        ),
        dict(
            x=45,
            y=2000,
            xref='x',
            yref='y',
            text="",
            showarrow=True,
            ay=0,
            ax=5
        )
    ]
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Package Downloads Comparison (Monthly)

The dataset was created on Jan 22, 2016. We will use these months on the x-axis.

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug']

We are using pandas’ groupby method to gather all the row by their month value and then adding their count to find out ‘total downloads’ in the month.

trace1 = go.Bar(x=months, y=plotly_df.groupby('month').sum()['total_downloads'], name='Plotly')
trace2 = go.Bar(x=months, y=vincent_df.groupby('month').sum()['total_downloads'], name='Vincent')
trace3 = go.Bar(x=months, y=bokeh_df.groupby('month').sum()['total_downloads'], name='Bokeh')
trace4 = go.Bar(x=months, y=mpld3_df.groupby('month').sum()['total_downloads'], name='MPLD3')

data = [trace1, trace2, trace3, trace4]

layout = go.Layout(
    barmode='group',
    title="Package Downloads Comparison (PyPI)",
    yaxis=dict(
        title='No. of downloads (monthly)'
    ),
    xaxis=dict(
        title='Month'
    ),
    annotations=[
        dict(
            x=3,
            y=0,
            xref='x',
            yref='y',
            text="PyPI's stats collection service
was down from March 6 to May 21",
            showarrow=True,
            arrowhead=2,
            ax=0,
            ay=-150
        ),
        dict(
            x=3.7,
            y=90000,
            xref='x',
            yref='y',
            text='From Jan 22, 2016 To Aug 29, 2016',
            showarrow=False
        )
    ]
)
fig = go.Figure(data=data, layout=layout)
iplot(fig)

Growth of Plotly package downloads

Following the tutorial Linear fit in Python, we will try to find an approximate regression line for the scatter graph of Plotly package’s downloads.

xvals = np.arange(0, len(plotly_df))

The following traces are for the package downloads scatter plot (for each package).

trace1 = go.Scatter(
    x=xvals[:44], 
    y=plotly_df['total_downloads'].iloc[:44], 
    mode='markers',
    marker=go.Marker(color='rgb(255, 127, 14)',size=5,symbol='x'),
    name='Plotly Downloads'
)

trace2 = go.Scatter(
    x=xvals[121:], 
    y=plotly_df['total_downloads'].iloc[121:],
    mode='markers',
    marker=go.Marker(color='rgb(255, 127, 14)',size=5,symbol='x'),
    name='Plotly Downloads',
    showlegend=False
)

# linear regression line for Plotly package downloads
pslope, pintercept, pr_value, pp_value, pstd_err = linregress(xvals, plotly_df['total_downloads'])
plotly_line = pslope*xvals + pintercept

trace3 = go.Scatter(
    x=xvals, 
    y=plotly_line, 
    mode='lines',
    marker=go.Marker(color='rgb(10, 20, 30)'),
    name='Plotly Regression Line',
    line=dict(
        color='rgba(10, 10, 10, 1)',
        width=1,
        dash='longdashdot'
    )
)

layout = go.Layout(
    title='Linear Regression Line for Plotly\'s Package Downloads Growth',
    yaxis = dict(
        title='No. of downloads (daily)'
    ),
    xaxis = dict(
        title='# days'
    ),
    annotations=[
        dict(
            x=85,
            y=2000,
            xref='x',
            yref='y',
            text="<b>Y = 13.29X - 282.55</b>",
            showarrow=False
        )
    ]
)

data = [trace1, trace2, trace3]

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Similarly, we can find the approximate growth line for ‘Matplotlib’.

mslope, mintercept, mr_value, mp_value, mstd_err = linregress(xvals, matplotlib_df['total_downloads'])
matplotlib_line = mslope*xvals + mintercept

Daily download counts for ‘Matplotlib’ ranges around 7000-8000 as of now.

How much time will it take for Plotly to reach that level?

Using the Plotly’s growth line equation Y=13.29X−282.55, we can find out the approximate no. of days for downloads to reach 8000.

Y(8000), results in X = 624 (nearest integer value), where current day index is 220 as of Aug 29, 2016.

That means it will take almost 400 days (from 29 Aug, 2016) for Plotly to reach the current download range of Matplotlib.

# linear regression line for Plotly package downloads
pslope, pintercept, pr_value, pp_value, pstd_err = linregress(xvals, plotly_df['total_downloads'])
plotly_line = pslope*xvals + pintercept

trace1 = go.Scatter(
    x=xvals, 
    y=plotly_line, 
    mode='lines',
    marker=go.Marker(color='rgb(10, 20, 30)'),
    name='Plotly Regression (Actual)',
    line=dict(
        color='rgba(10, 10, 10, 1)',
        width=1,
        dash='longdashdot'
    )
)

future_xvals = np.arange(221, 221 + 404)

trace2 = go.Scatter(
    x=future_xvals, 
    y=pslope*future_xvals+pintercept, 
    mode='lines',
    marker=go.Marker(color='rgb(10, 20, 30)'),
    name='Plotly Regression (Prediction)',
    line=dict(
        color='rgba(10, 10, 10, 1)',
        width=1,
        dash='dot'
    )
)

layout = go.Layout(
    title='Prediction for Plotly\'s Package Downloads Growth',
    yaxis = dict(
        title='No. of downloads (daily)'
    ),
    xaxis = dict(
        title='# days'
    ),
    annotations=[
        dict(
            x=85,
            y=2000,
            xref='x',
            yref='y',
            text="<b>Y = 13.29X - 282.55</b>",
            showarrow=False
        ),
        dict(
            x=400,
            y=7800,
            xref='x',
            yref='y',
            text="Current download range for Matplotlib",
            showarrow=False
        )
    ],
    shapes=[
        dict(
            type='line',
            xref='x',
            yref='y',
            x0=0,
            y0=8000,
            x1=624,
            y1=8000,
            line=dict(
                color='rgba(10, 10, 10, 1)',
                width=1,
                dash='solid'
            )
        ),
        dict(
            type='line',
            xref='x',
            yref='y',
            x0=624,
            y0=0,
            x1=624,
            y1=8000,
            line=dict(
                color='rgba(10, 10, 10, 1)',
                width=1,
                dash='solid'
            )
        )
    ]
)

data = [trace1, trace2]

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The IPython Notebook for this analysis is available here, Analyzing Plotly’s Python package downloads.

↧

Radial Stacked Area Chart in R using Plotly

September 24, 2016, 6:03 am

≫ Next: Visualizing ROC Curves in R using Plotly

≪ Previous: Analyzing Plotly’s Python package downloads

In this post we’ll quickly show how to create radial stacked ara charts in plotly. We’ll use the AirPassengers dataset.

Inspired by Mike Bostocks post: http://bl.ocks.org/mbostock/3048740

#devtools::install_github("ropensci/plotly")

library(plotly)
library(zoo)
library(data.table)

# Load Airpassengers data set
data("AirPassengers")

# Create data frame with year and month
AirPassengers <- zoo(coredata(AirPassengers), order.by = as.yearmon(index(AirPassengers)))
df <- data.frame(month = format(index(AirPassengers), "%b"),
                 year =  format(index(AirPassengers), "%Y"),
                 value = coredata(AirPassengers))

# Get coordinates for plotting
#Angles for each month
nMonths <- length(unique(df$month))
theta <- seq(0, 2*pi, by = (2*pi)/nMonths)[-(nMonths+1)]

# Append these angles to the data frame
df$theta <- rep(theta, nMonths)

# Cumulatively sum number of passgengers
dt <- as.data.table(df)
dt[,cumvalue := cumsum(value), by = month]
df <- as.data.frame(dt)

# Cartesian coordinates (x, y) space will be value*cos(theta) and value*sin(theta)
df$x <- df$cumvalue * cos(df$theta)
df$y <- df$cumvalue * sin(df$theta)

# Create hovertext
df$hovertext <- paste("Year:", df$year, "<br>",
                      "Month:", df$month, "<br>",
                      "Passegers:", df$value)

# Repeat January values
ddf <- data.frame()
for(i in unique(df$year)){
  temp <- subset(df, year == i)
  temp <- rbind(temp, temp[1,])
  ddf <- rbind(ddf, temp)
}

df <- ddf

# Plot
colorramp <- colorRampPalette(c("#bfbfbf", "#f2f2f2"))
cols <- colorramp(12)

cols <- rep(c("#e6e6e6", "#f2f2f2"), 6)

linecolor <- "#737373"

p <- plot_ly(subset(df, year == 1949), x = ~x, y = ~y, hoverinfo = "text", text = ~hovertext,
             type = "scatter", mode = "lines",
             line = list(shape = "spline", color = linecolor))

k <- 2
for(i in unique(df$year)[-1]){
  p <- add_trace(p, data = subset(df, year == i), 
                 x = ~x, y = ~y, hoverinfo = "text", text = ~hovertext,
                 type = "scatter", mode = "lines",
                 line = list(shape = "spline", color = linecolor),
                 fillcolor = cols[k], fill = "tonexty")
  
  k <- k + 1
}

start <- 100
end <- 4350
axisdf <- data.frame(x = start*cos(theta), y = start*sin(theta),
                     xend = end*cos(theta), yend = end*sin(theta))

p <- add_segments(p = p, data = axisdf, x = ~x, y = ~y, xend = ~xend, yend = ~yend, inherit = F,
                  line = list(dash = "8px", color = "#737373", width = 4),
                  opacity = 0.7)

p <- add_text(p, x = (end + 200)*cos(theta), y = (end + 200)*sin(theta), text = unique(df$month), inherit = F,
              textfont = list(color = "black", size = 18))

p <- layout(p, showlegend = F,
       title = "Radial Stacked Area Chart",
       xaxis = list(showgrid = F, zeroline = F, showticklabels = F, domain = c(0.25, 0.80)),
       yaxis = list(showgrid = F, zeroline = F, showticklabels = F),
       length = 1024,
       height = 600)

p

Radial Stacked Area Chart

↧

Visualizing ROC Curves in R using Plotly

October 15, 2016, 12:12 pm

≫ Next: Filled Chord Diagram in R using Plotly

≪ Previous: Radial Stacked Area Chart in R using Plotly

In this post we’ll create some simple functions to generate and chart a Receiver Operator (ROC) curve and visualize it using Plotly. See Carson’s plotly book for more details around changes in syntax.

We’ll do this from a credit risk perspective i.e. validating a bank’s internal rating model (we’ll create a sample dataset keeping this in mind)

We’ll replicate computations highlighted in this paper.

library(plotly)
library(dplyr)
library(flux)

Sample data

set.seed(123)
n <- 100000
lowest.rating <- 10

# Sample internal ratings
# Say we have a rating scale of 1 to 10̥
ratings <- sample(1:lowest.rating, size = n, replace = T)

# Defaults
# We'll randomly assign defaults concentrating more defaults 
# in the lower rating ranges. We'll do this by creating exponentially
# increasing PDs across the rating range

power <- 5
PD <- log(1:lowest.rating)
PD <- PD ^ power

#PD <- exp((1:lowest.rating))
PD <- PD/(max(PD) * 1.2)  # increased denominator to make the PDs more realistic
Now given PD for eac rating category sample from a binomial distribution
# to assign actual defaults
defaults <- rep(0, n)
k <- 1
for(i in ratings){
  defaults[k] <- rbinom(1, 1, PD[i])
  
  k <- k + 1
}

dataset <- data.frame(Rating = ratings,
                      Default = defaults)

# Check if dataset looks realistic̥
# df <- dataset %>% 
#   group_by(Rating) %>% 
#   summarize(Def = sum(Default == 1), nDef = sum(Default == 0))

ROC Curve Computation

Now that we have a sample dataset to work with we can start to create the ROC curve

ROCFunc <- function(cutoff, df){
  
  # Function counts the number of defaults hap̥pening in all the rating
  # buckets less than or equal to the cutoff
  
  # Number of hits = number of defaults with rating < cutoff / total defaults
  # Number of false alarms = number ofnon defaults with rating < cutoff / total non defaults

  nDefault <- sum(df$Default == 1)
  notDefault <- sum(df$Default == 0)

  temp <- df %>% filter(Rating >= cutoff)
  hits <- sum(temp$Default == 1)/nDefault
  falsealarm <- sum(temp$Default == 0)/notDefault
  ret <- matrix(c(hits, falsealarm), nrow = 1)
  colnames(ret) <- c("Hits", "Falsealarm")

  return(ret)
}

# Arrange ratings in decreasing order
# A lower rating is better than a higher rating
vec <- sort(unique(ratings), decreasing = T)
ROC.df <- data.frame()

for(i in vec){
  ROC.df <- rbind(ROC.df, ROCFunc(i, dataset))
}

# Last row to complete polygon

labels <- data.frame(x = ROC.df$Falsealarm, 
                     y = ROC.df$Hits,
                     text = vec)

ROC.df <- rbind(c(0,0), ROC.df)

# Area under curve
AUC <- round(auc(ROC.df$Falsealarm, ROC.df$Hits),3)

Plot

plot_ly(ROC.df, y = ~Hits, x = ~Falsealarm, hoverinfo = "none") %>% 
  
  add_lines(name = "Model",
            line = list(shape = "spline", color = "#737373", width = 7), 
            fill = "tozeroy", fillcolor = "#2A3356") %>% 
  
  add_annotations(y = labels$y, x = labels$x, text = labels$text,
                  ax = 20, ay = 20,
                  arrowcolor = "white",
                  arrowhead = 3,
                  font = list(color = "white")) %>% 
  
  add_segments(x = 0, y = 0, xend = 1, yend = 1, 
               line = list(dash = "7px", color = "#F35B25", width = 4), 
               name = "Random") %>% 
  
  add_segments(x = 0, y = 0, xend = 0, yend = 1, 
               line = list(dash = "10px", color = "black", width = 4), 
               showlegend = F) %>%
  
  add_segments(x = 0, y = 1, xend = 1, yend = 1, 
               line = list(dash = "10px", color = "black", width = 4), 
               showlegend = F) %>% 
  
  add_annotations(x = 0.8, y = 0.2, showarrow = F, 
                  text = paste0("Area Under Curve: ", AUC),
                  font = list(family = "serif", size = 18, color = "#E8E2E2")) %>%
  
  add_annotations(x = 0, y = 1, showarrow = F, xanchor = "left", 
                  xref = "paper", yref = "paper",
                  text = paste0("Receiver Operator Curve"),
                  font = list(family = "arial", size = 30, color = "#595959")) %>%
  
  add_annotations(x = 0, y = 0.95, showarrow = F, xanchor = "left", 
                  xref = "paper", yref = "paper",
                  text = paste0("Charts the percentage of correctly identified defaults (hits) against the percentage of non defaults incorrectly identifed as defaults (false alarms)"),
                  font = list(family = "serif", size = 14, color = "#999999")) %>% 
  
   
  layout(xaxis = list(range = c(0,1), zeroline = F, showgrid = F,
                      title = "Number of False Alarms"),
         yaxis = list(range = c(0,1), zeroline = F, showgrid = F,
                      domain = c(0, 0.9),
                      title = "Number of Hits"),
         plot_bgcolor = "#E8E2E2",
         height = 800, width = 1024)

↧

Filled Chord Diagram in R using Plotly

November 8, 2016, 8:43 am

≫ Next: NHL shots analysis using Plotly shapes

≪ Previous: Visualizing ROC Curves in R using Plotly

In this post we’ll create a Filled Chord Diagram using plotly. The post is inspired by Plotly’s Python documentation.

Install / update packages

Just to ensure we are working with the latest dev version of plotly.

# Install packages if needed
# install.packages(c("devtools", "dplyr"))
# library(devtools)
# install_github("ropensci/plotly")

library(plotly)
library(dplyr)

Dataset

The dataset we’ll use consists of the number of comments a person made on her own facebook posts as well as the number of comments on her friend’s facebook posts.

# Dataset for creating chord diagram ------------------------------------------
# Data for number of facebook posts
# See https://plot.ly/python/filled-chord-diagram/

df <- rbind(c(16, 3, 28, 0, 18),
            c(18, 0, 12, 5, 29),
            c(9, 11, 17, 27, 0),
            c(19, 0, 31, 11, 12),
            c(23, 17, 10, 0, 34))
df <- data.frame(df)

colnames(df) <- c('Emma', 'Isabella', 'Ava', 'Olivia', 'Sophia')
rownames(df) <- c('Emma', 'Isabella', 'Ava', 'Olivia', 'Sophia')

Global settings

These are some plot related settings which we can setup right now. These settings will be fed to plot_ly() later on. Also, having these aesthetic settings accessible now will make it easier for us to make changes later on.

# Settings --------------------------------------------------------------------
# Over all plot settings like color and transparency

cols <- RColorBrewer::brewer.pal(nrow(df), "Set1")  # Set of colors (n = number of rows in data)
opacity <- 0.5  # Opacity of ideogram
chord.opacity <- 0.3  # Opcaity of individual chords
linecolor <- "black"
circlefill <- "#f2f2f2"
inner.radius <- 0.93
gap <- 0.02

Creating the Ideogram

We’ll first create the ideogram. The ideogram which is essentially a set of sectors plotted on a unit circle which’ll represent the rowsums of the dataset i.e. the total number of comments made by each person (to themselves and their friends). We’ll first create some helper functions – toAngular() to map a vector of numeric values onto the unit circle using cumulative sums essentially creating sectors and addGaps() which’ll create some space between each sector for aesthetic purposes.

# Function Definition: addGaps() ----------------------------------------------
addGaps <- function(theta, gap = 0.05){
  
  # Takes a vector of angles and adds a gap in-between them
  # Adds and subtracts the gap value from computed angle
  
  newtheta <- data.frame()
  
  for (i in 1:length(theta)) {
    
    if(i == 1){
      x <- 0 + gap
      y <- theta[i] - gap
      newtheta <- rbind(newtheta, c(x, y))
    }else{
      x <- theta[i - 1] + gap
      y <- theta[i] - gap
      newtheta <- rbind(newtheta, c(x, y))
    }
  }
  
  newtheta <- data.frame(theta, newtheta)
  colnames(newtheta) <- c("theta", "start", "end")
  
  return(newtheta)
}

# Function Definition: toAngular() --------------------------------------------
toAngular <- function(x, rad = 1, gap = 0.05, lower = 0, upper = 2*pi, addgaps = T){
  
  # Maps a set of numbers onto the unit circle by computing cumulative
  # sums and assigning angles to each sum
  
  cumtotals <- cumsum(x / sum(x))
  
  # Upper and lower bounds are the angle limits to which mapping
  # is limited. Ex 0 - 2PI
  delta <- ifelse(upper > lower, upper - lower, (2*pi) - lower + upper)
  theta <-  cumtotals * delta
  
  x <- rad * cos(theta)
  y <- rad * sin(theta)
  
  df <- data.frame(x, y)
  
  # Additionally, add gaps in between each sector using the addGaps() function
  if (addgaps == T) {
    gaps <- addGaps(theta, gap = gap)
    ret <- list(theta = theta,
                coord = df,
                gaps = gaps)
  }else{
    ret <- list(theta = theta,
                coord = df)
  }
  
  return(ret)
}

Now that the functions are defined we need to create the ideogram. We do so by:

Creating an outer ring which’ll lie on the unit circle
Creating an inner ring which lie inside the outer circle (radius is defined in the settings section above)
Creating an SVG path for each sector bounded by the outer and inner circles and the end points of each gap

# Create ideogram -------------------------------------------------------------
# See See https://plot.ly/python/filled-chord-diagram/

# The ideogram is constructed of the row sums i.e total interactions in each row
dat <- rowSums(df)

# Outer ring is the unit circle
outer <- toAngular(dat, gap = gap)

# Inner ring has radius < 1
inner <- toAngular(dat, rad = inner.radius, gap = gap)

# Ideogram is charted as a svg path and fed to plot_ly() as a shape
# Compute a path for each sector of the ideogram by combining the 
# coordinates of the outer and inner circles
outer.inner <- rbind(outer$gaps, inner$gaps) %>% arrange(theta)  # arrange in increasing order of theta

# Each sector of the ideogram is made of four points - 
# the start and end points of the outer and inner circles
# Hence increment by 2 and not 1
vec <- seq(1, nrow(outer.inner), by = 2)

# Create and empty dataframe
ideogram <- data.frame()

k <- 1  # Counter for each row / group

# Loop through each sector and create a svg path using the start and end
# points of the outer and inner circles
for (i in vec) {
  
  # Get starting and ending point for 'i' th sector
  start <- outer.inner$start[i]
  end <- outer.inner$end[i]
  
  # Ensure starting point is always less than ending point of sector
  if (start > end) start <- start - (2*pi)
  
  # Create a sequence of thetas along the sector
  thetas <- seq(start, end, length.out = 100)
  
  # Compute x and y coordinates
  x <- c(cos(thetas), inner.radius * cos(rev(thetas)))
  y <- c(sin(thetas), inner.radius * sin(rev(thetas)))
  
  # Add a group for easy subsetting later on
  coords <- data.frame(x, y, group = k)
  ideogram <- rbind(ideogram, coords)
  
  # Increment group number
  k <- k + 1
}

# Function definition: createPath() -------------------------------------------
createPath <- function(df){
  
  # Given x and y coordinates creates a string containing a svg path
  # that can be fed to plotly as a shape
  
  start <- paste("M", df$x[1], df$y[1])
  path <- paste("L", df$x[-1], df$y[-1], collapse = " ")
  path <- paste(start, path, "Z")
  return(path)
}

# Use group numbers assigned to each sector to subset and create a path string
ideogram.path <- by(ideogram, ideogram$group, createPath)

# Plot the ideogram (just as a check). Chord diagram is generated separately later
# Create shape list
ideogram.shapes <- list()  # Used later on

for (i in 1:nrow(df)) {
  
  # Use plotly syntax to save shapes of each sector as a list
  ideogram.shapes[[i]] <- list(type = "path",
                               path = ideogram.path[i],
                               fillcolor = cols[i],
                               line = list(color = linecolor, width = 1),
                               opacity = opacity)
}

# Just to check if things are looking okay
ideogram.plot <- plot_ly(height = 800, width = 800) %>%
  layout(
    xaxis = list(showgrid = F, zeroline = F, showticklabels = F),
    yaxis = list(showgrid = F, zeroline = F, showticklabels = F),
    shapes = ideogram.shapes)

ideogram.plot

You should now have something similar to this:

Creating Chords

The following set of code snippets essentially do this:

Divide each sector on the ideogram into sub – sectors based on the number of comments made by each person i.e. traverse the dataset row-wise and map the numeric vectors in each row onto the associated sector (not the unit circle). Example: Emma has a total of 65 comments amongst herself and her friends. The sector corrosponding to Emma’s 65 comments needs to be divided into further sectors based on the 16, 3, 28, 0 and 18 comments.
Find the four points that bind each chord. A chord is a planar shape that is bound by two bezier curves and two circular arcs. For each bezier curve the control point is generated by finding the mean of the angluar coordinates of the end points.
Fill each chord (ribbon) with the appropriate color. Example: Emma made 18 comments on Sophia’s posts but Sophia made 23 comments on Emma’s posts. Since Sophia made more comments, the chord depicting the interaction betweem Emma and Sophia is colored using the same color that is used for coloring Sophia’s sector on the ideogram.
Create SVG shapes for each chord and then plot using plot_ly()

Note that the ordering of the endpoints of bezier curves and circular arcs is tricky and was done based on trial and error.

# Create chords ---------------------------------------------------------------
# Divide each sector corresponding to each interaction in each row
sector.angles <- inner$gaps
angle.list <- data.frame()
for (i in 1:nrow(sector.angles)) {
  # Get starting and ending points of each sector
  start <- sector.angles$start[i]
  end <- sector.angles$end[i]
  
  # Sort each row from increasing to decreasing
  dat <- sort(df[i,])
  
  # Use toAngular() function to get thetas corrosponding to each row item
  angle <- toAngular(as.numeric(dat), lower = start, upper = end, addgaps = F)$theta
  
  # Offset by the starting point since the function returns values in 
  # the [0 - (start - end)] interval
  angle <- c(start + angle)
  
  # Collate all the data for each division of the sector 
  temp <- data.frame(from = rownames(sector.angles)[i],
                     to = names(dat),
                     value = as.numeric(dat),
                     angle,
                     x = inner.radius * cos(angle),
                     y = inner.radius * sin(angle),
                     stringsAsFactors = F)
  
  # Add the starting point to the divisions
  # If min value in a row is zero then starting point for that division
  # must be the starting point of the sector
  startrow <- data.frame(from = rownames(sector.angles)[i],
                         to = "start",
                         value = 0,
                         angle = sector.angles$start[i],
                         x = inner.radius * cos(sector.angles$start[i]),
                         y = inner.radius * sin(sector.angles$start[i]),
                         stringsAsFactors = F)
  
  angle.list <- rbind(angle.list, startrow, temp)
}

# Create unique path IDs i.e. each set of interactions gets a unique ID
# Example - A -> B and B -> A will get the same ID
k <- 1
angle.list$ID <- rep(0, nrow(angle.list))
revstr <- paste(angle.list$to, angle.list$from)

for (i in 1:nrow(angle.list)) {
  if (angle.list$ID[i] == 0) {
    from = angle.list$from[i]
    to = angle.list$to[i]
    str <- paste(from, to)
    mtch <- match(str, revstr)
    
    if (!is.na(mtch)) {
      angle.list$ID[c(i, mtch)] <- k
      k <- k + 1
    }
  }
}

# Each chord is bounded by four points: 
# 1. two actual data points corrosponding to the actual interaction i.e. A -> B (p1) and B -> A (p2)
# 2. And two previous data points to complete the polygon
# We'll create some helper functions

# Function definition: bezierCurve() ------------------------------------------
bezierCurve <- function(t1, t2){
  
  # Takes two angles as arguments and returns the x and y coordinates
  # of a quadratic bezier curve 
  
  t <- seq(0, 1, length.out = 100)
  
  p0 <- c(inner.radius * cos(t1), inner.radius * sin(t1))  # Starting point (t1)
  p2 <- c(inner.radius * cos(t2), inner.radius * sin(t2))  # Ending point (t2)
  p1 <- c(-inner.radius * cos(mean(t1, t2)), -inner.radius * sin(mean(t1, t2)))  # Control point
  
  # Curve =  (1 - t^2)*p0 + 2(t-1)t*p1 + t^2*p2
  x <- (1 - t**2) * p0[1] + 2*(1 - t)*t * p1[1] + t**2 * p2[1]
  y <- (1 - t**2) * p0[2] + 2*(1 - t)*t * p1[2] + t**2 * p2[2]
  df <- data.frame(x, y)
  
  return(df)
}

# Function definition: circleCurve() ------------------------------------------
circleCurve <- function(t1, t2){

  # Returns the x and y coordinates of points lying on the inner 
  # boundary of the ideogram bounded by two angles t1 and t2 
    
  t <- seq(min(t1, t2), max(t1, t2), length.out = 50)
  x <- inner.radius * cos(t)
  y <- inner.radius * sin(t)
  
  df <- data.frame(x, y)
  
  return(df)
}

# Function definition: opposite() ---------------------------------------------
opposite <- function(df){
  
  # Given a dataframe, simply returns the dataframe in reverse order
  
  n <- nrow(df)
  df <- df[n:1,]
  return(df)
}

# Function definition: chordShape() -------------------------------------------
chordShape <- function(ID){
  
  # Function to create svg path for a chord given by a unique ID (created earler)
  
  id <- which(angle.list$ID == ID)
  
  # Get color based on higher number of connects
  idx <- which.max(angle.list$value[id])
  fillcolor <- angle.list$from[id[idx]]
  fillcolor <- cols[which(rownames(df) == fillcolor)]
  
  # Append the two prior points to complete polygon
  id <- c(id, id - 1)
  t <- angle.list$angle[id]
  
  # Each chord is made of two bezier curves and two (one) curve lying on the 
  # inner boundary of the ideogram
  if(length(t) == 4){
    a <- bezierCurve(t[1], t[4])
    b <- bezierCurve(t[3], t[2])
    c <- circleCurve(t[1], t[3])
    d <- circleCurve(t[2], t[4])
    
    df <- rbind(a, d, opposite(b), c)
    
    pth <- createPath(df)
    shp <- list(type = "path",
                path = pth,
                fillcolor = fillcolor,
                line = list(color = linecolor, width = 1),
                opacity = chord.opacity)
    
  }else{
    
    # Case when there are zero interactions i.e. 
    # A -> B > 0 but B -> A = 0 or viceversa
    a <- bezierCurve(t[1], t[2])
    b <- circleCurve(t[1], t[2])
    
    df <- rbind(a, b)
    
    pth <- createPath(df)
    shp <- list(type = "path",
                path = pth,
                fillcolor = fillcolor,
                line = list(color = linecolor, width = 1),
                opacity = chord.opacity)
  }
  
  return(shp)
}

# Loop through each unique ID and create a shape for each corrosponding polygon
chord.shapes <- list()
for(i in unique(angle.list$ID)){
  if(i != 0){
    chord.shapes[[i]] <- chordShape(ID = i)
  }
}

# Create a grey circle on the inside for aesthetics
ang <- seq(0, (2*pi), length.out = 100)
x <- 1 * cos(ang)
y <- 1 * sin(ang)

pth <- createPath(df = data.frame(x, y))
inner.circle <- list(list(type = "path",
                          path = pth,
                          fillcolor = circlefill,
                          line = list(color = linecolor, width = 1),
                          opacity = 0.2))

# Add all shapes to same list
all.shapes <- c(ideogram.shapes, chord.shapes, inner.circle)
length(all.shapes)

# Plot chord diagram ----------------------------------------------------------
# Just a description of chord diagram
description <- paste0("<i>","A chord diagram is a graphical method of displaying the inter-relationships ",
                      "between data in a matrix. The data is <br> arranged radially around a circle ",
                      "with the relationships between the points typically drawn as arcs connecting ",
                      "the<br>data together - <b>Wikipedia</b>","</i>")

# Coordinates for labels
labels <- data.frame(x = 1.1 * cos(outer$theta - pi/5),
                     y = 1.1 * sin(outer$theta - pi/5),
                     text = paste0("<b>", rownames(df), "</b>"))

# Plot using plot_ly()
chord.plot <- plot_ly(width = 800, height = 800) %>%
  
  # Add labels to sectors
  add_text(data = labels, x = ~x, y = ~y, text = ~text, hoverinfo = "none",
           textfont = list(family = "serif", size = 14, color = "#999999")) %>%
  
  # Layout for shapes, annotations and axis options
  layout(
    xaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F, domain = c(0, 0.9)),
    yaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F, domain = c(0, 0.9)),
    shapes = all.shapes,
    
    annotations = list(
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = 0, y = 1, showarrow = F,
           text = "<b>Filled Chord Diagram</b>",
           font = list(family = "serif", size = 25, color = "black")),
      
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = 0, y = 0.95, showarrow = F,
           text = description,
           align = "left",
           font = list(family = "arial", size = 10, color = "black"))
    ))

print(chord.plot)

You should now have something like this:

I am sure there are more efficient ways of going about this but hopefully you found this post helpful. Here are some additional resources to look at:

↧

NHL shots analysis using Plotly shapes

November 24, 2016, 8:19 am

≫ Next: Animations in R using Plotly

≪ Previous: Filled Chord Diagram in R using Plotly

In this post, we will analyse the shots by P. K. Subban for the season 2015-16. He is a defenceman for the Nashville Predators of the National Hockey League.

It’s the second post in the series of posts on Plotly shapes. You can also read the first one, NBA shots analysis using Plotly shapes.

You can create SVG shapes like line, circle, rectangle, and path using Plotly’s shapes feature. With the help of these shapes, we will create the Ice Hockey rink and plot all his shots (in the season 2015-16) on it.

Data Collection

We have collected the shot location for P.K. Subban from SportingCharts’ Ice Tracker Tool. The dataset is publicly available at our datasets repository.

Creating the ice hockey rink

The court rink has a physical dimension of 200 ft (61 m) × 85 ft (26 m), we will use the NHL Rulebook for reference.

For the post, we will draw just the half (in height) of the rink.

The X-axis and Y-axis of our court chart will range from -250 to 250 and 0 to 580 respectively. A single unit on the chart scale is equal to 0.17 (85/500) and 0.172 (100/580) ft for X and Y axis respectively.

Outer lines

The standard size of the rink is 200 ft long and 85 ft wide. The corners are rounded in the arc of a circle with a radius of 28 ft.

We will use the Rectangle, Line, and Arc shapes to draw the outer lines. The upper side of this rectangle is a line 11 ft away from both ends of the rink.

#list containing all the shapes
rink_shapes = []

outer_rect_shape = dict(
            type='rect',
            xref='x',
            yref='y',
            x0='-250',
            y0='0',
            x1='250',
            y1='516.2',
            line=dict(
                width=1,
            )
)

rink_shapes.append(outer_rink_shape)

To support the arcs, we’ll draw a line parallel to the X-axis.

outer_line_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='200',
            y0='580',
            x1='-200',
            y1='580',
            line=dict(
                width=1,
            )
)

rink_shapes.append(outer_line_shape)

To simulate the rounded corners, we will use the Arc shape.

outer_arc1_shape = dict(
            type='path',
            xref='x',
            yref='y',
            path='M 200 580 C 217 574, 247 532, 250 516.2',
            line=dict(
                width=1,
            )
)

rink_shapes.append(outer_arc1_shape)

We need to draw one more arc just opposite (Y-axis) to this.

red lines

It’s a red line in the center of the rink along the width.

center_red_line_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='-250',
            y0='0',
            x1='250',
            y1='0',
            line=dict(
                width=1,
                color='rgba(255, 0, 0, 1)'
            )
)

rink_shapes.append(center_red_line_shape)

This line is 11 ft away from the end of the rink and red in color.

end_line_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='-250',
            y0='516.2',
            x1='250',
            y1='516.2',
            line=dict(
                width=1,
                color='rgba(255, 0, 0, 1)'
            )
)

rink_shapes.append(end_line_shape)

blue line

It’s 1 ft wide along the width and located at 25 ft distance from the center red line. We will use the rect shape to give it the desired width.

blue_line_shape = dict(
            type='rect',
            xref='x',
            yref='y',
            x0='250',
            y0='150.8',
            x1='-250',
            y1='-145',
            line=dict(
                color='rgba(0, 0, 255, 1)',
                width=1
            ),
            fillcolor='rgba(0, 0, 255, 1)'
)

rink_shapes.append(blue_line_shape)

face-off spots and circles

A circular blue spot, 1 ft in diameter, shall be marked exactly in the center of the rink.

center_blue_spot_shape = dict(
            type='circle',
            xref='x',
            yref='y',
            x0='2.94',
            y0='2.8',
            x1='-2.94',
            y1='-2.8',
            line=dict(
                color='rgba(0, 0, 255, 1)',
                width=1
            ),
            fillcolor='rgba(0, 0, 255, 1)'
)

rink_shapes.append(center_blue_spot_shape)

Two red spots 2 ft in diameter shall be marked on the ice in the neutral zone 5 ft from each blue line.

red_spot1_shape = dict(
            type='circle',
            xref='x',
            yref='y',
            x0='135.5',
            y0='121.8',
            x1='123.5',
            y1='110.2',
            line=dict(
                color='rgba(255, 0, 0, 1)',
                width=1
            ),
            fillcolor='rgba(255, 0, 0, 1)'
)

rink_shapes.append(red_spot1_shape)

The second red spot in the neutral zone is the mirror image of this along with the Y-axis.

In both end zones and on both sides of each goal, red face-off spots and circles shall be marked on the ice. There will be circles of radius 15 ft as the blue and the red spots (in the end zones) as the center.

red_spot1_circle_shape = dict(
            type='circle',
            xref='x',
            yref='y',
            x0='217.6',
            y0='487.2',
            x1='41.2',
            y1='313.2',
            line=dict(
                width=1,
                color='rgba(255, 0, 0, 1)'
            )
)

rink_shapes.append(red_spot1_shape)

Face-off lines

At the outer edge of both sides of each face-off circle and parallel to the goal line shall be marked two red lines, 2 inches wide and 2 ft in length and 5 ft 7 inches apart.

parallel_line1_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='230',
            y0='416.4',
            x1='217.8',
            y1='416.4',
            line=dict(
                color='rgba(255, 0, 0, 1)',
                width=1
            )
)

parallel_line2_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='230',
            y0='384',
            x1='217.8',
            y1='384',
            line=dict(
                color='rgba(255, 0, 0, 1)',
                width=1
            )
)

rink_shapes.append(parallel_line1_shape)
rink_shapes.append(parallel_line2_shape)

The other two red lines will be the mirror images of these line along with the Y-axis.

Here is the line configuration near face-off spots.

These four line shapes represent the approx face-off spot configuration lines.

faceoff_line1_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='141.17',
            y0='423.4',
            x1='141.17',
            y1='377',
            line=dict(
                color='rgba(10, 10, 100, 1)',
                width=1
            )
)

faceoff_line2_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='117.62',
            y0='423.4',
            x1='117.62',
            y1='377',
            line=dict(
                color='rgba(10, 10, 100, 1)',
                width=1
            )
)

faceoff_line3_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='153',
            y0='406',
            x1='105.8',
            y1='406',
            line=dict(
                color='rgba(10, 10, 100, 1)',
                width=1
            )
)

faceoff_line4_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='153',
            y0='394.4',
            x1='105.8',
            y1='394.4',
            line=dict(
                color='rgba(10, 10, 100, 1)',
                width=1
            )
)

rink_shapes.append(faceoff_line1_shape)
rink_shapes.append(faceoff_line2_shape)
rink_shapes.append(faceoff_line3_shape)
rink_shapes.append(faceoff_line4_shape)

Goal Crease

goal_line1_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='64.7',
            y0='516.2',
            x1='82.3',
            y1='580',
            line=dict(
                width=1
            )
)

goal_line2_shape = dict(
            type='line',
            xref='x',
            yref='y',
            x0='23.5',
            y0='516.2',
            x1='23.5',
            y1='493',
            line=dict(
                width=1
            )
)

# mirror images of "goal_line1" and "goal_line2" along with the Y-axis

goal_arc1_shape = dict(
            type='path',
            xref='x',
            yref='y',
            path='M 23.5 493 C 20 480, -20 480, -23.5 493',
            line=dict(
                width=1,
            )
)

goal_arc2_shape = dict(
            type='path',
            xref='x',
            yref='y',
            path='M 17.6 516.2 C 15 530, -15 530, -17.6 516.2',
            line=dict(
                width=1,
            )
)

rink_shapes.append(goal_line1_shape)
rink_shapes.append(goal_line2_shape)
rink_shapes.append(goal_arc1_shape)
rink_shapes.append(goal_arc2_shape)

Referee Crease

On the ice immediately in front of the Penalty Timekeeper’s seat there shall be marked in red on the ice a semi-circle of 10 ft radii and 2 inches in width, known as the REFEREE’s CREASE.

referee_crease_shape = dict(
            type='path',
            xref='x',
            yref='y',
            path='M ',
            line=dict(
                width=1,
                color='rgba(255, 0, 0, 1)'
            )
)

rink_shapes.append(red_spot1_shape)

Here is the resultant “Ice Hockey Rink Outline” diagram.

Point Chart for Shots

We will start from importing the JSON dataset into a pandas DataFrame object.

import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode()

import pandas as pd
df = pd.read_json('data.json')

For a point chart, we will use a Scatter trace.

point_trace = go.Scatter(
    x = df['x'] - 250,
    y = 580 - df['y'],
    mode = 'markers',
    marker = dict(
        size = 4
    )
)

data = [point_trace]

layout = go.Layout(
    shapes=rink_shapes
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Heatmap Chart for Shots

point_trace = go.Scatter(
    x = df['x'] - 250,
    y = 580 - df['y'],
    mode = 'markers',
    marker = dict(
        size = 4
    )
)

heatmap_trace = go.Histogram2dcontour(
    x=df['x'] - 250, y=580 - df['y'], name='density', ncontours=2,
    colorscale='Hot', reversescale=True, showscale=False,
    contours=dict(coloring='heatmap')
)

data = [point_trace, heatmap_trace]

layout = go.Layout(
    shapes=rink_shapes
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

↧

Animations in R using Plotly

January 1, 2017, 8:07 am

≫ Next: Funnel charts in Python using Plotly

≪ Previous: NHL shots analysis using Plotly shapes

Like last year, lets have some fun with the Plotly package. We’ll try out Plotly’s new animation capabilities.

library(plotly)
rm(list = ls())
gc()

# Options for plotting ----
x <- 0.2
y <- 0.72
speed <- 250
nbkdrops <- 100

# Colorset for plot
# See http://colorhunt.co/
cols <- c("#FFC85B", "#379956","#234C63")
ncolors <- length(cols)


# Function to create random points by adding jitter to ----
# a starting set of points
n <- 1000  # Number of points

# Starting template
bkdrop.x <- runif(n, min = 0, max = 1)
bkdrop.y <- runif(n, min = 0, max = 1)

# Function Definition
bkdrop <- function(n = 1000, amount = 0.005){
  
  x <- jitter(bkdrop.x, amount = amount)
  y <- jitter(bkdrop.y, amount = amount)
  
  df <- data.frame(x, y)
  
  return(df)
  
}

# Make backdrops ----
# Each call to the backdrop function is a separate frame
# Number of frames is controlled by nbkdrops

bkdrop.df <- data.frame()
for(i in 1:nbkdrops){
  temp <- bkdrop()
  temp <- data.frame(temp, frame = i, color = sample(1:ncolors, size = nrow(temp), replace = T))
  bkdrop.df <- rbind(bkdrop.df, temp)
  
}

# Make back lights ----
# Coordinates for backlight rectangles
# Will be plotted as line segments
bklight.x <- c(0.28, 0.18, 0.48)
bklight.y <- c(0.42, 0.62, 0.65)
bklight.xend <- c(0.63, 0.50, 0.75)
bklight.yend <- c(0.42, 0.62, 0.65)

# Function to create a dataframe containing coordinates, frame and
# color of each backlight segment
makebklight <- function(id){
  bklight <- data.frame()
  
  for(i in 1:nbkdrops){
    temp <- data.frame(x = bklight.x[id],
                       y = bklight.y[id],
                       xend = bklight.xend[id],
                       yend = bklight.yend[id],
                       frame = i, 
                       color = sample(1:ncolors, size = 1))
    
    bklight <- rbind(bklight, temp)
  }
  
  return(bklight)
}

# Create backlight segments
bklight1 <- makebklight(1)
bklight2 <- makebklight(2)
bklight3 <- makebklight(3)

# Initialize colors for first frame
bklight1$color[1] <- 1
bklight2$color[1] <- 2
bklight3$color[1] <- 3

# Plot !! ----
p <- plot_ly(height = 800, width = 1024, 
             colors = cols, 
             frame = ~frame,
             x = ~x, 
             y = ~y,
             color = ~factor(color)) %>%  
  
  # Backdrop
  add_markers(data = bkdrop.df, 
              opacity = 0.8,
              marker = list(symbol = "star", size = 8),
              hoverinfo = "none") %>%
  
  # Add segments (for back lighting)
  add_segments(data = bklight1, 
               xend = ~xend, yend = ~yend, 
               line = list(width = 150)) %>%
  
  add_segments(data = bklight2, 
               xend = ~xend, yend = ~yend, 
               line = list(width = 150)) %>% 
  
  add_segments(data = bklight3, 
               xend = ~xend, yend = ~yend, 
               line = list(width = 150)) %>% 
  
  # Animation options
  # See https://cpsievert.github.io/plotly_book/key-frame-animations.html
  
  animation_opts(speed, easing = "linear", transition = 0) %>%
  animation_button(x = 1, xanchor = "right", y = 1, yanchor = "bottom") %>%
  animation_slider(hide = T) %>%
  
  # Layout, annotations and shapes
  
  layout(
    showlegend = F,
    
    xaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F, range = c(0, 1)),
    yaxis = list(title = "", showgrid = F, zeroline = F, showticklabels = F, range = c(0, 1)),
    
    annotations = list(
      
      # For shadow
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = x + 0.002, y = y + 0.002, 
           showarrow = F,
           text = "Happy New<br>Year !",
           font = list(size = 100, family = "Times New Roman",
                       color = "black")),
      
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = x + 0.003, y = y + 0.003, 
           showarrow = F,
           text = "Happy New<br>Year !",
           font = list(size = 100, family = "Times New Roman",
                       color = "black")),
      
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = x + 0.004, y = y + 0.004, 
           showarrow = F,
           text = "Happy New<br>Year !",
           font = list(size = 100, family = "Times New Roman",
                       color = "black")),
      
      # Actual
      list(xref = "paper", yref = "paper",
           xanchor = "left", yanchor = "top",
           x = x, y = y, 
           showarrow = F,
           text = "Happy New<br>Year !",
           font = list(size = 100, family = "Times New Roman",
                       color = "#ff6666"))
      ),
    
    shapes = list(
      
      # Border
      list(xref = "paper", yref = "paper",
           x0 = 0, y0 = 0, 
           x1 = 1, y1 = 1,
           type = "rect",
           line = list(width = 10, color = cols[1])),
      
      list(xref = "paper", yref = "paper",
           x0 = 0.01, y0 = 0.01, 
           x1 = 0.99, y1 = 0.99,
           type = "rect",
           line = list(width = 10, color = cols[2])),
      
      list(xref = "paper", yref = "paper",
           x0 = 0.02, y0 = 0.02, 
           x1 = 0.98, y1 = 0.98,
           type = "rect",
           line = list(width = 10, color = cols[3])),
      
      # Black outline
      list(xref = "plot", yref = "plot",
           path = "
           M 0.50 0.53
           L 0.50 0.50
           L 0.18 0.50 
           L 0.18 0.73
           L 0.48, 0.73",
           type = "path",
           line = list(width = 7, color = "black")),
      
      list(xref = "plot", yref = "plot",
           path = "
           M 0.50 0.535
           L 0.48 0.535
           L 0.48 0.77
           L 0.75 0.77
           L 0.75 0.535
           Z",
           type = "path",
           line = list(width = 7, color = "black")),
      
      list(xref = "plot", yref = "plot",
           path = "
           M 0.28 0.5
           L 0.28 0.31
           L 0.63 0.31
           L 0.63 0.535",
           type = "path",
           line = list(width = 7, color = "black"))

    )
  )

print(p)

You should now have something like this:
happy-2017

For mode details visit:
Plotly for R by Carson Seivert.

↧

Funnel charts in Python using Plotly

January 2, 2017, 6:50 pm

≫ Next: Heatmaps with padding gaps in Plotly

≪ Previous: Animations in R using Plotly

Funnel Charts are often used to represent data in different stages of a business process. It’s an important mechanism in Business Intelligence to identify potential problem areas of a process. For example, it’s used to observe the revenue or loss in a sales process for each stage.

In this post, we’ll learn how to plot a funnel chart using a numerical dataset.

We are going to use a sample dataset from a dummy E-commerce firm’s social media campaign. The funnel chart will represent the flow of new users at different stages of the campaign.

These are the five stages of user flow:

Link Visit : When a user clicks on the campaign link
Sign-up : When a user creates an account
Selection : When a user adds a product to the cart
Purchase : When a user buys a product
Review : When a user reviews a purchased product

Here is the table (dataset) containing values (number of users) for all the intermediate phases.

Phases	Values
Visit	13873
Sign-up	10553
Selection	5443
Purchase	3703
Review	1708

Let’s represent the data in Python lists.

import plotly.plotly as py
import plotly.graph_objs as go

from __future__ import division

# campaign data
phases = ['Visit', 'Sign-up', 'Selection', 'Purchase', 'Review']
values = [13873, 10553, 5443, 3703, 1708]

colors = ['rgb(32,155,160)', 'rgb(253,93,124)', 'rgb(28,119,139)', 'rgb(182,231,235)', 'rgb(35,154,160)']

We will use Plotly shapes to draw the sections of a funnel. Each funnel section will be represented by a Quadrilateral (4 sided polygon).

A section will be a Rectangle if it has value equal to its next phase-value, or it’ll be a Isosceles Trapezoid (Isosceles Trapezium in British English) if its value is unequal to the next phase’s value.

We are using a fixed width for the plot and the section (phase) having the maximum users (value). All other sections will be drawn according to their values relative to the maximum value.

n_phase = len(phases)

# the fixed width for the plot
plot_width = 400

# height of a section and difference between sections 
section_h = 100
section_d = 10

# multiply factor to calculate the width of other sections
unit_width = plot_width / max(values)

# width for all the sections (phases)
phase_w = [int(value * unit_width) for value in values]

Each section will have a height of 100px and there will be a difference of 10px in successive sections.

To draw a section, we are going to use SVG paths.

height = section_h * n_phase + section_d * (n_phase-1)

shapes = []

label_y = []

for i in range(n_phase):
        if (i == n_phase-1):
                points = [phase_w[i]/2, height, phase_w[i]/2, height - section_h]
        else:
                points = [phase_w[i]/2, height, phase_w[i+1]/2, height - section_h]

        path = 'M {0} {1} L {2} {3} L -{2} {3} L -{0} {1} Z'.format(*points)

        shape = {
                'type': 'path',
                'path': path,
                'fillcolor': colors[i],
                'line': {
                    'width': 1,
                    'color': colors[i]
                }
        }
        shapes.append(shape)
        
        # Y-axis location for this section's details (phase name and value)
        label_y.append(height - (section_h / 2))

        height = height - (section_h + section_d)

We will use text mode to draw the name of phase and its value.

# For phase names
label_trace = go.Scatter(
    x=[-350]*n_phase,
    y=label_y,
    mode='text',
    text=phases,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

# For phase values
value_trace = go.Scatter(
    x=[350]*n_phase,
    y=label_y,
    mode='text',
    text=values,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

We will style the plot by changing the background color of the plot and the plot paper, hiding the legend and tick labels, and removing the zeroline.

data = [label_trace, value_trace]

layout = go.Layout(
    title='Funnel Chart',
    shapes=shapes,
    height=560,
    width=800,
    showlegend=False,
    paper_bgcolor='rgba(44,58,71,1)',
    plot_bgcolor='rgba(44,58,71,1)',
    xaxis=dict(
        showticklabels=False,
        zeroline=False,
    ),
    yaxis=dict(
        showticklabels=False,
        zeroline=False
    )
)

fig = go.Figure(data=data, layout=layout)
py.plot(fig)

We can observe that the number of users is decreasing at each stage.

At the end of the funnel, we can see that 1,708 users have reviewed their purchase. The E-commerce firm can analyze their reviews and work on creating a better experience for them in future.

That’s an example use case of funnel charts, you can create them to monitor your business processes.

↧

Heatmaps with padding gaps in Plotly

January 2, 2017, 9:51 pm

≫ Next: Segmented Funnel charts in Python using Plotly

≪ Previous: Funnel charts in Python using Plotly

This post will introduce you to xgap and ygap fields for Plotly Heatmaps.

You can set horizontal and vertical gap (in pixels) between the heatmap bricks using these fields.

We will create two different plots, one with the padding and another without it.
The plots will show events per weekday and time of day.

import plotly.plotly as py
import plotly.graph_objs as go

from random import randint

We are using randomly generated data to use in the plots.

The variable hours represents the hours in a day.

hours = ['00','01','02','03','04','05','06','07','08','09','10', '11','12','13','14','15','16','17','18','19','20','21','22','23']

The variable days represents all the days in a week.

days = ['Saturday','Friday','Thursday','Wednesday','Tuesday','Monday','Sunday']

Using the randint function, we will generate the events in the range from 1000 to 1800.

events = [[randint(1000, 1800) for j in range(24)] for i in range(7)]

Heatmap without padding

data = [go.Heatmap(
  z = events,
  y = days,
  x = hours,
  colorscale = 'Viridis'
)]

layout = go.Layout(
  title = 'Events per weekday &amp; time of day',
  xaxis = dict(
    tickmode = 'linear'
  )
)

fig = go.Figure(data=data, layout=layout)

py.plot(fig, filename='heatmap-without-padding')

Heatmap with padding

We are setting both horizontal and vertical padding of 5 pixels.

data = [go.Heatmap(
  z = events,
  y = days,
  x = hours,
  xgap = 5,
  ygap = 5,
  colorscale = 'Viridis'
)]

layout = go.Layout(
  title = 'Events per weekday &amp; time of day',
  xaxis = dict(
    tickmode = 'linear'
  )
)

fig = go.Figure(data=data, layout=layout)

py.plot(fig, filename='heatmap-with-padding')

↧

Segmented Funnel charts in Python using Plotly

January 3, 2017, 9:37 am

≫ Next: 7 Interactive Plots from the Pharmaceutical Industry

≪ Previous: Heatmaps with padding gaps in Plotly

Funnel Charts are often used to represent data in different stages of a business process. You can learn more about them in our previous post, Funnel charts in Python using Plotly.

In this post, we will learn about creating Segmented Funnel Charts.

Instead of having a single source of data like the funnel charts, the segmented funnel charts have multiple data sources.

We are going to use a sample dataset from a dummy E-commerce firm’s quarterly product sales. The funnel chart will represent the users at different stages of the process. We can also inspect the number of users contributed by different segments (channels).

Here is the dataset for this post, segment-funnel-dataset.csv.

IPython Notebook for the source code is available here.

	Ad	Media	Affiliates	Referrals	Direct
Visit	9806	13105	6505	2517	24321
Sign-up	3065	6096	3011	1710	11453
Selection	1765	3592	2234	1555	8603
Purchase	1507	2403	1610	1005	5798

import plotly.plotly as py
import plotly.graph_objs as go

from __future__ import division

# campaign data (download the file mentioned above)
import pandas as pd
df = pd.read_csv('segment-funnel-dataset.csv')

# color for each segment
colors = ['rgb(63,92,128)', 'rgb(90,131,182)', 'rgb(255,255,255)', 'rgb(127,127,127)', 'rgb(84,73,75)']

We can calculate the total number of users in each phase using DataFrame.iterrows() method.

total = [sum(row[1]) for row in df.iterrows()]

Number of phases and segments can be calculated using the shape (returns a tuple) attribute of DataFrame.

n_phase, n_seg = df.shape

We are using a fixed width for the plot and the width of each phase will be calculated according to the total users compared to the initial phase.

plot_width = 600
unit_width = plot_width / total[0]

phase_w = [int(value * unit_width) for value in total]

# height of a section and difference between sections 
section_h = 100
section_d = 10

# shapes of the plot
shapes = []

# plot traces data
data = []

# height of the phase labels
label_y = []

A phase in the chart will be a rectangle made of smaller rectangles representing different segments.

height = section_h * n_phase + section_d * (n_phase-1)

# rows of the DataFrame
df_rows = list(df.iterrows())

# iteration over all the phases
for i in range(n_phase):
    # phase name
    row_name = df.index[i]
    
    # width of each segment (smaller rectangles) will be calculated
    # according to their contribution in the total users of phase
    seg_unit_width = phase_w[i] / total[i]
    seg_w = [int(df_rows[i][1][j] * seg_unit_width) for j in range(n_seg)]
    
    # starting point of segment (the rectangle shape) on the X-axis
    xl = -1 * (phase_w[i] / 2)
    
    # iteration over all the segments
    for j in range(n_seg):
        # name of the segment
        seg_name = df.columns[j]
        
        # corner points of a segment used in the SVG path
        points = [xl, height, xl + seg_w[j], height, xl + seg_w[j], height - section_h, xl, height - section_h]
        path = 'M {0} {1} L {2} {3} L {4} {5} L {6} {7} Z'.format(*points)
        
        shape = {
                'type': 'path',
                'path': path,
                'fillcolor': colors[j],
                'line': {
                    'width': 1,
                    'color': colors[j]
                }
        }
        shapes.append(shape)
        
        # to support hover on shapes
        hover_trace = go.Scatter(
            x=[xl + (seg_w[j] / 2)],
            y=[height - (section_h / 2)],
            mode='markers',
            marker=dict(
                size=min(seg_w[j]/2, (section_h / 2)),
                color='rgba(255,255,255,1)'
            ),
            text="Segment : %s" % (col_name),
            name="Value : %d" % (df[col_name][row_name])
        )
        data.append(hover_trace)
        
        xl = xl + seg_w[j]

    label_y.append(height - (section_h / 2))

    height = height - (section_h + section_d)

We will use text mode to draw the name of phase and its value.

# For phase names
label_trace = go.Scatter(
    x=[-350]*n_phase,
    y=label_y,
    mode='text',
    text=df.index.tolist(),
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

data.append(label_trace)
 
# For phase values (total)
value_trace = go.Scatter(
    x=[350]*n_phase,
    y=label_y,
    mode='text',
    text=total,
    textfont=dict(
        color='rgb(200,200,200)',
        size=15
    )
)

data.append(value_trace)

We will style the plot by changing the background color of the plot and the plot paper, hiding the legend and tick labels, and removing the zeroline.

layout = go.Layout(
    title="<b>Segmented Funnel Chart</b>",
    titlefont=dict(
        size=20,
        color='rgb(230,230,230)'
    ),
    hovermode='closest',
    shapes=shapes,
    showlegend=False,
    paper_bgcolor='rgba(44,58,71,1)',
    plot_bgcolor='rgba(44,58,71,1)',
    xaxis=dict(
        showticklabels=False,
        zeroline=False,
    ),
    yaxis=dict(
        showticklabels=False,
        zeroline=False
    )
)

fig = go.Figure(data=data, layout=layout)
py.plot(fig)

You can even analyze different segments by hovering on them.

↧

Running boot.stepAIC()

Collecting required information

Plot

Introduction

Quick Start

The Data

Related Work

Key Macroeconomic Indicators

Monitary Policy Transmission

Nominal and Real Fed Funds Rate

Some other examples:

Volume of google searches related to immigrating to Canada

AIDS related Visualization

The Problem

Solution using ggrepel

Solution using plotly

The Code

Principal Component Analysis and Hierarchical Clustering

First figure using ggplot2

Second figure using ggplot2 with ggrepel

Interactive plot using plotly

References

Re-Styling a graph

Changing chart type

Data Collection

Data Transformation

Shot Locations

Creating the court

1. Outer Lines

2. basketball hoop

3. Basket Backboard

4. Outer box of three-second area

5. Inner box of three-second area

6. Three-point line (left)

7. Three-point line (right)

8. Three-point arc

9. Center circle

10. Restraining circe

11. Free-throw circle

12. Restricted area

Charting the shots

Data Collection

Inspecting for missing data

Data Transformation

Package Downloads Comparison (daily)

Package Downloads Comparison (Monthly)

Growth of Plotly package downloads

Sample data

ROC Curve Computation

Plot

Install / update packages

Dataset

Global settings

Creating the Ideogram

Creating Chords

Data Collection

Creating the ice hockey rink

Outer lines

red lines

blue line

face-off spots and circles

Face-off lines

Goal Crease

Referee Crease

Point Chart for Shots

Heatmap Chart for Shots

Heatmap without padding

Heatmap with padding

Running `boot.stepAIC()`

Solution using `ggrepel`

Solution using `plotly`

First figure using `ggplot2`

Second figure using `ggplot2` with `ggrepel`

Interactive plot using `plotly`