Principal Component Analysis Cluster Plots with Plotly

The Problem

When clustering data using principal component analysis, it is often of interest to visually inspect how well the data points separate in 2-D space based on principal component scores. While this is fairly straightforward to visualize with a scatterplot, the plot can become cluttered quickly with annotations as shown in the following figure:

Solution using `ggrepel`

The ggrepel package by Kamil Slowikowski implements functions to repel overlapping text labels away from each other and away from the data points that they label. It’s an easy to use package that works well in this example as shown in the following figure:

Solution using `plotly`

An alternative solution is to use interactive plots that are usable from the R console, in the RStudio viewer pane, in R Markdown documents, and in Shiny apps. Annotations can be viewed by hovering the mouse pointer over a point or dragging a rectangle around the relevant area to zoom in. Interactive plots using plotly allow you to de-clutter the plotting area, include extra annotation information and create interactive web-based visualizations directly from R. Once uploaded to a plotly account, plotly graphs (and the data behind them) can be viewed and modified in a web browser.

The resulting plot is clean and not cluttered with text annotations. While the ggrepel package provides a nice solution in this example, the plotly solution will be even more useful with a larger number of data points.

The Code

Principal Component Analysis and Hierarchical Clustering

# cor = TRUE indicates that PCA is performed on 
# standardized data (mean = 0, variance = 1)
pcaCars <- princomp(mtcars, cor = TRUE)

# view objects stored in pcaCars
names(pcaCars)

# proportion of variance explained
summary(pcaCars)

# scree plot
plot(pcaCars, type = "l")

# cluster cars
carsHC <- hclust(dist(pcaCars$scores), method = "ward.D2")

# dendrogram
plot(carsHC)

# cut the dendrogram into 3 clusters
carsClusters <- cutree(carsHC, k = 3)

# add cluster to data frame of scores
carsDf <- data.frame(pcaCars$scores, "cluster" = factor(carsClusters))
carsDf <- transform(carsDf, cluster_name = paste("Cluster",carsClusters))

First figure using `ggplot2`

library(ggplot2)
p1 <- ggplot(carsDf,aes(x=Comp.1, y=Comp.2)) +
      theme_classic() +
      geom_hline(yintercept = 0, color = "gray70") +
      geom_vline(xintercept = 0, color = "gray70") +
      geom_point(aes(color = cluster), alpha = 0.55, size = 3) +
      xlab("PC1") +
      ylab("PC2") + 
      xlim(-5, 6) + 
      ggtitle("PCA Clusters from Hierarchical Clustering of Cars Data") 

p1 + geom_text(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

Second figure using `ggplot2` with `ggrepel`

library(ggplot2)
library(ggrepel)

p1 + geom_text_repel(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

Interactive plot using `plotly`

library(plotly)
p <- plot_ly(carsDf, x = Comp.1 , y = Comp.2, text = rownames(carsDf),
             mode = "markers", color = cluster_name, marker = list(size = 11)) 

p <- layout(p, title = "PCA Clusters from Hierarchical Clustering of Cars Data", 
       xaxis = list(title = "PC 1"),
       yaxis = list(title = "PC 2"))

p

References

PCA with R by Gaston Sanchez

Principal Component Analysis Cluster Plots with Plotly

The Problem

Solution using `ggrepel`

Solution using `plotly`

The Code

Principal Component Analysis and Hierarchical Clustering

First figure using `ggplot2`

Second figure using `ggplot2` with `ggrepel`

Interactive plot using `plotly`

References

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112

The Problem

Solution using ggrepel

Solution using plotly

The Code

Principal Component Analysis and Hierarchical Clustering

First figure using ggplot2

Second figure using ggplot2 with ggrepel

Interactive plot using plotly

References

Trending Articles

Solution using `ggrepel`

Solution using `plotly`

First figure using `ggplot2`

Second figure using `ggplot2` with `ggrepel`

Interactive plot using `plotly`