Scatterplot in R

Scatterplot in R

R is a popular programming language and environment for statistical computing and graphics. One of its most powerful tools is the scatterplot, which is used to visualize relationships between two continuous variables. In this article, we will explore various ways to create scatterplots in R.

Basic Scatterplot

The basic scatterplot can be created using the plot function.

plot(y ~ x)

This will create a scatterplot of y against x. You can customize the plot by adding labels, titles, and other features.

Customizing the Legend

You can disable the legend setting the legend argument to FALSE.

scatterplot(x, y,
 boxplots = "", # Disable boxplots
 grid = FALSE, # Disable plot grid
 ellipse = TRUE) # Draw ellipses

There are more arguments you can customize, so recall to type ?scatterplot for additional details.

Scatterplot Matrix

When dealing with multiple variables, it is common to plot multiple scatter plots within a matrix, which will plot each variable against others to visualize the correlation between variables. You can create a scatter plot in R with multiple variables using the pairs function.

pairs(~disp + wt + mpg + hp, data = mtcars)

In addition, if your dataset contains a factor variable, you can specify the variable in the col argument as follows to plot the groups with different colors.

pairs(~disp + wt + mpg + hp, col = factor(mtcars$am), pch = 19, data = mtcars)

An alternative is to use the scatterplotMatrix function of the car package, which adds kernel density estimates in the diagonal.

install.packages("car")
library(car)

scatterplotMatrix(~ disp + wt + mpg + hp, data = mtcars)

You can customize the colors of the previous plot with the corresponding arguments:

scatterplotMatrix(~ disp + wt + mpg + hp, data = mtcars,
 diagonal = FALSE, # Remove kernel density estimates
 regLine = list(col = "green", # Linear regression line color
 lwd = 3), # Linear regression line width
 smooth = list(col.smooth = "red", # Non-parametric mean color
 col.spread = "blue")) # Non-parametric variance color

Scatterplot with ggplot2

Creating a scatter graph with the ggplot2 library can be achieved with the geom_point function, and you can divide the groups by color passing the aes function with the group as parameter of the colour argument.

# install.packages("ggplot2")
library(ggplot2)

my_df <- data.frame(x = x, y = y, group = group)

ggplot(my_df, aes(x = x, y = y)) +
 geom_point(aes(colour = group)) + # Points and color by group
 scale_color_discrete("Groups") + # Change legend title
 xlab("Variable X") + # X-axis label
 ylab("Variable Y") + # Y-axis label
 theme(axis.line = element_line(colour = "black", # Changes the default theme
 size = 0.24))

3D Scatterplot

With scatterplot3d and rgl libraries, you can create 3D scatter plots in R. The scatterplot3d function allows to create a static 3D plot of three variables.

# install.packages("scatterplot3d")
library(scatterplot3d)

set.seed(2)
x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000)

scatterplot3d(x, y, z, pch = 19, color = "blue")

An alternative is to use the plot3d function of the rgl package, which allows an interactive visualization. You can rotate, zoom in and zoom out the scattergram.

# install.packages("rgl")
library(rgl)

plot3d(x, y, z,
 type = "s", # Type of the plot
 radius = 0.1, # Radius of the observations
 col = "lightblue", # Color of the observations
 xlab = "X axis lab", # Label of the X axis
 ylab = "Y axis lab", # Label of the Y axis
 zlab = "Z axis lab") # Label of the Z axis
```, R provides a variety of ways to create scatterplots, including basic scatterplot, scatterplot matrix, and 3D scatterplot. With the help of libraries such as `ggplot2` and `rgl`, you can customize your scatterplots with colors, labels, and other features.