Interpreting Q-Q Plot in Linear Regression: Explained

Interpreting Q-Q Plot in Linear Regression: Explained

======================================================

Quantile-Quantile (Q-Q) plot is a graphical tool that helps us assess whether a set of data plausibly came from some theoretical distribution, such as Normal, exponential or Uniform. Additionally, it helps determine if two data sets come from populations with a common distribution.

Advantages

  1. Flexible: Q-Q plot can be used with sample sizes.
  2. Robust: Many distributional aspects like shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can all be detected from this plot.

Scenarios Checked

  1. Common Distribution: Whether two data sets come from populations with a common distribution.
  2. Common Location and Scale: Whether two data sets have the same mean and standard deviation.
  3. Similar Distributional Shapes: Whether two data sets have similar distributional shapes (e.g., symmetric or skewed).
  4. Similar Tail Behavior: Whether two data sets have similar tail behavior (i.e., extreme values).

Interpretation

A Q-Q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.

  • Similar Distribution: If all points lie on or close to a straight line at an angle of 45 degrees from the x-axis.
  • Different Distribution: If all points lie away from the straight line at an angle of 45 degrees from the x-axis.

Python

The statsmodels.api package provides qqplot() and qqplot_2samples() to plot Q-Q graphs for single and two different data sets, respectively.


Interpreting Shape of QQ Plot of Standardized Residuals

When interpreting the shape of a Q-Q plot of standardized residuals, we can identify several patterns:

  • Fatter Tails: The ends of the line of points turn counter-clockwise relative to the middle, indicating that the tails of your distribution are fatter than those of a true Normal distribution.
  • Mixture of Distributions: The shape suggests that you have a mixture of two distributions with the same mean but different standard deviations.

Code Example

The following R code can be used to generate a plot similar to yours:

set.seed(646) # this makes the example exactly reproducible
s = 4 # this is the ratio of SDs
x = c(rnorm(11600, mean=0, sd=1), # 99.7% of the data come from the 1st distribution
 rnorm( 400, mean=0, sd=s)) # small fraction comes from 2nd dist w/ greater SD
qqnorm(x) # a basic qq-plot


The Q-Q plot is a powerful tool for identifying and understanding the underlying distribution of your data. In this article, we have discussed how to interpret the Q-Q plot in linear regression and how to identify common scenarios such as different distributions, locations, scales, shapes, and tail behaviors. Additionally, we have shown an example code in R that can be used to generate a plot similar to yours.

Leave a comment