When to Use Bar Chart vs Scatter: Understanding Data Visualization

When to Use Bar Chart vs Scatter: Understanding Data Visualization

In this era of information abundance, data visualization is designed to facilitate knowledge transfer between creators and receivers. Therefore, it is crucial for dashboard creators to understand which chart aligns with the key delivery objectives. Moreover, having a basic understanding of each chart's underlying meaning helps the audience interpret dashboards effectively.

In this article, I will introduce a way to better comprehend some common charts and graphs by categorizing them into four main types: distribution, comparison, composition, and correlation. This does not mean that it is a clear-cut solution or a rigid boundary that limits us to only use a chart in one certain way. Rather, it is adrawn from my experience regarding what each chart is able to communicate.

Distribution

This type of data visualization helps interpret univariate analysis results in the early analytical stage. Simply put, it shows where data points are dense and where they are sparse in one dimension. Distribution charts can also be widely applied in market research, such as demographics analysis and customer segmentation. Some common charts under this category include histogram, box plot, and map.

Histogram

Histograms look very similar to bar charts because, oh well, it is also composed of bars. However, instead of comparing categorical data, it breaks down a numeric data into interval groups and shows the frequency of data falling into each group. It is commonly used to gain insights about customers, such as Pinterest using histograms to show the age distribution of their audience. Histograms are good at identifying patterns of data distribution on a numeric spectrum.

Map

Maps are frequently used to show demographic data by linking to geospatial data and indicating where audiences or customers are located. The logic behind map charts is that numeric values are aggregated by a geospatial attribute. In the map below, not only does it show the variation within an individual subject group but also it displays the variation among critical reading group, mathematics group, and writing group.

Comparison

Charts that fall under this category provide a bird's-eye view of data. For instance, pie charts, stacked bar charts, and area charts are designed to illustrate part-to-whole relationships.

Pie Chart

Pie charts are used to represent the percentage and weight of components belonging to one categorical attribute. The size of the pie slice is proportional to the percentage, hence it intuitively depicts how much each component occupies the whole.

Stacked Bar Chart

Stacked bar charts are used when we need to break down a primary category into a secondary category. As we can see in the chart below, it is very similar to the bar chart we saw earlier. Horizontally, it compares the performance of each market. Vertically, it further demonstrates the composition of each segment within the market.

Area Chart

Area charts map the measure of a categorical dimension against a date-type variable. The chart below shows how the profit of each product category fluctuates over time. It is different from line charts, since the measure is accumulated and stacked from bottom to top. Therefore, it can be utilized to illustrate how each category contributes to the whole throughout the timeline.

Correlation

Charts that fall under this category assist in discovering whether one or more pairs of variables are related. It mainly indicates a dependency between variables instead of a causal relationship (causal, not "casual", relationship means that variable x causes changes in variable y). Scatter plots and heatmaps are great tools to depict correlation.

Scatter Plot

Scatter plots plot one numeric attribute against another numeric attribute and visualize the correlation between axes. Scatter plots are commonly applied to identify regression-type relationships such as linear regression, logistic regression, etc. It also provides a robust analysis of correlation significance. We can estimate that the correlation relationship is stronger when data points are concentrated on certain areas, whereas the relationship is weak if they are sparse.

Heatmap

Heatmaps are commonly used as a visual representation of correlation matrices. They are powerful techniques to find correlated attributes in principle component analysis (PCA). By using a gradient color code, we can directly visualize which attribute-pairs are strongly correlated. In the heatmap below, highly positively correlated attributes are in darker blue., understanding data visualization is crucial for dashboard creators and their audience. By categorizing charts into four main types – distribution, comparison, composition, and correlation – we can better comprehend each chart's underlying meaning.

Leave a comment