Plotnine Grammar of Graphics

  • ID: VISPY-005
  • Type: Lesson
  • Audience: Public
  • Theme: Grammar-of-graphics thinking

Matplotlib and Seaborn are excellent for building figures directly.

Plotnine is useful when you want to think in layers:

The point of this lesson is not to adopt a new library. It is to learn a mental model that improves plotting in any library.


Setup

import pandas as pd

from plotnine import (
    ggplot, aes, geom_point, geom_smooth,
    geom_boxplot, facet_wrap, labs,
    theme_light
)

from cdi_viz.theme import cdi_notebook_init, show_and_save_plotnine, cdi_theme_plotnine

# Chapter init (shared counter + plotly template if needed elsewhere)
cdi_notebook_init(chapter="05")

df = pd.read_csv("data/cdi-student-outcomes.csv")
df.head()
group test_prep study_hours math_score reading_score writing_score
0 Group B completed 3.9 58 64 51
1 Group A none 7.7 67 85 61
2 Group A none 9.3 83 65 73
3 Group A none 3.9 60 67 48
4 Group A none 8.3 68 63 47

Core idea: mappings vs style

In grammar-of-graphics, you separate:

  • what the data encodes (mappings)
  • what is only styling (colors, sizes, labels)

Example:

  • mapping: x=study_hours, y=math_score, color=test_prep
  • styling: point transparency, line thickness

Relationship plot (layered)

p = (
    ggplot(df, aes(x="study_hours", y="math_score", color="test_prep"))
    + geom_point(alpha=0.55)
    + geom_smooth(method="lm", se=False)
    + labs(
        title="Study Hours vs Math Score",
        subtitle="Trend shown separately by test preparation",
        x="Study hours per week",
        y="Math score",
        color="Test prep"
    )
    + theme_light(base_size=14)
)

show_and_save_plotnine(p)
Saved PNG → figures/05_001.png

What matters here is the separation:

  • aes(...) describes meaning (what encodes what)
  • geom_* layers add marks and models
  • labels and theme tune readability without changing meaning

Group comparison (summary)

p = (
    ggplot(df, aes(x="test_prep", y="math_score"))
    + geom_boxplot()
    + labs(
        title="Math Score by Test Preparation",
        subtitle="Boxplots summarize differences across groups",
        x="Test preparation",
        y="Math score"
    )
    + theme_light(base_size=14)
)

show_and_save_plotnine(p)
Saved PNG → figures/05_002.png

Boxplots summarize:

  • central tendency (median)
  • spread (IQR)
  • potential outliers

Faceting: small multiples

Faceting repeats the same plot by group. This forces disciplined comparison because every panel uses the same encodings.

p = (
    ggplot(df, aes(x="study_hours", y="math_score"))
    + geom_point(alpha=0.55)
    + geom_smooth(method="lm", se=False)
    + facet_wrap("~group")
    + labs(
        title="Study Hours vs Math Score by Group",
        subtitle="Same plot repeated by group for disciplined comparison",
        x="Study hours per week",
        y="Math score"
    )
    + theme_light(base_size=14)
)

show_and_save_plotnine(p)
Saved PNG → figures/05_003.png


Key Takeaways

  • Grammar-of-graphics is a mental model, not a brand.
  • Separate mappings (meaning) from styling (appearance).
  • Layering helps you modify plots without restarting.
  • Faceting is one of the cleanest ways to compare groups.