SHERLOCK HOLMES: The Artometrics of Sherlock Holmes

This report analyzes the TidyTuesday 2025-11-18 release on Sherlock Holmes — 65,958 rows after cleaning and merge. How is word count distributed across the Holmes canon?

Five charts track Word count across time, category, and named entities — trend, leaders, distribution, tiers, and relationships. Where companion files exist in the repo, they are joined before analysis so reception, geography, or metadata columns are not left on the table.

FAST FACTS

65,958Records in the working dataset

12.0Median Word count

18.0Highest observed Word count

The Adventure of the Beryl CTop Book by Word count

DATASET CONTEXT

The source is the TidyTuesday release from 2025-11-18 (R for Data Science community). This working file contains 65,958 rows and 4 columns after merging all available CSV/XLSX tables in the week folder.

Charts are exported as Plotly JSON with PNG fallbacks. Medians are used for robustness where distributions skew. Index-style fields (row numbers, sequential IDs) are excluded from metric selection.

How to read this report: start with the chart caption, then ask what the metric actually means, what a non-expert should notice first, and what an expert would challenge in the source. The goal is not to memorize every number; it is to leave with a sharper question than the one you arrived with.

Reader path: if you are new to the topic, treat each chart as a guided tour of one question: who leads, how concentrated the field is, what changes over time, and where the outliers sit. If you already know the domain, use the same charts as a challenge: check whether the metric is the right proxy, whether the source omits an important population, and whether the headline survives the limitations section.

CHART 1 — BREAKDOWN

Word count by Book

The Yellow Face leads at 13.0; A Case of Identity anchors the low end at 12.0.

Grouping by book exposes how the metric varies across the catalog's major entities.

CHART 2 — LEADERS

The Yellow Face leads at 13.0 — 12.0 marks the median among the top dozen

The Yellow Face leads at 13.0 — 12.0 marks the median among the top dozen.

Head-of-field concentration is where quality, scale, or brand visibly separates from the pack.

CHART 3 — DISTRIBUTION

Word count Distribution

Median 12.0 vs mean 10.9 — the shape is relatively symmetric.

The top decile begins at 14.0; that tail is where defining cases live.

CHART 4 — CONCENTRATION

Cumulative Word count

The top 5 book entries account for 34% of the aggregate word count.

Steep Pareto curves mean a small head drives most of the signal — the long tail is noise until it isn't.

SUPPLEMENT — CONCENTRATION

Cumulative Word count

The top 5 book entries account for 34% of the aggregate word count.

Steep Pareto curves mean a small head drives most of the signal — the long tail is noise until it isn't.

LIMITATIONS

Community-cleaned TidyTuesday snapshots are not live APIs. Missing values, spelling variants, and week-of-export coverage limits apply. Merged tables may fan out or duplicate rows when join keys are imperfect.

Findings describe the file on hand — treat them as structural signals about Sherlock Holmes, not exhaustive truth about the full domain.

CONCLUSION

Read as a teaching map, Sherlock Holmes shows why one metric is rarely enough: leaders, tails, trends, and relationships each answer a different question about word count.

The best reading is modest: use the chart to sharpen the question, then check the source and limits before turning it into a claim.

REFERENCES

Data Science Learning Community. (2025). TidyTuesday: Sherlock Holmes. https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-11-18/holmes.csv

EDITOR'S NOTE

Artometrics data report from the TidyTuesday research pipeline. Charts and aggregates are reproducible from the embedded exhibits and public source files.

View TidyTuesday source on GitHub