Info: Call for Graduate Fellows!

Applications for the CDH Graduate Fellowship are open through March 31, 2025. Apply now.

Humanities Data

Providing resources and education around the development and use of data for, in, and about the humanities


Approaching humanities sources as data is a key component of digital humanities work. At the CDH, we focus on critically engaging with the concepts and methods in data science as they apply to the creation and analysis of humanities data.

We grapple with the technical necessity of cleaning, reducing, and normalizing data in ways that don't hide diversity and nuance. How much messiness should we maintain in our data and how can we do so? How do we want to invite other scholars to view our data: as the polished result of an investigation or argument? Or, as an experimental lens on our subject matter, a sandbox that we invite others to play in?

These are the questions negotiated by datasets produced at the CDH, each of which imagines different forms of argumentation and different kinds of stories that can be told in humanities scholarship. These are also the questions we encourage others to ask through our programming and resources.

Datasets published by the CDH

Princeton Prosody Archive

Inviting users to rethink poetry's past through a collection of historical prosodic works

Built by CDH

Shakespeare and Company Project

Recreating the world of the Lost Generation in interwar Paris

Built by CDH

Derrida’s Margins

An online research tool for the philosopher’s annotations that provides a behind-the-scenes look at his reading practices and the philosophy of deconstruction.

Built by CDH

Princeton Ethiopian Miracles of Mary Project

Folklore about How the Virgin Mary Helps Believers in Ethiopian Literature and Art


Resources for finding humanities data

Resource Description

Curated List of Humanities Datasets

CDH maintained list of humanities datasets

Journal of Open Humanities Data

Peer-reviewed forum for reports on the curation and publication of new datasets

UC Berkeley Library's Text Mining & Computational Text Analysis

Web portal with tutorials, sources, etc.

Jupyter Notebooks for digital humanities

Collection of data analysis notebooks, curated by Quinn Dombrowski

Rutgers University Libraries Datasets

Datasets distinctive or unique to Rutgers

Matthew Lavin's list of datasets

Datasets list

List maintained by Melanie Walsh, including example uses and tutorials for each dataset

Alan Liu's DH Toychest

List of text corpora

Data-focused programming and curriculum

Humanities + Data Science Institute

A five-day intensive faculty seminar to explore the conceptual, practical and ethical aspects of data science