Graduate Fellows Scan, Model, and Map: New Discoveries from Sermons to Ballet
3 June 2026
Digital collage featuring illustrative elements of fellows' research. Credit: Carrie Ruddick
This spring, six CDH Graduate Fellows arrived with their research in progress, asking six different questions across multiple disciplines ranging from History and Comparative Literature to Music and English. Their findings? Working with data and computational methods rarely unfolded the way they expected, and while oftentimes arduous, the labor uncovered “strange and beautiful” discoveries.
"We are receiving more competitive applications to the Graduate Fellowship than ever before," said Grant Wythoff, who directs CDH graduate student programs. "Some of these emerging scholars bring knowledge of data curation standards and machine learning methods. Others are tuned into the latest debates on AI's political and epistemological impacts. The mix of voices makes for an incredibly exciting group dynamic."
Cecelia Ramsey (French and Italian) came to the CDH with a question about literary afterlives: what makes a book experience a revival many years after its initial release? To study this at scale, she worked with BiblioBase, a database of the nineteenth-century Bibliographie de la France, tracking gaps between editions and reeditions and looking for patterns in a book's reintroduction.
The data was messy—inconsistent titles, variable spellings of authors' names—and rather than cleaning the inconsistencies away, Cecelia explored them as an opportunity to learn more about the nature of reeditions and the format of the Bibliographie itself. "Interacting with the messy data taught me how slippery the very name of a work can be," she reflected.
It was also her first substantial engagement with DH methods, and she described the fellowship's atmosphere as essential. When Grant Wythoff opened the semester by telling the cohort it was normal not to know things—that the DH world is so interdisciplinary that all scholars often feel that way—it changed what was possible. "This introduction made it a space where it's normal to ask questions, to learn, and to just be openly curious," Cecelia said. "What a gift."
A pedigree chart tracing the lineage of Ignez de Guiné, the matriarch of several prominent Portuguese families.
Amanda Pinheiro (History) has been working with a database developed over the last ten years, containing 115,545 baptismal, notarial, and judicial documents from ten villages in colonial Brazil. These records detail the eighteenth- and nineteenth-century lives of roughly 6,000 individuals who inhabited the south of Brazil and may have migrated through the frontier zone between the Spanish and Portuguese empires. The issue: that number is inflated by duplicate names, with varying spellings and characteristics across documents. Her fellowship project used Splink, a Python library for probabilistic record linkage, to calculate the statistical likelihood that two records refer to the same individual—generating a unique identifier for each person so she could cross-reference this database with her new archival findings.
I realized that automation necessarily requires diligent and continuous manual labor.
Amanda Pinheiro
What surprised her was how much human judgment, or as she describes it, laborious decision-making, the automation required. "I realized that automation necessarily requires diligent and continuous manual labor," Amanda reflected. "The two are interconnected and walk hand-in-hand in the digital realm." With Wouter Haverals’ (Associate Research Scholar, CDH; Perkins Fellow, Humanities Council) guidance, she completed Splink tests and generated an analytical report on the quality of her datasets—one she hopes to publish and add to her metadata in the future.
Amy Weng (English) asked whether a seventeenth-century English preacher's confessional affiliation—Anglican, Nonconformist, or Catholic—leaves a detectable fingerprint in their printed sermons. Her project, Godly and Learned Divines (GoLD), represented each of 809 preachers across 2,877 books as a 1,000-dimensional vector built from scripture citations, named references, topic modeling, and entity types, then trained a Random Forest classifier to predict denomination. The model achieved an F1-score (a machine learning metric used to evaluate the performance of a classification model) of 0.80 for Anglicans, 0.51 for Nonconformists, and 0.12 for Catholics.
Wikidata-Linkable Preachers in EEBO-TCP
Clustered based on the distribution of topics, named entities, and scriptural references in their sermons
The results were striking. Place of education turned out to be the least important feature—far less predictive than the types of sources a preacher reached for. "Bible versions matter more than the proportions of Bible divisions," Amy concluded, "and ancient entities once again outrank medieval and contemporary references. Generally, godly learnedness—patterns of referencing scripture—distinguishes preachers across confessional divides more than overall learnedness."
Amy credited Jacob Murel (Research Software Engineer, Classics) for sustained mentorship on using large language models for orthographic standardization, and Wouter Haverals for introducing her to Wikidata reconciliation.
Pierre Azou (French and Italian) examines the relationship between literature and political violence in his doctoral research and found himself drawn to the “digital sphere” as the space where the questions he studies in published books are being reconfigured. For his fellowship project investigating the link between "manliness" and insecurity in contemporary French public discourse, he turned to two foundational texts in the French debate on masculinity: Élisabeth Badinter's XY, de l'identité masculine (1992) and Éric Zemmour's Le Premier Sexe (2006). These works contain opposing premises, one theorizing a fragile masculinity, the other insisting it is strong but under siege, yet both binding manliness tightly to a language of threat and crisis.
Using keyness analysis and topic modeling in Python allowed Pierre to compare the density of each author's clusters of insecurity-related words (fear, violence, war, crisis, domination). He identified the author's statistically distinctive vocabulary and examined the semantic neighborhoods of shared terms. The most productive approach was contextual: looking at what words appear near a key shared term like virilité in each text. "It turns out the same word lives in completely different semantic environments in Badinter and Zemmour," Pierre noted.
A technical challenge gave him pause early on—preprocessing French text that contained English-language citations required combining stopword lists (words like “a,” “the,” “and” or “un,” “le,” “et”) and filtering bibliographic noise—but his more substantive reflection was on what data cleaning actually does. "It reminded me that cleaning decisions in DH are more than purely technical, as they also shape the findings."
Cleaning decisions in DH are more than purely technical, as they also shape the findings.
Pierre Azou
Nathaniel Gallant (Comparative Literature) studies the relationship between Buddhism and the history of dramatic and poetic theory across Japanese and Tibetan literary traditions. In his daily research, he relies on well-developed digital tools and databases built for pre-modern Japanese sources—resources that reflect years of philological groundwork by scholars who came before him. For Tibetan studies, that infrastructure is still being built. DH projects in the field are scattered across academic, non-profit, and private spheres, with no centralized view of what exists or where the gaps are.
Nathaniel’s fellowship project addressed that directly: he created a database cataloging existing DH projects in Tibetan studies, with visualizations mapping networks of funding sources, text archives, OCR and LLM development projects, and institutional stakeholders. The goal was to understand current patterns in project development and identify potential directions for future text-digitization projects, particularly in the history of Tibetan literature and poetry.
The hours of scanning documents, mental grappling...crystallized into something coherent, beautiful.
Rachel Glodo
Each issue of the Yearbook of the Imperial Theaters includes detailed lists (spiski) of creators and artists.
Rachel Glodo (Music) is reconstructing the world of the Imperial Ballet in the Russian Silver Age through eighteen volumes of the Yearbook of the Imperial Theaters (1890–1908)—elaborate annual retrospectives documenting productions, performers, choreographers, musicians, designers, and administrators across St. Petersburg and Moscow. The challenge was getting that data out of the page and into a form that a researcher could query. Rachel used optical text recognition (OTR) to convert nineteenth-century printed Cyrillic into machine-readable text, while Andy Janco (Digital Scholarship Specialist) developed custom Python scripts, based on her project design, to convert images of lists and tables into structured spreadsheets.
What Rachel hadn't anticipated was how much the project would begin with physical, analog labor. "The most challenging part of my project wasn't the implementation of DH methodologies," she said, "but the quotidian task of scanning and saving thousands of images spanning 18 volumes." She described it as "the strange and beautiful juxtaposition of 'distant' and 'close' readings that characterizes DH." And she was surprised by how much the technology itself shifted between her original proposal and the start of the fellowship—she ended up using an entirely different processing strategy than she had planned, with Christine Roughan (Postdoctoral Research Associate, CDH/MARBAS) and Andy as crucial partners in identifying her priorities and methods.
A eureka moment was had when they ran the Python scripts together for the first time. "All the hours of scanning documents, mental grappling, design, and redesign suddenly crystallized into something coherent, beautiful, and—almost miraculously—exactly what I needed," she recalled. "It was a glorious moment."
A record of all productions on the Imperial stages, including ballets and operas.
Throughout the semester, the cohort's monthly sessions became as important as the technical work itself. "The regular meetings provided me with more productive time and space to learn about digital tools than scheduling different consultations could have," Amanda said. For Cecelia, the cross-disciplinary exchange was its own kind of finding: "It's exciting to step outside your discipline and be invited into someone else's world while it's still in the making—while they're still experimenting and puzzling through the challenges."
Interested in applying for a Graduate Fellowship? Visit here, or head to the CDH Graduate Program page to see more opportunities for graduate students.
Related posts
Graduate Fellowships
A one-semester studio for workshopping research in progress.