Senior Thesis Prize Winners

“Gendered Topics: Boyhood and Girlhood in a Century of (Cotsen) Children’s Literature”

My thesis . . . examined a hundred years of children’s literature (from 1800–1900). . . . I used the nineteenth-century catalogue of the Cotsen Children’s Library, a special collection at Princeton, as a guide to create my own dataset: the Cotsen Children’s Literature (CCL) dataset. I chose the nineteenth century both because it was in the public domain, allowing unrestricted academic access, and because it was a time of great change in children’s literature—for instance, it includes much of the so-called Golden Age. The Cotsen Children’s Library also allowed me to look at a diverse set of works—novels, anthologies, magazines, short stories, and even creative non-fiction.

Abstract: In order to examine themes across a century of children’s literature, the Cotsen Children’s Literature (CCL) dataset was curated for this project based on the nineteenth-century catalogue of the Cotsen Children’s Library. As a proof of concept for the corpus’s ability to shed light on the nineteenth century, the gendered nature of children’s publishing was examined by labeling texts in the dataset with the gender of their protagonist, which was assumed to be a proxy for the gender of the intended audience. By training a topic model on the subset of the CCL dataset that had a protagonist gender label, 125 topics were produced, 112 of which were statistically significant by protagonist gender. By conducting a qualitative analysis of these results, the gendered landscape of nineteenth-century children’s literature was explored, particularly how boys and girls occupy space differently. Specifically, the paper concludes that the home space is largely absent from boys’ stories but is the defining space for girls’ stories. This conclusion extends and revises existing children’s literature scholarship by describing how domesticity shapes boy and girl characters’ experiences of space and time.

“Defending Our Freedom: The U.S. Military, Environmental Contamination, and Ongoing Native Land Theft in the Choctaw Nation”

My thesis looked at the impact of the McAlester Army Ammunition Plant (McAAP) in McAlester, Oklahoma on the Choctaw Nation on the environment and public health. I used methods from anthropology, history, environmental studies, geosciences, digital humanities, and other fields to uncover the relationship between the facility and the lived experiences of McAlester residents.

Abstract: Relatively little is known about environmental contamination on American Indian reservations in the United States. Yet the problem is widespread in Indian Country. I used ArcGIS Online to uncover 1,250 Superfund sites – sites with uncontrolled hazardous waste – on or within five miles of 302 Tribal Nations. I then investigated the environmental health of a town on the reservation of my Tribe, the Choctaw Nation, where the U.S. military decommissions old bombs through daily detonations. By testing surface and tap water, and by installing and monitoring air sensors, I evaluated and documented contamination of Choctaw water, land, and air. At the same time, I used anthropological field research and interviews to explore Choctaw experiences of this contamination and its adverse health effects. I argue that these environmental assaults on the Choctaw Nation are an expression of ongoing Native Land theft, aided by the politicization of environmental data and inadequate regulations.

The Side Unseen (and accompanying essay, “Ethnographic Data Visualization as a Methodology to Visualize the Health Impacts of Structural Violence in Urban Philadelphia Communities”)

My thesis . . . takes the form of a digital ethnography, and exists as an online website. It is a compilation of interviews with community residents that are expressed in interactive formats, such as video, audio, and ethnographic data visualization. My thesis focuses on the anthropological topic of structural violence and how it manifests in health crises (such as the opioid epidemic, environmental contamination, and the COVID-19 pandemic) in urban Philadelphia neighborhoods. It aims to contextualize the numbers that society sees as representative of health by layering them in conversation with perspectives and stories of individuals who actually reside in these Philadelphia neighborhoods.


Structural violence harms individuals’ health; however, this connection is not broadly recognized in society because the relationships which constitute structural violence are invisible. This lack of recognition is compounded by society viewing data as representative of an ultimate truth.

This thesis is twofold; its primary work is the website, The Side Unseen. This website shows how ethnographic data visualizations can highlight a more complete story surrounding structural violence in Philadelphia. It can be accessed here.

This written methodology is a supplement to the website; it addresses the anthropological theory behind why structural violence demands visualization by discussing the subjectivity and power dynamics behind data creation. Ethnographic data visualization layers data with interlocutor narrative to emphasize the absence inherent in data. I argue that it is necessary to utilize an anthropological perspective while analyzing data, because all data are a social constructions.

“The Old Bailey, U.S. Reports, and OCR: Benchmarking AWS, Azure, and GCP on 360,000 Page Images”

One hundred eighty thousand images can be challenging to grasp. Looking at every single image in the Old Bailey for just one second each would take over two days and two hours. If a human worked at this for only eight hours a day, it would take a whole week. Therefore, I created a web-based “explorer” to enable rapid investigation and random access of the whole dataset by allowing humans to search for anomalies using dynamically loaded metrics and to build an intuition for the contents of a dataset without attempting to look at every page.

Abstract: The goal of my thesis was to benchmark three leading Optical Character Recognition (OCR) cloud services from Amazon, Microsoft, and Google on over 360,000 page images of legal documents in the Old Bailey in London (1674-1913) and the U.S. Supreme Court and predecessor courts (1754-1915). Over the course of my project, I ran over a million cloud OCR calls on all three services combined, which was possible through generous funding from Princeton. Error rates were calculated by comparing OCR results for the Old Bailey against human transcriptions created for each page by the Old Bailey Online Project. The Supreme Court records, which do not have human transcriptions, serve only as a relative measure of similarity between services. Ultimately, I found that Amazon’s Textract service had the lowest error rate on the Old Bailey dataset, followed by Google’s Vision, and finally Microsoft’s Cognitive Services OCR. This work also led to the creation of the open-source tigerocr tool, which enables reproducible benchmarking of cloud services by handling their different interfaces and presents a unified file format for results that includes coordinates for each text block, line, and word.

“Can a Machine Originate Art? Creating Traditional Chinese Landscape Paintings Using Artificial Intelligence”

The goal of my thesis was to generate traditional Chinese landscape paintings using artificial intelligence. To make the paintings edge-defined and high-quality, I developed a two-stage machine learning model based on a framework known as the Generative Adversarial Network (GAN). The first stage of the GAN generated sketches of the landscape painting, and the second stage painted within those sketches to produce the fake painting. In a survey of 262 people, the paintings generated by my model were mistaken as human art 55% of the time. This research is significant because it introduces a model so “intelligent” that it produces paintings good enough to pass as human-created.

Abstract: The Generative Adversarial Network (GAN) is a machine learning model that has introduced the possibility of artificial intelligence-created art. However, direct generation methods fail to create convincing artworks that are realistic and structurally well-defined. Here we present a GAN variant, CompositionGAN (CGAN), which originates edge-defined, artistically-structured paintings without a dependence on supervised style transfer. CGAN is composed of two stages, edge generation and edge-to-painting translation, and is trained on a new dataset of traditional Chinese landscape paintings never before used for generative research. A 242-person human Visual Turing Test study reveals that CGAN paintings are mistaken as human artwork over 55% of the time, significantly outperforming those of a baseline GAN model. Our work highlights the importance of artistic composition in art generation and takes an exciting step toward computational originality.

Previous Winners: