Senior Thesis Prize Winners

“Defending Our Freedom: The U.S. Military, Environmental Contamination, and Ongoing Native Land Theft in the Choctaw Nation”

My thesis looked at the impact of the McAlester Army Ammunition Plant (McAAP) in McAlester, Oklahoma on the Choctaw Nation on the environment and public health. I used methods from anthropology, history, environmental studies, geosciences, digital humanities, and other fields to uncover the relationship between the facility and the lived experiences of McAlester residents.

Abstract: Relatively little is known about environmental contamination on American Indian reservations in the United States. Yet the problem is widespread in Indian Country. I used ArcGIS Online to uncover 1,250 Superfund sites – sites with uncontrolled hazardous waste – on or within five miles of 302 Tribal Nations. I then investigated the environmental health of a town on the reservation of my Tribe, the Choctaw Nation, where the U.S. military decommissions old bombs through daily detonations. By testing surface and tap water, and by installing and monitoring air sensors, I evaluated and documented contamination of Choctaw water, land, and air. At the same time, I used anthropological field research and interviews to explore Choctaw experiences of this contamination and its adverse health effects. I argue that these environmental assaults on the Choctaw Nation are an expression of ongoing Native Land theft, aided by the politicization of environmental data and inadequate regulations.

"The Old Bailey, U.S. Reports, and OCR: Benchmarking AWS, Azure, and GCP on 360,000 Page Images"

One hundred eighty thousand images can be challenging to grasp. Looking at every single image in the Old Bailey for just one second each would take over two days and two hours. If a human worked at this for only eight hours a day, it would take a whole week. Therefore, I created a web-based “explorer” to enable rapid investigation and random access of the whole dataset by allowing humans to search for anomalies using dynamically loaded metrics and to build an intuition for the contents of a dataset without attempting to look at every page.

Abstract: The goal of my thesis was to benchmark three leading Optical Character Recognition (OCR) cloud services from Amazon, Microsoft, and Google on over 360,000 page images of legal documents in the Old Bailey in London (1674-1913) and the U.S. Supreme Court and predecessor courts (1754-1915). Over the course of my project, I ran over a million cloud OCR calls on all three services combined, which was possible through generous funding from Princeton. Error rates were calculated by comparing OCR results for the Old Bailey against human transcriptions created for each page by the Old Bailey Online Project. The Supreme Court records, which do not have human transcriptions, serve only as a relative measure of similarity between services. Ultimately, I found that Amazon’s Textract service had the lowest error rate on the Old Bailey dataset, followed by Google’s Vision, and finally Microsoft’s Cognitive Services OCR. This work also led to the creation of the open-source tigerocr tool, which enables reproducible benchmarking of cloud services by handling their different interfaces and presents a unified file format for results that includes coordinates for each text block, line, and word.

“Can a Machine Originate Art? Creating Traditional Chinese Landscape Paintings Using Artificial Intelligence”

The goal of my thesis was to generate traditional Chinese landscape paintings using artificial intelligence. To make the paintings edge-defined and high-quality, I developed a two-stage machine learning model based on a framework known as the Generative Adversarial Network (GAN). The first stage of the GAN generated sketches of the landscape painting, and the second stage painted within those sketches to produce the fake painting. In a survey of 262 people, the paintings generated by my model were mistaken as human art 55% of the time. This research is significant because it introduces a model so “intelligent” that it produces paintings good enough to pass as human-created.

Abstract: The Generative Adversarial Network (GAN) is a machine learning model that has introduced the possibility of artificial intelligence-created art. However, direct generation methods fail to create convincing artworks that are realistic and structurally well-defined. Here we present a GAN variant, CompositionGAN (CGAN), which originates edge-defined, artistically-structured paintings without a dependence on supervised style transfer. CGAN is composed of two stages, edge generation and edge-to-painting translation, and is trained on a new dataset of traditional Chinese landscape paintings never before used for generative research. A 242-person human Visual Turing Test study reveals that CGAN paintings are mistaken as human artwork over 55% of the time, significantly outperforming those of a baseline GAN model. Our work highlights the importance of artistic composition in art generation and takes an exciting step toward computational originality.