The Center for Digital Humanities (CDH) recently hosted an all-day Hackathon where developers across campus joined together in teams to explore library collection data with a digital humanities lens by applying computationally driven methods to analyze the archives in creative ways. Participants included more than 20 developers from CDH, the Princeton University Library (PUL) and the Princeton Institute for Computational Science and Engineering/Research Computing (PICSciE/RC).
This Hackathon was the first hands-on collaboration connecting the CDH, PUL and PICSciE/RC , and it marked the first time many of the participants utilized a high-performance computing cluster allowing for much faster processing of large-scale data. “It was a unique opportunity to bring together developers that support computational research at Princeton in a wide variety of ways,” commented Ben Hicks, a developer with both CDH and Princeton Research Computing.
It also reinforced the ongoing partnership between the Library and the CDH as it built upon previous discussions that emerged from the “Collections as Data” Reading Groups -- regular meetings among PUL and CDH staff held at least once a month where they examine Princeton’s library collections through short readings, discussions, presentations and hands-on activities and coordinate efforts to sustain data-driven scholarship at Princeton.
The specific library collection used for the Hackathon, PUL’s Latin American Ephemera (LAE) collection, is comprised of a growing repository containing approximately 12.2k published items (pamphlets, posters, etc.) highlighting politics, public culture and social change in Latin America. Materials include primary sources originally created around the turn of the 20th century, as well as newly acquired materials. Fernando Acosta-Rodríguez, Librarian for Latin American, Iberian and Latino Studies, has been overseeing the LAE collection for the past 16 years and introduced the collection to Hackathon participants.
Initially, CDH and Library staff led demonstrations to familiarize everyone with an extensive data set that included metadata, images, and Optical Character Recognition (OCR) text. To minimize technical frustrations that inevitably arose from working with new tools, an element of fun and playfulness was encouraged, as groups formed based on how each programmer wanted to experiment with the data.
About the experience, Nikitas Tampakis, PUL Discovery Infrastructure Developer, noted: “It was a fun, low-stakes way to experiment with new tools and learn something new.” Many of the groups were named after the groups’ intended experiments, such as the “Lost in Space” group who worked with spatial data or the “Topic Thunder” group, who experimented with text analysis including topic models.
By the end of the day, Hackathon team members shared their work with each other in several presentations, including topic modeling and sentiment analysis examining different sides of contested issues (e.g., reproductive rights) and a network analysis of items based on their source and subject locations, examining which countries and regions are publishing work about each other. One participant, using results from Google Vision API, demonstrated a prototype that made it possible to browse materials by facial expressions.
Vineet Bansal, a research software engineer at Princeton Research Computing, summed up the spirit of the day-long collaborative effort: “Programmers can always get stellar (projects) done on their own. It’s working with others and asking for help that we need to get better at.”
In addition to the Hackathon, upcoming “Year of Data” events presented by the CDH, including a two-part “Playing with Data” series Feb. 21 and Feb. 28, offer terrific opportunities to collaborate and explore new ways of thinking about and working with humanities data.