From Logbook to Spreadsheet to . . . ?
5 December 2023
Princeton senior Brett Zeligson’s new system for automating data extraction promises to open up new pathways for research on a famous bookshop and lending library.
What can business records tell us about the cultural world of interwar Paris?
Launched in 2020, the CDH-built Shakespeare and Company Project approaches this question via the celebrated Shakespeare and Company lending library and bookshop, whose members included James Joyce, Gertrude Stein, Walter Benjamin, and other writers and intellectuals.
The Project draws from the three kinds of sources from the papers of Shakespeare and Company’s founder Sylvia Beach, which are held by Special Collections at Princeton’s Firestone Library: 1) lending library cards; 2) address books; and 3) logbooks. These sources show what the lending library’s members read and where they lived and help researchers understand interwar Paris's intellectual community.
Much of the information from Beach’s logbooks—which include data such as dates of membership purchases, renewals, and reimbursements—form an important part of the Project's web application. Until recently, however, some of Beach and her employees’ other notations have remained understudied.
Princeton senior Brett Zeligson hopes to change that.
Brett, who is earning a BSE in Computer Science, is working on a project to “automat[e] data extraction and tabulation from the project’s database” of logbook images to facilitate work on logbook notations.
“The first component of this project involved training, testing, and tuning several Transkribus optical character recognition models for text extraction from the logbook pages,” Brett explained. “The second project component involved building a system to ingest images of the logbook pages and the corresponding transcription files in XML format and output a spreadsheet with all the transcriptions organized neatly in columns for price, book title, etc.”
Now, Brett is busy “tuning the text extraction model further to improve the accuracy of transcriptions.”
“My favorite part of working on this project is knowing that the system I’ve built has the ability to unlock so much data that hasn’t been explored yet. It’s super exciting to me that this project may in the future enable faculty to use extracted data to make ground-breaking discoveries about the exchange of knowledge in twentieth-century Paris.”
Brett's involvement in the Project emerged from a conversation with CDH Lead Research Software Engineer Rebecca Sutton Koeser, the Project's technical lead (Joshua Kotin, associate professor of English, is project director).
As Brett explains, Koeser suggested that the logbooks project would allow him to explore his “interests in history and urban studies,” in which Brett is earning a certificate. Of course, his work also draws on his training in computer science.
“My work with the Shakespeare and Company Project is directly related to . . . my Department of Computer Science coursework,” Brett noted. “I learned many of the computer vision and natural language processing concepts which I utilized throughout this project from taking the Computer Vision and Natural Language Processing courses.”
Following graduation this spring, Brett hopes to continue his work at the intersection of computer science and the humanities.
“I am strongly considering graduate study and in doing so am planning on working with the corresponding center for digital humanities at the university I attend.”
We’re excited to see what happens next!