We are pleased to highlight the work of Nobline Yoo (Computer Science ’23), who was the recipient of the Independent Work Award at the Center for Statistics and Machine Learning’s (CSML) poster session in May 2022.
Nobline’s project, “Building a Tool for Chronicling America: Flexibility and Efficiency in Digital Humanities,” was the culminating project for COS 398, a computer science independent work seminar taught by Brian Kernighan, former acting faculty director of the Center for Digital Humanities. CDH Postdoctoral Research Associate Kavita Kulkarni served as a teaching assistant for the course.
Chronicling America is a newspaper database with American newspapers from 1777–1963, and run by the Library of Congress. Notably, Chronicling America featured prominently in a CDH panel in February featuring Benjamin Lee, who developed the Newspaper Navigator tool (watch the panel).
Nobline told us about her award-winning project and her experiences of being a computer science student working with digital humanities.
Why were you interested in taking COS 398?
I was drawn to the connection between computing and human studies. This was a space where I could explore human-computer interaction.
Tell me about your project, “Building a Tool for Chronicling America: Flexibility and Efficiency in Digital Humanities.”
Digital humanities is at the interface between humanistic studies and computational power. Hence, when building tools in this space, it is important to consider, firstly, the humanities scholar who desires an accessible and flexible data analysis tool, and secondly, the developer who would prefer efficiency in creating the tool. Broadly, in this project, we redefine “flexibility” and “efficiency” in the space of digital humanities tools: “flexibility” as flexibility of code adaptation via code-to-feature correspondence, and “efficiency” as efficiency of deployment via intentional choice of development/deployment platform. Concretely, we (1) create a toolkit that uses statistical methods (word density, lexical density dispersion plot, etc.) and machine learning (Word2Vec, Latent Dirichlet Allocation) to distill historical trends and anomalies from Chronicling America, a dataset of American newspapers ranging from the 1700s to 1900s, and (2) identify Google Colab as an “enabler platform” that enables flexibility and efficiency in the tool development process, as defined in our project. In redefining “flexibility” and “efficiency” and producing our own data analysis tool for Chronicling America, our work sets forth a new lens through which we can evaluate digital humanities tools and allows historians to unveil novel historical anomalies and trends that help teach us about ourselves and the potential future.
As an example, Figure 1 illustrates the density of the word “federalist” over time. From around 1790 to the early 1820s, the density generally increases. This is in line with the official dates of the Federalist Party’s existence. What is not expected is that the highest peak only happens after 1817, around 1840. Why might this be? This may indicate that later policies implemented earlier Federalist ideals.
Looking forward, I am hoping to work on publishing my paper.
How did you get the idea for your project?
I first learned about the Chronicling America dataset in COS 398. I thought this would be a good opportunity to use machine learning to help find new patterns in our history. I saw Chronicling America as a largely unexplored space where I could apply machine learning to uncover new patterns in American history. I like the interdisciplinary nature of having my research help us understand ourselves. As I went deeper into developing the project, I saw an opportunity to connect my project with the study of human-computer interaction (HCI) by looking into the factors that help make digital humanities tools easier to interact with, both from a developer and user point-of-view. I took the opportunity; I am interested in computer science research done in the context of the human user/developer.
What did you learn from this project?
I learned how to write a proposal, get funding, and write a paper. I also learned that I enjoy conducting research projects in HCI.
Tell me more about your poster for the CSML poster session.
The poster was a way to present my work to the broader audience. It was a challenge to condense my work into a poster, but I’m thankful that my audience understood my motivation for the project!
What is it like being a computer science engaged in a digital humanities project?
It’s exciting, because there’s so much to learn about how computing and the humanities can interact.
What advice would you give to other computer science majors interested in digital humanities projects?
Conducting research projects at the interface between computing and the humanities can force us to think outside the box, because it adds new constraints—we have to think about human impact. So, I think digital humanities is a very interesting space to study, because it can open up new ways of thinking.
Nobline’s Acknowledgments: I thank God for blessing me with such wonderful advisors who helped me succeed in this project. I'm thankful for Professor Brian Kernighan, who gave me extremely valuable advice on how to approach problems. I would also like to thank Professor Karthik Narasimhan and Professor Prateek Mittal for advice on natural language processing tools and bypassing CAPTCHA, respectively. Furthermore, I thank Professor Olga Russakovsky for advising me on how to conduct research. I would also like to thank Dr. Kavita Kulkarni for introducing me to the Chronicling America dataset, which was a turning point in my project timeline. I would also like to thank my colleagues in my IW seminar for feedback, support, and weekly encouragement.