CDH Postdocs Among PLI Seed Grant Awardees

9 March 2024

Happy Buzaaba, Wouter Haverals, and Christine Roughan will work with Princeton faculty on projects ranging from literary “style” in the age of LLMs to HTR workflows.

5d3_0764_neh_shelly.original.jpg

A few weeks ago, Princeton Language and Intelligence (PLI) announced that fourteen research projects across the Princeton University campus would receive funding from its first round of seed grants. Among them: three collaborations with our CDH-affiliated postdocs! We checked in with Happy Buzaaba, Wouter Haverals, and Christine Roughan to learn more about these projects and their collaborations with faculty throughout the University.

IMG_3710

Happy Buzaaba (CDH / African Language Technologies Postdoctoral Research Associate) will work with Christiane Fellbaum (Linguistics / Computer Science) on Infrastructure for African Languages: Culturally Diverse and Theoretically Sound Benchmarks for Automatic Language Processing. The goal of the project is to create high-quality datasets so that users of African languages can better engage with Large Language Models (LLMs) in their research. The collaborators are currently working with eleven African languages, including Igbo, Wolof, Kiswahili, and IsiXhosa.

“The underrepresentation of African languages in NLP and LLMs has mainly been the lack of high quality human annotated datasets,” Happy explains. “This project aims to not only increase the representation of African languages in NLP and LLMs research but also the values of African languages and culture.”
profile_pic_WouterHaverals_large.e73314d6.fill-790x632

Also working on LLMs will be Wouter Haverals (Perkins Fellow in the Humanities Council) and CDH Faculty Director Meredith Martin (English). Their project, AI for Humanists / Humanities for AI: Parameterizing Style: Dataset, Workshop, Notebook, Paper, will bring together scholars from computer science and the humanities to engage in a dialogue about literary style. If successful, the project will nuance both technical and literary assumptions about successful creative writing for writers and editors. Wouter sees connections between the project and his past research on “the writings of medieval monks who, due to their close collaboration, naturally impacted each other's spelling, choice of words, and phrasing.” Adds Wouter: “There exist fascinating similarities in the way creative writing processes cross-fertilize and delineate between 'unique' and 'shared' stylistic features.”

“In an era where LLMs can potentially flood the market with computer-written books—simultaneously raising copyright violation issues—it's crucial to investigate the notion of 'literary style' more deeply,” Wouter notes.
roughan_headshot

Meanwhile, Christine Roughan (CDH / MARBAS Postdoctoral Research Associate) is working with Helmut Reimitz (History) and Marina Rustow (History / Near Eastern Studies) on the Princeton Open HTR Initiative: Creating Infrastructure for Modeling Historical Texts. In collaboration with Princeton Research Computing, the team will create infrastructure and workflows specifically for Princeton researchers who want to use handwritten text analysis (HTR) in their research with unpublished manuscripts. For example, Christine notes that she is currently “training models to facilitate research with medieval Arabic scientific manuscripts, many of which have not been published before.”

“An increasing number of scholars are eager to leverage HTR because it opens up handwritten texts to everything from text search to computational analyses at scale,” Christine explains. “In addressing this need, this initiative will be a foundation supporting research projects and contributions across the Princeton community.”