Old Documents, New Insights

9 January 2024

Catching up with the Princeton Geniza Project team

Screen Shot 2024-01-06 at 6.52.41 PM

In summer 2020, the Center for Digital Humanities teamed up with the Princeton Geniza Lab to create a new and improved Princeton Geniza Project (PGP) database and web application offering researchers access to tens of thousands of medieval documents found in the geniza chamber of the Ben Ezra Synagogue in Cairo.

The team—led by Marina Rustow, Khedouri A. Zilkha Professor of Jewish Civilization in the Near East and director of the Princeton Geniza Lab, and Rebecca Sutton Koeser, lead research software engineer at the CDH—launched a redesigned public site in early 2022.

We checked in with the PGP collaborators to find out more about recent additions to the PGP and about the research the new database has made possible.

Our takeaways: not only is the PGP becoming more user-friendly and accessible to scholars around the world, but it is also making a difference for researchers on the ground.

New Features

  • In summer 2023, the team built upon PGP version 4’s transcription feature by introducing side-by-side transcriptions and translations. The team can now load new and existing translations at any time; five hundred are already in place. That means that researchers can see a transcription of the text in the original Judaeo-Arabic, Hebrew, Aramaic, and Arabic alongside an English translation. PGP Project Manager Ksenia Ryzhova (NES) reports that Hebrew translations are in the works as well.
pgp_transcriptions_translations

Number of transcriptions and translations added to PGPv4 over time

  • Speaking of languages: the PGP recently launched an Arabic version
    of the site featuring translations by Princeton graduate student Fatima Zaraket (NES). As a result, the PGP site is now available in three languages: English, Hebrew, and Arabic.

Research Impact

  • PGP in the real world! Senior Research Assistant Dr. Alan Elbaum published an article in the January 2023 issue of Speculum entitled “‘The Fire in My Heart and the Pain in My Eyes’: Interdependence and Outburst in the Illness Letters of the Cairo Geniza.” Elbaum added descriptions of the documents to the PGP site while working on the article.
  • “The PGP appeared in force” at the recent Middle East Studies Association annual meeting, reports Ryzhova, who presented at the conference. Two separate panels featured PGP team members past and present, most of them sharing work that drew from the PGP. “Using the Cairo Geniza in scholarship is just so much easier now that we have the PGP because all the up-to-date scholarship is mostly in one place, alongside images, transcriptions and translations,” she said.
  • At the Association for Jewish Studies conference, former PGP Project Manager Rachel Richman (NES) presented on a panel on “Popularizing Primary Sources: Virtual Libraries, Archives, and Public Access to Jewish Studies Sources.” Richman was also awarded a CDH Data Fellowship last year to support a research project involving PGP resources.
  • Rebecca Koeser reports that she “started working on a python library named ‘undate’” during a DHTech hackathon last year. Explains Koeser: “PGP has much more complicated and ambiguous dates than previous projects . . . . I hope to eventually incorporate the logic for handling different calendars and ambiguous dates that we developed as part of working on PGPv4.”
  • Koeser also reminded us that the transcriptions feature itself has research impact—now, the team can more easily document their transcriptions, and other scholars can use the transcriptions in their work.

What’s Next?

  • The team’s not done yet! The Handwritten Text Recognition group at the Princeton Geniza Lab, which oversees the PGP, has been training an eScriptorium model using published geniza documents and testing them on documents without transcriptions. Explains Ryzhova: “We are currently in the process of manually checking these new automatic transcriptions, but so far the results are really promising! Eventually, after we finish reviewing this batch (and re-training the model with our corrections), we will ingest them into the PGP and exponentially increase the number of documents with transcriptions.”
  • The PGP team is also tracking people and places mentioned in the documents in hopes of assembling enough data to produce network graphs.
  • Koeser is “looking forward to publishing datasets and a transcription corpus” as well as “seeing the content used more and more for research, including different scale and scope of analysis.” She also mentioned that PGP will have a new RSE who “will be working on automated transliteration from Judaeo-Arabic to Arabic.”

So excited to keep following the PGP in 2024!

Carousel Image: The PGP now provides researchers with side-by-side transcriptions and translations of hundreds of documents, such as this eleventh-century legal document.