Princeton Open HTR Initiative
Establishing research infrastructures to support Princeton use of HTR for manuscripts and archival documents in a variety of languages and scripts
While digitization efforts of recent decades have revolutionized access to historical texts in libraries, archives and cultural heritage institutions, handwritten text recognition (HTR) is making these digitized materials machine readable, opening them up to text search and computational analysis at scale. An increasing number of humanities scholars – including at Princeton – are eager to integrate HTR into their research workflows. But barriers to entry can be significant, particularly in technical expertise and cost.
The Princeton Open HTR Initiative: Creating Infrastructure for Modeling Historical Texts is developing a local infrastructure and workflows for HTR in conversation with the team behind eScriptorium, the open-source leader for HTR technology for humanistic scholarship. Currently, there is a technical barrier to entry for researchers to get started with eScriptorium (especially for team use cases), as well as lack of direct integration with high-performance computing clusters. This project will address these technical limitations in order to create a local environment for scholarly use, and support a community of users at Princeton.
Related projects
Bringing HTR to the HPC
Customizing the eScriptorium HTR software for use on Princeton high performance computing hardware
Related posts
CDH Postdocs Among PLI Seed Grant Awardees
9 March 2024
Happy Buzaaba, Wouter Haverals, and Christine Roughan will work with Princeton faculty on projects ranging from literary “style” in the age of LLMs to HTR workflows.
Related research groups
Text Technologies for Manuscript Cultures
Using emerging technologies to transform research, teaching and understanding of pre-modern evidence
Grants
2024–2025
Princeton Language + Intelligence (PLI) Seed Grant