Workshop 1 of the New Languages for NLP Institute focus on building fundamental skills for annotation and data collection. We will start with a seminar-style discussion to help participants situate their work at the intersections of NLP and humanities research.
The following sessions will focus on specific tools and skills needed to create a richly-annotated linguistic corpus, including the annotation tools Cadet and INCEpTION, as well as the spaCy NLP library.
At the end of the first workshop, participants will have the skills and resources needed to begin the process of annotating their texts and using those annotations to create valid training data.
Workshop sessions are limited to Institute participants only. However, we plan to publish all training material on DARIAH-Campus in late 2022.
"New Languages for NLP: Building Linguistic Diversity in the Digital Humanities" is funded by a National Endowment for Humanities Institute for Advanced Topics in the Digital Humanities grant. Held at the Center for Digital Humanities at Princeton, this Institute is a collaboration with Haverford College, the Library of Congress Labs, and DARIAH, the European Digital Research Infrastructure for the Arts and Humanities.