“New Languages for NLP” Scholars Will Bring Global Perspectives to Text Analysis
We are excited to announce the ten language teams selected to participate in The New Languages for NLP: Building Linguistic Diversity in the Digital Humanities series of workshops, held at the Center for Digital Humanities at Princeton and funded by the National Endowment for Humanities.
The language teams were selected from a large field of over eighty-five applications, and chosen for the potential impact of their projects for current speakers as well as scholars studying historical languages. Many of the participants are graduate students or early career scholars.
Participants come from a variety of academic disciplines ranging from comparative literature, to history, to computational linguistics, to socio-linguistics, to language and literature studies ( learn more about the participants on the languages page ).
Starting in June 2021, the nineteen participants will create linguistic data and trained language models for the following world languages:
- Classical Arabic (ٱلْعَرَبِيَّةُ ٱلْفُصْحَىٰ)
- Classical Chinese (文言文, funded by the CDH)
- Kanbun (寛文)
- Ottoman Turkish (لسان عثمانى)
- Quechua (Qheswa simi)
- Dostoevsky's Russian (funded by the Canadian Social Sciences and Humanities Research Council)
- Tigrinya (ትግርኛ)
- Yiddish (ייִדיש)
- Yoruba (Èdè Yorùbá)
Cohort members will work over the course of a year, and will meet for three intensive workshops to learn cutting-edge natural language processing (NLP) tools as well as best practices in project and data management. They will advance their own research by creating, employing and interrogating text-analysis tools and methods, while increasing much-needed linguistic diversity in the field of NLP.
Held at the Center for Digital Humanities at Princeton, this Institute is a collaboration with Haverford College, the Library of Congress Labs, and DARIAH, the European Digital Research Infrastructure for the Arts and Humanities.
Further information can be found on the project website.