New Languages for NLP: Building Linguistic Diversity in the Digital Humanities

An NEH-funded workshop to help scholars use NLP for new languages

Natural Language Processing (NLP) has revolutionized our ability to analyze texts at scale. However, of the world's more than 7,500 languages, the major NLP resources only support eighty-five. This means that text mining, topic modeling and other methods of computational text analysis are unavailable for the vast majority of languages — especially those that are minority, regional or endangered. The proliferation of data and tools in several dominant languages will hinder research and perpetuate the existing structural inequalities on both local and global scales.

“New Languages for NLP: Building Linguistic Diversity in the Digital Humanities,” is a workshop series, funded by a National Endowment for Humanities Institute for Advanced Topics in the Digital Humanities grant.  

Between June 2021 and May 2022, Institute participants will be taught to annotate linguistic data and train statistical language models using cutting-edge NLP tools. They will learn best practices in project and research data management. They will join discussions with leaders in the fields of multilingual NLP and DH. They will advance their own research projects by creating, employing and interrogating text-analysis tools and methods, while increasing much-needed linguistic diversity in the field of NLP.

Hosted by the CDH, this Institute is a collaboration with Haverford College, the Library of Congress Labs, and DARIAH, the European Digital Research Infrastructure for the Arts and Humanities. 

Visit our website for more information. 

NEH seal

 Any views, findings, conclusions, or recommendations expressed on this page do not necessarily represent those of the National Endowment for the Humanities.

CDH Grant History

  • 2020–2022 NEH Institutes for Advanced Topics in the Digital Humanities