Computational Approaches to Nigerian Literature
Experiments in NLP for texts in Yoruba and Efik
Despite growing interest in digital humanities approaches to African literary studies, the lack of robust linguistic resources and tools for African languages continues to be a significant barrier for computational research, particularly for methods that harness the latest AI technologies. Thus, while digital humanities analyses of literature in languages such as English, German, French, and Spanish continue to evolve, the corpus of computationally-enabled scholarship for the study of texts in African languages remains almost nonexistent.
This project seeks to bring African literature into the computational literary studies world by focusing on texts in Yoruba and Efik, two of the languages spoken in Nigeria. We aim to create, leverage and modify existing data, workflows and models so that Nigerian literature can be understood at scale.
The project builds on work started during the New Languages for NLP: Building Linguistic Diversity for the Digital Humanities project (2020-2021). In this phase, we will:
1) test the accuracy of multilingual pre-trained language models (massively multi-lingual and Africa-centric) on Yoruba and Efik literary texts;
2) ehnance our corpus by annotating a set of five novels by Daniel O. Fagunwa, the pioneer of Yoruba novels, published between 1938 and 1961: Ògbójú ọdẹ nínú igbó Irúnmọlẹ̀, igbó Olódùmarè, ìrèké oníbùdó, Ìrìnkerìndó nínú igbó elégbèje and àdììtú Olódùmarè.
3) test tools that exist for computational literary studies, such as BookNLP, on our corpus, and compare results when run on the Yoruba original and English translations of Fagunwa’s works.
Related projects
African_UD: Universal Dependencies Treebank for African Languages
Increasing the representation of African languages in NLP by creating quality datasets for eleven African languages
Related research group
Infrastructure for African Languages
Increasing representation of African languages in NLP, LLMs, and AI
Related events
African Languages in the Age of AI (AAA) Speaker Series
Bringing leading scholars to Princeton to discuss the opportunities and challenges for developing technologies that empower African languages
Team
Project Director
Project Advisor
Grants
2023–
Staff Project
2020–2021
NEH Institutes for Advanced Topics in the Digital Humanities