Computational Approaches to Nigerian Literature

Experiments in NLP for texts in Yoruba and Efik

AI/ML
African and African Diaspora Studies
Computer Science
Digital Research Infrastructures
Multilingual
Natural Language Processing
Nigerian pattern (Adobe Stock)

Despite growing interest in digital humanities approaches to African literary studies, the lack of robust linguistic resources and tools for African languages continues to be a significant barrier for computational research, particularly for methods that harness the latest AI technologies. Thus, while digital humanities analyses of literature in languages such as English, German, French, and Spanish continue to evolve, the corpus of computationally-enabled scholarship for the study of texts in African languages remains almost nonexistent.

This project seeks to bring African literature into the computational literary studies world by focusing on texts in Yoruba and Efik, two of the languages spoken in Nigeria. We aim to create, leverage and modify existing data, workflows and models so that Nigerian literature can be understood at scale.

The project builds on work started during the New Languages for NLP: Building Linguistic Diversity for the Digital Humanities project (2020-2021). In this phase, we will:

1) test the accuracy of multilingual pre-trained language models (massively multi-lingual and Africa-centric) on Yoruba and Efik literary texts;

2) ehnance our corpus by annotating a set of five novels by Daniel O. Fagunwa, the pioneer of Yoruba novels, published between 1938 and 1961: Ògbójú ọdẹ nínú igbó Irúnmọlẹ̀, igbó Olódùmarè, ìrèké oníbùdó, Ìrìnkerìndó nínú igbó elégbèje and àdììtú Olódùmarè.

3) test tools that exist for computational literary studies, such as BookNLP, on our corpus, and compare results when run on the Yoruba original and English translations of Fagunwa’s works.

Related projects

African_UD: Universal Dependencies Treebank for African Languages

Increasing the representation of African languages in NLP by creating quality datasets for eleven African languages

African pattern green circles (Adobe)

Related research group

Infrastructure for African Languages

Increasing representation of African languages in NLP, LLMs, and AI

Infrastructure for African Languages

Related events

African Languages in the Age of AI (AAA) Speaker Series

Bringing leading scholars to Princeton to discuss the opportunities and challenges for developing technologies that empower African languages

CDH-ALS-web2

Team

Project Director

Utitofon Inyang
Temitayo Olatoye

Project Advisor

Grants

2023–

Staff Project

2020–2021

NEH Institutes for Advanced Topics in the Digital Humanities