Tracking the Sounds of Early Chinese Texts

Excerpt from Shuowen guyin pu 説文古音譜 by Liu Ze 劉賾
a brief selection of the Shuowen guyin pu 説文古音譜 by Liu Ze 劉賾, published in 1963. This work also deals with Old Chinese phonology, based on a system by Huang Kan 黃侃.

How did puns and clichés function in ancient societies? The Digital Intertextual Resonances in Early Chinese Texts (DIRECT) Project helps scholars of ancient China ask these kinds of questions and more, including: what did ancient Chinese sound like? How do these phonetic qualities highlight the ways in which similar sounds in ancient Chinese texts obfuscate their original meanings?

One of the main challenges of working with classical Chinese texts, according to DIRECT’s website, is that “while the sounds of the Chinese language changed over time, the characters did not change.” As a result, when modern scholars read ancient texts, they might not recognize connections that would have been apparent to ancient readers. That’s where DIRECT—an open-source, Python-based resource created by Princeton East Asian Studies PhD students Gian Rominger and John O’Leary—comes in.

Gian Rominger started the project as a seminar paper that explored how intertextual links could be conceived within a collection of classical Chinese poetry (the Chuci 楚辭). Because of the challenges facing modern scholars, these similarities can be hard to spot. For the DIRECT Project, Rominger wanted to focus on how written Chinese and Chinese characters, although seemingly unchanged, had different semantics, grammar, and pronunciation, comparable to the relation between Latin and Italian.

According to Rominger, the two main goals of the DIRECT Project are to 1) compare two sets of text by means of phonetic similarity, not characters, and 2) find a way to identify sound-based text-structuring devices with the same toolkit. In order to achieve these goals, Rominger built a database of Chinese characters used in ancient texts, parsed out by their reconstructed phonetic properties. “For that, I upheld the basic rules of historical phonology,” he added.

Rominger initially estimated that he would be working with 6,000 to 7,000 characters used in early texts, entering them into a spreadsheet by hand. “Yet by now I'm sitting on roughly 13,000, and still adding.” To parse all of this information, Rominger teamed up with John O’Leary, a PhD student in East Asian Studies, and Nick Budak, a developer at the CDH. Budak has been using his skills in computer programming to reveal and digitally visualize how parts of ancient texts are reused throughout history more often than widely thought. As a scholar of Asian Studies in his undergraduate years, Budak was able to use his knowledge of Chinese language alongside his coding experience. Yet Budak also encountered challenges, including the fact that the primary sources contain overlapping characters across a large number of disorganized texts.

For Budak, one of the most exciting moments of the project came when Rominger ran a comparison between two texts at a conference at the University of Pennsylvania and revealed that there were two strings of characters that reused the same sound, even though the two strings have different meanings. As a result, Budak says, “Now the analyses of scholars who have previously engaged with these texts might have to be reevaluated because of this discovery.” These kinds of discoveries are only at the cusp of the project’s potential.

The DIRECT team will present at conferences in 2020 and hopes to develop software that allows users to search for any text and see how these texts get visualized across all Chinese languages. Eventually, Budak hopes to apply the project’s tools to texts in other languages, such as old Tibetan. Rominger adds, “I hope at some point we can make this project more accessible for people to use and contribute. I’d also like to see the same tools applied to medieval Chinese, Chinese dialects, and other writing systems (e.g. hieroglyphics).” Expanding the languages involved in this project will allow scholars and linguists to practice different modes of interdisciplinarity, exploring the relation between textual studies, linguistics, phonetics, Asian studies, and computer programming.


