What can algorithms do for the humanities? How can technology befriend Russian literature?
Story crossposted on the Slavic DH Working Group at Princeton website here.
On May 28, the “Slavic DH Workshop: Russian Literary Studies in the Digital Age,” sponsored by the Slavic DH Working Group at Princeton, showed that there are many ways to answer these questions. The presenters – Frank Fischer and Boris Orekhov, researchers at the Higher School of Economics Centre for Digital Humanities in Moscow – gave an interactive overview of some of the DH tools that could benefit a scholar of Russian literature. An audience of about twenty Slavists – from Princeton, as well as from Rutgers, Haverford College, Yale, Hunter College, Swarthmore College, and other area colleges and universities – gathered for a stimulating day of collaborative learning.
The presenters began by charting the history of Digital Humanities in Russia – from the modernist writer Andrei Bely’s predilection for quantitative methods of studying literature to Russia joining the European Association for Digital Humanities in 2017. In between we learned about the uniquely Russian approach to the computational methods in humanities in the 20th century: from philologist Boris Yarkho (1889-1942), who sought to bring scientific precision to his discipline; to the mathematician Andrei Kolmogorov’s (1903-1987) seminars; to computer scientist Andrei Ershov’s (1931-1988) corpus of Russian (1978-2006); to the scholarly “History and Computers” Association. The session then pivoted to DH initiatives that exist in Russia today. These include Russian National Corpus, digital libraries (such as Maksim Moshkov’s Library and Russian Virtual Library), and Prozhito – a unique project which digitizes diaries, both past and contemporary, and which is sustained by the dedication of volunteers. The Higher School of Economics alone co-sponsors the Moscow-Tartu Digital Humanities School and has recently started a new master’s program that trains digital humanists – all in addition to running the Centre for Digital Humanities.
The next session, “Programmable Corpora: A New Infrastructural Concept for Digital Literary Studies,” provided an insight into some of the technologies used by the researchers at the Centre. The term “programmable corpora” refers to the drama corpora hosted by the Drama Corpora Project, or DraCor. The platform, which is slated for expansion, allows users to create network graphs of data gathered from around the web of German, Russian, Greek, and Spanish plays, as well as plays by Shakespeare. The data makes it possible, for example, to visualize how characters engage with each other throughout a play; as Fischer pointed out, the graphs featured on DraCor incorporate even the characters left off dramatis personae lists. Fischer also showed how one could create one’s own graph: all that is required is the software RStudio and a metadata file that can be downloaded from DraCor. When it comes to using these graphs in teaching, sometimes a computer is not even necessary: Fischer introduced the audience to a card game with a self-explanatory title Brecht Beats Shakespeare! – A Card-Game Introduction to the Network Analysis of European Drama, which invites participants to compare graphs based on different plays.
Data visualization is key to another DH initiative – the monumental corpus of Leo Tolstoy’s collected works, introduced by Boris Orekhov in the next installment of the workshop. The fruit of a collaboration between Tolstoy Museum in Moscow and Higher School of Economics, the corpus, which comprises 90 downloadable volumes of Tolstoy’s writings, is equipped with a searchable index of proper names that can be encountered in Tolstoy’s texts. A search for a proper name yields not only the links to the volumes where it occurs, but also a graph that shows what other names are mentioned alongside it in Tolstoy’s corpus. This feature, as Orekhov pointed out, could be invaluable to researchers seeking to take the full measure of Tolstoy’s erudition. A word cloud showing the most common proper names is another useful feature of the corpus, especially given that each word is clickable. Although the index does not include fictional proper names, these can be found through another search tool, which is tied to the same 90-volume corpus and which can be used to search for any word, whether it is a proper name or not. The search can be customized using several parameters, including, helpfully, the possibility of limiting it only to letters, or only to Tolstoy’s fiction.
If humans need such signposts to navigate vast amounts of text, neural networks can simply plough through them. The last presentation, “Neural Network Poetry Meets Distant Reading: Analyzing Computer-Generated Echoes of Russian Literary History,” focused on what happens when a neural network analyzes a body of work of a particular poet (or poets) and then creates poetry of its own. As Boris Orekhov noted, computer-generated poetry in itself is not new, its history reaching back into 1940s; and yet, he pointedly outlined, there is a fundamental difference between the “creative efforts” of neural networks and the computer poetry that existed previously. Before the advent of neural networks, the machine would compose poetry in an arbitrary manner – either by jumbling phrases from the work of a particular writer or by stringing random words on a preprogrammed grammatical frame. Neural networks, on the other hand, dispense with arbitrariness: they identify the recurring features of the corpus they analyze. As we learned from the session, although networks do not fare particularly well when they are fed narrative poetry, they are vastly more successful when working with experimental, rule-bending texts – such as the poetry of N.M. Azarova. In this latter case, the neural network yielded poems strikingly similar to Azarova’s texts, down to their alliteration. This is why one of Orekhov’s suggestions for how to use such poems in teaching was to ask students to identify whose works “inspired” a neural network.
The conversation about the educational uses of neural networks was a fitting coda to the day. Indeed, the “Slavic DH Workshop” has shown that digital humanities not merely promise to diversify our teaching, learning, and research practices, but also deliver on that promise.
The slides of all four presentations are provided by Frank Fischer and are available at https://twitter.com/umblaetterer/status/1133360545523015681.