Paper Journals, Digital Models: New Frontiers in Slavic DH

17 December 2020

Like scholars across all of academia, Slavic digital humanists have been forced to transition into a virtual space over the past few months. Perhaps appropriately, much of our work has been devoted to digitizing a slate of Russian periodicals.
default_copy_cropped.jpg

[Editor’s note: This post is part of our series on multilingual DH at the CDH. The series kicked off with our announcement that the NEH had awarded funding to“New Languages for NLP: Building Linguistic Diversity in the Digital Humanities,” which will be hosted by the CDH beginning in spring 2021.]

Like scholars across all of academia, Slavic digital humanists have been forced to transition into a virtual space over the past few months. Instead of in-person meetings, the DH world has regrouped in the digital world, transforming periodic meetings into Zoom get-togethers. Perhaps appropriately, then, much of our work has been devoted to virtualizing another episodic phenomenon: the digitalization of a slate of Russian periodicals.

My name is Alexander Jacobson, and I am a fourth year in the Department of Slavic Languages and Literatures. Over the past few months, I have been the graduate student liaison to Pages of Early Soviet Performance, a project of the Princeton Slavic Digital Humanities Working Group. Comprising a group of scholars, librarians, and coders from two continents, the group has spent copious amounts of effort in analyzing a corpus of both pre- and post-revolutionary Russian journals, a database sourced from Princeton University Library’s rich collections. Fundamentally, we have aimed to productivize this database, transforming it from a collection of images into a dataset ripe for intellectual exploration.

I was brought into the project due to my dissertation, where I attempt to decode the importance of materiality for books and other forms of textual publications. I argue for the importance of physical forms in textual consumption, showing that features such as imprints, paper, and bibliographic dimensions (quarto, folio, etc.) skewed the interpretation of Russian texts writ large. Accordingly, the hope was that I could advocate for this point, providing a voice that would foreground the original materiality of our dataset. Further, I had previously worked on the publishing history of Russian periodicals of this era; due to my familiarity with this frankly esoteric field, the group, too, hoped that my expertise could potentially be of use.

When I finally joined the working group, though, I quickly realized that I would need to acclimate to the dynamics inherent to a large-scale, multinational DH project. Coming from the traditional humanistic world, where individual research questions undergird all large projects (like dissertations), I was shocked by the actual direction of our project. Perhaps foolishly, I had assumed that we would begin with a set of potential research questions, hypotheses about journals or journal production, and subsequently develop tools that would render our conclusions self-evident.

Instead, the group endeavored toward a vastly different goal. Instead of plying after individual hypotheses, we aimed to create a sort of base for future scholarship, a corpus of material – and, perhaps more importantly, a toolkit – that could be easily employed by future scholars.

Specifically, we decided to divide journals into three constituent parts: text, images, and what we termed “intertext,” paratextual features (like advertisements, imprints, cover pages, etc.) that employed text, graphic design, and position within the journal to impart meaning. We aimed to record both the textual content of these features and their location within the journal, which we hoped would allow future scholars to analyze both these journals’ text and their physical layout.

In a sense, this labor resembled the traditional task of the academic librarian. In subdividing journals into constituent parts and recording both the location and content of these features, we functionally created a new set of metadata, information that offered a deep window into the character and nature of these periodicals. Too, like librarians, we aimed to create a scaffolding for future scholarship; our work was not to investigate these journals, but to undergird future investigations.

The metadata-forward nature of our work was a reflection of my first big surprise relating to this project – the idea that teamwork is not an incidental feature of digital humanities, but rather deeply bound to its output. On a project of this scale, where no one scholar possesses the expertise – or the free time – to analyze these journals alone, teamwork is an unavoidable necessity. When teams are assembled, though, the diversity of their members’ interests creates a whole different from the sum of its parts. Rather than engaging in directed, strictly defined scholarship, teams work towards a sort of common medium, allowing each individual to subsequently pursue their own research questions. In our context, such an approach led to a metadata-centric project, which constituted the creation of data potentially useful for a variety of projects.

This orientation, where scholars work towards knowledge production outside of their particular wheelhouses, is a structural consequence of teamwork; pursuing any single avenue would be necessarily unfair to other teammates. However, this should not be thought of as a drawback; instead, it is a deep and exciting boon. This teamwork forces humanists into an unfamiliar, but productive type of work; in a sense, it necessitates that we broaden our work patterns and acclimate to a collective form of academic production. In a field predicated upon the lone library-bound scholar, this is unfamiliar, but it complements our traditional style of work. Such labor eventually enables further research projects, investigations which employ DH-generated datasets to produce otherwise inaccessible knowledge.

Further, working alongside a group of scholars – including highly proficient coders – led to my second surprise related to this project: as someone definitionally working with analogue objects, I had failed to appreciate the expertise and accuracy of contemporary computational tools. Rather than subdividing journals into text, image, and intertext ourselves, we trained computer models to analyze our corpus for us. Each member of our team was asked to categorize roughly fifty pages of material, which was subsequently used as a nucleus for training a computer model to duplicate our labor. Happily, our model took to our data and displayed a tremendous degree of accuracy, quickly providing us with highly productive results.

Given that our dataset comprised thousands of pages, a scale infeasible for purely human analysis, I have been shocked by the efficacy and utility of this model. I was particularly astounded by the ease in setting it up – as I recall, between the manual categorization referred to above and the coding of the model, we obtained our first results after approximately twenty hours of labor.

This ease, too, is particularly incredible in light of the effort required via traditional metadata. Unlike traditional bibliographic descriptors, which were formulated through the expenditure of vast effort by thousands of cataloguers, our model developed a set of metadata with an extremely limited amount of human labor. Further, unlike traditional metadata, our investigations were far deeper; we aimed not for an impressionistic, but a wholescale description of the objects under our purview. This dynamic has astounded me and has opened my eyes to a transformative future, one where computational models both expand and facilitate bibliographical work on the part of libraries themselves.

So, in compendium, working on this project has been fantastically exciting for me, a graduate student reared in the humanities. Accustomed to concertedly working solo towards my own projects, digital humanities has forced me into a radically different academic paradigm, one where cooperation and compromise are the name of the game. Further, it has granted me a window into the cutting-edge world of computational possibilities, showing the amazing potential of easily developed computer models. Together, this has been a tremendous experience, and I’m phenomenally excited to continue my work with this group moving forward.

Carousel image: Cover of issue 3 of the periodical 30 Days [ Princeton Slavic Collections ]