Bridging Digital Humanities Internal And Open Source Software Projects Through Reusable Building Blocks

Tendiendo un puente entre los proyectos internos y los proyectos de código abierto en las humanidades digitales a través de los cimientos reutilizables

The following is content associated with a poster presented at Digital Humanities 2018. Full abstract below and also available on the conference abstracts site.

Why Write Reusable Code?

  • Do one thing and do it well for maximum reusability
  • Take advantage of synergy across projects with similar functionality
  • Test and maintain code more easily
  • Minimize or make explicit assumptions embedded in code that could impact research conclusions

¿Para qué se escribe el código reutilizable?

  • Para hacer una cosa y hacerlo bien para facilitar la reutilización
  • Para aprovechar de algunas oportunidades sinérgicas entre los proyectos con funcionalidad similar
  • Para probar y mantener el código más fácilmente
  • Para minimizar o hacer explícitas las suposiciones embebidas en el código que pueden afectar las conclusiones de las investigaciones
Focus new development on research-specific innovation.
Development Activity over one year for Derrida's Margins project codebase and Dependencies
Modules developed and released alongside primary project benefit other projects
Los módulos que se desarrollaron y se lanzaron con el proyecto principal benefician a otros proyectos
Development Activity over one year for Shakespeare and Company Project codebase and Dependencies
Projects and Dependencies Development Activity Over One Year
El desarrollo de los proyectos y dependencias a lo largo de un año

Emergent Packages

  • Identify common tasks that repeat across projects.
  • Implement project-specific code with an eye towards generalization
  • Extend to a reusable software package based on multiple use cases and a confirmed need

Los paquetes emergentes

  • Identificar las tareas comunes que se repiten entre proyectos
  • Implementar un código específico para los proyectos con el fin de la generalización
  • Extender a un paquete reutilizable que se basa en múltiples casos de uso y una necesidad confirmada
Centrar el desarrollo nuevo en las innovaciones específicas para las investigaciones.

software family tree

Possible Future Packages

  • Django Solr Indexable (created for PPA; expect to reuse)
  • GeoNames code
  • Database footnotes - generic footnoting across all database content;
    allows for conflicting evidence
  • Bibliographic Description - Minimal Library catalog for projects that need descriptions of books
  • Multi-author semantic blog component for editorial content
  • Django unAPI for Zotero support (implemented in PPA)
  • Reusable JavaScript components: search form submission, date histogram

Los posibles paquetes futuros

  • Django Solr Indexable (creado para PPA con la expectativa de la reutilización)
  • El código para GeoNames
  • Las notas a pie de la página en las bases de datos - Tener notas a pie de la página genéricas por todo el contenido de la base de datos, lo cual permite cierto tipo de evidencia en conflicto
  • La descripción bibliográfica - Un catálogo de una biblioteca mínimo para los proyectos que requieren las descripciones de libros
  • Un componente semántico del blog para contenido editorial que tenga en cuenta la posibilidad de autores múltiples de un libro
  • Django unAPI para permitir el uso de Zotero (se implementó en PPA)
  • Unos componentes reutilizables en JavaScript: envío de formulario de búsqueda, un histograma de fechas
Reusable packages in use across multiple projects

 


Abstract

Software development is often an integral aspect of Digital Humanities projects.  By working to generalize and build small modules or utilities targeting specific needs rather than large-scale systems, DH software developers have the capacity to generate tools with greater potential for scholarly reuse, which should enable more rapid development on future projects, and allow developers to focus on innovative work. This poster demonstrates a case study of modular software developed as part of ongoing DH projects.

There is a tendency among some institutions, particularly libraries, to adopt existing large-scale Open Source Software solutions and adapt them for local needs; but as Hector Correa points out, this approach results in skipping the work of thinking carefully about users and local needs (Correa, 2017).  If large-scale software solutions developed by coalitions of libraries are problematic (Princeton University Library Systems, 2017) where needs are at least similar, even where content structures or workflows differ, this problem is redoubled for research software, which is much more likely bespoke to a particular problem.  As Correa argues, single-purpose software is less complex and easier to understand and manage; and understanding the logic of code is crucial for research that is based on or otherwise makes use of software (Koeser, 2015).

Applying best practices from software development such as modular design can mitigate these problems through an emphasis on delivering working components of software and focusing on simplicity of purpose—a single, well-honed and balanced knife rather than a multi-tool with every imaginable attachment.  This approach is consistent with the design philosophy from one of the greatest success stories of modern open-source software, UNIX and its derivatives (Raymond, 2003).

There are certainly possible drawbacks and concerns about this approach.  It may require more effort, and perhaps different skills, to create, release, and manage independent software packages or modules.  According to Glass’ Facts and Fallacies of Software Engineering, it is “three times as difficult to build reusable components as single use components” (Glass, 2003: 49). In our case, when new software modules were being developed and extended in tandem with an existing software project, finalizing a new release of that project involved releasing and publishing multiple software modules.  There is also a danger of generalizing too soon; another familiar rule of thumb in software is that you have to do something three times before you know how to generalize it properly (Glass, 2003).

As a case study, our poster will present an overview of the software written for two annotation projects that were developed at the same time. “Derrida’s Margins” analyzes the work of Jacques Derrida through references in De la grammatologie and corresponding annotations in the books he cited. “The Winthrop Family on the Page” examines a community of readers connected through books over time via annotations.  This software ecosystem includes two project codebases (Koeser et al., 2018; Koeser and Hicks, 2018a) that make use of four new reusable components (Koeser and Hicks, 2018b; Koeser, 2018b), two of which (Koeser, 2018a; Koeser and Hicks, 2018c) were adapted from the “Readux” codebase (Koeser et al., 2017), which was previous developed at Emory University. In the process, we also used and made minor updates to a related, pre-existing module (Koeser, 2018c).

For each of these tools, a use case emerged in one project which could be generalized for other projects, with potential for broader reuse. As an example, “viapy”—a Python module for searching and providing VIAF data to a web framework—was adapted from previous work, and first existed as code for one of the annotation projects, but it proved generalizable.  In fact, it proved easier to extract as a reusable component rather than duplicate; one project team discovered a bug that had previously gone undetected, and creating a reusable package allowed us to correct the problem once for both projects. Likewise, code for storing and displaying annotations from the Readux project was ripe for repackaging as a general module because of its relatively direct purpose despite the different intellectual aims of these projects. However, these codebases also contain similar, potentially reusable functionality that is not yet ready for generalization.

These projects provide a view into the ongoing process of balancing customized solutions to DH projects with generalizing focused portions of functionality. Modular design aimed at ‘doing one thing and doing it well’ offers the possibility of creating an ecosystem of reusable packages that are widely useful and applicable, and can participate in a larger community of open source and other DH software research.


Bibliography

Correa, H. (2017). Build your own software Hector Correa http://hectorcorrea.com/blog/build-your-own-software/70 (accessed 28 November 2017).

Glass, R. L. (2003). Facts and Fallacies of Software Engineering. Addison-Wesley Professional.

Koeser, R. S. (2015). Trusting Others to ‘Do the Math’. Interdisciplinary Science Reviews, 40(4): 376–92 doi:10.1080/03080188.2016.1165454. https://doi.org/10.1080/03080188.2016.1165454.

Koeser, R. S. (2018a). Django-Annotator-Store: Django Application to Act as an Annotator.Js 2.x  Annotator-Store Backend. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/django-annotator-store.

Koeser, R. S. (2018b). Viapy: VIAF via Python. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/viapy.

Koeser, R. S. (2018c). Piffle: Python Library for Generating and Parsing IIIF Image API URLs. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/piffle.

Koeser, R. S., Glover, K., Li, Y., Varner, J. and Thomas, A. (2017). Readux: Django Web Application to Display, Annotate, and Export Digitized Books in a Fedora Commons Repository. JavaScript Emory Center for Digital Scholarship https://github.com/ecds/readux.

Koeser, R. S. and Hicks, B. W. (2018a). Django-Pucas: Django App to Streamline CAS Auth and Populate User Attributes from LDAP. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/django-pucas.

Koeser, R. S. and Hicks, B. W. (2018b). Winthrop-Django: Django Web Application for the Winthrop Family on the Page Project. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/winthrop-django.

Koeser, R. S. and Hicks, B. W. (2018c). Djiffy: Django Application to Index and Display IIIF Manifests for Books. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/djiffy.

Koeser, R. S., Hicks, B. W., Glover, K. and Budak, N. (2018). Derrida-Django: Django Web Application for Derrida’s Margins. Python Center for Digital Humanities at Princeton https://github.com/Princeton-CDH/derrida-django.

O’Sullivan, J., Jakacki, D. and Galvin, M. (2015). Programming in the Digital Humanities. Digital Scholarship in the Humanities, 30(suppl_1): i142–47 doi:10.1093/llc/fqv042. https://academic.oup.com/dsh/article/30/suppl_1/i142/364055 (accessed 28 November 2017).

Princeton University Library Systems (2017). Valkyrie Princeton University Library Systems by Pulibrary https://pulibrary.github.io/2017-07-06-valkyrie (accessed 28 November 2017).

Raymond, E. S. (2003). Art of Unix Programming, The. Addison-Wesley Professional http://proquest.safaribooksonline.com/book/operating-systems-and-server-administration/unix/0131429019.

Rebecca Sutton Koeser, Benjamin W. Hicks, Kevin Glover, Nick Budak, Xinyi Li and Jean Bauer (2018). Princeton-CDH/Derrida-Django: V1.1. Zenodo doi:10.5281/zenodo.1299972. https://zenodo.org/record/1299972#.WzZ-BxJKjMU (accessed 29 June 2018).

 

Acknowledgements

Thanks to Dr. Nora Benedict for translation

Gracias a la Dra. Nora Benedict por la traducción

Keywords: software development, DH2018

Bridging Digital Humanities Internal And Open Source Software Projects Through Reusable Building Blocks

Rebecca Sutton Koeser, Benjamin W. Hicks

Subscribe: RSS | ATOM