Reading Group

Collections as Data Reading Group

In 2108-2019, the the CDH Reading Group will partner with the Princeton University Library to explore the topic of "Collections as Data," and consider how Princeton’s library collections can be leveraged to support computationally-driven research and teaching. We invite members from the Princeton research community who play various roles in the creation, dissemination and use of library collections. Through short readings, discussions, presentations and hands-on activities, this group will identify ways that PUL collections are currently being exposed as data, and explore ways to better coordinate efforts to support and sustain cutting-edge data-driven scholarship at Princeton.

Topics for discussion may include:

  • What data does PUL currently make available?
  • Who are the main audiences on campus using PUL collections as data? Who else do we want to reach?
  • What are examples of research outcomes using library collections data?
  • What are the different types of data that we can make available?
  • What data does the library produce that is hidden or siloed?
  • How are changes in the cataloging and metadata profession transforming the data libraries produce?
  • How do we address the legal issues associated with data use and re-use (e.g. copyright, licensing, negotiating with vendors)?
  • What new workflows, roles and and services are needed to make our collections accessible as data?
  • What new skills or training would be needed to support the research use of collections as data? Who would provide that support?
  • What are barriers for making and using collections as data?

Our group will also read Safija Noble's Algorithms of Oppression: How Search Engines Reinforce Racism, in advance of the author's public talk at Princeton on Thursday, Dec. 6. 

Meetings take place every few weeks during the semester, on Wednesdays, 12-1:20pm, at the CDH on B Floor of Firestone. Lunch will be provided. Discussion topics and short readings will be posted on the CDH events page in advance of the meeting. Drop ins are welcome!


Inspiration for Collections as Data comes from initiatives led by the Library of Congress and the IMLS-funded Always Already Computational: Collections as Data, which aims to promote conversations among librarians, archivists, and museum curators to develop a framework for creating and sharing cultural heritage collections as data and a community for developing best practices.


 Schedule& Topics


  • Wed Oct 10 - What data does PUL produce? Moderated by Jim Casey.
    • Contact for access to a collectively-generated list. For examples of datas and datasets produced at other institutions, see the Always Already Computational group's Data Facets.


  • Wed Oct 24 - Who creates the data? Moderated by Thomas Keenan and Don Thornbury
    • Who are data creators/producers?
    • Who are/will be the data stewards?
    • How do changes in the metadata & cataloging profession affect data/dataset creation?
    • How does trend toward outsourcing metadata work affect our goals? How do we assess data quality when work is outsourced?
    • What new roles and workflows are needed to make collections available as data?
    • How do we value labor within data-driven collection work?


  • Wed Nov. 14th - "Identifying our users: Who will use the data?" Moderated by Jim Casey and Nick Budak.

    Please take a look at the Collections as Data Personas (v2) online here: 

    Then, browse the list of 50 things you can do to get started in Collections as Data: We really like the randomizer. From the list, please come prepared to share one thing you think could be a useful first step in your work.

    Other motivating questions for discussion: 

    • Who do we imagine as users of PUL collections as data?
    • What are examples of library data being used for computational purposes? Success or failures stories.
    • How do we make collections as data accessible to a variety of users, including non-technical users?
    • How do we use data & digital tools to promote learning, engagement and innovation? How do we promote/encourage the use of PUL collections as data?
    • How do we train Princeton researchers to use collections as data?


  • Wed Dec 5 - Ethical & cultural issues/Algorithms of Oppression  

We'll also  have a special presentation by members of the Mudd Manuscript Library about the Archives Research and Collaborative History (ARCH) program.



Spring semester:

Systems and services: How do we do it?

  • How do we leverage existing library tools (Figgy, etc) to make sure data is available in usable formats?
  • What new platforms, tools and services need to be built to make collections available as data?
  • How do we design collections for Open Access?
  • How do we connect the different types of collections as data (library, archives, museums, smaller collections/projects)
  • How do we ensure the stability and integrity of the data we produce?
  • How should researchers cite the data they use?


Wed Feb 13 - PUL Digital Library Infrastructure Demystified

For our first meeting of the Spring semester, the Collections as Data Reading Group turn to the topic of services and systems at PUL, and how they support - or could better support - work with data-driven collections.  

We'll be joined by Esme Cowles, Software Development Manager at Digital Repository and Discovery Services (Library Information Technology, Imaging and Metadata Services) who will walk us through PUL's digital library infrastructure. 


Wed Feb 27 - What's next for the Finding Aids? Session moderators Kelly Bolding and Faith Charlton (Rare Books and Special Collections)

To continue our discussion about PUL's systems and services, our next Collections as Data Reading Group session will delve into Princeton's Finding Aids site to investigate how Princeton's rich archival metadata can be used for computational research and analysis.

We will start with a special presentation by the PULFA 3.0 team, who will discuss the working group’s progress thus far and give some ideas about how Princeton's Finding Aids may evolve in the future. 


Wed April 3 - Services: Who makes Collections as Data work? 

In this meeting, we turn our attention to the people, roles and skills needed to make data-driven work on our collections possible.  

  • Where do researchers currently go to interact with PUL collections as data?
  • What services, roles and workflows does PUL currently have in place to make our collections available and useable as data? What new services, roles and workflows do we need?
  • What new skills or training would be needed to support the research use of collections as data? Who would provide that support?

Please look again at the Collections as Data: 50 Things You can Do document. 


Thursday June 6 - Services, continued and wrap up & next steps