In 2108-2019, the the CDH Reading Group will partner with the Princeton University Library to explore the topic of "Collections as Data," and consider how Princeton’s library collections can be leveraged to support computationally-driven research and teaching. We invite members from the Princeton research community who play various roles in the creation, dissemination and use of library collections. Through short readings, discussions, presentations and hands-on activities, this group will identify ways that PUL collections are currently being exposed as data, and explore ways to better coordinate efforts to support and sustain cutting-edge data-driven scholarship at Princeton.
Topics for discussion may include:
- What data does PUL currently make available?
- Who are the main audiences on campus using PUL collections as data? Who else do we want to reach?
- What are examples of research outcomes using library collections data?
- What are the different types of data that we can make available?
- What data does the library produce that is hidden or siloed?
- How are changes in the cataloging and metadata profession transforming the data libraries produce?
- How do we address the legal issues associated with data use and re-use (e.g. copyright, licensing, negotiating with vendors)?
- What new workflows, roles and and services are needed to make our collections accessible as data?
- What new skills or training would be needed to support the research use of collections as data? Who would provide that support?
- What are barriers for making and using collections as data?
Our group will also read Safija Noble's Algorithms of Oppression: How Search Engines Reinforce Racism, in advance of the author's public talk at Princeton on Thursday, Dec. 6.
Meetings take place every three weeks during the semester, on Wednesdays, 12-1:20pm, at the CDH on B Floor of Firestone. Lunch will be provided. Discussion topics and short readings will be posted on the CDH events page in advance of the meeting. Drop ins are welcome!
Inspiration for Collections as Data comes from initiatives led by the Library of Congress and the IMLS-funded Always Already Computational: Collections as Data, which aims to promote conversations among librarians, archivists, and museum curators to develop a framework for creating and sharing cultural heritage collections as data and a community for developing best practices.
Proposed Fall 2018 Schedule& Topics
- Wed Sept 26 - Introduction: the Santa Barbara Statement on Collections as Data on the Always Already Computational website.
- Wed Oct 10 - What data does PUL produce? Moderated by Jim Casey.
- Wed Oct 24 - Who creates the data? Moderated by Thomas Keenan and Don Thornbury
- Who are data creators/producers?
- Who are/will be the data stewards?
- How do changes in the metadata & cataloging profession affect data/dataset creation?
- How does trend toward outsourcing metadata work affect our goals? How do we assess data quality when work is outsourced?
- What new roles and workflows are needed to make collections available as data?
- How do we value labor within data-driven collection work?
November 14th - "Identifying our users: Who will use the data?" Moderated by Jim Casey and Nick Budak.
Please take a look at the Collections as Data Personas (v2) online here: https://collectionsasdata.github.io/personas/
Then, browse the list of 50 things you can do to get started in Collections as Data: https://collectionsasdata.github.io/fiftythings/. We really like the randomizer. From the list, please come prepared to share one thing you think could be a useful first step in your work.
Other motivating questions for discussion:
- Who do we imagine as users of PUL collections as data?
- What are examples of library data being used for computational purposes? Success or failures stories.
- How do we make collections as data accessible to a variety of users, including non-technical users?
- How do we use data & digital tools to promote learning, engagement and innovation? How do we promote/encourage the use of PUL collections as data?
- How do we train Princeton researchers to use collections as data?
- Wed Dec 5 - Ethical & cultural issues/Algorithms of Oppression
- What are the ethical issues we need to keep in mind as we generate collections as data?
- How do we document/publish our ethical approach/stance?
- Discussion of Safiya Noble's Algorithms of Oppression: How Search Engines Reinforce Racism
We'll also have a special presentation by members of the Mudd Manuscript Library about the Archives Research and Collaborative History (ARCH) program.
Systems and services: How do we do it?
- How do we leverage existing library tools (Figgy, etc) to make sure data is available in usable formats?
- What new platforms, tools and services need to be built to make collections available as data?
- How do we design collections for Open Access?
- How do we connect the different types of collections as data (library, archives, museums, smaller collections/projects)
- How do we ensure the stability and integrity of the data we produce?
- How should researchers cite the data they use?