Data Curation

The Center for Digital Humanities offers a Dataset Curation Grant for humanities data sets, and consultation on issues related to humanities data curation and humanities datasets. To request a consultation, please enter your information here.

The Research Data Lifecycle

Rather than collecting a final, published version of scholarly research, data curation intervenes in the research process itself to allow current and future researchers and educators to use and reuse the data for their own scholarship and teaching.  This lifecycle can be simplified into four basic stages:

plan: a research question and a process for gathering the data necessary to answer it; plans vary in specificity, scale, and duration but if they will be creating data sets, they will benefit from data curation practices

create: the raw data is generated (or gathered from preexisting sources)

analyze: the data set is parsed, filtered, and run through the necessary analyses to answer the research question, in the case of digital art projects this may be supplemented (or replaced) by a performance or installation

share: data is made available for use and reuse

Each research discipline has its own path through these stages, but here are some potential steps.

While the plan is often presented in as a straight, chronological line, these steps interweave creating a research data lifecycle closer to the one pictured below.[1]

Describes flow of data lifecyle where analysis can lead back to creating more data and sharing data can result in new plans.

Data curation has its own set of processes which integrate into this larger cycle, providing structure and provenance for the data as it is created, continuity of data sets as they pass through various levels of analysis, and finally frameworks through with the data can be shared in intelligible and reusable forms. These practices include ensuring and maintaining data quality, providing consistent organization of files and related material, ongoing documentation of the process by which it was derived and other metadata, assessing which data should be retained (and for how long), and a plan for storage and re-use, including the assignment of identifiers and stable links.

[1] This graphic draws on the larger literature of the research lifecycle, especially the DCC Curation Lifecycle Model