Who or what gets counted in the production and maintenance of data today? How does data change depending on who does the counting? When visibility entails vulnerability, how might counting become a dangerous activity?
During the last week of August, Rebecca Munson (Project and Education Coordinator) and Grant Wythoff (Digital Humanities Strategist) taught a class titled “Who Counts? Data Citizenship in a Digital World.” The class was part of the Freshman Scholar’s Institute, a summer program at Princeton that allows an entering cohort of students the chance to experience intellectual and social life at Princeton prior to the beginning of the fall semester. The FSI especially welcomes first-generation college and lower-income applicants, military veterans, transfer students, and others seeking a supportive scholarly community. Students spend seven weeks taking two full-credit courses with the opportunity to take an elective course during their eighth week; “Data Citizenship” was one of these options.
The objective was to prompt students to become critical readers, as well as producers, of data and to provide them with a look at the tools and methods available to help with this daunting task. The class introduced students to the ways in which data is gathered, curated, analyzed, described, stored, and communicated in often biased ways. Through a series of in-class activities and their final presentations, students had the opportunity to discover and analyze the ways in which data practices—particularly those that track and categorize individuals—replicate and institutionalize existing inequalities.
In the first session on data privacy the class focused on the kinds of data about our lives already out there in the world. In a two-part exercise, students first acted as private investigators and researched themselves based on their own publicly-available information, then imagined that they were “data brokers” looking to sell that information to a marketing company. The exercise led to a discussion of potential risks and rewards of keeping data private vs. public. The concept of “data publics” guided the second session, in which students considered different conceptions of the private/public divide and the impact of data practices on politics, culture, and the public sphere by focusing on everyday interactions with social media and “smart home” devices like Alexa.
The final two sessions tackled the topic of “data bias,” exploring how humans and machines are biased in different ways and how the process of machine learning produces algorithms that replicate and amplify existing issues in human-produced datasets. Students considered the implications of the differences between human bias and statistical bias in how algorithms are trained and applied in real-world situations like assessing whether or not to grant bail in criminal court cases. The class looked at a recent DH project, Torn Apart / Separados, that attempts to address these issues of (in)visibility and perform a critical analysis of its own data practices.
For their final project , students were asked to “adopt” a dataset, consider the conditions of its assembly, and expose the assumptions, practices, and potential problems with it. They also had the opportunity to offer suggestions for its use or modification in order to address potential issues and foster more equitable practices at every stage of the data lifecycle. They discovered that, by and large, data sources were not clearly documented, making it overwhelmingly easy to be guided by assumptions rather than facts. The final project experience demonstrated the need to approach both the production and consumption of data with critical distance, a skill these students can now put into practice during their time at Princeton.