SHARING IN A GRAY AREA: A FRAMEWORK FOR BIG DATA CURATION
Our world is data-driven. "Big data" can create insights to inform all kinds of efforts—from business strategy and predicting markets, to producing media stories, to academic research. For the purpose of this talk, we define "big data" as data about people that is collected as a corollary to other services. Examples include shopping behavior tracked by retailers, clickstream data tracked by websites, geospatial data generated from people's mobile devices, and social media posts. As big data is increasingly used to conduct research, so is it increasingly shared publicly—in response to requirements from funding agencies and academic journals, or simply in the spirit of Open Data. While the research community has developed guidelines for anonymization and sharing of human subjects data, anonymization and sharing of big data continues to inhabit a gray area. Is big data human subjects data, or is it already-existing data that can be shared freely? Our talk proposes a framework that will draw on case studies from the Dryad Repository to help guide data curators through ethical inquiry when assessing big data for the purpose of public archiving. The framework explores factors such as the sensitivity of the research being conducted; the context in which the data was collected; and the expectations of the users whose lives and actions constitute big data. This framework is not meant to provide hard and fast rules, but rather aims to improve practice and minimize risk for all humans involved in the open data ecosystem.