How the project started#
The Data Hazards project started 2021. We (Natalie Zelenka and Nina Di Cara) wanted a way to communicate what might go wrong in data science projects, because we were frustrated by the repetitive themes we were seeing in harmful technologies that we talked about in Data Ethics Club. We were also concerned that many projects that have significant societal impact do not have those impacts scrutinised by an Ethics Committee, because they do not technically have research participants. After that conversation we came up with the idea of Hazard labels for communicating these potential harms, and called them Data Hazards. We decided they should be visual, like COSHH chemical hazards are, and that they should be a way for people at all stages of data science technology development to communicate about the same potential outcomes (no matter how far away those outcomes might seem).
Once we had thought of the original list of Hazards we wanted a way for researchers to think about them in a format that encouraged them to reflect, invite different opinions and make them think more broadly about the potential ethical concerns from their project. This led to the development of our workshop format and all the materials we have since made for self-reflection and teaching. All our resources are designed for re-use by others.
The Data Hazards are built on the foundations of standpoint theory. This is an epistemological theory that knowledge (including in the sciences) is not objective, and that our perspectives are shaped by our lived socio-political experiences. This means that ethical problems are not going to have a single correct answer, and that to get a well-rounded understanding of the ethical issues of any new technology we need people from lots of different standpoints to analyse it from their perspective. This is the best way we can understand the harms it could possibly cause. We also need to make sure that we are paying attention to how technology might be more likely to adversely affect people from minoritised backgrounds.
In summary, the Data Hazards exist to prompt discussion, reflection and thought. They are not a checkbox exercise, and there is no requirement for a group to come to a consensus. In an individual context you will likely come to a conclusion, but someone else may have a different view. We hope that the Data Hazards discussion and reflective activities will help researchers be aware of a broader variety of potential ethical risks in tech projects, and that ethics is complex, situational and worth discussing.
Here’s a rough project timeline to let you know what we’ll be up to:
March-April 2021: Behind the scenes plans
Thinking, reading and planning
Getting feedback on initial ideas
Sept 2021 Run first Data Hazards workshops (academic-focused)
Run first Data Hazards workshop on 21st Sept 2021.
Oct 2021 Use workshop feedback to improve data hazards and present early results
Present early results from workshop at AI Ethics Best Practices and the Future of Innovation as part of Bristol Tech Festival on 13th Oct 2021 (slides).
Look at workshop feedback to make improvements to: - data hazards labels - workshop exercises/materials
Jan 2021 Awarded £20,000 Enhancing Research Culture funding
Feb-May 2022 Developed new labels and facilitator training materials
hired animator to create animated explainers for Data Hazards and new Hazard labels
development and release of run-your-own workshop materials
Data Hazards discussion session: Mozfest 2022
Ran Data Hazards workshop as part of the Jean Golding Institute Showcase
Run first Data Hazards facilitator workshop in-person as part of Bristol Data Week
Run second (online) Data Hazards facilitator workshop as part of Bristol Data Week