Read more about the origin of the project and its ethos, contributors and timeline here.
How the project started#
The Data Hazards project started in 2021. We (Natalie Zelenka and Nina Di Cara) spoke together about wanting a way to communicate what might go wrong in data science projects, because we were frustrated by the repetitive themes we were seeing in harmful technologies that we talked about in Data Ethics Club. We were also concerned that many projects that have significant societal impact do not have those impacts scrutinised by an ethics committee, because they do not technically have research participants.
After this conversation we came up with the idea of Hazard labels for communicating these potential harms, and called them Data Hazards. We decided they should be visual, like COSHH chemical hazards are, and that they should be a way for people at all stages of data science technology development to communicate about the same potential outcomes (no matter how far away those outcomes might seem).
These days the project is bigger than just us, and we have many contributors who suggest new content, changes to the labels, help us to teach others about ethical hazards or run their own events. If you would like to get involved (we’d love you to!) then we’ve listed lots of ways you could on our Contributing page.
Once we had thought of the original list of Hazards we wanted a way for researchers to think about them in a format that encouraged them to reflect, invite different opinions and make them think more broadly about the potential ethical concerns from their project. This led to the development of our workshop format and all the materials we have since made for self-reflection and teaching. All our resources are designed (and licensed) for re-use by others.
The Data Hazards are currently intended to be used creatively and flexibly, in whatever way they are useful to the user. Sometimes this means they are flashcards for teaching students about ethics, sometimes they are displayed with new research to communicate potential harms, and sometimes they are used in workshops as prompts.
We believe it is important when using the Data Hazards to help investigate risks in a project, that people beyond the original researcher are consulted on potential hazards. This is because we believe that knowledge, including in the sciences, is not objective, and that our perspectives are shaped by our lived socio-political experiences (this is based on standpoint theory). This means that ethical problems are not going to have a single correct answer, and that to get a well-rounded understanding of the ethical issues of any new technology we need people from lots of different standpoints to analyse it from their perspective. This is the best way we can understand the harms it could possibly cause. We also need to make sure that we are paying attention to how technology might be more likely to adversely affect people from minoritised backgrounds.
We developed our workshop format to help researchers to gather these different views.
In summary, the Data Hazards exist to prompt discussion, reflection and thought. They are not a checkbox exercise, and there is no requirement for a group to come to a consensus. In an individual context you will likely come to a conclusion, but someone else may have a different view. We hope that the Data Hazards discussion and reflective activities will help researchers be aware of a broader variety of potential ethical risks in tech projects, and that ethics is complex, situational and worth discussing.
Our brilliant contributors are listed here, and you can read more detail about our contributing process here.
Here’s a rough project timeline that tells the rough history of the project, and sometimes what we have coming up!
March-April 2021: Behind the scenes plans
Thinking, reading and planning.
Getting feedback on initial ideas.
Sept 2021 Run first Data Hazards workshops (academic-focused)
Ran the first Data Hazards workshop on 21st Sept 2021.
Oct 2021 Use workshop feedback to improve data hazards and present early results
Presented early results from workshop at AI Ethics Best Practices and the Future of Innovation as part of Bristol Tech Festival on 13th Oct 2021 (slides).
Used workshop feedback to make improvements to the Hazard labels and workshop materials.
Jan 2021 Awarded £20,000 Enhancing Research Culture funding
Set up our new project to deliver a ‘train-the-trainer’ for Data Hazards.
Feb-May 2022 Developed new labels and facilitator training materials
Hired animator to create animated explainers for Data Hazards and new Hazard labels.
Developmed and released run-your-own workshop materials.
We went to MozFest 2022 to run a Data Hazards workshop!
June 2022: Run our facilitator workshops
Ran a Data Hazards workshop as part of the Jean Golding Institute Showcase.
Ran our first Data Hazards facilitator workshop in-person as part of Bristol Data Week.
Ran our second (online) Data Hazards facilitator workshop.
July-Dec 2022: Prepare for first version release
Analyse all the results and feedback from five total workshops.
Applied (successfully) for launch event funding from UKRN.
Collect all previous suggestions for the project and think about future versions.
March 2023: Version 1.0 Release!
Attend AI UK 2023 as exhibitors!
29th March we will run the Data Hazards V1.0 Launch event!
We will release a pre-print of our first paper about the project!