Data Hazards is a project about worst-case scenarios of Data Science. Data Scientists are great at selling our work, for example communicating the gains in efficiency and accuracy, but we are less well-practiced in thinking about the ethical implications of our work. The ethical implications go beyond most ethics Institutional Review Boards, to questions about the wider societal impact of Data Science and algorithms work.
We aim to create resources to:
Create a shared vocabulary of Data Hazards in the form of Data Hazard Labels.
Make ethical and future-thinking more accessible to data scientists, computer scientists and applied mathematicians - to apply to their own work.
Enable bringing together and respecting diverse and interdisciplinary viewpoints to this work, through workshops or mailing lists.
Find out what circumstances, and for who, these resources work best by
To support our aims we will:
Get feedback on our draft Data Hazard Labels, to develop them with the communities who will be using them.
Create resources that help data scientists reflect on their own work, by creating prompts, frameworks, and forms for them to consider.
Run workshops and mailing lists where data scientists can listen to diverse perspectives and grow their ideas of what is possible, and where interdisciplinary researchers and the public can both be heard, respected, and listened to by the people who are doing computational and mathematical work.
Listen to our community’s feedback through surveys.
Why are the Hazard Labels so scary-looking?
We know that the Data Hazards labels are a bit frightening. Argh, there’s a skull! Please know that we don’t want these labels to scare anyone away from considering ethics or from doing data science, and we will do everything that we can to make applying Data Hazards labels as welcoming and approachable as possible, but also have some good reasons for choosing these images.
We chose this format because of the similarity to COSHH hazard labels - hazard labels for chemicals. We made this choice because we want a similar response from people:
Attention-grabbing, asking people to stop and think, and take the safety precautions seriously, rather than as an optional extra.
We’re asking people to “handle with care”, not to stop doing the work. We still use chemicals, but we think about how it can be done safely and how to avoid emergencies.
They are familiar, especially to scientists, who (within universities) tend to have the least experience of applying ethics.
Here’s a rough project timeline to let you know what we’ll be up to:
March-April 2021: Behind the scenes plans
Thinking, reading and planning
Getting feedback on initial ideas
Sept 2021 Run first Data Hazards workshops (academic-focused)
Run first Data Hazards workshop on 21st Sept 2021.
Oct 2021 Use workshop feedback to improve data hazards and present early results
Present early results from workshop at AI Ethics Best Practices and the Future of Innovation as part of Bristol Tech Festival on 13th Oct 2021 (slides).
Look at workshop feedback to make improvements to:
data hazards labels
Dec 2021 Trial asynchronous Data Hazards materials (without group discussion) as a tool for self-assessment.
Successful JGI Seedcorn applicants will be invited to trial our asynchronous materials and help us to improve the labels (info for Seed Corn applicants here).
Early 2022 Run second Data Hazards workshop (public and company/local-government focused)
The second Data Hazards workshop will be part of the JGI Showcase in February 2022.
Spring 2022 Write up Data Hazards paper