Examples#

Here we share a series of examples of how people have made use of the Data Hazards in their work. All examples here are shared with permission.

Teaching#

Holly Fraser, University of Bristol, used the Data Hazards with MSc students studying Digital Health and Care for an AI and Ethics themed seminar.

The seminar went really well I thought, and the labels linked really well with the other ethics content in the course. I adapted the slides you provided to give an overview of the labels, and found some real life examples of the hazards (e.g. Boris Johnson blaming a ‘mutant algorithm’ for the A-Level results prediction fiasco back in 2020, instead of the government taking accountability), then asked the students to apply the labels to some real life projects. I used my PhD project as an example, and some other people from the Digital Health CDT kindly let me use their projects as well.

I asked for some feedback at the end, and the students said they found the labels easy to use and seemed to understand the concepts really well. They all managed to apply multiple labels to the different projects, so I think the label imagery and the explanations worked well in the different research contexts. They had one comment on the label ‘High environmental use’, where they definitely understood the concept but weren’t sure how to measure high or low use, or what unit they would use to quantify it, which I thought was an interesting discussion point.

The students worked in small groups (about 5 per group) which worked well, I think even pairs would have been fine too though.

Nina Di Cara used the Data Hazards to teach MSc students in Medical Statistics and Health Data Science at the University of Bristol about data ethics.

Using the Data Hazards cards students really quickly started talking about what they thought. We had a good group discussion for 20 minutes in groups of 4-5 and afterwards the groups fed back about what they thought.

One really interesting observation made by one of the students was that the Hazards naturally split into those which apply in the development of a project, and those which apply in the way that information about the project is shared.

I’m planning to use this again next year as it went really well, especially for a topic which can be tricky for people to get started on.

ALSPAC Data Protection Impact Assessment (DPIA) forms#

The Avon Longitudinal Study of Parents And Children (ALSPAC), also known as the Children of the 90s project, is a longitudinal cohort study that follows children who are born in the South West of England between 1991-1992. Pioneered by Professor Jean Golding (OBE), the data for ALSPAC which is hosted at the University of Bristol is a resource of health, social and lifestyle data that has generated more than 3000 peer-reviewed research papers and is used by researchers all over the world.

Due to the sensitive nature of the data, stringent ethics checks are in place for researchers hoping to access the data. Alongside these formal ethics requirements, the data hazard warning labels are used as part of any Data Protection Impact Assessments required for data access.

We have included the Data Hazards as a part of our DPIA - Data Processing Impact Assessment (which is required by GDPR) - we use the Data Hazards as a way to prompt ourselves when considering what risks could be associated with a new data project, and then to consider ways to mitigate those risks.

Self Assessment#

Natalie Zelenka used the Data Hazards in her thesis as part of the discussion of the ethical aspects of her work. See this example on Natalie’s website

You can also see another analysis by Natalie here.

Susana Román García has integrated the Data Hazards analysis into her PhD work, including it as part of roadmap to ethical and reproducible research. See Susana’s beautiful poster about her work is available here, presented at COMBINE 2022.

Delivering Workshops#

COMBINE 2022#

Susana Román García delivered a Data Hazards Workshop at COMBINE 2022 after attending our facilitator training in Summer 2022. You can see Susana’s workshop materials here. As a result we also recieved some new Data Hazard suggestions from the attendees!

JGI Seedcorn Projects 2023#

As part of the Jean Golding Institute for data science and data intensive research’s (JGI) annual Seedcorn funding call, sucessful applicants had the opportunity to present their project at a data hazards workshop, run by Huw Day and Nina Di Cara. Members of the University of Bristol’s research community were invited to attend, learn about the data hazards labels and how they work and then apply the labels to these exciting interdisiplinairy data science projects. This included discussing where certain hazards might apply and how you might go about mitigating these hazards. The project owners were then given the chance to feedback to the group on what they could take away from the session.

JGI Data Week 2024 Data Hazards Creative Workshop#

As part of the JGI’s annual Data Week conference, Huw Day and Nina Di Cara ran a creative data hazards workshop. The workshop involved educating participants on the necesity of ethical practise in data science projects, then they were shown a series of reports from the AI Incident Database and invited to discuss what sort of harms or hazards might occur. Then, without seeing the original data hazard labels, participants were encouraged to come up with their own idea of data hazard labels and create them by drawing or using arts and craft to make the label. Some of these hazards have been added as suggested new hazards on our GitHub issues page.