Data Hazard labels#

This page contains the Data Hazard labels themselves. These labels, descriptions, examples, and safety precautions will evolve as we develop the hazard labels with the communities who will use them. We welcome you to suggest changes, so please check our contribution guidelines if you would like to.

Each hazard has:

  • Hazard image, title, and description which represents and describes the risk.

  • Examples to clarify what the hazard covers.

  • Safety Precautions - things that we would want to see done before the research is deployed.

They are designed to help us think about the different types of hazards

img-top

Data Hazard

Data Science is being used in this output, and any negative outcome of using this work are not the fault of “the algorithm” or “the software”.

This hazard applies to all Data Science research outputs.

img-top

Reinforces Existing Biases

Reinforces unfair treatment of individuals and groups. This may be due to for example input data, algorithm or software design choices, or society at large.

Note: this is a hazard in it’s own right, even if it isn’t then used to harm people directly, due to e.g. reinforcing stereotypes.

img-top

Ranks Or Classifies People

Ranking and classifications of people are hazards in their own right and should be handled with care.

To see why, we can think about what happens when the ranking/classification is inaccurate, when people disagree with how they are ranked/classified, as well as who the ranking/classification is and is not working for, how it can be gamed, and what it is used to justify or explain.

img-top

High Environmental Cost

This hazard is appropriate where methodologies are energy-hungry, data-hungry (requiring more and more computation), or require special hardware that require rare materials.

img-top

Lacks Community Involvement

This applies when technology is being produced without input from the community it is supposed to serve.

img-top

Danger Of Misuse

There is a danger of misusing the algorithm, technology, or data collected as part of this work.

img-top

Difficult To Understand

There is a danger that the technology is difficult to understand. This could be because of the technology itself is hard to interpret (e.g. neural nets), or problems with it’s implementation (i.e. code is not provided, or not documented).

Depending on the circumstances of its use, this could mean that incorrect results are hard to identify, or that the technology is inaccessible to people (difficult to implement or use).

img-top

May Cause Direct Harm

The application area of this technology means that it is capable of causing direct physical or psychological harm to someone even if used correctly e.g. healthcare and driverless vehicles may be expected to directly harm someone unless they have 100% accuracy.

img-top

Risk To Privacy

This technology may risk the privacy of individuals whose data is processed by it.

img-top

Automates Decision Making

Automated decision making can be hazardous for a number of reasons, and these will be highly dependent on the field in which it is being applied. We should ask ourselves whose decisions are being automated, what automation can bring to the process, and who is benefitted/harmed from this automation.

img-top

Lacks Informed Consent

This hazard applies to datasets or algorithms that use data which has not been provided with the explicit consent of the data owner/creator. This data often lacks other contextual information which can also make it difficult to understand how the dataset may be biased.