Skip main navigation

Ethics

Ethics in data science

As data scientists, we design algorithms or experiments and use them to derive results. But how do these results impact society and our surrounding environment?

Experiments are a particular class of algorithms that involve processing data.

Designing algorithms enables us to build a system that can be re-used in different contexts and completely outside our control.

To some extent, this is what we want to happen, as it enables high degrees of scalability; income and business success are not reliant on one person to conduct the analysis, it can be picked up by anyone. On the other hand, this also means a lack of control over how our algorithms are used and how their outputs could be interpreted.

We, therefore, need to be aware of how our algorithms could be used and how their outputs could be interpreted. We have previously seen examples of interpreting results in Step 2.13: Sense checking sensational statistics.

In designing our algorithms and experiments, we must take into account:

  • Any applicable laws and regulations. For example, the General Data Protection Regulation (GDPR) or California Consumer Privacy Act (CCPA).

  • Privacy and anonymity of individuals. We need to be aware of how data science can be used to link different datasets, which can put anonymity under threat. Privacy is also a concern when consumers of a dataset change.

  • Ethical use of data. Not all data that is available can or should be used. We should follow guidelines published by professional bodies, such as the ACM Code of Ethics and Professional Conduct and the BCS Code of Conduct.

  • Validity of data and absence of bias. We need to make sure the data we hold accurately reflects the facts and is representative. Equally, the further processing of data must maintain representativeness and not introduce any bias.

  • Interpretation of results. Statistical models may be used in a predictive manner and different conclusions may be drawn by different audiences. Such statistical models provide no guarantee of the absence of other events. The Black Swan theory explains the potential impact unexpected events.

Whether designing or applying algorithms, we need to make sure we do so in an ethical fashion, taking into account all of the above. Even when doing so, we may encounter unexpected use of our technology at a future point in time.

References

ACM Ethics. (2018). ACM code of ethics and professional conduct. https://ethics.acm.org

Chappelo, J. (2020, March 11). Black swan. Investopedia. https://www.investopedia.com/terms/b/blackswan.asp

European Commission. (2020). EU data protection rules. https://ec.europa.eu/info/law/law-topic/data-protection/eu-data-protection-rules_en

State of California Department of Justice. (2020). California Consumer Privacy Act (CCPA). https://oag.ca.gov/privacy/ccpa

The Chartered Institute for IT. (2020). BCS code of conduct. https://www.bcs.org/membership/become-a-member/bcs-code-of-conduct/

© Coventry University. CC BY-NC 4.0
This article is from the free online

Get ready for a Masters in Data Science and AI

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now