Skip main navigation

Principles of FAIR and open access data

FAIR principles
hands linking together covered with representation of digital network

Not all data is created and shared equally. It is important that when we contribute data to the wider community, we follow some basic principles to ensure it is useable by all.

Scientists are increasingly working in data-rich environments where human techniques of knowledge discovery (e.g. reading and observation) operate alongside machine-driven activities such as large-scale data generation and analysis. Therefore it’s important to standardise approaches to enable data management and stewardship.

In 2016 a consortium of scientists published the FAIR guiding principles for scientific data management and stewardship. These call for all data to be:

  • Findable via a unique and persistent identifier
  • Accessible using their identifier via standard communication protocols (e.g. the hypertext transfer protocol that powers the World Wide Web)
  • Interoperable by using a formal, accessible, shared and broadly applicable language for knowledge representation
  • Reusable through technical (e.g. common, standardised format) and legal (i.e. an associated data use licence) clarity on reuse.

Adherence to the FAIR principles for datasets and even teaching materials allows a certain acceleration of collaboration between groups who might never physically meet or directly communicate.

Open Data

The notion of FAIR is sometimes confused with that of Open Data. Open Data, a concept that evolved over at least fifty years, describes data that “can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike”. While Open Data can be FAIR, the FAIR principles are associated with the use of data in a data-rich world. While FAIR data might not be Open, Open Data might not be FAIR. For example, supplementary excel files shared in an open access journal are open data, but they are not FAIR compliant.

The world’s largest genomic database collaboration, the International Sequence Database Collaboration (INSDC), comprises the databases hosted by Japan’s National Institute of Genetics, the European Bioinformatics Institute and the National Center for Biotechnology Information in the United States of America. We will explain this collaboration in more detail in a future step, but briefly, these databases contain genomic sequence data and associated metadata. Metadata is data about data, for example information about the collection date and location of collection of a microbial sample.

Open and FAIR genomic data

The INSDC operates on an Open Data model, with all data in this collaboration “freely accessible without restrictive licensing as part of the scientific record”. The INSDC also aims to operate on FAIR principles, although historical inconsistencies in metadata poses a challenge for full reusability.

These definitions of FAIR and Open Data have largely emerged from conversations in the High Income Countries of the world, and suffer from limitations as a result.

FAIR+E data and proper attribution

Antimicrobial resistance (AMR) is a global issue, but the research conducted on this problem is not evenly distributed worldwide. The study of AMR relies heavily on laboratory infrastructure and expertise, which are often lacking in many regions. Just like other aspects of human life, scientific research is influenced by networks of global exchange as well as inequalities and exploitation that stem from centuries of European colonialism. This context highlights the necessity of data sharing in addressing AMR, while also revealing the complications that arise from it.

The World Health Organisation, in its guiding principles for pathogen data sharing, has “highlighted the importance of addressing equity in addition to aligning with FAIR data sharing principles” with the suggestion that FAIR be replaced with FAIR+E (FAIR + equitable). Along similar principles, LATech4Good advocates for data equity where “equity seeks to ensure fair treatment, equality of opportunity, and fairness in access to information and resources for all”. Finally the Public Health Alliance for Genomic Epidemiology (PHA4GE)’s Ethics and Data Sharing working group has produced a proposed Data Sharing Accord for Microbial Data that “aims to provide generally accepted, practical “rule of thumb” elements for data users to observe when conducting secondary analyses with data generated by others, helping to operationalise generalised principles into simple and unequivocal practice”.

The PHA4GE Data Sharing Accord proposes a set of clauses that data generators may wish to implement when sharing their data. These clauses cover several important aspects, such as:

  • Attribution: Ensuring credit is given to the original data generators.
  • Review of Outputs: Allowing review any outputs generated from their data before publication.
  • Restrictions on Further Sharing: Placing limits on the onward sharing of the data.
  • Collaboration Requirements: Mandating that secondary users seek collaboration with the original data generators.

The authors suggest that data repositories include support for stipulating such clauses in their platforms, something that has not yet happened.

The emergence of the proposed PHA4GE Data Sharing Accord and new genomic databases such as Pathoplexus that support restricted sharing of genomic data illustrates that the discussion around equitable data-sharing practices is very much alive.

How might adherence to the FAIR principles impact the long-term preservation and accessibility of scientific data for future research, especially in rapidly evolving fields?Let us know what you think
© Wellcome Connecting Science
This article is from the free online

Antimicrobial Databases and Genotype Prediction: Data Sharing and Analysis

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now