Carrying on from our open science feature in the last Quarterly newsletter, this time we explore the importance of ‘FAIR principles’, as the bedrock of solid research and replicable data, through the work of the European Open Science Cloud (EOSC) and GO-FAIR Organisation.
Even at the height of the Covid-19 pandemic, anti-vaccination propaganda proliferated across the internet. It was a wake-up call for the scientific community that its methods and motivations can never be taken for granted.
Throughout the ages, scientific advances have relied on a solid foundation of evidence underpinned by reliable data. The more unconventional the innovation or development, the higher the standard of proof needed to overcome natural conservatism. Early reticence about the novel RNA-based Covid vaccines is a case in point.
To counter such resistance, scientists’ measurements and observations typically focus on building and strengthening a body of evidence fed by carefully scrutinised datasets to “anticipate, identify and minimise (or even eliminate) sources of error”, according to the New South Wales government’s guidance on evaluating scientific data. The Australian State adds:
“Every aspect of a scientific investigation must be scrutinised for errors, as they may affect the investigator’s conclusions. When experiments are repeated, the errors of measurement may compound. Therefore, scientists use several criteria to decide if an experiment, and the conclusions derived from it, are acceptable.”
The internet, digital technologies and powerful tools to sort, search and understand research results, including machine-learning, are fuelling a data-led scientific revolution. But with these developments comes added responsibility and new challenges warranting the creation of new disciplines and even operating system to foster coherent solutions for what is being called a global ‘Internet of FAIR Data and Services (IFDS).
So, what does FAIR stand for?
First communicated in a 2016 article, ‘FAIR Guiding Principles for scientific data management and stewardship’ (Nature – Scientific Data), the concept of making digital assets Findable, Accessible, Interoperable, and Reusable was introduced. The principles emphasise “machine-actionability”, or how easily computers can interact with increasingly complex and large datasets.
Efforts to build the IFDS are well underway across Europe and in other regions, including Australia, Africa, and the US. The work focuses on establishing what GO-FAIR describes as a “federated environment” for scientific data-sharing and re-use based on existing and emerging elements in EU Member States and “lightweight” international guidance and governance.
Emphasis is on avoiding top-down decrees and allowing a “large degree of freedom regarding practical implementation” in much the same way as the internet currently functions with “no single centralised governance”.
Here, GO-FAIR stresses that the “dominance of a very limited number of private or public parties should be avoided by copying the internet’s “hourglass model” of minimal yet rigorous standards and protocols.
This, the organisation adds, allows open and common implementation through different stakeholders: “All kinds of providers, both public and private, can start implementing prototype applications for the Internet of FAIR Data and Services on the day minimal standards and minimal rules of engagement are released.”
Guidelines have been published on how FAIR works in practice and summarised on GO-FAIR’s website (‘Three-point FAIRification Framework’). The principles apply to three main types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure.
For the EURAXESS Worldwide’s busy community, we provide a quick overview of the main points.
Good metadata (machine-readable descriptions) is key to making your data ‘findable’. The FAIR data framework assigns (meta)data a unique globally recognised identifier, which should be clearly and explicitly included in the rich data descriptions and registered or indexed in a searchable repository/resource.
Once users find the data, they need access and possibly permission to do so (authentication/authorisation). For this, (meta)data must be retrievable using the identifier and standard communications protocols, which should be open, free and universally implemented. Metadata should remain accessible even when the data are no longer available.
Data should be well integrated with other datasets, applications and workflows for optimal analysis, storage, and processing. This means using accessible and broadly applicable language (lexicon) that follows FAIR principles and includes qualified references to other (meta)data.
Replicability is vital to good science and underpins FAIR principles. (Meta)data should thus be well-formulated for ease of use and re-use in different settings. That means it should be “richly described” with accurate and relevant attributes, released with clear and accessible data-usage licensing and provenance information, and should meet domain-relevant community standards.