IDOC-DATA: data management and value chain

IDOC-DATA

The main course of IDOC-DATA's activities can be summarised in the following diagram:

IDOC-DATA data ingestion and curation


    FAIR principles

IDOC has been actively involved since the beginning in processes that aim to improve the infrastructure supporting the reuse of scholarly data. Therefore, when a diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that now refer to as the FAIR Data Principles, IDOC makes every effort to comply with these recommendations.

For example, from the first implementations of the data access interfaces, the ability of machines to automatically find and use the data, has been implemented.

    Data Integrity and authenticity

Whatever the service IDOC implements, and independently of the possible enhancement on the datasets, the initial dataset is kept unchanged. This means that all level of dataset management assume that potential added value and curation are only made on copies of those originals.

Moreover, IDOC’s dataset management ensures integrity and authenticity during the processes of ingest, archival storage, and data access: changes to data and metadata are documented and the relationship of the dataset with the original data is maintained.

    Cycling curation model

Digital content (set of formatted and organized informations) preservation must comply with:

  •     Stability: ability to have the same result along time,
  •     Referencing: the location of the information is predictable,
  •     Certified origin: informations are produced by successive certified processes,
  •     Context: each bit of information has a context allowing its understanding.

IDOC-DATA provision

    Identifying the expectations of Designated Communities

IDOC-DATA staff is composed of engineers deeply participating in the research of the scientific teams building the datasets. They thus have the necessary knowledge to accommodate the data evolutions. Most of the times, they even are the dynamic of these evolutions. Moreover, the IDOC-DATA organization largely promotes the technological monitoring to explore, in advance, future technical developments that seems promising for its present and future requirements. This allows to be prepared to integrate the right tools at the right time, if and when their benefits are validated. The participation of IDOC-DATA members in regional and national networks allows reinforcing the effectiveness of this technology monitoring.  

For each of the five scientific themes it hosts (solar physics, interstellar medium physics, cosmology, stellar physics, planetary surfaces, and other thematic of GEOPS and AIM), IDOC-DATA includes a scientific leader. Each of those leaders is a recognized senior scientist who acts as an adviser irrespective of each specific delimited project requirements.   

Agroup of independent experts validates the technical and scientific orientations and the quality of the actions undertaken.   

During the definition phase of each project, the leader is usually building a team of (national or international) experts in the given theme who will help in defining the requirements and act as beta testers. After the service in online, the leader presents the datasets or tools in conferences and workshops and collects feedbacks.   

These feedbacks lead to changes in the interface or integrates its FAQs. A "contacts and credits" page is always available, which also allows contributors to be thanked.  

The experts team also act as reviewers periodically to ensure that the tools and datasets evolves in phase with the community needs. Further communication is done through interface sites, exchange forums or the implementation of collaborative tools.  

    Environment of the provision

Each of the interfaces giving access to the 63 datasets distributed by IDOC-DATA allows:   

- to find the first level of help for the use of these data  

- to contact the experts of this interface  

- to find the DOI of the dataset and the contact information of the creator(s) of this dataset  

- to participate in a possible collaborative exchange place.  

     Accompany the use

IDOC-DATA also organizes workshops to help you get to grips with the most complex aspects or new categories of data.  

 

Citations

The interfaces developed explicitly advise IDOC users to include in their articles and publications an explicit reference to the IDOC site that has enabled them to progress in their work.  

As the debate is not easily settled between user identification through the creation of accounts giving access to data and free access, the exact monitoring of the exact uses of the data is not uniform.

Indeed, a registration request may put off some visitors and discourage them from using the data presented. On the contrary, logging in with a (free) account seems to be more efficient for IDOC-DATA to be quoted in articles or other types of publications resulting from the use of the data or tools made available.  

Consequently, the choice implemented is the result of a discussion between the actors of the dataset: providers, relevant user committee...
 

IDOC-DATA Governance and rules of commitment

IDOC-DATA governance is assured by the IDOC-DATA steering committee. It is designated by the OSUPS Governing Board which gives its recommendations. The IDOC-DATA steering committee nominates both the IDOC-DATA technical and scientific leaders.

    Compliance with specific contracts or Data Management Plans  

Whatever the service IDOC implements, it has to be done in accordance with the terms agreed with the producers of the data or project funders (e.g. space agencies). 

Moreover, IDOC’s dataset management ensures integrity and authenticity during the processes of ingestion, storage, data access and preservation: changes to data and metadata are documented and the relationship of the dataset with the original data is maintained.

Data management OAIS description: https://www.dpconline.org/docs/technology-watch-reports/1359-dpctw14-02/...

The Provision function maintains databases of descriptive metadata identifying and describing the archived information in support of the OAIS’s finding aids; it also manages the administrative data supporting the OAIS’s internal system operations, such as system performance data or access statistics. The primary functions of Data Management include maintaining the databases for which it is responsible; performing queries on these databases and generating reports in response to requests from other functional entities within the OAIS; and conducting updates to the databases as new information arrives, or existing information is modified or deleted. In managing these databases, the Data Management function supports search and retrieval of the OAIS’s archived content, and administration of the OAIS’s internal operations. 
 

  
 

IDOC-DATA preservation model and its strategy

Long term preservation of digital documents has three main objectives:

  •     Preserve the information  
  •     Make it accessible
  •     Preserve intelligibility.

these three objectives aim to perpetuate not only the data as such but above all their capacity to be used effectively by the user communities.

Let's detail these objectives:

Preserve the information over the years?  
That is the most obvious function expected of a repository. It must ensure that the record is always available on the storage medium, and that it maintains its integrity. 

Make it accessible?  
This means that you can find the document on the storage medium and retrieve its contents for use from any workstation that is normally available to users of that data. 

Preserve the intelligibility of the document?  
It is a question of ensuring that the document is certainly readable but above all that its content is intelligible to the user and that the semantics carried by this content is well preserved. 

Note: Secure backup (or storage) only takes into account the first two of the three objectives listed above and only in the short and medium term.  

Ensuring that all three objectives are met means that it is necessary to validate over time that the tools, interfaces, descriptions, etc., which are the environment of the data and allow its use, retain their relevance for the understanding and effective use of the data.

To build and repeat this validation over time and at regular intervals, IDOC interacts with the scientific teams behind the data to describe this environment. This interactive procedure is described in the following chapters, and the result will lead to identify how to mitigate the four main risks that a dataset inevitably must face:

  •     Hardware obsolescence,
  •     Software obsolescence,
  •     File format obsolescence,
  •     Loss of the meaning of the content.

 This will allow to determine the specific points of attention of this datatset that will join the usual points to which IDOC knows to take in attentive consideration.

Over time, these points of attention are validated in a cyclical way, and this scheme allow to keep the data intelligible.