Lifecycle of research data

Lifecycle of research data#

Research data is inextricably bound to a research project. The lifecycle of research data however can be separate from such a project, and therefore requires proper (long term) planning.

The research project lifecycle is, on a high level, extremely simple and consists of three major stages:

        flowchart LR

subgraph planstage["plan stage"]
plan["plan"]
end

subgraph runstage["run stage"]
    experiment
    manage["manage / process"]
    publish
end

subgraph closeoutstage["closeout stage"]
    closeout["closeout"]
end

planstage --> runstage --> closeoutstage
experiment --> manage --> publish --> experiment
    

The timeline involved for the stages to proceed can be from a few weeks to sometimes more than ten years. It is important to note that after the closeout stage has ended the research data can (and often will) still exist and be accessible. However the researcher running the research project may no longer be responsible for that data.

From a data management perspective the following activities can be plotted in the stages:

        flowchart LR

classDef irods stroke:#00bdab,stroke-width:4px;

subgraph planstage["plan stage"]
direction TB
    dmp["Write a data management plan"]
    funding["Acquire funding for archiving"]
    storageaccess["Ensure data storage access"]
    dmp ~~~ funding ~~~ storageaccess
end

subgraph runstage["run stage"]
    store["Store data"]:::irods
    process["Access / process data"]:::irods
    move["Move/copy data to compute facilities"]:::irods
    publish["Publish data"]
    share["Share data with colleagues"]:::irods
    delete["Delete data"]:::irods

    store ~~~ process ~~~ move ~~~ delete ~~~ share ~~~ publish
end

subgraph closeoutstage["closeout stage"]
    co_delete["Delete data"]:::irods
    co_hotstore["Archive data online for n years"]
    co_coldstore["Archive data offline for n years"]

    co_delete ~~~ co_hotstore ~~~ co_coldstore
    
end

planstage --> runstage --> closeoutstage
    

Not all activities presented here are relevant for all research projects. However there are relations between the named activities. For instance, it will be nearly impossible to acquire funding for archiving your data if you do not first write a data management plan. Also (obviously) you will not be able to store data when you have not first ensured access to a data storage facility.

iRODS has functionality to support you on the green outlined tasks.


Ignore this test diagram please:

        sankey-beta

Run,Store data at Surf, 40
Run,Store data at TU Delft, 40
Store data at Surf, Closeout, 40
Store data at TU Delft, Closeout, 40
Run, Store data in Cloud, 20
Store data in Cloud, Closeout, 20
Closeout,Delete data,50
Closeout,Archive data for n years, 50

Archive data for n years, Delete data, 50