Lifecycle of research data#
Research data is inextricably bound to a research project. The lifecycle of research data however can be separate from such a project, and therefore requires proper (long term) planning.
The research project lifecycle is, on a high level, extremely simple and consists of three major stages:
flowchart LR subgraph planstage["plan stage"] plan["plan"] end subgraph runstage["run stage"] experiment manage["manage / process"] publish end subgraph closeoutstage["closeout stage"] closeout["closeout"] end planstage --> runstage --> closeoutstage experiment --> manage --> publish --> experiment
The timeline involved for the stages to proceed can be from a few weeks to sometimes more than ten years. It is important to note that after the closeout stage has ended the research data can (and often will) still exist and be accessible. However the researcher running the research project may no longer be responsible for that data.
From a data management perspective the following activities can be plotted in the stages:
flowchart LR classDef irods stroke:#00bdab,stroke-width:4px; subgraph planstage["plan stage"] direction TB dmp["Write a data management plan"] funding["Acquire funding for archiving"] storageaccess["Ensure data storage access"] dmp ~~~ funding ~~~ storageaccess end subgraph runstage["run stage"] store["Store data"]:::irods process["Access / process data"]:::irods move["Move/copy data to compute facilities"]:::irods publish["Publish data"] share["Share data with colleagues"]:::irods delete["Delete data"]:::irods store ~~~ process ~~~ move ~~~ delete ~~~ share ~~~ publish end subgraph closeoutstage["closeout stage"] co_delete["Delete data"]:::irods co_hotstore["Archive data online for n years"] co_coldstore["Archive data offline for n years"] co_delete ~~~ co_hotstore ~~~ co_coldstore end planstage --> runstage --> closeoutstage
Not all activities presented here are relevant for all research projects. However there are relations between the named activities. For instance, it will be nearly impossible to acquire funding for archiving your data if you do not first write a data management plan. Also (obviously) you will not be able to store data when you have not first ensured access to a data storage facility.
iRODS has functionality to support you on the green outlined tasks.
Ignore this test diagram please:
sankey-beta Run,Store data at Surf, 40 Run,Store data at TU Delft, 40 Store data at Surf, Closeout, 40 Store data at TU Delft, Closeout, 40 Run, Store data in Cloud, 20 Store data in Cloud, Closeout, 20 Closeout,Delete data,50 Closeout,Archive data for n years, 50 Archive data for n years, Delete data, 50