Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. 2023

Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
Universite Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, Orsay 91405, France.

Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.

UI MeSH Term Description Entries

Related Publications

Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
October 2021, Nature methods,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
May 2024, Research square,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
September 2019, Nature,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
January 2020, F1000Research,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
June 2022, Biophysics reports,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
May 2019, GigaScience,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
September 2018, Briefings in bioinformatics,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
October 2018, BMC bioinformatics,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
July 2012, Journal of computer-aided molecular design,
Marine Djaffardjy, and George Marchment, and Clémence Sebe, and Raphael Blanchet, and Khalid Bellajhame, and Alban Gaignard, and Frédéric Lemoine, and Sarah Cohen-Boulakia
November 2022, Microorganisms,
Copied contents to your clipboard!