CloudOS Meets Jupyter: Fixing Reproducibility Crisis in Science

3 minute read
Lifebit

Lifebit

Anyone working in a data-driven industry will undoubtedly recognise the name Jupyter, a clever acronym for Julia, Python and R (although dozens of other programming languages are now supported). Screenshot-2019-11-19-at-15

The ultimate digital notebooks. Forget Wiki, Google Drive and Evernote. The open-source and language-agnostic Jupyter notebooks support execution environments, and allow users to create and share dynamic documents that contain live code, equations, interactive visualisations and explanatory text – features that have made them very popular with researchers. Astonishingly, over 3,000,000 Jupyter notebooks are currently publicly available on GitHub, while an equivalent number of private notebooks also exist.

The fast and broad adoption of Jupyter notebooks within the life sciences is mainly attributed to the fact that they are based on “literate programming” – essentially, a programming style which emphasises a prose-first approach, or human-friendly text, punctuated with machine-readable executable code blocks, also referred to as cells. Interestingly, this programming mindset is the opposite of how we usually approach coding.

Jupyter notebooks lead to better documentation, better workflows and better collaboration among researchers.

These notebooks can be used for data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. Furthermore, Jupyter notebooks offer the ideal working environment for visual exploratory data analysis of results, leading to many different use cases in the world of data-driven scientific disciplines.

Personal & collaborative logs

With the surge of next generation sequencing (NGS) data, the increased complexity of bioinformatics analysis and the fast pace of the bioinformatics software landscape (3,000 new bioinformatics tools developed in 2018 alone), researchers are increasingly faced with the burden of documenting the steps they follow during their work, usually performed by chronicling in a README file. Jupyter notebooks allow researchers to essentially skip this step, and instead automate compilations. Ciao README files!

Scientific publications

Jupyter notebooks are becoming increasingly popular in research because they enable researchers to effectively reproduce and share complex computational workflows. 

original

For the most part, scientific manuscripts are still static PDF documents which fail to communicate the complexity and sophistication of today’s data-driven scientific methods. Publications usually rely on chains of computer programs that generate, clean up, plot and run statistical models on data – methods which are critical to the science performed but are extremely difficult to write out as the code just doesn’t naturally translate into prose. 

The interactive Jupyter notebooks enable scientists to walk the readers through the work that was generated for every figure in the publication (check out this Jupyter notebook detailing an open RNA-Seq data analysis pipeline to reprocess data from a recent Zika virus study). Anyone can run the code themselves while also tweaking parts to introduce experimentation into the mix. In short, Jupyter notebooks empower researchers to review the validity of scientific publications. 

Education & training

Ensuring that life science students are exposed to bioinformatics methods as the field becomes more data-driven is a major focus. By presenting content in Jupyter notebooks, students are encouraged to edit and execute code, enabling a rapid transition from learning theory to applying techniques in a real-world setting (check out this openly available book for teaching with Jupyter, written by the Jupyter community).

Removing challenges to implementing Jupyter notebooks in research

Despite the rapid adoption of Jupyter notebooks across scientific disciplines, there are challenges the community must overcome to ensure the seamless incorporation of these notebooks in research.

Table

In a utopian world of bioinformatics, scientists, regardless of their coding skills, would be able to use and collaborate through Jupyter notebooks, installation and configuration steps would magically disappear, compute resources would be infinite, and the reuse and experimentation with existing Jupyter notebooks would be easy peasy lemon squeezy .

Seamlessly execute Jupyter notebooks with Lifebit CloudOS

So, how do we go about this at Lifebit? We democratise the use of Jupyter notebooks through Lifebit CloudOS. Of course, this is easier said than done, but we’re proud to say we’ve done it!

We are delighted to announce the integration of Jupyter Labs (an improvement of Jupyter notebooks in a flexible and powerful user interface) with Lifebit CloudOS.

cloudos_x_jupy

This powerful integration allows Lifebit CloudOS users to benefit from Jupyter Labs all in one place, easing barriers into complex data analysis for all researchers. Specifically, the integration of Jupyter Labs with Lifebit CloudOS benefits users by:

  • Removing entry barriers by making analysis tools accessible to all members of the team – this also gives the opportunity for researchers to learn basic exploratory skills, regardless of their coding skills. Choose to run analysis through the user interface or via the command line interface (CLI), it’s your choice!
  • Eliminating the need to install, configure and maintain the Jupyter environment – get right into action with Jupyter notebooks through Lifebit CloudOS!
  • Delivering access to a wide breadth of compute resources – Lifebit CloudOS users can select micro instances (1 CPU/1Gb memory) to very large instances for more computationally demanding tasks. Users can also switch instances dynamically to increase or decrease the amount of data.
  • Automatically synchronising libraries (and library versions) – Users no longer need to worry about library synchronisation when sharing and collaborating through Jupyter notebooks on Lifebit CloudOS.
  • Interactively explore result – Easily produce visually impactful graphs and results with all of your data in one environment
  • Fostering reuse and experimentation ‍ – select any existing Jupyter notebook to apply to your own datasets.
  • Supercharging collaboration ⚡- Jupyter notebooks on Lifebit CloudOS are designed to be shareable by simply sharing a link.
ezgif
Easily clone repositories with Jupyter notebooks into your Lifebit CloudOS environment!

 

If you are already using Jupyter notebooks on our CloudOS platform, we would love to know what you think! If you’re interested in using Jupyter notebooks for your research, contact our Customer Success team below, they would love to help you out!