SQR-023: Design of the notebook-based report system

  • Jonathan Sick

Latest Revision: 2018-08-15

1   Abstract

The notebook-based test report system provides a way for LSST to generate and publish data-driven reports with an automated system. This technote describes the technical design behind the notebook-based report system.

For usage information, see the nbreport documentation at https://nbreport.lsst.io.

2   Requirements

The notebook-based report system is driven by these informally-defined design requirements:

  • A document handle corresponds to a templated Jupyter Notebook.
  • Notebook instances a generated from templates.
  • Each notebook instance has a unique serial number so that a notebook instance can be identified universally from a combination of document handle and instance serial number.
  • Once a notebook instance is generated, it is runnable. Notebooks may only be runnable in certain environments where data is available, such as the LSST Science Platform.
  • Generated notebook instances are published to the web with LSST the Docs.
  • The process of creating a notebook instance, running the notebook, and publishing the notebook instance is completely automated. Manual intervention is only required to develop the notebook template for the document and to set up the automations to run the report generation workflow.
  • The notebook-based report system should share as much infrastructure as possible with notebook-based technical notes.

3   Report generation sequence

This section traces the generation of a notebook report instance. In doing so, this section identifies the key components of the notebook report system and their designed functionality.

_images/nbreport-workflow.png

Figure 1 Processing sequence for generating and publishing a notebook-based report.

3.1   Compute platform phase

Notebook instances are generated on a compute platform that provides access to software libraries and datasets. The LSST Science Platform is envisioned as the baseline compute platform.

Running on the compute platform, the generation of a report instance is coordinated by the nbreport command-line tool. nbreport is a command-line tool so that it can be universally scripted. The all-in-one command for generating and publishing a notebook report is nbreport issue. This section describes the nbreport issue command.

The nbreport issue command begins by cloning the requested document repository. A document repository contains a templated Jupyter notebook. The template language is Jinja, compatible with Cookiecutter. Cookiecutter and Jinja have already been adopted by LSST DM for the lsst/templates repository. The report’s notebook is templated so that both code and documentation content can be updated. For example, the Python code block for a Butler get can be templated so that the report corresponds specified dataset.

Before the template is rendered, the report instance is registered with the nbreport API service (POST api.lsst.codes/nbreport/documents/<handle>/instances/). Doing so provides a unique, monotonically increasing ID for the report instance. Registering the report instance now allows the report’s instance ID to be used as a template variable. This registration step could also be used to preallocate a DOI.

Next, nbreport issue renders a Jupyter notebook instance using the Jinja/Cookiecutter API. Besides the instance ID, nbreport issue also gathers template variables from command-line arguments.

nbreport issue executes the report instance programatically using nbconvert‘s ExecutePreprocessor class. Doing so allows nbreport to work as a “headless” service that doesn’t need to open a browser window to execute a Jupyter Notebook. The executed notebook is saved as an ipynb file.

Finally, nbreport issue uploads that ipynb file to the nbreport web service (POST api.lsst.codes/nbreport/documents/<handle>/instances/<id>). All interactions with the web service are authenticated with GitHub-based OAuth, and authorized by GitHub organization membership.

3.2   Web service-based publishing phase

Once the api.lsst.codes/nbreport web service receives the ipynb file for the report instance, it converts that ipynb file into an LSST-branded HTML page using nbconvert. This report web page is similar in appearance to notebook-based technical notes. The web service uploads the HTML page for the report, in addition to the source ipynb file, to LSST the Docs. The notebook-based report system uses LSST the Docs’s editions feature to render each report instance at separate /v/<id> paths.

The api.lsst.codes/nbreport web service also generates an index page of all report instances. This index page is displayed at the report’s main URL. For example, if the report’s handle is DMQA-001, the main URL is https://dmqa-001.lsst.io. The URL for an individual report instance with an ID of 1 is then https://dmqa-001.lsst.io/v/1.

At its most basic, the index page provides a chronological listing of reports. The index page may also be developed to enable filtering and search of reports based on template variables.

4   The GitHub repository of a report

Each notebook-based report has its own GitHub repository. In the notebook-based report system, the GitHub repository for a report contains the source to generate a report instance. The GitHub repository not contain the instances themselves, those are published and archived with LSST the Docs.

The GitHub repository for a report has several standardized files. For illustration, the repository for a report with a handle DMQA-001 is laid out like this:

DMQA-001/
├── cookiecutter.json
├── DMQA-001.ipynb
├── nbreport.yaml
└── README.rst

The following sections describe each file.

4.1   cookiecutter.json

The cookiecutter.json file, adopted from the Cookiecutter project, establishes the template context. This file both defines the template variables that are expected in the report notebook and also defines default values.

A basic cookiecutter.json file that defines keys cookiecutter.a and cookiecutter.b with default values of 0 and 1, respectively, looks like this:

{
  "a": 0,
  "b": 1
}

Then a cell in the DMQA-001.ipynb notebook file can use those template variables using standard Jinja syntax:

answer = {{ cookiecutter.a }} + {{ cookiecutter.b }}

That cell would be rendered, if using the default values, as:

answer = 0 + 1

4.2   DMQA-001.ipynb

This file, named after the report’s handle, is a Jupyter notebook. This notebook contains the code and prose that, when executed, becomes a report instance.

This ipynb file must be committed into the GitHub repository in an unexecuted state, without outputs.

The source of each cell in the ipynb file is treated as a Jinja template (see 5   Templating of the report notebook).

4.3   nbreport.yaml

This file provides configuration for the report within the LSST notebook-based report system.

For the DMQA-001 example report, this file looks like:

handle: DMQA-001
ltd_product: dmqa-001
repository: https://github.com/lsst/DMQA-001
published_url: https://dmqa-001.lsst.io
ipynb: DMQA-001.ipynb

4.4   README.rst

The README file describes the report, for users on GitHub.

Note

The report repository should contain a description that can published on the report’s published homepage. This description could either come from the README or from the nbreport.yaml file.

5   Templating of the report notebook

The Jupyter notebook in a report’s GitHub repository is templated so that it can be customized on-demand when the report is generated.

5.1   Use of Cookiecutter and Jinja

Jinja is the templating format. The notebook-based report system uses Cookiecutter as a convenient wrapper around Jinja. Although the notebook-based report system does not use Cookiecutter for its true purpose of instantiating entire file trees, Cookiecutter has these capabilities that can be adapted into the notebook-based report system:

  • The cookiecutter.json file is useful for defining the full set of template variables, along with their types, and defaults. cookiecutter.json is also useful for creating secondary variables based on prior variables.
  • Cookiecutter has a mechanism for running pre- and postprocessing hooks, if necessary.
  • Cookiecutter has useful way of registering Jinja extensions.

Cookiecutter’s own command line interface is not used by the notebook-based report system. Instead, cookiecutter’s Python APIs are invoked by the nbreport command-line client. Doing so enables cell-wise templating, as described next. This usage pattern is already used by LSST for the lsst/templates project.

5.2   Cell source templating

Rather than interpreting the entire notebook file as a Jinja template, the notebook-based report system is designed so that the source of individual cells is processed as a Jinja template. This distinction is key because it ensures that the notebook file (ipynb format) can always be opened, displayed, and authored in the Jupyter notebook viewer or JupyterLab. Notebook authors simply mark up the Markdown and Python cells with Jinja formatting.

6   nbreport command-line interface

nbreport provides a command-line interface that can be used directly, or through automated scripting. nbreport uses the subcommand pattern so that several atomic commands are encapsulated in the same executable. The CLI itself is implemented with Click. This section describes the basic design of this CLI.

6.1   nbreport clone

This command clones a report’s repository from GitHub to the local filesystem.

Example:

nbreport clone https://github.com/lsst/DMQA-001

6.2   nbreport init

This command initializes a report instance.

Example:

nbreport init DMQA-001

Alternative example that also clones the report repository:

nbreport init https://github.com/lsst/DMQA-001

This command does the following:

  1. Reserves an instance ID for the report.

    Instance IDs are managed by the api.lsst.codes/nbreport service.

  2. Creates a directory named after the report instance.

    For example, if the report is DMQA-001 and the reserved ID is 1, then the report instance is named DMQA-001-1. This report instance directory (DMQA-001-1) is where all notebook computations are carried out. By giving each notebook an isolated directory, the system allows notebooks to create intermediate files in the current working directory without any concern of colliding with other report instances.

  3. Copies the nbreport.yaml file into the instance’s directory.

    In addition, a field named instance_id is added to the nbreport.yaml file. This allows the nbreport tool to concretely identify the report and its instance.

6.3   nbreport render

This command renders the report template from the report repository into a Jupyter Notebook in the instance directory.

Example:

nbreport render DMQA-001-1 -c dataRef=xyz -c paramX=2.9

The -c options are context overrides — that is, values that replace the template variable defaults set in cookiecutter.json.

6.4   nbreport compute

This command computes the Jupyter notebook in the report instance. It does so in a “headless” manner, without opening a browser window.

nbreport compute DMQA-001-1

6.5   nbreport upload

This command uploads the report instance to the api.lsst.codes/nbreport service, which then publishes the report with LSST the Docs.

nbreport upload DMQA-001-1

Authentication for this command comes from a GitHub token.

6.6   nbreport issue

This all-in-one command renders, computes, and uploads a report instance. This command is useful for automated environments.

Example:

nbreport issue https://github.com/lsst/DMQA-001 -c dataRef=xyz

This command carries out the following steps:

  1. Clones the report repository (like 6.1   nbreport clone).
  2. Reserves the report instance number (like 6.2   nbreport init).
  3. Renders the notebook given the provided context variables (like 6.3   nbreport render).
  4. Computes the notebook (like 6.4   nbreport compute).
  5. Uploads the computed notebook (like 6.5   nbreport upload).

6.7   nbreport test

The nbreport test command provides a workflow for testing the executability of notebook templates during development.

This command works on an already-cloned report repository (the most common case in development).

Example:

nbreport test DMQA-001
  1. Creates a test instance directory (DMQA-001-test, by default), and clears pre-existing test instance.
  2. Renders the notebook instance using default values from cookiecutter.json (by default). It is also possible to pass context overrides on the command line.
  3. Executes and saves the notebook instance for inspection.

The advantage of this workflow for testing is that it creates a local test instance, rather than registering an instance with api.lsst.codes/nbreport.

6.8   nbreport reproduce

The nbreport reproduce command is used to verify that a report is reproducible. “Reproducible” in this context means that a published report instance can be re-generated, given the same template context variables, without meaningful variations in the output cells.

Example:

nbreport reproduce https://dmqa-001.lsst.io/v/1

In this example, nbreport reproduce attempts to regenerate the DMQA-001-1 report instance published at dmqa-001.lsst.io/v/1.

Note

This command could be implemented by adapting the nbval package.

7   Further reading

These are links to related reports, user documentation, or software repositories: