Overview

A Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used in data science, machine learning, scientific computing, and academic research. Here are some key features and components of Jupyter Notebooks:

Key Features

  • Interactive Computing: You can write and execute code in real-time, making it easy to test and debug your code on the fly.Support for Multiple Languages: While it is most commonly used with Python, Jupyter supports many programming languages through the use of kernels (e.g., R, Julia, Scala).
  • Rich Text Support: You can include Markdown, HTML, and LaTeX to create well-formatted text, making it easy to document your code and findings.
  • Data Visualization: Jupyter Notebooks can display plots and graphs inline, which is particularly useful for data analysis and visualization tasks.
  • Export Options: You can export notebooks to various formats, including HTML, PDF, and Markdown, making it easy to share your work with others.
  • Extensions and Customization: There are many extensions available that enhance the functionality of Jupyter Notebooks, allowing for customization to fit specific needs.

Typical Use Cases

  • Data Analysis: Analyzing datasets using libraries like Pandas and NumPy.
  • Machine Learning: Building and testing machine learning models with libraries like TensorFlow.
  • Education: Teaching programming, data science, and other subjects in an interactive manner.
  • Research: Documenting experiments and results in a reproducible format.

Example Structure of a Jupyter Notebook

A typical Jupyter Notebook consists of cells that can contain:

  • Code Cells: Where you write and execute code.
  • Markdown Cells: Where you write formatted text, including explanations and documentation.
  • Output Cells: Where the results of code execution are displayed, such as tables, plots, or text output.

Orbit and Juptyer Notebooks in LC

Orbit is a system for connecting a federation of JupyterHubs together into one interface. Changes were made to the default Orbit in order to launch Jupyter Notebooks that adhere to LC security policies.

You can reach the various JupyterHubs through Orbit at the following URLs:

CZ: https://lc.llnl.gov/orbit

RZ: https://rzlc.llnl.gov/orbit

SCF: https://lc.llnl.gov/orbit

Caveats

Notebooks are files essentially equivalent to python files additionally containing output from when it was last run. You can run notebooks that you obtain with other users, but be advised, as with running any code not written by you, exercise caution.

For notebook workloads that are computationally intensive, please utilize Batch Spawner. There are two available Batch Spawners: "LSF Spawner" for `BlueOS` machines, and "Slurm Spawner" for `TOSS4` machines. Batch Spawners are subject to the regular wait times for an allocation. To reduce the long wait time on normal partitions, there are special partitions, `pci` and `pjupyter` that have faster turnaround times, but are subject to both machine and extra node availability (these "overlap" with other partitions like `pbatch`). If a machine's other partitions are at capacity, it may not be possible to get an allocation via `pci` and `pjupyter`.



For light processing, you may use the "Login Node Spawner". This will launch notebooks on a login node of the chosen machine. There is a faster turnaround time when launching notebooks this way, but you will subject to the conditions and load of the login node. Remember to only use the "Login Node Spawner" for small tasks: the login nodes are shared resources.

Notebook lifetimes are dependent on the time of spawner you utilize. Notebooks spawned on login nodes have a time limit of 12 hours, while notebooks spawned on compute nodes are subject to the time limit of the chosen partition/queue. Work is saved in ".ipynb" files (in your home directory by default) and can be restored on a relaunch.



As noted above, Jupyterhubs are now hosted on cluster login nodes. The availability of Jupyterhub is dependent on the machine's availability and load. If all the login nodes are down on a machine, Orbit will not display that machine as an option. If the login nodes are experiencing heavy load, Jupyterhub's performance will also be negatively affected.

Finally, the JupyterHub deployments that we expose are what you should use to run a notebook. Do not launch notebooks on your own on LC systems.

Custom kernels

JupyterHub, as deployed in LC, ships with a read-only python 3 kernel that includes all installed system packages. For some notebooks, that might be sufficient. If you'd like to install other python packages for a project that you're working on, you'll need a custom kernel. There are 2 options for installing a custom kernel: installing in your home directory or installing into a team project directory (to be shared with a group).

Also note that this is the preferred way to use custom packages rather than LC updating the primary python kernel. We have added some custom packages that would not otherwise work in a custom kernel, but must be judicious as the more dependencies we add to this kernel, the higher the likelihood of conflict with dependencies of our core packages required to run JupyterHub.

Custom kernel caveats

By default, custom kernel invocation entails launching the ipykernel_launcher with the python specified in your virtualenv. This does not source your virtualenv (adding the virtualenv /bin directory to your path). As such, referring to "binaries" in your virtualenv or to site-packages from a subprocess launched from a notebook will not work. To remedy this, after installing your custom kernel following the instructions below, open the relevant kernelspec file under `.local/share/jupyter/kernels` and add the "env" option to the kernel spec as shown in the sample JSON file below.

Home directory install

To set up a kernel in your home directory, run the following from the command line while in $HOME:

# Set up a virtualenv in your home directory
virtualenv --system-site-packages my_personal_env

# Activate your environment (bash)
source my_personal_env/bin/activate

# Or activate your environment (csh)
source my_personal_env/bin/activate.csh

# Install your custom kernel to .local in your home directory
python3 -m ipykernel install --prefix=$HOME/.local/ --name 'some-unique-name' --display-name 'My awesome kernel'

# "--name" must be unique among your kernels and "--display-name" is what you'll see when selecting a new kernel

 

Custom Kernel ENV

Ensure that a JSON file with the following format exists in `.local/share/jupyter/kernels` for your new custom kernel:

{
 "display_name": "Python 2", 
 "language": "python",
 "_comment": "NOTE: variable expansion _not_ supported",
 "env": {
  "PATH": "/g/g0/user/python2-venv/bin:/existing/path/here:..."
 },
 "argv": [
  "/g/g0/user/python2-venv/bin/python", 
  "-m", 
  "ipykernel_launcher", 
  "-f", 
  "{connection_file}"
 ]
}

Once you finish installing your custom kernel, it will be available as on option on JupyterHub when you create a new notebook.

Nonstandard kernels requiring additional/different steps:

Matlab kernel

Julia kernel

R kernel

For powerai installations and TensorFlow, please refer to IBM Power AI in LC - JupyterHub Custom Kernel

Interactive plotting and widgets

For users wanting to generate interactive plots or widgets within a Jupyter notebook, we’ve found that the following packages seem to work on LC JupyterHub:

We unfortunately have been unable to get the following packages to create interactive visualizations/widgets on LC JupyterHub:

If you work with or have had difficulty with packages for interactive plotting that are not listed here, please let us know so that we can document this for other users!

* Plotly users: Please note that a couple users have reported difficulties saving notebooks that included embedded Plotly graphics. The issue appears to be sporadic. So far, we've found that users have been able to resolve this issue and save their notebooks after deleting their Plotly graphics/output cells (but not the code/input cells that generated those graphics).



* Matplotlib-widgets users: Please utilize %matplotlib widget instead of %matplotlib inline for rendering graphs.

Troubleshooting

Other common issues

Problem: "500: Internal Server Error"

Explanation: A common cause of this is the existence of a python directory in `~/.local/lib`, which confuses the JupyterHub server about the correct source of python packages. If this is the cause of your "500: Internal Server Error", you will likely see 'jinja2': ImportError: cannot import name 'contextfilter' from 'jinja2' error messages written to your `~/.jupyter/jupyterhub/resources/notebook.log`.

Solution: Rename the problematic python directory. For example, if it is called `python3.7`, you might `mv ~/.local/lib/python3.7/ ~/.local/lib/python3.7.bak`. To re-test JupyterHub, stop and restart any running server.