Orbit and Jupyter Notebooks

Overview

A Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used in data science, machine learning, scientific computing, and academic research. Here are some key features and components of Jupyter Notebooks:

Key Features

Interactive Computing: You can write and execute code in real-time, making it easy to test and debug your code on the fly.Support for Multiple Languages: While it is most commonly used with Python, Jupyter supports many programming languages through the use of kernels (e.g., R, Julia, Scala).
Rich Text Support: You can include Markdown, HTML, and LaTeX to create well-formatted text, making it easy to document your code and findings.
Data Visualization: Jupyter Notebooks can display plots and graphs inline, which is particularly useful for data analysis and visualization tasks.
Export Options: You can export notebooks to various formats, including HTML, PDF, and Markdown, making it easy to share your work with others.
Extensions and Customization: There are many extensions available that enhance the functionality of Jupyter Notebooks, allowing for customization to fit specific needs.

Typical Use Cases

Data Analysis: Analyzing datasets using libraries like Pandas and NumPy.
Machine Learning: Building and testing machine learning models with libraries like TensorFlow.
Education: Teaching programming, data science, and other subjects in an interactive manner.
Research: Documenting experiments and results in a reproducible format.

Example Structure of a Jupyter Notebook

A typical Jupyter Notebook consists of cells that can contain:

Code Cells: Where you write and execute code.
Markdown Cells: Where you write formatted text, including explanations and documentation.
Output Cells: Where the results of code execution are displayed, such as tables, plots, or text output.

Orbit and Juptyer Notebooks in LC

Orbit is a system for connecting a federation of JupyterHubs together into one interface. Changes were made to the default Orbit in order to launch Jupyter Notebooks that adhere to LC security policies.

You can reach the various JupyterHubs through Orbit at the following URLs:

CZ: https://lc.llnl.gov/orbit

RZ: https://rzlc.llnl.gov/orbit

SCF: https://lc.llnl.gov/orbit

Caveats

Notebooks are files essentially equivalent to python files additionally containing output from when it was last run. You can run notebooks that you obtain with other users, but be advised, as with running any code not written by you, exercise caution.

For notebook workloads that are computationally intensive, please utilize Batch Spawner. There are two available Batch Spawners: "LSF Spawner" for `BlueOS` machines, and "Slurm Spawner" for `TOSS4` machines. Batch Spawners are subject to the regular wait times for an allocation. To reduce the long wait time on normal partitions, there are special partitions, `pci` and `pjupyter` that have faster turnaround times, but are subject to both machine and extra node availability (these "overlap" with other partitions like `pbatch`). If a machine's other partitions are at capacity, it may not be possible to get an allocation via `pci` and `pjupyter`.

For light processing, you may use the "Login Node Spawner". This will launch notebooks on a login node of the chosen machine. There is a faster turnaround time when launching notebooks this way, but you will subject to the conditions and load of the login node. Remember to only use the "Login Node Spawner" for small tasks: the login nodes are shared resources.

Notebook lifetimes are dependent on the time of spawner you utilize. Notebooks spawned on login nodes have a time limit of 12 hours, while notebooks spawned on compute nodes are subject to the time limit of the chosen partition/queue. Work is saved in ".ipynb" files (in your home directory by default) and can be restored on a relaunch.

As noted above, Jupyterhubs are now hosted on cluster login nodes. The availability of Jupyterhub is dependent on the machine's availability and load. If all the login nodes are down on a machine, Orbit will not display that machine as an option. If the login nodes are experiencing heavy load, Jupyterhub's performance will also be negatively affected.

Finally, the JupyterHub deployments that we expose are what you should use to run a notebook. Do not launch notebooks on your own on LC systems.

Custom kernels

JupyterHub, as deployed in LC, ships with a read-only python 3 kernel that includes all installed system packages. For some notebooks, that might be sufficient. If you'd like to install other python packages for a project that you're working on, you'll need a virtualenv python environment and a custom kernel installed with that virtualenv python environment. If you have an existing virtualenv, you may use that environment.

Home directory install

To set up a kernel in your home directory, run the following from the command line while in $HOME. If you already have a virtualenv that you would like to use, you may skip the first step. Note the virtualenv you use can be any environment you have read/execute access to, so this can also be a virtualenv in some /usr/workspace/yourGroupID space.

# Set up a virtualenv in your current working directory (this can be your $HOME or /usr/workspace/$USER director)
virtualenv --system-site-packages my_personal_env

# Activate your environment (bash)
source my_personal_env/bin/activate

# Or activate your environment (csh)
source my_personal_env/bin/activate.csh

# Install your custom kernel to .local in your home directory
python3 -m ipykernel install --prefix=$HOME/.local/ --name 'yourKernelName' --display-name 'My awesome kernel'

# "--name" must be unique among your kernels and "--display-name" is what you'll see when selecting a new kernel

The custom kernel must be visible from $HOME/.local/share/jupyter. However, if you want to create a custom kernel for a group, this can be done by having one group member create the kernel and then all users can create a symlink in their $HOME. For example:

# only one user will have to run these steps:
mkdir -p /usr/workspace/yourGroupID/orbit
cd /usr/workspace/yourGroupID/orbit
virtualenv --system-site-packages my_group_env
source my_group_env/bin/activate # activate your environment (bash)
source my_group_env/bin/activate.csh # Or activate your environment (csh)
python3 -m ipykernel install --prefix=/usr/workspace/yourGroupID/orbit/.local/ --name 'yourGruopKernelName' --display-name 'Our group kernel'
chmod -R g+rwX,o+rX /usr/workspace/yourGroupID/orbit
chgrp -R yourGroupID /usr/workspace/yourGroupID/orbit

# All user of the kernel will have to run this step, including the kernel's creator:
mkdir -p $HOME/.local/share/jupyter/kernels
ln -s /usr/workspace/yourGroupID/orbit/.local/share/jupyter/kernels/yourGroupKernelName $HOME/.local/share/jupyter/kernels

# In all of the above commands, please replace "yourGroupID" and "yourGroupKernelName" and set the --display-name value to your desired description.

Custom kernel caveats

By default, custom kernel invocation entails launching the ipykernel_launcher with the python specified in your virtualenv. This does not source your virtualenv (adding the virtualenv /bin directory to your path). As such, referring to "binaries" in your virtualenv or to site-packages from a subprocess launched from a notebook will not work. To remedy this, after installing your custom kernel following the instructions below, open the relevant kernelspec file under `.local/share/jupyter/kernels` and add the "env" option to the kernel spec as shown in the sample JSON file below. You may need to do similarly with the LD_LIBRARY_PATH variable. In the terminal where you have your virtualenv activated you may run `echo $PATH` and `echo $LD_LIBRARY_PATH` to determine the values to set in your kernel.json

Custom Kernel ENV

Ensure that a JSON file with the following format exists in `$HOME/.local/share/jupyter/kernels/yourKernelName/kernel.json` for your new custom kernel:

{
 "argv": [
  "/usr/WS2/myUID/my_personal_env/bin/python3",
  "-Xfrozen_modules=off",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "env": {
  "PATH": "/g/replace/this/value/with/your/desired/path/bin",
  "LD_LIBRARY_PATH": "/g/replace/this/value/with/your/desired/path/lib",  
 },
 "display_name": "My awesome kernel",
 "language": "python",
 "metadata": {
  "debugger": true
 }

Once you finish installing your custom kernel, it will be available as on option on JupyterHub when you create a new notebook.

Nonstandard kernels requiring additional/different steps:

Matlab kernel

Julia kernel

R kernel

For powerai installations and TensorFlow, please refer to IBM Power AI in LC - JupyterHub Custom Kernel

Interactive plotting and widgets

For users wanting to generate interactive plots or widgets within a Jupyter notebook, we’ve found that the following packages seem to work on LC JupyterHub:

We unfortunately have been unable to get the following packages to create interactive visualizations/widgets on LC JupyterHub:

pygal

If you work with or have had difficulty with packages for interactive plotting that are not listed here, please let us know so that we can document this for other users!

* Plotly users: Please note that a couple users have reported difficulties saving notebooks that included embedded Plotly graphics. The issue appears to be sporadic. So far, we've found that users have been able to resolve this issue and save their notebooks after deleting their Plotly graphics/output cells (but not the code/input cells that generated those graphics).

* Matplotlib-widgets users: Please utilize %matplotlib widget instead of %matplotlib inline for rendering graphs.

Troubleshooting

Other common issues

Problem: "500: Internal Server Error"

Explanation: A common cause of this is the existence of a python directory in `~/.local/lib`, which confuses the JupyterHub server about the correct source of python packages. If this is the cause of your "500: Internal Server Error", you will likely see 'jinja2': ImportError: cannot import name 'contextfilter' from 'jinja2' error messages written to your `~/.jupyter/jupyterhub/resources/notebook.log`.

Solution: Rename the problematic python directory. For example, if it is called `python3.7`, you might `mv ~/.local/lib/python3.7/ ~/.local/lib/python3.7.bak`. To re-test JupyterHub, stop and restart any running server.