TensorBoard by default is not configured to run securely in LC's multi-tenant environment. To enhance security and ensure reliable usage, LC provides a curated, containerized version of TensorBoard specifically designed for use on login nodes. Follow the steps below to launch TensorBoard and access it from your local system.

Quickstart Guide

Step 1: Enable Podman (on LC)

If this is your first time using TensorBoard or Podman, run the following command to enable Podman:

/collab/usr/gapps/lcweg/containers/scripts/enable-podman.sh

Step 2: Launch TensorBoard (on LC)

Run the TensorBoard script with the required -l option to specify the log directory:

/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -l </path/to/logs>

Step 3: Access TensorBoard (locally)

Once the script runs successfully, it will provide an SSH command and a web address. Run the SSH command from your local machine to establish secure connectivity:

ssh <login-node-pub>.llnl.gov -L6006:/var/tmp/<private_socket_directory>/tensorboard.sock

Then, open TensorBoard in your browser using the web address:

http://localhost:6006

Replace 6006 with the port used in the SSH command, if different.

Step 4: Stop the Container (on LC)

The container will automatically stop after 12 hours. You can manually stop it using the output podman stop command provided in Step 2:

podman stop <container ID>

If you don't remember the container ID when you're wrapping up, you can recover it with podman ps. For example,

[class01@oslic7:~]$ podman ps
CONTAINER ID  IMAGE                                                 COMMAND               CREATED         STATUS         PORTS       NAMES
fdae6af69d75  wci-repo.llnl.gov:4567/lc-private-tensorboard:latest  supervisord -c /e...  59 minutes ago  Up 59 minutes              tensorboard_mgwl0se4
[class01@oslic7:~]$ podman stop tensorboard_mgwl0se4
tensorboard_mgwl0se4

Example

Running on oslic7 and using sample data in /usr/global/docs/training/pacman-sample, your output should look somewhat similar to this:

[class01@oslic7:~]$ /collab/usr/gapps/lcweg/containers/scripts/enable-podman.sh
[class01@oslic7:~]$ /collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -l /usr/global/docs/training/pacman-sample
Interface: aci.73
IP Address: 134.9.73.11
DNS Name: oslic7.llnl.gov
-------------------------
Single DNS name found: oslic7.llnl.gov
Temporary directory created: /var/tmp/class01/tmp.snas17XRDi
Generated container name: tensorboard_mgwl0se4
Trying to pull wci-repo.llnl.gov:4567/lc-private-tensorboard:latest...
Getting image source signatures
Copying blob 5b2bc9e67c87 done   |
Copying blob f370f1e672e8 done   |
Copying blob f599217d4457 done   |
Copying blob c7cd81da46dd done   |
Copying blob b89ca137b47b done   |
Copying blob e0a9ed50581f done   |
Copying blob 225ba378e48c done   |
Copying blob 0a9ff2b5240b done   |
Copying blob 01612657d478 done   |
Copying blob e8d2a6f4f578 done   |
Copying config 1ad0649082 done   |
Writing manifest to image destination
fdae6af69d75f2ffed5adc9ad9e0992b5e969604b732429f7d42831026ebeeb9
Container started successfully with name: tensorboard_mgwl0se4

============================================================
To access TensorBoard, run the following command on your local system:

ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock

Then open your web browser and navigate to:
http://localhost:6006

IMPORTANT:
- Port 6006 is your local system's port. Using a different port will not impact the running container.
- The container will automatically stop after 12 hours.
- If you are done before the timeout, you can stop the container manually by running:
  podman stop tensorboard_mgwl0se4
============================================================

Using the output above, you would run this SSH command on your laptop/desktop:

ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock

Then, paste the following into a browser on your laptop/desktop:

http://localhost:6006

and, when done using TensorBoard, run this on LC:

podman stop tensorboard_mgwl0se4

Advanced Options

To see the script's usage options, run the following command:

/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -h

Customizing DNS

If you need to manually specify a DNS name, use the -H option:

/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -H <my.dns.name> -l </path/to/logs>

Verbose Mode

Enable verbose mode for detailed output using the -v option:

/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -v

Custom SSH Port

If port 6006 is already in use, you can replace it with any available port above 1024 by altering the port used in your SSH command (and associated web address):

ssh <login-node-pub>.llnl.gov -L<custom_port>:/var/tmp/<private_socket_directory>/tensorboard.sock

Summary of SSH Command Format

The general form of the SSH command is:

ssh <login_node DNS name>.llnl.gov -L<local_port>:<private_socket_directory>/tensorboard.sock

Example:

ssh ruby963.llnl.gov -L6006:/var/tmp/westlund2/tmp.PmTkgDOwoy/tensorboard.sock

Depending on how things are configured on your local machine, you may need to explicitly add your LC username to the SSH command provided to you. For example, for LC user class01, the provided

ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock

may need to become

ssh class01@oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock

Conclusion

By following these steps, you can successfully launch and privately access TensorBoard on LC login nodes. Please reach out to the LC Hotline with questions.