TensorBoard by default is not configured to run securely in LC's multi-tenant environment. To enhance security and ensure reliable usage, LC provides a curated, containerized version of TensorBoard specifically designed for use on login nodes. Follow the steps below to launch TensorBoard and access it from your local system.
Quickstart Guide
Step 1: Enable Podman (on LC)
If this is your first time using TensorBoard or Podman, run the following command to enable Podman:
/collab/usr/gapps/lcweg/containers/scripts/enable-podman.sh
Step 2: Launch TensorBoard (on LC)
Run the TensorBoard script with the required -l option to specify the log directory:
/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -l </path/to/logs>
Step 3: Access TensorBoard (locally)
Once the script runs successfully, it will provide an SSH command and a web address. Run the SSH command from your local machine to establish secure connectivity:
ssh <login-node-pub>.llnl.gov -L6006:/var/tmp/<private_socket_directory>/tensorboard.sock
Then, open TensorBoard in your browser using the web address:
http://localhost:6006
Replace 6006 with the port used in the SSH command, if different.
Step 4: Stop the Container (on LC)
The container will automatically stop after 12 hours. You can manually stop it using the output podman stop command provided in Step 2:
podman stop <container ID>
If you don't remember the container ID when you're wrapping up, you can recover it with podman ps. For example,
[class01@oslic7:~]$ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fdae6af69d75 wci-repo.llnl.gov:4567/lc-private-tensorboard:latest supervisord -c /e... 59 minutes ago Up 59 minutes tensorboard_mgwl0se4 [class01@oslic7:~]$ podman stop tensorboard_mgwl0se4 tensorboard_mgwl0se4
Example
Running on oslic7 and using sample data in /usr/global/docs/training/pacman-sample, your output should look somewhat similar to this:
[class01@oslic7:~]$ /collab/usr/gapps/lcweg/containers/scripts/enable-podman.sh [class01@oslic7:~]$ /collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -l /usr/global/docs/training/pacman-sample Interface: aci.73 IP Address: 134.9.73.11 DNS Name: oslic7.llnl.gov ------------------------- Single DNS name found: oslic7.llnl.gov Temporary directory created: /var/tmp/class01/tmp.snas17XRDi Generated container name: tensorboard_mgwl0se4 Trying to pull wci-repo.llnl.gov:4567/lc-private-tensorboard:latest... Getting image source signatures Copying blob 5b2bc9e67c87 done | Copying blob f370f1e672e8 done | Copying blob f599217d4457 done | Copying blob c7cd81da46dd done | Copying blob b89ca137b47b done | Copying blob e0a9ed50581f done | Copying blob 225ba378e48c done | Copying blob 0a9ff2b5240b done | Copying blob 01612657d478 done | Copying blob e8d2a6f4f578 done | Copying config 1ad0649082 done | Writing manifest to image destination fdae6af69d75f2ffed5adc9ad9e0992b5e969604b732429f7d42831026ebeeb9 Container started successfully with name: tensorboard_mgwl0se4 ============================================================ To access TensorBoard, run the following command on your local system: ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock Then open your web browser and navigate to: http://localhost:6006 IMPORTANT: - Port 6006 is your local system's port. Using a different port will not impact the running container. - The container will automatically stop after 12 hours. - If you are done before the timeout, you can stop the container manually by running: podman stop tensorboard_mgwl0se4 ============================================================
Using the output above, you would run this SSH command on your laptop/desktop:
ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock
Then, paste the following into a browser on your laptop/desktop:
http://localhost:6006
and, when done using TensorBoard, run this on LC:
podman stop tensorboard_mgwl0se4
Advanced Options
To see the script's usage options, run the following command:
/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -h
Customizing DNS
If you need to manually specify a DNS name, use the -H option:
/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -H <my.dns.name> -l </path/to/logs>
Verbose Mode
Enable verbose mode for detailed output using the -v option:
/collab/usr/gapps/lcweg/containers/scripts/launch-tensorboard.sh -v
Custom SSH Port
If port 6006 is already in use, you can replace it with any available port above 1024 by altering the port used in your SSH command (and associated web address):
ssh <login-node-pub>.llnl.gov -L<custom_port>:/var/tmp/<private_socket_directory>/tensorboard.sock
Summary of SSH Command Format
The general form of the SSH command is:
ssh <login_node DNS name>.llnl.gov -L<local_port>:<private_socket_directory>/tensorboard.sock
Example:
ssh ruby963.llnl.gov -L6006:/var/tmp/westlund2/tmp.PmTkgDOwoy/tensorboard.sock
Depending on how things are configured on your local machine, you may need to explicitly add your LC username to the SSH command provided to you. For example, for LC user class01, the provided
ssh oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock
may need to become
ssh class01@oslic7.llnl.gov -L6006:/var/tmp/class01/tmp.snas17XRDi/tensorboard.sock
Conclusion
By following these steps, you can successfully launch and privately access TensorBoard on LC login nodes. Please reach out to the LC Hotline with questions.