NICE DCV | HPC @ LLNL

Overview

NICE DCV is a 3D/GLX-capable Virtual Network Computing (VNC) server that provides a securely authenticated way for users to create and view a virtual desktop with 3D/GLX applications that persists even if no client is actually viewing it. As an example, let's say you are at an airport getting ready to fly to Texas for a conference, and you wish to create an animation with VisIt, an OpenGL application, that is going to take a couple of hours to create. You log into one of the GPU-enabled clusters, launch a batch job, run a NICE DCV session on the batch node, and connect to that virtual desktop session. You launch VisIt inside the DCV session, click the button to create your animation, and then disconnect from the session and catch your flight. You can then reconnect to your DCV session from the comfort of your hotel in Texas and view VisIt's window, which has been happily updating the whole time. Science is saved again!

As in the example above, NICE DCV allows users to start any GUI-based program on the cluster which opens a main window, and then disconnect from the cluster without closing the main window and exiting the program. The user can then reconnect later to the virtual desktop to view any changes in their program. NICE DCV at LLNL is secure, highly performant, and supports OpenGL programs such as VisIt and ParaView. You may connect to your DCV virtual desktop via a common web browser or through the DCV "thick" client from Windows, MacOS, or Linux.

DCV shares functionality with RealVNC; however, DCV only runs on the compute nodes (not the login node) and exits when your batch job is over. For these reasons, if you do not need the 3D/GLX compatibility, it is recommended that you use RealVNC instead, which runs independently from Slurm and provides longer-term virtual sessions, albeit with 2D graphics.

Environment

DCV Clusters

NICE DCV is only installed on the compute nodes of specific LC Linux systems with GPUs. As of August 2022 these systems include:

CZ: Pascal - pvis partition only
SCF: Vertex - GPU nodes only (pgpu partition)

NICE DCV is not supported on the RZ.

Location

NICE DCV is accessed by running /usr/bin/dcvsession on a GPU-enabled compute node in one of the DCV clusters mentioned above. Note that this software is not installed on login nodes or on other clusters.

Accounts and Banks

You must have an account on the DCV clusters(s) where you intend to do your work. These systems are mentioned above. For details on requesting an LC account see New Account Setup. When you request an account, you should also specify the appropriate bank. Otherwise, you will need to request a bank later through the LC Hotline. Contact the Hotline if this is unclear or you don't have a bank. Having an account does not always mean you have a bank!

Usage

To use DCV, you must first initialize, or create, a session and then attach to the virtual desktop of that session. As there are multiple methods to perform each of these, they are documented below.

Quickstart

This section provides a quick summary of commands/actions for getting a DCV session going in a web browser, provided you've already performed all of the one-time software installation and setup requirements. For more detailed information, please see the expandable sections below.

Task / Step	Where	Command / Notes
1. Acquire a compute node partition on a DCV cluster	DCV cluster	salloc (interactive) sbatch myjobscript (batch)
2. Start dcvsession on a compute node	DCV cluster	dcvsession -o win (Windows) dcvsession -o mac (Mac) dcvsession -o lin (Linux)
3. Open a local terminal window	Workstation	Windows: use DOS cmd command Mac/Linux: open terminal window as usual
4. Connect to compute node DCV session from your workstation	Workstation	Windows: plink.exe -L 8443:pascal23:8443 joeuser@pascal sleep 180 Mac/Linux: ssh -f -L 8443:pascal23:8443 joeuser@pascal sleep 180 Note: use actual command from dcvsession output in step 2
5. Start virtual desktop on workstation web browser	Workstation	https://localhost:8443 Note: use actual URL provided from dcvsession output in step 2

Initializing a DCV Session

There are two different usage cases for creating a DCV session: Interactive and Batch, with instructions for each below.

Interactive: Your local workstation connection to your DCV session will be persistent. That is, you won't be logging off, closing your terminal window, or losing your VPN connection (if you have one) during your DCV session.
Batch: You may be disconnecting from your local workstation DCV session and/or you plan on connecting to your DCV session later, possibly from somewhere else.

In both cases, the instructions include steps that must be performed on the LC DCV cluster, and steps that must be performed on your local workstation. Instructions may differ for Mac/Linux and Windows machines.

1. Interactive Usage Instructions—to be performed on the DCV cluster

Log into a DCV cluster where you have an account/bank. This will place you on a login node.
Reserve your compute node(s) using salloc . Some examples with notes are shown below - see the salloc man page for additional options:

salloc allocate 1 node, default time limit and default partition.

salloc -N 8 -t 4:00:00 allocate 8 nodes for 4 hours, default partition

salloc -t 120 allocate 1 node for 120 minutes, default partition

salloc -p pgpu allocate 1 node, default time limit, in the pgpu partition (required for max)

salloc -N 2 -p pvis allocate 2 nodes, default time limit, in the pvis partition (such as on pascal)

When your salloc command completes, you will be placed on a compute node. Then start the DCV session by running:

dcvsession -o <YOUR_OS_TYPE>

Where <YOUR_OS_TYPE> is either mac (Apple/Mac), lin (Linux) or win (Windows). Use dcvsession -h to see other options.

Carefully review the output from dcvsession, as it will provide instructions on how to connect to your DCV session from your workstation. Instructions may differ between RZ and CZ hosts, and will differ between Mac/Linux and Windows. Examples below (click for larger version):

You are now ready to follow the instructions for your local workstation - see the Connecting to a DCV session section below.

2. Batch Usage Instructions—to be performed on the DCV cluster

Log into a DCV cluster where you have an account/bank. This will place you on a login node.
Create a small job script that allocates your batch node(s), job time limit and partition, and then launches your DCV session. If you do not specify these, the system defaults will be used, which may not be adequate for your purposes. For details on creating job scripts see Building a Job Script. A sample job script that allocates 4 nodes for 8 hours in the pvis partition is shown below. Note that it sleeps for a long period of time - presumably longer than you will need the DCV session batch nodes to do your work.

$ cat dcv.sh

#!/bin/sh

#SBATCH -N 4

#SBATCH -t 8:00:00

#SBATCH -p pvis

/usr/bin/dcvsession -o win

sleep 10d

Submit your job using sbatch: sbatch dcv.sh. You can add additional sbatch options if so desired - see the sbatch man page for details.
After your job starts to run (the squeue and other commands can be used to monitor this) look at the contents of your job's output file (named slurm-<job#>.out ;by default) and carefully review the instructions on connecting to your DCV session from your local machine. Sample output is available here.
You are now ready to follow the instructions for your local workstation - see the Connecting to a DCV session section below.

Connecting to a DCV Session

Once you have initialized a DCV session, you can connect to it in one of two ways from your workstation: through a web browser or via a standalone "thick" client.

Web Browser: You can connect to your DCV session using a common web browser, such as Firefox, Chrome, Safari, or Edge. This is the default method and provides good performance with minimal setup.
Thick Client: Some users may wish to use a standalone or "thick" DCV client, which may provide improved performance and usability. To use this method, you must download and install the DCV client, available from the NICE DCV web site.

In both cases, the instructions include steps that must be performed on your local workstation, documented below. It may be helpful to understand that your DCV session is running on a GPU-enabled compute node that is not normally accessible from outside that cluster; therefore in both methods we set up SSH "tunnels" to direct our network traffic from our workstation, through the cluster's login node, to the compute node.

1. Web Browser

The default method to connect to your DCV session is through your web browser, but first you must set up an SSH tunnel to the GPU-enabled compute node from your workstation.

First, set up an SSH tunnel to the compute node(s) you reserved.
1. On Linux and MacOS workstations (or using Cygwin in Windows), open a terminal window (or Cygwin window in Windows) and copy the actual command that appears in the output of dcvsession as described in the Initializing a DCV Session section above. For example:
  
  ssh -f -L 8443:pascal16:8443 joeuser@pascal sleep 180
  
  Note: the number on the end is the time in seconds when the SSH tunnel will close if unused, and is configurable.
2. On a Windows workstation, first make sure you have plink.exe available on your machine; it is part of the Putty software suite. Then, open a command prompt or Powershell window and enter the command that appears in the dcvsession output as described in the Initializing a DCV Session section above. For example:
  
  plink.exe -L 8443:pascal123:8443 joeuser@pascal sleep 180
  
  Note: the number on the end is the time in seconds when the SSH tunnel will close if unused, and is configurable.
In a web browser on your local workstation, go to the URL shown in the output from your dcvsession command. For example: https://localhost:8443. This should connect you to a DCV authentication window.
Login with your LC username and RSA token credentials. You should then see your "virtual desktop" on the compute node where your dcvsession command was run. Sample screenshot available here.
You can now use your virtual desktop to do work on your compute node allocation. For example, you could open a terminal/konsole and then launch a parallel job.
To exit your DCV session, either terminate your Slurm job or log out of the virtual desktop (not just closing the window).

2. DCV "Thick Client" Method

As an alternative to the web browser method above, you can use the DCV Viewer "thick" or standalone client for either Windows, Linux, or MacOS. It can be downloaded and installed from the NICE DCV website . Make sure you download the Client and not the Server software.

First you need to perform a minimal setup of your DCV Viewer client. For that, follow these steps:
1. Open the DCV Viewer application and click on Connection Settings.
2. From the drop down menu at the top of the Connection Settings window, select Get through SOCKSv5 proxy
3. In the empty box on the left, enter the word localhost; and in the box on the right for the port, type 1080.
4. Leave the checkbox for Proxy server requiring password unchecked.
Initialize a DCV session on your compute node(s) as described in the Initializing a DCV Session section above and follow the instructions from the output of your dcvsession command for the thick client. Sample output is available here.
1. Start an SSH SOCKS proxy server on your workstation to tunnel traffic to and from the allocated compute node by typing:
  
  ssh -fN -D 1080 <user>@pascal
  
  NOTE: If you wish to connect to multiple clusters at the same time, you will need to use different port numbers. For example, you could use port 1080 when connecting to Pascal, and then use port 1081 when connecting to RZHasGPU.
2. Launch your DCV EndStation client and connect to the hostname of the allocated batch node.

Help

Troubleshooting Tips

For the DCV thick client: If you get a message trying to set up the SSH proxy connection similar to bind: Address already in use:
- Determine if you have already established the proxy (you generally only do it once until your machine is rebooted). Try connecting to the port with the nc or netcat command and if it says connected, then proceed past the dcvsession prompt to set up the proxy:
  
  ~$ nc -v localhost 1080
  
  Ncat: Version 7.60 ( https://nmap.org/ncat )
  
  Ncat: Connected to ::1:1080.

Try using a different port. For instance, use port 1082 instead of 1080. You'll then need to change your .dcv file to use the same port number

Help is also available from the lc-hotline@llnl.gov, (925) 422-4531.