Below are both directions and requirements about large data transfer, especially when bringing in large data sets to LC. In addition, consider contacting the LC Hotline for best practices about data transfer or if you need additional support. 

NOTEAny outgoing data transfers, regardless of size, need to follow the LLNL review and release processes.

Use SLIC Nodes Instead of Compute Nodes

The SLIC nodes exists to facilitate external (wide-area-network/WAN) data transfers.

Unlike most LC compute clusters that are optimized for internal data transfers, the SLIC nodes have their network stack and physical network connections optimized for external data transfer. External data transfers assume more latency than internal data transfers, so there is a physically shorter data path to the internet that is optimized by using the SLIC nodes.

In addition, SLIC nodes do not contain expensive GPUs, APUs or other compute-optimized hardware that do nothing for data transfer but are otherwise rendered idle (i.e., wasted) by data transfer tasks run on compute clusters.

OSLIC and RZSLIC have exceptions In place for special access to Amazon Web Serves (AWS) S3. If you are moving data to/from AWS S3 it will work much better from the SLIC nodes than any other LC HPC system.

Before Starting

When must I talk to LC about my large data transfer?

  1. If your file transfer protocol, includes the use of HTTP/HTTPS as the transport protocol. Contact us for help.

    Why? All external web traffic goes through the LLNL web proxy by default. It is very easy for an HPC data transfer to (or from) LC to swamp the institutional web proxy, causing web access for the rest of LLNL to slow down or even grind to a near halt.

    This includes: Curl, git over HTTP, and anything that includes a URL is likely using HTTP or HTTPS and subject to the proxy.

    Depending on the use case, our security or networking staff can either put in a temporary exception to skip the web proxy, or advise you on how many streams to run, and verify that you are NOT killing the institutional web proxy.   

  1. If the data you are bringing in is of questionable origin. Contact us for help.

    Everything done on LC compute platforms, like all government resources, needs to consider both the law and the reputation of LLNL. Data that is “Not Suitable for Work;” data containing copyrighted content; or data that is otherwise not appropriate for government systems may be present in public data sets.

    LC ISSO/OISSO have worked with programs, including CSP an OIS (Office of Investigate Services) to document and allow uncurated data sets. Contact us for help before adding data sets, that are even likely to contain data that is generally not allowed on government systems.

  1. If your data transfer, is to/from questionable or even unknown places? Contact us for help.

    If your data set is being harvested from many locations around the internet, instead of just one, then it is likely that the LLNL web proxy will block some of these locations. You will still get results; they just will not be the correct results. You will get a file back that says your access to the site was blocked, which you may or may not notice in an large automated data set. Such transfers typically set off alarms if the sites being accessed are known to host malware, porn, gambling, or any other data that is not allowed at LLNL. 

    So avoid becoming the target of an investigation by OIS. These investigations take some amount of LC staff time that could be better spent in other ways, so talk to the LC ISSOs first. We have sample processes that have been used by others with permission for legitimate LLNL work—but with added controls and modified processes so that everything is clear up front.