Using LLMs in Livermore Computing

Start here Preferred, policy-aligned options (recommended)

Use Option 1 or Option 2 first. They are designed to address security controls, policy compliance, and operational support with minimal user burden.

Option 1 (Preferred): LivAI Endpoints

What

Centralized, managed access to commercial and advanced LLMs (e.g. OpenAI) via a secure API.

How to access

Request an API key

Model list & features

Check usage/spend

Getting started guide

Pros

Fast setup

Access to recent commercial models

Institution tracks which models meet policy guidelines

Usage tracking and budget management

Access a subset of models and LLNL data interactively via the LivChat interface

Caveats / constraints:

$500/year/user cap (can request more for projects)

CUI only, no PII/classified data allowed

Option 2 (Preferred): LLamaMe (LC-Hosted Open Weight LLMs)

What

Locally hosted open-source LLMs (for example, Llama, Mistral, Codestral) available in LC’s Collaboration, Restricted, and Secure zones.

How to access

LLamaMe service & model list

Request API key via LaunchIT catalog

Pros

Fast Setup

Data stays within LC zones: process data up to the classification supported by each zone.

LC tracks which models meet policy guidelines

No additional spend

Good for workflows requiring open-source models

Caveats / constraints

Rate limits (default: 20 requests/min)

Keys expire every 30 days

Option 3 (Fallback): Self-Hosting on LC Compute Nodes

Use this approach only when the preferred options cannot satisfy a specific technical requirement (for example, custom model builds, unusual model sizes, specialized frameworks, or multi-node inference). This option has the most user responsibility and should be done in coordination with your ISSO.

What

Run your own LLMs on LC batch compute nodes using tools like vLLM, Llama.cpp, or Mastodon for multi-node, multi-GPU distributed inference.

How to get started

User responsibility: You must ensure any model you download, or run is approved for use at LLNL. (See: LC LLM Model Download Decision Guide)

Pick an inference framework, e.g.: vLLM or Llama.cpp

Mastodon (multi-node inference on MI250A/MI300A)

AMD-specific vLLM containers

Work with your ISSO on model approval and any security or policy questions

Pros

Full control over models, parameters, and data

Supports very large models (multi-node, multi-GPU)

Caveats / constraints:

Requires local exclusive resources like an LC batch compute node

Do not use shared resources, like a login node, when running services that open ports

You manage toolchain, containers, dependencies, and scaling

You monitor which models meet policy requirements, including approvals being revoked

Contact your ISSOs if you have questions about model approval