Start here Preferred, policy-aligned options (recommended)

Use Option 1 or Option 2 first. They are designed to address security controls, policy compliance, and operational support with minimal user burden.

Option 1 (Preferred): LivAI Endpoints

What

Centralized, managed access to commercial and advanced LLMs (e.g. OpenAI) via a secure API.

How to access

Pros

  • Fast setup

  • Access to recent commercial models

  • Institution tracks which models meet policy guidelines

  • Usage tracking and budget management

  • Access a subset of models and LLNL data interactively via the LivChat interface

  • Caveats / constraints:

  • $500/year/user cap (can request more for projects)

  • CUI only, no PII/classified data allowed

Option 2 (Preferred): LLamaMe (LC-Hosted Open Weight LLMs)

What

Locally hosted open-source LLMs (for example, Llama, Mistral, Codestral) available in LC’s Collaboration, Restricted, and Secure zones.

How to access

Pros

  • Fast Setup

  • Data stays within LC zones: process data up to the classification supported by each zone.

  • LC tracks which models meet policy guidelines

  • No additional spend

  • Good for workflows requiring open-source models

Caveats / constraints

  • Rate limits (default: 20 requests/min)

  • Keys expire every 30 days

Option 3 (Fallback): Self-Hosting on LC Compute Nodes

Use this approach only when the preferred options cannot satisfy a specific technical requirement (for example, custom model builds, unusual model sizes, specialized frameworks, or multi-node inference). This option has the most user responsibility and should be done in coordination with your ISSO.

What

Run your own LLMs on LC batch compute nodes using tools like vLLM, Llama.cpp, or Mastodon for multi-node, multi-GPU distributed inference.

How to get started

  • Pick an inference framework, e.g.: vLLM or Llama.cpp

  • Work with your ISSO on model approval and any security or policy questions

Pros

  • Full control over models, parameters, and data

  • Supports very large models (multi-node, multi-GPU)

  • Caveats / constraints:

  • Requires local exclusive resources like an LC batch compute node

  • Do not use shared resources, like a login node, when running services that open ports

  • You manage toolchain, containers, dependencies, and scaling

  • You monitor which models meet policy requirements, including approvals being revoked

  • Contact your ISSOs if you have questions about model approval