Start here Preferred, policy-aligned options (recommended)

Use Option 1 or Option 2 first. They are designed to address security controls, policy compliance, and operational support with minimal user burden.

Option 1 (Preferred): LivAI Endpoints

What: Centralized, managed access to commercial and advanced LLMs (e.g. OpenAI) via a secure API.

How to access:

Pros:

  • Fast setup

  • Access to recent commercial models

  • Institution tracks which models meet policy guidelines

  • Usage tracking and budget management

  • Access a subset of models and LLNL data interactively via the LivChat interface

  • Caveats / constraints:

  • $500/year/user cap (can request more for projects)

  • CUI only, no PII/classified data allowed

Option 2 (Preferred): LLamaMe (LC-Hosted Open Weight LLMs)

What: Locally hosted open-source LLMs (for example, Llama, Mistral, Codestral) available in LC’s Collaboration, Restricted, and Secure zones.

How to access:

Pros:

  • Fast Setup

  • Data stays within LC zones: process data up to the classification supported by each zone.

  • LC tracks which models meet policy guidelines

  • No additional spend

  • Good for workflows requiring open-source models

Caveats / constraints:

  • Rate limits (default: 20 requests/min)

  • Keys expire every 30 days

Option 3 (Fallback): Self-Hosting on LC Compute Nodes

Use this approach only when the preferred options cannot satisfy a specific technical requirement (for example, custom model builds, unusual model sizes, specialized frameworks, or multi-node inference). This option has the most user responsibility and should be done in coordination with your ISSO.

What: Run your own LLMs on LC batch compute nodes using tools like vLLM, Llama.cpp, or Mastodon for multi-node, multi-GPU distributed inference.

How to get started:

  • Pick an inference framework, e.g.: vLLM or Llama.cpp

  • Work with your ISSO on model approval and any security or policy questions

Pros:

  • Full control over models, parameters, and data

  • Supports very large models (multi-node, multi-GPU)

  • Caveats / constraints:

  • Requires local exclusive resources like an LC batch compute node

  • Do not use shared resources, like a login node, when running services that open ports

  • You manage toolchain, containers, dependencies, and scaling

  • You monitor which models meet policy requirements, including approvals being revoked

  • Contact your ISSOs if you have questions about model approval 

LLNL LLM Model Download Decision Guide

Purpose

Help you decide whether a language model you want to download and run can be used at LLNL, while reducing security, compliance, and operational risk.

  • If you are uncertain at any point, stop and contact your ISSO for guidance on model approval and acceptable use.

Step 0, Write down what you are trying to do

Capture this up front because it affects the risk decision.

Item

What to record

Intended use

Chat, coding help, summarization, analysis, RAG, tool use, etc.

Where it will run

Workstation, cluster node, etc

Who will access it

Just you, team, org-wide service

Data you will put into it

Data classification, presence of PII, export-controlled content, etc.

Integrations

RAG knowledge base, MCP tools, databases, shell, file access, external APIs

If any of these are unclear, ask your ISSO before you proceed.

Step 1: Identify the exact model and source

You cannot assess approval risk without pinning the exact artifact.

Item

Example

Model name

“VendorModel-7B-Instruct”

Exact version / tag

“v1.2” or commit hash

Download source

Official repo URL, vendor registry, internal mirror

Model card link

URL

License text

Link or included file

Weight file details

Filenames, formats, checksums if available

Step 2: Country of origin and jurisdiction

Why it matters: jurisdiction, sanctions, and data transfer laws may restrict use.

Decision questions

  • Where was the model developed, maintained, and distributed?

  • Are there known restrictions for using this model on DOE or LLNL-connected systems?

Hard stop example:

  • Per DOE guidance, DeepSeek and its products, including “open weights,” are not authorized on DOE assets or any device that is DOE-network accessible.

If you cannot confidently answer, stop and contact your ISSO.

Step 3: Weight distribution format and supply chain safety

Why it matters: some packages include loaders, scripts, or templates that could execute arbitrary code.

What to prefer

  • Plain, well-documented weight files from trusted sources.

What to treat as high risk

  • Bundles requiring opaque loaders or installer-like scripts.

  • “Convenience” repos that include execution templates, unreviewed binaries, or complex startup scripts.

Practical implication

  • Treat model downloads like software installs, use vetted repositories and require a basic security review before running.

  • safetensors and similar are low risk

  • GGUF type formats contain interpreted content that could be exploited, use tools to evaluate the additional metadata in these files

  • PyTorch .pt files are binary blobs (pickle based) that are high risk

If anything looks opaque or you do not understand what will execute, stop and contact your ISSO.

Step 4: Training data provenance

Why it matters (from the best practices): unclear or disputed sources create risk, training can be tainted by using output from other banned models.

Decision questions

  • Does the model card describe data sources, and licensing?

  • Is provenance vague, contested, or missing?

Practical implication:

  • Even if a model is not explicitly not authorized it still may have used output or be the product of training based on models that are not authorized.

If provenance is unclear, stop and contact your ISSO.

Step 5: License compatibility

Why it matters: licenses may restrict use and fine-tuning.

Decision questions

  • Does the license allow your intended use (modifications, fine-tuning)?

  • Are there field-of-use, or deployment restrictions?

Practical implication (from the best practices)

  • Pin the model version and store the license.

If the license is confusing or restrictive, stop and contact your ISSO.

Step 6: Shared model or service

If you plan to make the running model accessible to other users (UI/API/service), stop and contact WEG (for LC based use cases) or your ISSO before proceeding. A shared deployment introduces required controls for authentication, authorization, logging, and retention, and the architecture should be reviewed.

Step 7: If you will use tools (MCP) or RAG, you have extra gates

MCP tools (Model Context Protocol)

From the best practices:

  • Tools run where the MCP server is, not where the LLM is.

  • Treat each MCP server like an integration with its own data flow, permissions, and logging.

Minimum controls to confirm

  • Least privilege scopes, allowlists.

  • Read-only by default, human-in-the-loop for destructive actions.

  • Scoped, short-lived credentials.

If you cannot constrain tool permissions, stop and contact WEG (for LC based use cases) or your ISSO.

RAG (Retrieval-Augmented Generation)

From the best practices, exposure points include:

  • ingestion pipeline, embedding generation, storage/index, retrieval-time context sent to the LLM, logs/telemetry.

RAG-specific threats and mitigations in the best practices

  • Prompt injection via documents, use guardrails and sanitization, system policy overrides document text.

  • Data poisoning, use provenance checks, source allowlists, signatures, controlled update workflows.

If you are adding RAG and do not have a clear end-to-end data flow, contact WEG (for LC based use cases) or your ISSO.