Start here Preferred, policy-aligned options (recommended)
Use Option 1 or Option 2 first. They are designed to address security controls, policy compliance, and operational support with minimal user burden.
Option 1 (Preferred): LivAI Endpoints
What: Centralized, managed access to commercial and advanced LLMs (e.g. OpenAI) via a secure API.
How to access:
Pros:
-
Fast setup
-
Access to recent commercial models
-
Institution tracks which models meet policy guidelines
-
Usage tracking and budget management
-
Access a subset of models and LLNL data interactively via the LivChat interface
-
Caveats / constraints:
-
$500/year/user cap (can request more for projects)
-
CUI only, no PII/classified data allowed
Option 2 (Preferred): LLamaMe (LC-Hosted Open Weight LLMs)
What: Locally hosted open-source LLMs (for example, Llama, Mistral, Codestral) available in LC’s Collaboration, Restricted, and Secure zones.
How to access:
-
Request API key via LaunchIT catalog
Pros:
-
Fast Setup
-
Data stays within LC zones: process data up to the classification supported by each zone.
-
LC tracks which models meet policy guidelines
-
No additional spend
-
Good for workflows requiring open-source models
Caveats / constraints:
-
Rate limits (default: 20 requests/min)
-
Keys expire every 30 days
Option 3 (Fallback): Self-Hosting on LC Compute Nodes
Use this approach only when the preferred options cannot satisfy a specific technical requirement (for example, custom model builds, unusual model sizes, specialized frameworks, or multi-node inference). This option has the most user responsibility and should be done in coordination with your ISSO.
What: Run your own LLMs on LC batch compute nodes using tools like vLLM, Llama.cpp, or Mastodon for multi-node, multi-GPU distributed inference.
How to get started:
-
User responsibility: You must ensure any model you download, or run is approved for use at LLNL. (See: LC LLM Model Download Decision Guide)
-
Pick an inference framework, e.g.: vLLM or Llama.cpp
-
Work with your ISSO on model approval and any security or policy questions
Pros:
-
Full control over models, parameters, and data
-
Supports very large models (multi-node, multi-GPU)
-
Caveats / constraints:
-
Requires local exclusive resources like an LC batch compute node
-
Do not use shared resources, like a login node, when running services that open ports
-
You manage toolchain, containers, dependencies, and scaling
-
You monitor which models meet policy requirements, including approvals being revoked
-
Contact your ISSOs if you have questions about model approval
LLNL LLM Model Download Decision Guide
Purpose
Help you decide whether a language model you want to download and run can be used at LLNL, while reducing security, compliance, and operational risk.
-
If you are uncertain at any point, stop and contact your ISSO for guidance on model approval and acceptable use.
Step 0, Write down what you are trying to do
Capture this up front because it affects the risk decision.
|
Item |
What to record |
|
Intended use |
Chat, coding help, summarization, analysis, RAG, tool use, etc. |
|
Where it will run |
Workstation, cluster node, etc |
|
Who will access it |
Just you, team, org-wide service |
|
Data you will put into it |
Data classification, presence of PII, export-controlled content, etc. |
|
Integrations |
RAG knowledge base, MCP tools, databases, shell, file access, external APIs |
If any of these are unclear, ask your ISSO before you proceed.
Step 1: Identify the exact model and source
You cannot assess approval risk without pinning the exact artifact.
|
Item |
Example |
|
Model name |
“VendorModel-7B-Instruct” |
|
Exact version / tag |
“v1.2” or commit hash |
|
Download source |
Official repo URL, vendor registry, internal mirror |
|
Model card link |
URL |
|
License text |
Link or included file |
|
Weight file details |
Filenames, formats, checksums if available |
Step 2: Country of origin and jurisdiction
Why it matters: jurisdiction, sanctions, and data transfer laws may restrict use.
Decision questions
-
Where was the model developed, maintained, and distributed?
-
Are there known restrictions for using this model on DOE or LLNL-connected systems?
Hard stop example:
-
Per DOE guidance, DeepSeek and its products, including “open weights,” are not authorized on DOE assets or any device that is DOE-network accessible.
If you cannot confidently answer, stop and contact your ISSO.
Step 3: Weight distribution format and supply chain safety
Why it matters: some packages include loaders, scripts, or templates that could execute arbitrary code.
What to prefer
-
Plain, well-documented weight files from trusted sources.
What to treat as high risk
-
Bundles requiring opaque loaders or installer-like scripts.
-
“Convenience” repos that include execution templates, unreviewed binaries, or complex startup scripts.
Practical implication
-
Treat model downloads like software installs, use vetted repositories and require a basic security review before running.
-
safetensors and similar are low risk
-
GGUF type formats contain interpreted content that could be exploited, use tools to evaluate the additional metadata in these files
-
PyTorch .pt files are binary blobs (pickle based) that are high risk
If anything looks opaque or you do not understand what will execute, stop and contact your ISSO.
Step 4: Training data provenance
Why it matters (from the best practices): unclear or disputed sources create risk, training can be tainted by using output from other banned models.
Decision questions
-
Does the model card describe data sources, and licensing?
-
Is provenance vague, contested, or missing?
Practical implication:
-
Even if a model is not explicitly not authorized it still may have used output or be the product of training based on models that are not authorized.
If provenance is unclear, stop and contact your ISSO.
Step 5: License compatibility
Why it matters: licenses may restrict use and fine-tuning.
Decision questions
-
Does the license allow your intended use (modifications, fine-tuning)?
-
Are there field-of-use, or deployment restrictions?
Practical implication (from the best practices)
-
Pin the model version and store the license.
If the license is confusing or restrictive, stop and contact your ISSO.
Step 6: Shared model or service
If you plan to make the running model accessible to other users (UI/API/service), stop and contact WEG (for LC based use cases) or your ISSO before proceeding. A shared deployment introduces required controls for authentication, authorization, logging, and retention, and the architecture should be reviewed.
Step 7: If you will use tools (MCP) or RAG, you have extra gates
MCP tools (Model Context Protocol)
From the best practices:
-
Tools run where the MCP server is, not where the LLM is.
-
Treat each MCP server like an integration with its own data flow, permissions, and logging.
Minimum controls to confirm
-
Least privilege scopes, allowlists.
-
Read-only by default, human-in-the-loop for destructive actions.
-
Scoped, short-lived credentials.
If you cannot constrain tool permissions, stop and contact WEG (for LC based use cases) or your ISSO.
RAG (Retrieval-Augmented Generation)
From the best practices, exposure points include:
-
ingestion pipeline, embedding generation, storage/index, retrieval-time context sent to the LLM, logs/telemetry.
RAG-specific threats and mitigations in the best practices
-
Prompt injection via documents, use guardrails and sanitization, system policy overrides document text.
-
Data poisoning, use provenance checks, source allowlists, signatures, controlled update workflows.
If you are adding RAG and do not have a clear end-to-end data flow, contact WEG (for LC based use cases) or your ISSO.
