Purpose

The intent of this document is to help you decide whether a language model you want to download and run can be used at LLNL, while reducing security, compliance, and operational risk.

NOTE If you are uncertain at any point, stop and contact your ISSO for guidance on model approval and acceptable use.

Step 0: Write down what you are trying to do

Capture this up front because it affects the risk decision.

Item

What to record

Intended use

Chat, coding help, summarization, analysis, RAG, tool use, etc.

Where it will run

Workstation, cluster node, etc

Who will access it

Just you, team, org-wide service

Data you will put into it

Data classification, presence of PII, export-controlled content, etc.

Integrations

RAG knowledge base, MCP tools, databases, shell, file access, external APIs

If any of these are unclear, ask your ISSO before you proceed.

Step 1: Identify the exact model and source

You cannot assess approval risk without pinning the exact artifact.

Item

Example

Model name

“VendorModel-7B-Instruct”

Exact version / tag

“v1.2” or commit hash

Download source

Official repo URL, vendor registry, internal mirror

Model card link

URL

License text

Link or included file

Weight file details

Filenames, formats, checksums if available

Step 2: Country of origin and jurisdiction

Why it matters:

Jurisdiction, sanctions, and data transfer laws may restrict use.

Decision questions

  • Where was the model developed, maintained, and distributed?

  • Are there known restrictions for using this model on DOE or LLNL-connected systems?

Hard stop example:

  • Per DOE guidance, DeepSeek and its products, including “open weights,” are not authorized on DOE assets or any device that is DOE-network accessible.

NOTE If you cannot confidently answer, stop and contact your ISSO.

Step 3: Weight distribution format and supply chain safety

Why it matters 

Some packages include loaders, scripts, or templates that could execute arbitrary code.

What to prefer

  • Plain, well-documented weight files from trusted sources.

What to treat as high risk

  • Bundles requiring opaque loaders or installer-like scripts.

  • “Convenience” repos that include execution templates, unreviewed binaries, or complex startup scripts.

Practical implication

  • Treat model downloads like software installs, use vetted repositories and require a basic security review before running.

  • safetensors and similar are low risk

  • GGUF type formats contain interpreted content that could be exploited, use tools to evaluate the additional metadata in these files

  • PyTorch .pt files are binary blobs (pickle based) that are high risk

NOTE If anything looks opaque or you do not understand what will execute, stop and contact your ISSO.

Step 4: Training data provenance

Why it matters (from the best practices)

Unclear or disputed sources create risk, training can be tainted by using output from other banned models.

Decision questions

  • Does the model card describe data sources, and licensing?

  • Is provenance vague, contested, or missing?

Practical implication

  • Even if a model is not explicitly not authorized it still may have used output or be the product of training based on models that are not authorized.

NOTE If provenance is unclear, stop and contact your ISSO.

Step 5: License compatibility

Why it matters

Licenses may restrict use and fine-tuning.

Decision questions

  • Does the license allow your intended use (modifications, fine-tuning)?

  • Are there field-of-use, or deployment restrictions?

Practical implication (from the best practices)

  • Pin the model version and store the license.

NOTE If the license is confusing or restrictive, stop and contact your ISSO.

Step 6: Shared model or service

If you plan to make the running model accessible to other users (UI/API/service), stop and contact WEG (for LC based use cases) or your ISSO before proceeding. A shared deployment introduces required controls for authentication, authorization, logging, and retention, and the architecture should be reviewed.

Step 7: If you will use tools (MCP) or RAG, you have extra gates

MCP tools (Model Context Protocol)

From the best practices:

  • Tools run where the MCP server is, not where the LLM is.

  • Treat each MCP server like an integration with its own data flow, permissions, and logging.

Minimum controls to confirm

  • Least privilege scopes, allowlists.

  • Read-only by default, human-in-the-loop for destructive actions.

  • Scoped, short-lived credentials.

If you cannot constrain tool permissions, stop and contact WEG (for LC based use cases) or your ISSO.

RAG (Retrieval-Augmented Generation)

From the best practices, exposure points include:

  • ingestion pipeline, embedding generation, storage/index, retrieval-time context sent to the LLM, logs/telemetry.

RAG-specific threats and mitigations in the best practices

  • Prompt injection via documents, use guardrails and sanitization, system policy overrides document text.

  • Data poisoning, use provenance checks, source allowlists, signatures, controlled update workflows.

If you are adding RAG and do not have a clear end-to-end data flow, contact WEG (for LC based use cases) or your  ISSO.