LLamaMe (LLM) API Service

LC LLamaMe API is now integrated with LaunchIT ver. 2.3 available in general availability (GA). This capability was presented during LC user meeting in March 2025, see presentation.

LLamaMe is LC's locally hosted large language model (LLM) service. LC provides general availability (GA) API access to locally hosted open-source LLMs served using the vLLM library in the CZ, RZ, and SCF. Feel free to join the internal LC LLamaMe Microsoft teams channel for more updates!

Large Language Models hosted in LC

LLM Details
LC Network Zone	Large Language Model	Max Content Length	Infrastructure GPU
Collaboration Zone (CZ)	meta-llama/Llama-3.3-70B-Instruct	64000	2 A100 80GB RAM
Collaboration Zone (CZ)	openai/gpt-oss-120b	64000	2 A100 80GB RAM
Restricted Zone (RZ)	meta-llama/Meta-Llama-3.1-8B-Instruct	4096	1 A100 40GB RAM
	intfloat/e5-mistral-7b-instruct*	32768	1 Tesla V100S 32GB RAM
	Llama-3.3-70B-Instruct	32768	32 AMD MI250 120GB
	Codestral-22B-v0.1	32768
	Llama-4-Scout-17B-16E-Instruct	128000
	gpt-oss-120b	131072
Secure Compute Facility (SCF)	meta-llama/Llama-3.3-70B-Instruct	110000	2 H100 80GB RAM
	gpt-oss-20b	128000	1 H100 80GB RAM
	intfloat/e5-mistral-7b-instruct*	32768	1 H100 80GB RAM
	Meta-Llama-3.1-8B-Instruct	32768	24 AMD MI250 120GB
	Llama-4-Scout-17B-16E-Instruct	128000
	gpt-oss-120b	131072

*embedding model

NOTE LC locally hosted models are subject to change as they may be upgraded in the future. There is a default rate limit of 20 requests per minute and API keys will expire after 30 days.

Getting a LLamaMe API Key

Provision an API key to access the LLamaMe endpoint through LaunchIT catalog. For further information please visit our documentation on LaunchIT.

Once in the LaunchIT catalog, select the workspace for the project you will be using API access for. Note that keys can also be directly provisioned from a workspace.

Persistent Data Services — LC LLamaMe API in LaunchIT Catalog

Once your API key has been created, you may access it at any time through your workspace dashboard. Your LLamaMe will be listed as a separate resource under your workspace dashboard, and the LLamaMe endpoint and models you have provisioned a key for will be displayed along with the key.

LLamaMe Resource in LaunchIT Workspace Dashboard

NOTE Keys expire every 30 days and must be regenerated to maintain your access to the LLamaMe API. API keys may be regenerated at any time through your LaunchIT resource dashboard.

Example LLamaMe Resource Dashboard in LaunchIT

Getting Started with the LLamaMe API

Set your API_KEY as an environment variable (can be copied from your LaunchIT workspace dashboard).

export API_KEY = <your API key>

Here's an example Python script to check which models are being hosted and ask a specific LLM to tell you a joke. Replace the endpoint and model with the endpoint and any available model displayed in your LaunchIT workspace dashboard.

NOTE Please make sure your environment is configured properly. See this internal CZ GitLab repo for more details.

import os
from openai import OpenAI

API_KEY = os.environ.get("API_KEY")

client = OpenAI(base_url=<LLamaMe endpoint>, api_key=API_KEY)

# Check which LLModels LC is hosting
print(client.models.list())

chat_response = client.chat.completions.create(
    model="<LLamaMe model>",
    messages=[
        {"role": "user", "content": "Tell me a joke."},
    ]
)

print("Chat response:", chat_response)

# Enjoy!