Machine Learning Tools
Machine Learning Frameworks
PyTorch is available on LC systems. Documentation coming soon!
Jupyter Notebooks provide an interactive environment for developing and testing machine learning models using frameworks like PyTorch. For setup instructions and best practices on LC systems, please see "Orbit" and Jupyter notebooks".
Machine Learning Visualization
TensorBoard is available on LC. Please see Using TensorBoard on Login Nodes for info about how to run it securely.
Machine Learning Workflow Tools
If you're looking for a workflow orchestration tool, we recommend Merlin, which supports HPC workflows and scalability for ML-typical workflows composed of many jobs.
LC's Workflow Enablement Group supports Merlin by providing you with the required Persistent Data Services in LaunchIT.
AI/ML Services
Please see our AI/ML services page for information about LLMs hosted on LC (service called LLamaMe) and how to get API keys in LaunchIT. (available in all LC network zones CZ, RZ, and SCF).
We support GitLab Duo on the CZ and RZ. (See Tech Bulletin 582 for most recent info).
Hardware for AI/ML
Our systems that include GPUs are listed at "Compute Platforms with GPUs".
Data storage and sharing for AI/ML
We recommend our VAST filesystems for AI/ML workloads; VAST provides scalability for workloads that include working with many small input/output files and frequent I/O operations.
We also offer object storage compatible with the S3 protocol.
If you need to share models or other data with a group of collaborators, please don’t hesitate to reach out to us. We can create shared directories for groups of users upon request and are actively exploring additional solutions to facilitate model sharing.