Build an enterprise-grade LLM as a Service Platform in your environment and adopt Generative AI with flexibility, privacy, and security.

This won’t take long…

				
					curl -sfL https://get.gpustack.ai | sh -s -
				
			
				
					Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content
				
			

For detailed installation, refer to the docs

一条线

Why GPUStack?
Open-source, flexible and easy to use

一条线

Platform Flexibility

Seamlessly adaptable from desktop to server, across all major OS and GPU hardware.

Enterprise Ready

One-click deployment with a fully integrated technical stack, built for Enterprise needs.

Privacy & Security

100% open-source, fully on-premise – complete control over your data and access in your environment.

Run anywhere, from Desktop to Server

Develop and test your AI applications with models on your local Mac or Windows desktop, then seamlessly transition to production models on Linux GPU servers. GPUStack supports all platforms and provides a consistent experience.

Multiple Inference Engines & Model Types

With support for vLLM, llama.cpp, and more, GPUStack is built for cross-platform compatibility and performance. Easily extend and upgrade your inference engines and models to meet evolving needs.

Flexible Scheduling and High Availability

GPUStack ensures maximum GPU utilization and high availability with flexible scheduling strategies, automated resource calculation, and support for multi-model replicas with automatic load-balancing.

Streamlined AI Development for Developers

The Playground enables fast iteration and testing with tools like prompt tests, parameter configuration, multi-model comparison, and code examples. 

Comprehensive Monitoring and Metrics

The Dashboard provides real-time insights into system performance, resource usage, and API access statistics. Track token usage, top users, and model status to ensure optimal operations and efficiency.