GPUStack v0.7: Desktop Installer and Usage Metering

GPUStack v0.7

We’re thrilled to announce the release of GPUStack v0.7 !

GPUStack is an open-source MaaS platform designed for enterprise-level deployment. It runs on Linux, Windows, and macOS, and supports heterogeneous GPU clusters built with a wide range of hardware, including NVIDIA, AMD, Apple Silicon, Ascend, Hygon, and Moore Threads.

GPUStack supports a broad range of model types—including LLMs, VLMs, embeddings, rerankers, image generation, speech-to-text, and text-to-speech—and integrates seamlessly with multiple inference engines such as vLLM, MindIE, and llama-box (built on llama.cpp and stable-diffusion.cpp). It also enables coexistence of multiple versions of the same inference engine, providing greater flexibility for deployment and performance tuning.

The platform offers enterprise-grade features such as automatic resource scheduling, fault recovery, distributed inference, heterogeneous inference, request load balancing, resource and model monitoring, user management, and API authentication and authorization.

GPUStack also provides OpenAI-compatible APIs, making it easy to integrate with upper-layer application frameworks like Dify, n8n, LangChain, and LlamaIndex—making it an ideal choice for enterprises building robust AI model serving infrastructure.

In GPUStack v0.7, the platform is comprehensively enhanced around four pillars—inference performance, deployment ease, heterogeneous‑hardware compatibility, and system observability. Key updates include:

One‑click installers for desktop OSs: Greatly simplifies local environment setup, enabling individual developers and small teams to start local inference services quickly.
Usage metering & billing data collection: Adds fine‑grained statistics for inference requests, laying data foundations for precise operations and billing.
Ascend MindIE multi‑node distributed inference: Extends single‑node MindIE support to multi‑node clusters on Ascend NPUs, meeting deployment needs for ultra‑large models.
Cambricon MLU compatibility: Thanks to Cambricon for contributing MLU adaptation, further enriching GPUStack’s hardware ecosystem and advancing heterogeneous compute support.

Beyond these highlights, v0.7 delivers 70 + feature optimizations and stability fixes, spanning performance tuning, user experience, production readiness, and ops/observability—empowering users to build high‑performance, scalable model‑service systems with ease!

For more information about GPUStack, visit:

GitHub repo: https://github.com/gpustack/gpustack
User guide: https://docs.gpustack.ai

Key Feature Highlights

One‑Click Installers for macOS & Windows

Previously, desktop deployment relied on shell scripts or Python, often failing due to environment conflicts, missing dependencies, or network issues—and lacking progress feedback.

v0.7 introduces native one‑click installers for macOS and Windows:

Zero dependencies: No pre‑installed Python, Docker Desktop, or other components required—just double‑click to install and launch.
Built‑in GPUStack Helper panel: A unified GUI for:
- Quick setup of core parameters (Server/Worker role, port, env vars)
- One‑click access to config directories for easy edits
- Service status control & logs, plus Web console shortcuts
Lowering the entry barrier: Ideal for local tests, personal development, and small deployments—making local inference as easy as installing an app.

This dramatically improves the desktop experience, letting developers get hands‑on with models in minutes.

Usage Metering & Billing Data Collection

GPUStack v0.7 adds fine‑grained metering for inference requests—by user and model—capturing request counts, input/output tokens, and more.

Transparent, traceable data is crucial as services grow multi‑user and multi‑model. GPUStack provides a solid data foundation for flexible billing, anomaly tracing, resource quotas, and other governance needs—enabling a closed‑loop of refined operations.

▲ Fine‑grained metering introduced in v0.7

Ascend MindIE Multi‑Node Distributed Inference

MindIE is Ascend’s high‑performance inference framework, offering acceleration, debugging, and rapid deployment. Since v0.6, GPUStack has integrated MindIE for stable, efficient single‑node inference on Ascend 910B and 310P. Responding to growing demand for horizontal scaling, v0.7 now supports multi‑node distributed inference with MindIE.

We have validated deployments—including DeepSeek R1 671B—across multiple Ascend nodes, confirming stability and high throughput. This fully meets large‑scale model needs in NPU clusters.

▲ GPUStack + Ascend MindIE running the DeepSeek R1 671B across an NPU cluster

Cambricon MLU Compatibility

Thanks to community contributions from Cambricon, v0.7 now supports MLU chips, marking another milestone in heterogeneous‑hardware adaptation and reflecting growing recognition from mainstream vendors.

GPUStack now unifies scheduling across NVIDIA, AMD**, Apple Silicon, Ascend, Hygon, Moore Threads,** Iluvatar**, Cambricon**, and more—offering a one‑stop solution for stable inference on heterogeneous resources.

As support for heterogeneous chips widens, GPUStack is becoming the backbone platform for large‑model localization, unified scheduling, and high‑performance model services.

UI / UX Overhaul

v0.7 delivers a complete UI/UX redesign:

Reworked menus: Clearer top‑level nav with fewer clicks
Persistent key entries: Quick access to frequently used features
Consistent interactions: Logical layouts and smoother workflows

The new interface aligns with user habits, enhancing discoverability and efficiency so you can focus on your models.

▲ Brand‑new v0.7 UI—key actions one click away, boosting speed and ease

Join the Community

Explore more at our GitHub repo: https://github.com/gpustack/gpustack. Feel free to open issues or PRs—please Star ⭐️ us first!

Need help? Join our Discord: https://discord.gg/VXYJzuaqwD for technical support and discussion.

If GPUStack helps you, a like, share, or follow is always appreciated!

Get started

GPUStack v0.7: Desktop Installer and Usage Metering