GPUStack is continuously improving, and covering more enterprise-level LLM-as-a-Service (LLMaaS) use cases. GPUStack 0.3.1 is released this week, introducing support for Rerank models and API, Windows ARM64 devices, and several community-reported issues have been addressed to better accommodate diverse use cases.
For more information about GPUStack, visit:
GitHub repo: https://github.com/gpustack/gpustack
User guide: https://docs.gpustack.ai
Key features
Support Rerank models / API
RAG systems, such as knowledge bases, are one of the key directions for LLM applications. In best practices for RAG systems, it’s generally recommended to use both Embedding and Rerank models.
• Embedding models convert the text from a knowledge base into vectors and stored in a vector database. When a user asks a question, the Embedding model vectorizes the query. By calculating vector distances, the system retrieves the most relevant contexts from the vector database and combines them with the user’s question to send to the LLM.
• Rerank models reorder multiple contexts retrieved from the vector database, ensuring that the most relevant contexts are passed to the LLM, thereby improving the accuracy of retrieval.
In a RAG system, the combination of chat models (LLMs), Embedding models, and Rerank models is a common best practice. GPUStack introduced support for Embedding models in 0.1.1. As of 0.3.1, GPUStack supports Rerank models, providing a Jina-compatible Rerank API. Additionally, GPUStack has extended support for Embedding models on the vLLM backend. From 0.3.1 onwards, GPUStack offers comprehensive support for various RAG systems.
To use the Rerank model deployed via GPUStack in a RAG system, such as Dify, you can select the Jina Rerank model from the Model Provider and input the Rerank API and API Key provided by GPUStack:
Expanded support
GPUStack 0.3.1 now supports Windows ARM64 devices, enabling full support across AMD64 (x86_64) and ARM64 platforms on Linux, Windows, and macOS.
We are continuously expanding GPUStack’s support, with ongoing work planned for platforms such as AMD GPU.
GPUStack currently requires Python >= 3.10. If your Python version is below 3.10, we recommended using Miniconda to create a Python environment that meets the version requirements.
Other features and fixes
GPUStack 0.3.1 also includes many improvements and bug fixes based on user feedback. These include enhanced installation script outputs to guide users in accessing GPUStack, providing proper errors in the Playground when models are not ready, manually defining GPU resources when automatic detection is unavailable, and resolving issues like the default installation failing to run Qwen2-VL and preventing repeated downloads when downloading models from ModelScope.
For other enhancements and bug fixes, see the full changelog:
https://github.com/gpustack/gpustack/releases/tag/0.3.1
Join Our Community
For more information about GPUStack, please visit: https://gpustack.ai.
If you encounter any issues or have suggestions, feel free to join our Community for support from the GPUStack team and connect with users from around the world.
We are continuously improving the GPUStack project. Before getting started, we encourage you to follow and star our project on GitHub at gpustack/gpustack to receive updates on future releases. We also welcome contributions to the project.