Introducing GPUStack 0.3.1: Ready for RAG systems & Windows ARM support

GPUStack 0.3.1

GPUStack is continuously improving, and covering more enterprise-level LLM-as-a-Service (LLMaaS) use cases. GPUStack 0.3.1 is released this week, introducing support for Rerank models and API, Windows ARM64 devices, and several community-reported issues have been addressed to better accommodate diverse use cases.

 

For more information about GPUStack, visit:

GitHub repo: https://github.com/gpustack/gpustack

User guide: https://docs.gpustack.ai

 

Key features

Support Rerank models / API

RAG systems, such as knowledge bases, are one of the key directions for LLM applications. In best practices for RAG systems, it’s generally recommended to use both Embedding and Rerank models.

 

Embedding models convert the text from a knowledge base into vectors and stored in a vector database. When a user asks a question, the Embedding model vectorizes the query. By calculating vector distances, the system retrieves the most relevant contexts from the vector database and combines them with the user’s question to send to the LLM.

Rerank models reorder multiple contexts retrieved from the vector database, ensuring that the most relevant contexts are passed to the LLM, thereby improving the accuracy of retrieval.

 

In a RAG system, the combination of chat models (LLMs), Embedding models, and Rerank models is a common best practice. GPUStack introduced support for Embedding models in 0.1.1. As of 0.3.1, GPUStack supports Rerank models, providing a Jina-compatible Rerank API. Additionally, GPUStack has extended support for Embedding models on the vLLM backend. From 0.3.1 onwards, GPUStack offers comprehensive support for various RAG systems.

 

To use the Rerank model deployed via GPUStack in a RAG system, such as Dify, you can select the Jina Rerank model from the Model Provider and input the Rerank API and API Key provided by GPUStack:

image-20241024103842084

 

 

Expanded support

GPUStack 0.3.1 now supports Windows ARM64 devices, enabling full support across AMD64 (x86_64) and ARM64 platforms on Linux, Windows, and macOS.

 

We are continuously expanding GPUStack’s support, with ongoing work planned for platforms such as AMD GPU.

GPUStack currently requires Python >= 3.10. If your Python version is below 3.10, we recommended using Miniconda to create a Python environment that meets the version requirements.

 

 

Other features and fixes

GPUStack 0.3.1 also includes many improvements and bug fixes based on user feedback. These include enhanced installation script outputs to guide users in accessing GPUStack, providing proper errors in the Playground when models are not ready, manually defining GPU resources when automatic detection is unavailable, and resolving issues like the default installation failing to run Qwen2-VL and preventing repeated downloads when downloading models from ModelScope.

 

For other enhancements and bug fixes, see the full changelog:

https://github.com/gpustack/gpustack/releases/tag/0.3.1

 

 

Join Our Community

For more information about GPUStack, please visit: https://gpustack.ai.

 

If you encounter any issues or have suggestions, feel free to join our Community for support from the GPUStack team and connect with users from around the world.

 

We are continuously improving the GPUStack project. Before getting started, we encourage you to follow and star our project on GitHub at gpustack/gpustack to receive updates on future releases. We also welcome contributions to the project.

 

Related Articles