Beginner Tutorial: Using GPUStack to Aggregate GPUs and Run LLMs

Beginner Tutorial

What is GPUStack?

GPUStack is an open-source GPU cluster manager for running Large Language Models (LLMs). GPUStack allows you to create a unified cluster from any brand of GPUs in Apple MacBooks, Windows PCs, and Linux servers. Administrators can deploy LLMs from popular repositories such as Hugging Face. Developers can then access LLMs just as easily as accessing public LLM services from vendors like OpenAI or Microsoft Azure.

For more details about GPUStack, visit:

Introducing GPUStack: https://gpustack.ai/introducing-gpustack

GitHub repo: https://github.com/gpustack/gpustack

User guide: https://docs.gpustack.ai

Getting Started with GPUStack

You need to use at least Python version 3.10.

Installation

Linux or MacOS

GPUStack provides a script to install it as a service on systemd or launchd based systems. To install GPUStack using this method, execute:


xxxxxxxxxx
curl -sfL https://get.gpustack.ai | sh -

Now you have deployed and started the GPUStack server, which serves as the first worker node. You can access the GPUStack page via http://myserver (Replace with the IP address or domain of the host you installed).

Log in to GPUStack with username admin and the default password. You can run the following command to get the password for the default setup:


xxxxxxxxxx
cat /var/lib/gpustack/initial_admin_password

To add additional worker nodes and form a GPUStack cluster, please run the following command on each worker node:


xxxxxxxxxx
curl -sfL https://get.gpustack.ai | sh - --server-url http://myserver --token mytoken

Replace http://myserver with your GPUStack server URL and mytoken with your secret token for adding workers. To retrieve the token in the default setup from the GPUStack server, use the following command:


xxxxxxxxxx
cat /var/lib/gpustack/token

Or follow the instructions on GPUStack to add workers:

Windows

Run PowerShell as administrator, then run the following command to install GPUStack:


xxxxxxxxxx
Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content

You can access the GPUStack page via http://myserver (Replace with the IP address or domain of the host you installed).

Log in to GPUStack with username admin and the default password. You can run the following command to get the password for the default setup:


xxxxxxxxxx
Get-Content -Path (Join-Path -Path $env:APPDATA -ChildPath "gpustack\initial_admin_password") -Raw

Optionally, you can add extra workers to form a GPUStack cluster by running the following command on other nodes:


xxxxxxxxxx
    Invoke-Expression "& { $((Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content) } --server-url http://myserver --token mytoken"

In the default setup, you can run the following to get the token used for adding workers:


xxxxxxxxxx
Get-Content -Path (Join-Path -Path $env:APPDATA -ChildPath "gpustack\token") -Raw

For other installation scenarios, please refer to our installation documentation at: https://docs.gpustack.ai/docs/quickstart

Serving LLMs

As an LLM administrator, you can log in to GPUStack as the default system admin, navigate to Resources to monitor your GPU status and capacities, and then go to Models to deploy any open-source LLM into the GPUStack cluster. This enables you to provide these LLMs to regular users for integration into their applications. This approach helps you to efficiently utilize your existing resources and deliver stable LLM services for various needs and scenarios.

Access GPUStack to deploy the LLMs you need. Choose models from Hugging Face (only GGUF format is currently supported) or Ollama Library, download them to your local environment, and run the LLMs:

GPUStack will automatically schedule the model to run on the appropriate Worker:

You can manage and maintain LLMs by checking API requests, token consumption, token throughput, resource utilization status, and more. This helps you decide whether to scale up or upgrade LLMs to ensure service stability.

Integrating with your applications

As an AI application developer, you can log in to GPUStack as a regular user and navigate to Playground from the menu. Here, you can interact with the LLM using the UI playground.

Next, visit API Keys to generate and save your API key. Return to Playground to customize your LLM by adjusting the system prompt, adding few-shot learning examples, or resizing prompt parameters. When you're done, click View Code and select your preferred code format (curl, Python, Node.js) along with the API key. Use this code in your applications to enable communication with your private LLMs.

you can access the OpenAI-compatible API now, for example, use curl as the following:


xxxxxxxxxx
export GPUSTACK_API_KEY=myapikey
curl http://myserver/v1-openai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GPUSTACK_API_KEY" \
  -d '{
    "model": "llama3",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true
  }'

Manage GPUStack

For MacOS

In macOS, GPUStack runs as a launchd service. Use launchctl to manage the GPUStack service:

View the configuration


xxxxxxxxxx
sudo launchctl print system/ai.gpustack

Stop service


xxxxxxxxxx
sudo launchctl unload /Library/LaunchDaemons/ai.gpustack.plist
ps -ef | grep gpustack

Start service


xxxxxxxxxx
sudo launchctl load /Library/LaunchDaemons/ai.gpustack.plist
ps -ef | grep gpustack

Edit configuration and restart service


xxxxxxxxxx
sudo launchctl unload /Library/LaunchDaemons/ai.gpustack.plist
sudo vim /Library/LaunchDaemons/ai.gpustack.plist
sudo launchctl load /Library/LaunchDaemons/ai.gpustack.plist
ps -ef | grep gpustack

View logs

You can view GPUStack logs using the following path and command:


xxxxxxxxxx
tail -200f /var/log/gpustack.log

Uninstall

Run the following command to uninstall GPUStack:


xxxxxxxxxx
/var/lib/gpustack/uninstall.sh

For Linux

In Linux, GPUStack runs as a systemd service. Use systemctl to manage the GPUStack service:

View the configuration


xxxxxxxxxx
sudo cat /etc/systemd/system/gpustack.service

Stop service


xxxxxxxxxx
sudo systemctl stop gpustack
ps -ef | grep gpustack

Start service


xxxxxxxxxx
sudo systemctl start gpustack
ps -ef | grep gpustack

Edit configuration and restart service


xxxxxxxxxx
sudo vim /etc/systemd/system/gpustack.service
sudo systemctl daemon-reload
sudo systemctl restart gpustack
ps -ef | grep gpustack

View logs

You can view GPUStack logs using the following path and command:


xxxxxxxxxx
tail -200f /var/log/gpustack.log

Uninstall

Run the following command to uninstall GPUStack:


xxxxxxxxxx
/var/lib/gpustack/uninstall.sh

For Windows

In Windows, you can use PowerShell to manage the GPUStack service:

View the configuration


xxxxxxxxxx
Get-WmiObject Win32_Process -Filter "Name = 'gpustack.exe'"

Stop service


xxxxxxxxxx
Stop-Service -Name "GPUStack"
Get-WmiObject Win32_Process -Filter "Name = 'gpustack.exe'" | Select-Object ProcessId, CommandLine

Start service


xxxxxxxxxx
Start-Service -Name "GPUStack"
Get-WmiObject Win32_Process -Filter "Name = 'gpustack.exe'" | Select-Object ProcessId, CommandLine

Edit the configuration using nssm and restart the service


xxxxxxxxxx
nssm edit GPUStack

Restart after edit the configuration:


xxxxxxxxxx
Restart-Service -Name "GPUStack"
Get-Service -Name "GPUStack"

View logs

You can view GPUStack logs using the following path and command:


xxxxxxxxxx
Get-Content "$env:APPDATA\gpustack\log\gpustack.log" -Tail 200 -Wait

Uninstall

Run the following PowerShell command to uninstall GPUStack:


xxxxxxxxxx
Set-ExecutionPolicy Bypass -Scope Process -Force; & "$env:APPDATA\gpustack\uninstall.ps1"

Join Our Community

Please find more information about GPUStack at: https://gpustack.ai.

If you encounter any issues or have suggestions for GPUStack, feel free to join our Community for support from the GPUStack team and to connect with fellow users globally.

We are actively enhancing the GPUStack project and plan to introduce new features in the near future, including support for multimodal models, additional accelerators like AMD ROCm or Intel oneAPI, and more inference engines. Before getting started, we encourage you to follow and star our project on GitHub at gpustack/gpustack to receive instant notifications about all future releases. We welcome your contributions to the project.

About Us

GPUStack is brought to you by Seal, Inc., a team dedicated to enabling AI access for all. Our mission is to enable enterprises to use AI to conduct their business, and GPUStack is a significant step towards achieving that goal.

Quickly build your own LLMaaS platform with GPUStack! Start experiencing the ease of creating GPU clusters locally, running and using LLMs, and integrating them into your applications.

Try GPUStack

Beginner Tutorial: Using GPUStack to Aggregate GPUs and Run LLMs

What is GPUStack?

Getting Started with GPUStack

Installation

Serving LLMs

Integrating with your applications

Manage GPUStack

For MacOS

For Linux

For Windows

Join Our Community

About Us

Related Articles

Resources

Company

Try GPUStack

Beginner Tutorial: Using GPUStack to Aggregate GPUs and Run LLMs

What is GPUStack?

Getting Started with GPUStack

Installation

Serving LLMs

Integrating with your applications

Manage GPUStack

For MacOS

For Linux

For Windows

Join Our Community

About Us

Related Articles

Resources

Company

Get our newsletter