1
0
Fork 0
Go to file
Massaki Archambault e359def1cc update readme 2024-11-12 23:27:30 -05:00
litellm promote use of open-webui embedded litellm 2024-04-04 10:16:41 -04:00
.gitignore modularize docker-compose spec 2024-02-07 22:42:57 -05:00
README.md update readme 2024-11-12 23:27:30 -05:00
docker-compose.amd.yml update compose 2024-11-12 23:27:20 -05:00
docker-compose.base.yml update compose 2024-11-12 23:27:20 -05:00
docker-compose.cpu.yml update compose 2024-11-12 23:27:20 -05:00
docker-compose.nvidia.yml update compose 2024-11-12 23:27:20 -05:00

README.md

local-llm

A quick prototype to self-host Open WebUI backed by Ollama to run LLM inference locally.

Getting started

Prerequisites

  • Linux or WSL2
  • docker

Steps for NVIDIA GPU

  1. Make sure your drivers are up to date.
  2. Install the NVIDIA Container Toolkit.
  3. Clone the repo.
  4. Symlink the NVIDIA compose spec to select it. ln -s docker-compose.nvidia.yml docker.compose.yml
  5. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  6. Browse http://localhost:8080/
  7. Create an account and start chatting!

Steps for AMD GPU

Warning: AMD will doesn't support Windows at the moment. Use Linux.

  1. Make sure your drivers are up to date.
  2. Clone the repo.
  3. Symlink the AMD compose spec to select it. ln -s docker-compose.amd.yml docker.compose.yml
  4. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  5. Browse http://localhost:8080/
  6. Create an account and start chatting!

Steps for NO GPU (use CPU)

Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model

  1. Make sure your drivers are up to date.
  2. Clone the repo.
  3. Symlink the CPU compose spec to select it. ln -s docker-compose.cpu.yml docker.compose.yml
  4. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  5. Browse http://localhost:8080/
  6. Create an account and start chatting!

Using the API

Open WebUI act as a proxy to Ollama and LiteLLM. For both API, authentication is done though a JWT token which can be fetched in the Settings > About page in Open WebUI.

Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.
Example usage:

curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags

The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. In that case, there is no authentification.
Example usage:

curl http://localhost:11434/api/tags

Ollama also have some OpenAI-compatible APIs. See the blog post for more detailed usage instructions.
Example usage:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Open WebUI exposes the LiteLLM API (for external providers) at the url http://localhost:8080/litellm/api/v1.
Example usage:

curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/litellm/api/v1/models

The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.

Update

Simply run docker compose pull followed by docker compose restart.

Alternatives

Check out LM Studio for a more integrated, but non web-based alternative!