badjware

Go to file

Massaki Archambault 5b8ec8fc3d update compose		2024-11-12 23:27:20 -05:00
litellm	promote use of open-webui embedded litellm	2024-04-04 10:16:41 -04:00
.gitignore	modularize docker-compose spec	2024-02-07 22:42:57 -05:00
README.md	update README	2024-04-04 10:16:41 -04:00
docker-compose.amd.yml	update compose	2024-11-12 23:27:20 -05:00
docker-compose.base.yml	update compose	2024-11-12 23:27:20 -05:00
docker-compose.cpu.yml	update compose	2024-11-12 23:27:20 -05:00
docker-compose.nvidia.yml	update compose	2024-11-12 23:27:20 -05:00

README.md

local-llm

A quick prototype to self-host Open WebUI backed by Ollama to run LLM inference locally.

Goals

Streamline deployment of a local LLM for experimentation purpose.
Deploy a ChatGPT Clone for daily use.
Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

Getting started

Prerequisites

Linux or WSL2
docker

Steps for NVIDIA GPU

Make sure your drivers are up to date.
Install the NVIDIA Container Toolkit.
Clone the repo.
Copy the NVIDIA compose spec to select it. cp docker-compose.nvidia.yml docker.compose.yml
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:8080/
Create an account and start chatting!

Steps for AMD GPU

Warning: AMD was not tested on Windows.

Make sure your drivers are up to date.
Clone the repo.
Copy the AMD compose spec to select it. cp docker-compose.amd.yml docker.compose.yml
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:8080/
Create an account and start chatting!

Steps for NO GPU (use CPU)

Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model

Make sure your drivers are up to date.
Clone the repo.
Copy the CPU compose spec to select it. cp docker-compose.cpu.yml docker.compose.yml
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:8080/
Create an account and start chatting!

Configuring additional models

Self-hosted (Ollama)

Browse the Ollama models library to find a model you wish to add. For this example we will add gemma

Configuring via the command-line

docker compose exec ollama ollama pull gemma

External providers (OpenAI, Mistral, Anthropic, etc.)

External providers can be configured through a LiteLLM instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the documentation.

Let say we want to configure gpt-3.5-turbo with an OpenAI API key.

Configuring via a config file

Open the file ./litellm/config.yaml in your editor.

Add an entry under model_list:

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: <put your OpenAI API key here>

Run docker compose restart open-webui to restart Open WebUI.

Using the API

Open WebUI act as a proxy to Ollama and LiteLLM. For both API, authentication is done though a JWT token which can be fetched in the Settings > About page in Open WebUI.

Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.
Example usage:

curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags

The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. In that case, there is no authentification.
Example usage:

curl http://localhost:11434/api/tags

Ollama also have some OpenAI-compatible APIs. See the blog post for more detailed usage instructions.
Example usage:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Open WebUI exposes the LiteLLM API (for external providers) at the url http://localhost:8080/litellm/api/v1.
Example usage:

curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/litellm/api/v1/models

The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.

Alternatives

Check out LM Studio for a more integrated, but non web-based alternative!