1
0
Fork 0

Compare commits

...

2 Commits

Author SHA1 Message Date
Massaki Archambault e359def1cc update readme 2024-11-12 23:27:30 -05:00
Massaki Archambault 5b8ec8fc3d update compose 2024-11-12 23:27:20 -05:00
7 changed files with 35 additions and 82 deletions

View File

@ -2,13 +2,6 @@
A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally. A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.
## Goals
* Streamline deployment of a local LLM for experimentation purpose.
* Deploy a ChatGPT Clone for daily use.
* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.
## Getting started ## Getting started
### Prerequisites ### Prerequisites
@ -21,18 +14,18 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
1. Make sure your drivers are up to date. 1. Make sure your drivers are up to date.
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). 2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
3. Clone the repo. 3. Clone the repo.
4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml` 4. Symlink the NVIDIA compose spec to select it. `ln -s docker-compose.nvidia.yml docker.compose.yml`
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
6. Browse http://localhost:8080/ 6. Browse http://localhost:8080/
7. Create an account and start chatting! 7. Create an account and start chatting!
### Steps for AMD GPU ### Steps for AMD GPU
**Warning: AMD was not tested on Windows.** **Warning: AMD will *doesn't* support Windows at the moment. Use Linux.**
1. Make sure your drivers are up to date. 1. Make sure your drivers are up to date.
2. Clone the repo. 2. Clone the repo.
3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml` 3. Symlink the AMD compose spec to select it. `ln -s docker-compose.amd.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/ 5. Browse http://localhost:8080/
6. Create an account and start chatting! 6. Create an account and start chatting!
@ -43,39 +36,11 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
1. Make sure your drivers are up to date. 1. Make sure your drivers are up to date.
2. Clone the repo. 2. Clone the repo.
3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml` 3. Symlink the CPU compose spec to select it. `ln -s docker-compose.cpu.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/ 5. Browse http://localhost:8080/
6. Create an account and start chatting! 6. Create an account and start chatting!
## Configuring additional models
### Self-hosted (Ollama)
Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)
#### Configuring via the command-line
``` sh
docker compose exec ollama ollama pull gemma
```
### External providers (OpenAI, Mistral, Anthropic, etc.)
External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).
Let say we want to configure gpt-3.5-turbo with an OpenAI API key.
#### Configuring via a config file
1. Open the file *./litellm/config.yaml* in your editor.
2. Add an entry under `model_list`:
``` yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <put your OpenAI API key here>
```
3. Run `docker compose restart open-webui` to restart Open WebUI.
## Using the API ## Using the API
@ -121,6 +86,10 @@ curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:808
The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications. The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.
## Update
Simply run `docker compose pull` followed by `docker compose restart`.
## Alternatives ## Alternatives
Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative! Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!

View File

@ -6,11 +6,9 @@ services:
ollama: ollama:
image: ollama/ollama:rocm image: ollama/ollama:rocm
restart: unless-stopped restart: unless-stopped
entrypoint: /bootstrap.sh ports:
command: mistral - 11434:11434
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
# begin for AMD GPU support # begin for AMD GPU support
devices: devices:
- /dev/kfd - /dev/kfd
@ -22,12 +20,13 @@ services:
- SYS_PTRACE - SYS_PTRACE
security_opt: security_opt:
- seccomp=unconfined - seccomp=unconfined
environment: # environment:
# https://github.com/ROCm/ROCm/issues/2625 # # https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
GPU_MAX_HW_QUEUES: 1 # HSA_OVERRIDE_GFX_VERSION: 11.0.0
# https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
# HSA_OVERRIDE_GFX_VERSION: 11.0.0
# end of section for AMD GPU support # end of section for AMD GPU support
volumes: volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro - ollama_data:/root/.ollama
- ./ollama:/root/.ollama
volumes:
ollama_data:

View File

@ -18,13 +18,12 @@ services:
image: ghcr.io/open-webui/open-webui:main image: ghcr.io/open-webui/open-webui:main
ports: ports:
- 8080:8080 - 8080:8080
- 11434:11434
environment: environment:
OLLAMA_BASE_URL: http://localhost:11434 OLLAMA_BASE_URL: http://ollama:11434
WEBUI_AUTH: "False"
extra_hosts: extra_hosts:
- host.docker.internal:host-gateway - host.docker.internal:host-gateway
volumes: volumes:
- ./litellm/config.yaml:/app/backend/data/litellm/config.yaml
- open-webui_data:/app/backend/data - open-webui_data:/app/backend/data
volumes: volumes:

View File

@ -6,11 +6,10 @@ services:
ollama: ollama:
image: ollama/ollama:latest image: ollama/ollama:latest
restart: unless-stopped restart: unless-stopped
entrypoint: /bootstrap.sh ports:
command: mistral - 11434:11434
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
volumes: volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro - ollama_data:/root/.ollama
- ./ollama:/root/.ollama
volumes:
ollama_data:

View File

@ -6,11 +6,9 @@ services:
ollama: ollama:
image: ollama/ollama:latest image: ollama/ollama:latest
restart: unless-stopped restart: unless-stopped
entrypoint: /bootstrap.sh ports:
command: mistral - 11434:11434
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
# begin for NVIDIA GPU support # begin for NVIDIA GPU support
deploy: deploy:
resources: resources:
@ -20,6 +18,9 @@ services:
count: 1 count: 1
capabilities: [gpu] capabilities: [gpu]
# end of section for NVIDIA GPU support # end of section for NVIDIA GPU support
volumes: volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro - ollama_data:/root/.ollama
- ./ollama:/root/.ollama
volumes:
ollama_data:

3
ollama/.gitignore vendored
View File

@ -1,3 +0,0 @@
*
!.gitignore
!bootstrap.sh

View File

@ -1,11 +0,0 @@
#!/bin/bash -x
ollama serve &
sleep 1
for model in ${@:-mistral}; do
ollama pull "$model"
done
wait