1
0
Fork 0

Compare commits

...

2 Commits

Author SHA1 Message Date
Massaki Archambault e359def1cc update readme 2024-11-12 23:27:30 -05:00
Massaki Archambault 5b8ec8fc3d update compose 2024-11-12 23:27:20 -05:00
7 changed files with 35 additions and 82 deletions

View File

@ -2,13 +2,6 @@
A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.
## Goals
* Streamline deployment of a local LLM for experimentation purpose.
* Deploy a ChatGPT Clone for daily use.
* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.
## Getting started
### Prerequisites
@ -21,18 +14,18 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
1. Make sure your drivers are up to date.
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
3. Clone the repo.
4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
4. Symlink the NVIDIA compose spec to select it. `ln -s docker-compose.nvidia.yml docker.compose.yml`
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
6. Browse http://localhost:8080/
7. Create an account and start chatting!
### Steps for AMD GPU
**Warning: AMD was not tested on Windows.**
**Warning: AMD will *doesn't* support Windows at the moment. Use Linux.**
1. Make sure your drivers are up to date.
2. Clone the repo.
3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
3. Symlink the AMD compose spec to select it. `ln -s docker-compose.amd.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/
6. Create an account and start chatting!
@ -43,39 +36,11 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
1. Make sure your drivers are up to date.
2. Clone the repo.
3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml`
3. Symlink the CPU compose spec to select it. `ln -s docker-compose.cpu.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/
6. Create an account and start chatting!
## Configuring additional models
### Self-hosted (Ollama)
Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)
#### Configuring via the command-line
``` sh
docker compose exec ollama ollama pull gemma
```
### External providers (OpenAI, Mistral, Anthropic, etc.)
External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).
Let say we want to configure gpt-3.5-turbo with an OpenAI API key.
#### Configuring via a config file
1. Open the file *./litellm/config.yaml* in your editor.
2. Add an entry under `model_list`:
``` yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: <put your OpenAI API key here>
```
3. Run `docker compose restart open-webui` to restart Open WebUI.
## Using the API
@ -121,6 +86,10 @@ curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:808
The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.
## Update
Simply run `docker compose pull` followed by `docker compose restart`.
## Alternatives
Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!

View File

@ -6,11 +6,9 @@ services:
ollama:
image: ollama/ollama:rocm
restart: unless-stopped
entrypoint: /bootstrap.sh
command: mistral
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
ports:
- 11434:11434
# begin for AMD GPU support
devices:
- /dev/kfd
@ -22,12 +20,13 @@ services:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
environment:
# https://github.com/ROCm/ROCm/issues/2625
GPU_MAX_HW_QUEUES: 1
# https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
# HSA_OVERRIDE_GFX_VERSION: 11.0.0
# environment:
# # https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
# HSA_OVERRIDE_GFX_VERSION: 11.0.0
# end of section for AMD GPU support
volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
- ./ollama:/root/.ollama
- ollama_data:/root/.ollama
volumes:
ollama_data:

View File

@ -18,13 +18,12 @@ services:
image: ghcr.io/open-webui/open-webui:main
ports:
- 8080:8080
- 11434:11434
environment:
OLLAMA_BASE_URL: http://localhost:11434
OLLAMA_BASE_URL: http://ollama:11434
WEBUI_AUTH: "False"
extra_hosts:
- host.docker.internal:host-gateway
volumes:
- ./litellm/config.yaml:/app/backend/data/litellm/config.yaml
- open-webui_data:/app/backend/data
volumes:

View File

@ -6,11 +6,10 @@ services:
ollama:
image: ollama/ollama:latest
restart: unless-stopped
entrypoint: /bootstrap.sh
command: mistral
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
ports:
- 11434:11434
volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
- ./ollama:/root/.ollama
- ollama_data:/root/.ollama
volumes:
ollama_data:

View File

@ -6,11 +6,9 @@ services:
ollama:
image: ollama/ollama:latest
restart: unless-stopped
entrypoint: /bootstrap.sh
command: mistral
network_mode: service:open-webui
environment:
OLLAMA_HOST: http://localhost:11434
ports:
- 11434:11434
# begin for NVIDIA GPU support
deploy:
resources:
@ -20,6 +18,9 @@ services:
count: 1
capabilities: [gpu]
# end of section for NVIDIA GPU support
volumes:
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
- ./ollama:/root/.ollama
- ollama_data:/root/.ollama
volumes:
ollama_data:

3
ollama/.gitignore vendored
View File

@ -1,3 +0,0 @@
*
!.gitignore
!bootstrap.sh

View File

@ -1,11 +0,0 @@
#!/bin/bash -x
ollama serve &
sleep 1
for model in ${@:-mistral}; do
ollama pull "$model"
done
wait