Compare commits
No commits in common. "e359def1cc4998eb9ac13df8344f26ef13ea81a1" and "136fc43f23215197f527325b600a52392fecec6c" have entirely different histories.
e359def1cc
...
136fc43f23
47
README.md
47
README.md
|
@ -2,6 +2,13 @@
|
||||||
|
|
||||||
A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.
|
A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
* Streamline deployment of a local LLM for experimentation purpose.
|
||||||
|
* Deploy a ChatGPT Clone for daily use.
|
||||||
|
* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
|
||||||
|
* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.
|
||||||
|
|
||||||
## Getting started
|
## Getting started
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
@ -14,18 +21,18 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
|
||||||
1. Make sure your drivers are up to date.
|
1. Make sure your drivers are up to date.
|
||||||
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
||||||
3. Clone the repo.
|
3. Clone the repo.
|
||||||
4. Symlink the NVIDIA compose spec to select it. `ln -s docker-compose.nvidia.yml docker.compose.yml`
|
4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
|
||||||
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
||||||
6. Browse http://localhost:8080/
|
6. Browse http://localhost:8080/
|
||||||
7. Create an account and start chatting!
|
7. Create an account and start chatting!
|
||||||
|
|
||||||
### Steps for AMD GPU
|
### Steps for AMD GPU
|
||||||
|
|
||||||
**Warning: AMD will *doesn't* support Windows at the moment. Use Linux.**
|
**Warning: AMD was not tested on Windows.**
|
||||||
|
|
||||||
1. Make sure your drivers are up to date.
|
1. Make sure your drivers are up to date.
|
||||||
2. Clone the repo.
|
2. Clone the repo.
|
||||||
3. Symlink the AMD compose spec to select it. `ln -s docker-compose.amd.yml docker.compose.yml`
|
3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
|
||||||
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
||||||
5. Browse http://localhost:8080/
|
5. Browse http://localhost:8080/
|
||||||
6. Create an account and start chatting!
|
6. Create an account and start chatting!
|
||||||
|
@ -36,11 +43,39 @@ A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed
|
||||||
|
|
||||||
1. Make sure your drivers are up to date.
|
1. Make sure your drivers are up to date.
|
||||||
2. Clone the repo.
|
2. Clone the repo.
|
||||||
3. Symlink the CPU compose spec to select it. `ln -s docker-compose.cpu.yml docker.compose.yml`
|
3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml`
|
||||||
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
|
||||||
5. Browse http://localhost:8080/
|
5. Browse http://localhost:8080/
|
||||||
6. Create an account and start chatting!
|
6. Create an account and start chatting!
|
||||||
|
|
||||||
|
## Configuring additional models
|
||||||
|
|
||||||
|
### Self-hosted (Ollama)
|
||||||
|
|
||||||
|
Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)
|
||||||
|
|
||||||
|
#### Configuring via the command-line
|
||||||
|
``` sh
|
||||||
|
docker compose exec ollama ollama pull gemma
|
||||||
|
```
|
||||||
|
|
||||||
|
### External providers (OpenAI, Mistral, Anthropic, etc.)
|
||||||
|
|
||||||
|
External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).
|
||||||
|
|
||||||
|
Let say we want to configure gpt-3.5-turbo with an OpenAI API key.
|
||||||
|
|
||||||
|
#### Configuring via a config file
|
||||||
|
1. Open the file *./litellm/config.yaml* in your editor.
|
||||||
|
2. Add an entry under `model_list`:
|
||||||
|
``` yaml
|
||||||
|
model_list:
|
||||||
|
- model_name: gpt-3.5-turbo
|
||||||
|
litellm_params:
|
||||||
|
model: gpt-3.5-turbo
|
||||||
|
api_key: <put your OpenAI API key here>
|
||||||
|
```
|
||||||
|
3. Run `docker compose restart open-webui` to restart Open WebUI.
|
||||||
|
|
||||||
## Using the API
|
## Using the API
|
||||||
|
|
||||||
|
@ -86,10 +121,6 @@ curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:808
|
||||||
|
|
||||||
The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.
|
The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.
|
||||||
|
|
||||||
## Update
|
|
||||||
|
|
||||||
Simply run `docker compose pull` followed by `docker compose restart`.
|
|
||||||
|
|
||||||
## Alternatives
|
## Alternatives
|
||||||
|
|
||||||
Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!
|
Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!
|
||||||
|
|
|
@ -6,9 +6,11 @@ services:
|
||||||
ollama:
|
ollama:
|
||||||
image: ollama/ollama:rocm
|
image: ollama/ollama:rocm
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
ports:
|
entrypoint: /bootstrap.sh
|
||||||
- 11434:11434
|
command: mistral
|
||||||
|
network_mode: service:open-webui
|
||||||
|
environment:
|
||||||
|
OLLAMA_HOST: http://localhost:11434
|
||||||
# begin for AMD GPU support
|
# begin for AMD GPU support
|
||||||
devices:
|
devices:
|
||||||
- /dev/kfd
|
- /dev/kfd
|
||||||
|
@ -20,13 +22,12 @@ services:
|
||||||
- SYS_PTRACE
|
- SYS_PTRACE
|
||||||
security_opt:
|
security_opt:
|
||||||
- seccomp=unconfined
|
- seccomp=unconfined
|
||||||
# environment:
|
environment:
|
||||||
# # https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
|
# https://github.com/ROCm/ROCm/issues/2625
|
||||||
|
GPU_MAX_HW_QUEUES: 1
|
||||||
|
# https://github.com/ROCm/ROCm/issues/2788#issuecomment-1915765846
|
||||||
# HSA_OVERRIDE_GFX_VERSION: 11.0.0
|
# HSA_OVERRIDE_GFX_VERSION: 11.0.0
|
||||||
# end of section for AMD GPU support
|
# end of section for AMD GPU support
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
- ollama_data:/root/.ollama
|
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
|
||||||
|
- ./ollama:/root/.ollama
|
||||||
volumes:
|
|
||||||
ollama_data:
|
|
|
@ -18,12 +18,13 @@ services:
|
||||||
image: ghcr.io/open-webui/open-webui:main
|
image: ghcr.io/open-webui/open-webui:main
|
||||||
ports:
|
ports:
|
||||||
- 8080:8080
|
- 8080:8080
|
||||||
|
- 11434:11434
|
||||||
environment:
|
environment:
|
||||||
OLLAMA_BASE_URL: http://ollama:11434
|
OLLAMA_BASE_URL: http://localhost:11434
|
||||||
WEBUI_AUTH: "False"
|
|
||||||
extra_hosts:
|
extra_hosts:
|
||||||
- host.docker.internal:host-gateway
|
- host.docker.internal:host-gateway
|
||||||
volumes:
|
volumes:
|
||||||
|
- ./litellm/config.yaml:/app/backend/data/litellm/config.yaml
|
||||||
- open-webui_data:/app/backend/data
|
- open-webui_data:/app/backend/data
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
|
|
|
@ -6,10 +6,11 @@ services:
|
||||||
ollama:
|
ollama:
|
||||||
image: ollama/ollama:latest
|
image: ollama/ollama:latest
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
ports:
|
entrypoint: /bootstrap.sh
|
||||||
- 11434:11434
|
command: mistral
|
||||||
|
network_mode: service:open-webui
|
||||||
|
environment:
|
||||||
|
OLLAMA_HOST: http://localhost:11434
|
||||||
volumes:
|
volumes:
|
||||||
- ollama_data:/root/.ollama
|
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
|
||||||
|
- ./ollama:/root/.ollama
|
||||||
volumes:
|
|
||||||
ollama_data:
|
|
|
@ -6,9 +6,11 @@ services:
|
||||||
ollama:
|
ollama:
|
||||||
image: ollama/ollama:latest
|
image: ollama/ollama:latest
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
ports:
|
entrypoint: /bootstrap.sh
|
||||||
- 11434:11434
|
command: mistral
|
||||||
|
network_mode: service:open-webui
|
||||||
|
environment:
|
||||||
|
OLLAMA_HOST: http://localhost:11434
|
||||||
# begin for NVIDIA GPU support
|
# begin for NVIDIA GPU support
|
||||||
deploy:
|
deploy:
|
||||||
resources:
|
resources:
|
||||||
|
@ -18,9 +20,6 @@ services:
|
||||||
count: 1
|
count: 1
|
||||||
capabilities: [gpu]
|
capabilities: [gpu]
|
||||||
# end of section for NVIDIA GPU support
|
# end of section for NVIDIA GPU support
|
||||||
|
|
||||||
volumes:
|
volumes:
|
||||||
- ollama_data:/root/.ollama
|
- ./ollama/bootstrap.sh:/bootstrap.sh:ro
|
||||||
|
- ./ollama:/root/.ollama
|
||||||
volumes:
|
|
||||||
ollama_data:
|
|
||||||
|
|
|
@ -0,0 +1,3 @@
|
||||||
|
*
|
||||||
|
!.gitignore
|
||||||
|
!bootstrap.sh
|
|
@ -0,0 +1,11 @@
|
||||||
|
#!/bin/bash -x
|
||||||
|
|
||||||
|
ollama serve &
|
||||||
|
|
||||||
|
sleep 1
|
||||||
|
|
||||||
|
for model in ${@:-mistral}; do
|
||||||
|
ollama pull "$model"
|
||||||
|
done
|
||||||
|
|
||||||
|
wait
|
Loading…
Reference in New Issue