badjware

Go to file

Massaki Archambault 938ab69322 leave restart policy to default & cleanup		2024-11-13 20:22:17 -05:00
.gitignore	modularize docker-compose spec	2024-02-07 22:42:57 -05:00
README.md	update readme	2024-11-13 20:20:53 -05:00
docker-compose.amd.yml	leave restart policy to default & cleanup	2024-11-13 20:22:17 -05:00
docker-compose.base.yml	leave restart policy to default & cleanup	2024-11-13 20:22:17 -05:00
docker-compose.cpu.yml	leave restart policy to default & cleanup	2024-11-13 20:22:17 -05:00
docker-compose.nvidia.yml	leave restart policy to default & cleanup	2024-11-13 20:22:17 -05:00

README.md

local-llm

A quick prototype to self-host Open WebUI backed by Ollama to run LLM inference locally.

Getting started

Prerequisites

Linux or WSL2
docker

Steps for NVIDIA GPU

Make sure your drivers are up to date.
Install the NVIDIA Container Toolkit.
Clone the repo.
Symlink the NVIDIA compose spec to select it. ln -s docker-compose.nvidia.yml docker.compose.yml
Run docker compose up.
Browse http://localhost:8080/
Add a model and start chatting!

Steps for AMD GPU

Warning: AMD will doesn't support Windows at the moment. Use Linux.

Make sure your drivers are up to date.
Clone the repo.
Symlink the AMD compose spec to select it. ln -s docker-compose.amd.yml docker.compose.yml
Run docker compose up.
Browse http://localhost:8080/
Add a model and start chatting!

Steps for NO GPU (use CPU)

Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model

Make sure your drivers are up to date.
Clone the repo.
Symlink the CPU compose spec to select it. ln -s docker-compose.cpu.yml docker.compose.yml
Run docker compose up.
Browse http://localhost:8080/
Add a model and start chatting!

Adding models

Ollama makes it easy to download and start using new LLM models. It's structure is quite similar to docker so using it should feel familiar if you have used docker before. A list of available models can be found on their site (analogous to Docker Hub). You can also import models downloaded from other platforms like HuggingFace using Modelfile (analogous to Dockerfile).

GUI

Open WebUI provide an easy-to-use frontend to manage your Ollama models. You can do so via the Settings > Admin Settings > Models page.

Open WebUI can also be used a a front-end for SaaS such as OpenAI, Anthropic, Mistral, etc. Refer to the documentation.

Command-line

If you prefer using the command line,

Ensure the docker-compose project is up and running
Make sure your working directory is set to the folder where you cloned this repo.

Then, you should be able to run the ollama command line directly inside the ollama container.

Examples:

To download a model:

docker compose exec ollama ollama pull gemma2

To list all downloaded models:

docker compose exec ollama ollama list

To delete a model:

docker compose exec ollama ollama rm gemma2

A full list of command can be seen by running

docker compose exec ollama ollama help

Using the API

Open WebUI

Open WebUI can act as a proxy to Ollama. Authentication is done though a JWT token which can be fetched in the Settings > About page in Open WebUI.

Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.
Example usage:

curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags

The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. There is no authentication.
Example usage:

curl http://localhost:11434/api/tags

Ollama

Ollama also have some OpenAI-compatible APIs. See the blog post for more detailed usage instructions.
Example usage:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Examples integrations

Using the API, this deployment can be used as the basis for other applications which leverages LLM technology.

Examples:

Updating

Simply run

git pull
docker compose pull
docker compose restart

Alternatives

Check out LM Studio for a more integrated, but non web-based alternative!