local-llm/README.md

# local-llm

A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.

## Getting started

### Prerequisites

* Linux or WSL2
* docker

### Steps for NVIDIA GPU

1. Check if your GPU is supported: https://github.com/ollama/ollama/blob/main/docs/gpu.md#nvidia. You need CUDA 5.0+. As a reference, the oldest card I managed to make it run is a GeForce GTX 970Ti and a Quadro M4000 (they both were quite slow though).
2. Make sure your drivers are up to date. If you are on Windows, make sure your drivers are up to date on your Windows host.
3. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
4. Clone the repo.
5. Symlink the NVIDIA compose spec to select it. `ln -s docker-compose.nvidia.yml docker.compose.yml`
6. Run `docker compose up`.
7. Browse http://localhost:8080/
8. Add a model and start chatting!

### Steps for AMD GPU

**Warning: AMD will *doesn't* support Windows at the moment. Use Linux.**

1. Check if your GPU is supported: https://github.com/ollama/ollama/blob/main/docs/gpu.md#amd-radeon. It may be possible to run even with an unsupported GPU (I once managed to make it run on a 5700XT) by setting the `HSA_OVERRIDE_GFX_VERSION` environment variable but you are on your own. You can add this environment variable by editing the file `docker-compose.amd.yml`.
2. Make sure your drivers are up to date.
3. Clone the repo.
4. Symlink the AMD compose spec to select it. `ln -s docker-compose.amd.yml docker.compose.yml`
5. Run `docker compose up`.
6. Browse http://localhost:8080/
7. Add a model and start chatting!

### Steps for NO GPU (use CPU)

**Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model**

1. Clone the repo.
2. Symlink the CPU compose spec to select it. `ln -s docker-compose.cpu.yml docker.compose.yml`
3. Run `docker compose up`.
4. Browse http://localhost:8080/
5. Add a model and start chatting!


## Adding models

Ollama makes it easy to download and start using new LLM models. It's structure is quite similar to `docker` so using it should feel familiar if you have used docker before. A list of available models can be found on [their site](https://ollama.com/search) (analogous to Docker Hub). You can also import models downloaded from other platforms like [HuggingFace](https://huggingface.co/) using [Modelfile](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) (analogous to Dockerfile).

### GUI

Open WebUI provide an easy-to-use frontend to manage your Ollama models. You can do so via the **Settings > Admin Settings > Models** page.

Open WebUI can also be used a a front-end for SaaS such as [OpenAI](https://openai.com/), [Anthropic](https://www.anthropic.com/), [Mistral](https://mistral.ai/), etc. Refer to the [documentation](https://docs.openwebui.com/).

### Command-line

If you prefer using the command line,
1. Ensure the docker-compose project is up and running
2. Make sure your working directory is set to the folder where you cloned this repo.

Then, you should be able to run the `ollama` command line directly inside the *ollama* container.

Examples:

To download a model:
``` sh
docker compose exec ollama ollama pull gemma2
```

To list all downloaded models:
``` sh
docker compose exec ollama ollama list
```

To delete a model:
``` sh
docker compose exec ollama ollama rm gemma2
```

A full list of command can be seen by running
``` sh
docker compose exec ollama ollama help
```

## Using the API

### Open WebUI

Open WebUI can act as a proxy to Ollama. Authentication is done though a JWT token which can be fetched in the **Settings > About** page in Open WebUI.

Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.  
Example usage:
``` sh
curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags
```

The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. There is no authentication.  
Example usage:
``` sh
curl http://localhost:11434/api/tags
```

### Ollama

[Ollama also have some OpenAI-compatible APIs](https://ollama.com/blog/openai-compatibility). See the blog post for more detailed usage instructions.  
Example usage:
``` sh
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
```

### Examples integrations

Using the API, this deployment can be used as the basis for other applications which leverages LLM technology.

Examples:
* [continue.dev](https://continue.dev) [[openwebui documentaion](https://docs.openwebui.com/tutorials/integrations/continue-dev)]
* [aiac](https://github.com/gofireflyio/aiac)
* [crewAI](https://github.com/crewAIInc/crewAI)

## Updating

Simply run
``` sh
git pull
docker compose pull
docker compose restart
```

## Alternatives

Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!
update README 2024-04-04 14:16:41 +00:00			`# local-llm`
Initial commit 2024-02-02 17:19:32 +00:00
update README 2024-04-04 14:16:41 +00:00			`A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.`
readme 2024-02-06 18:29:57 +00:00
			`## Getting started`

			`### Prerequisites`

update readme 2024-02-08 04:19:26 +00:00			`* Linux or WSL2`
readme 2024-02-06 18:29:57 +00:00			`* docker`

update readme 2024-02-08 04:19:26 +00:00			`### Steps for NVIDIA GPU`

improve gpu requirements in readme 2024-11-14 01:40:51 +00:00			`1. Check if your GPU is supported: https://github.com/ollama/ollama/blob/main/docs/gpu.md#nvidia. You need CUDA 5.0+. As a reference, the oldest card I managed to make it run is a GeForce GTX 970Ti and a Quadro M4000 (they both were quite slow though).`
			`2. Make sure your drivers are up to date. If you are on Windows, make sure your drivers are up to date on your Windows host.`
			`3. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).`
			`4. Clone the repo.`
			5. Symlink the NVIDIA compose spec to select it. `ln -s docker-compose.nvidia.yml docker.compose.yml`
			6. Run `docker compose up`.
			`7. Browse http://localhost:8080/`
			`8. Add a model and start chatting!`
update readme 2024-02-08 04:19:26 +00:00
			`### Steps for AMD GPU`
readme 2024-02-06 18:29:57 +00:00
update readme 2024-11-13 04:27:30 +00:00			`*Warning: AMD will doesn't* support Windows at the moment. Use Linux.**`
add mention of nvidia container toolkit 2024-02-11 05:02:59 +00:00
improve gpu requirements in readme 2024-11-14 01:40:51 +00:00			1. Check if your GPU is supported: https://github.com/ollama/ollama/blob/main/docs/gpu.md#amd-radeon. It may be possible to run even with an unsupported GPU (I once managed to make it run on a 5700XT) by setting the `HSA_OVERRIDE_GFX_VERSION` environment variable but you are on your own. You can add this environment variable by editing the file `docker-compose.amd.yml`.
			`2. Make sure your drivers are up to date.`
			`3. Clone the repo.`
			4. Symlink the AMD compose spec to select it. `ln -s docker-compose.amd.yml docker.compose.yml`
			5. Run `docker compose up`.
			`6. Browse http://localhost:8080/`
			`7. Add a model and start chatting!`
readme 2024-02-06 18:29:57 +00:00
add CPU version of the deployment 2024-02-22 03:01:24 +00:00			`### Steps for NO GPU (use CPU)`

			`Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model`

improve gpu requirements in readme 2024-11-14 01:40:51 +00:00			`1. Clone the repo.`
			2. Symlink the CPU compose spec to select it. `ln -s docker-compose.cpu.yml docker.compose.yml`
			3. Run `docker compose up`.
			`4. Browse http://localhost:8080/`
			`5. Add a model and start chatting!`
readme 2024-02-06 18:29:57 +00:00

update readme 2024-11-14 01:12:23 +00:00			`## Adding models`

			Ollama makes it easy to download and start using new LLM models. It's structure is quite similar to `docker` so using it should feel familiar if you have used docker before. A list of available models can be found on [their site](https://ollama.com/search) (analogous to Docker Hub). You can also import models downloaded from other platforms like [HuggingFace](https://huggingface.co/) using [Modelfile](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) (analogous to Dockerfile).

			`### GUI`

			`Open WebUI provide an easy-to-use frontend to manage your Ollama models. You can do so via the Settings > Admin Settings > Models page.`

			`Open WebUI can also be used a a front-end for SaaS such as [OpenAI](https://openai.com/), [Anthropic](https://www.anthropic.com/), [Mistral](https://mistral.ai/), etc. Refer to the [documentation](https://docs.openwebui.com/).`

			`### Command-line`

			`If you prefer using the command line,`
			`1. Ensure the docker-compose project is up and running`
			`2. Make sure your working directory is set to the folder where you cloned this repo.`

			Then, you should be able to run the `ollama` command line directly inside the ollama container.

			`Examples:`

			`To download a model:`
			``` sh
			`docker compose exec ollama ollama pull gemma2`
			```

			`To list all downloaded models:`
			``` sh
			`docker compose exec ollama ollama list`
			```

			`To delete a model:`
			``` sh
			`docker compose exec ollama ollama rm gemma2`
			```

			`A full list of command can be seen by running`
			``` sh
			`docker compose exec ollama ollama help`
			```

update README 2024-04-04 14:16:41 +00:00			`## Using the API`
readme 2024-02-06 18:29:57 +00:00
update readme 2024-11-14 01:12:23 +00:00			`### Open WebUI`

			`Open WebUI can act as a proxy to Ollama. Authentication is done though a JWT token which can be fetched in the Settings > About page in Open WebUI.`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.`
			`Example usage:`
			``` sh
			`curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags`
			```
readme 2024-02-06 18:29:57 +00:00
update readme 2024-11-14 01:12:23 +00:00			`The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. There is no authentication.`
update README 2024-04-04 14:16:41 +00:00			`Example usage:`
			``` sh
			`curl http://localhost:11434/api/tags`
readme 2024-02-06 18:29:57 +00:00			```
update README 2024-04-04 14:16:41 +00:00
update readme 2024-11-14 01:12:23 +00:00			`### Ollama`

update README 2024-04-04 14:16:41 +00:00			`[Ollama also have some OpenAI-compatible APIs](https://ollama.com/blog/openai-compatibility). See the blog post for more detailed usage instructions.`
			`Example usage:`
			``` sh
			`curl http://localhost:11434/v1/chat/completions \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "mistral",`
			`"messages": [`
			`{`
			`"role": "system",`
			`"content": "You are a helpful assistant."`
			`},`
			`{`
			`"role": "user",`
			`"content": "Hello!"`
			`}`
			`]`
			`}'`
readme 2024-02-06 18:29:57 +00:00			```

update readme 2024-11-14 01:12:23 +00:00			`### Examples integrations`
readme 2024-02-06 18:29:57 +00:00
update readme 2024-11-14 01:12:23 +00:00			`Using the API, this deployment can be used as the basis for other applications which leverages LLM technology.`
default model name to mistral 2024-02-22 02:45:43 +00:00
update readme 2024-11-14 01:12:23 +00:00			`Examples:`
			`* [continue.dev](https://continue.dev) [[openwebui documentaion](https://docs.openwebui.com/tutorials/integrations/continue-dev)]`
			`* [aiac](https://github.com/gofireflyio/aiac)`
			`* [crewAI](https://github.com/crewAIInc/crewAI)`
update readme 2024-11-13 04:27:30 +00:00
update readme 2024-11-14 01:12:23 +00:00			`## Updating`

			`Simply run`
			``` sh
			`git pull`
			`docker compose pull`
			`docker compose restart`
			```
update readme 2024-11-13 04:27:30 +00:00
mention LM Studio 2024-02-22 03:05:38 +00:00			`## Alternatives`

			`Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!`