local-llm/README.md

# local-llm

A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.

## Goals

* Streamline deployment of a local LLM for experimentation purpose.
* Deploy a ChatGPT Clone for daily use.
* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

## Getting started

### Prerequisites

* Linux or WSL2
* docker

### Steps for NVIDIA GPU

1. Make sure your drivers are up to date.
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
3. Clone the repo.
4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
6. Browse http://localhost:8080/
7. Create an account and start chatting!

### Steps for AMD GPU

**Warning: AMD was not tested on Windows.**

1. Make sure your drivers are up to date.
2. Clone the repo.
3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/
6. Create an account and start chatting!

### Steps for NO GPU (use CPU)

**Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model**

1. Make sure your drivers are up to date.
2. Clone the repo.
3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:8080/
6. Create an account and start chatting!

## Configuring additional models

### Self-hosted (Ollama)

Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)

#### Configuring via the command-line
``` sh
docker compose exec ollama ollama pull gemma
```

### External providers (OpenAI, Mistral, Anthropic, etc.)

External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).

Let say we want to configure gpt-3.5-turbo with an OpenAI API key.

#### Configuring via a config file
1. Open the file *./litellm/config.yaml* in your editor.
2. Add an entry under `model_list`:
   ``` yaml
   model_list:
     - model_name: gpt-3.5-turbo
       litellm_params:
         model: gpt-3.5-turbo
         api_key: <put your OpenAI API key here>
   ```
3. Run `docker compose restart open-webui` to restart Open WebUI.

## Using the API

Open WebUI act as a proxy to Ollama and LiteLLM. For both API, authentication is done though a JWT token which can be fetched in the **Settings > About** page in Open WebUI.

Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.  
Example usage:
``` sh
curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags
```

The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. In that case, there is no authentification.  
Example usage:
``` sh
curl http://localhost:11434/api/tags
```

[Ollama also have some OpenAI-compatible APIs](https://ollama.com/blog/openai-compatibility). See the blog post for more detailed usage instructions.  
Example usage:
``` sh
curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'
```

Open WebUI exposes the LiteLLM API (for external providers) at the url http://localhost:8080/litellm/api/v1.  
Example usage:
``` sh
curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/litellm/api/v1/models
```

The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.

## Alternatives

Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!
update README 2024-04-04 14:16:41 +00:00			`# local-llm`
Initial commit 2024-02-02 17:19:32 +00:00
update README 2024-04-04 14:16:41 +00:00			`A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.`
readme 2024-02-06 18:29:57 +00:00
			`## Goals`

add mention of nvidia container toolkit 2024-02-11 05:02:59 +00:00			`* Streamline deployment of a local LLM for experimentation purpose.`
readme 2024-02-06 18:29:57 +00:00			`* Deploy a ChatGPT Clone for daily use.`
			`* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.`
			`* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.`

			`## Getting started`

			`### Prerequisites`

update readme 2024-02-08 04:19:26 +00:00			`* Linux or WSL2`
readme 2024-02-06 18:29:57 +00:00			`* docker`

update readme 2024-02-08 04:19:26 +00:00			`### Steps for NVIDIA GPU`

add mention of nvidia container toolkit 2024-02-11 05:02:59 +00:00			`1. Make sure your drivers are up to date.`
			`2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).`
			`3. Clone the repo.`
update README 2024-04-04 14:16:41 +00:00			4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
add mention of nvidia container toolkit 2024-02-11 05:02:59 +00:00			5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
update README 2024-04-04 14:16:41 +00:00			`6. Browse http://localhost:8080/`
			`7. Create an account and start chatting!`
update readme 2024-02-08 04:19:26 +00:00
			`### Steps for AMD GPU`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`Warning: AMD was not tested on Windows.`
add mention of nvidia container toolkit 2024-02-11 05:02:59 +00:00
			`1. Make sure your drivers are up to date.`
			`2. Clone the repo.`
			3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
update README 2024-04-04 14:16:41 +00:00			4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
			`5. Browse http://localhost:8080/`
			`6. Create an account and start chatting!`
readme 2024-02-06 18:29:57 +00:00
add CPU version of the deployment 2024-02-22 03:01:24 +00:00			`### Steps for NO GPU (use CPU)`

			`Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model`

			`1. Make sure your drivers are up to date.`
			`2. Clone the repo.`
			3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml`
			4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
update README 2024-04-04 14:16:41 +00:00			`5. Browse http://localhost:8080/`
			`6. Create an account and start chatting!`
readme 2024-02-06 18:29:57 +00:00
			`## Configuring additional models`

update README 2024-04-04 14:16:41 +00:00			`### Self-hosted (Ollama)`

			`Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)`

			`#### Configuring via the command-line`
			``` sh
			`docker compose exec ollama ollama pull gemma`
			```
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`### External providers (OpenAI, Mistral, Anthropic, etc.)`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`Let say we want to configure gpt-3.5-turbo with an OpenAI API key.`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`#### Configuring via a config file`
			`1. Open the file ./litellm/config.yaml in your editor.`
			2. Add an entry under `model_list`:
			``` yaml
			`model_list:`
			`- model_name: gpt-3.5-turbo`
			`litellm_params:`
			`model: gpt-3.5-turbo`
			`api_key: <put your OpenAI API key here>`
			```
			3. Run `docker compose restart open-webui` to restart Open WebUI.
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`## Using the API`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`Open WebUI act as a proxy to Ollama and LiteLLM. For both API, authentication is done though a JWT token which can be fetched in the Settings > About page in Open WebUI.`
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.`
			`Example usage:`
			``` sh
			`curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags`
			```
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. In that case, there is no authentification.`
			`Example usage:`
			``` sh
			`curl http://localhost:11434/api/tags`
readme 2024-02-06 18:29:57 +00:00			```
update README 2024-04-04 14:16:41 +00:00
			`[Ollama also have some OpenAI-compatible APIs](https://ollama.com/blog/openai-compatibility). See the blog post for more detailed usage instructions.`
			`Example usage:`
			``` sh
			`curl http://localhost:11434/v1/chat/completions \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"model": "mistral",`
			`"messages": [`
			`{`
			`"role": "system",`
			`"content": "You are a helpful assistant."`
			`},`
			`{`
			`"role": "user",`
			`"content": "Hello!"`
			`}`
			`]`
			`}'`
readme 2024-02-06 18:29:57 +00:00			```

update README 2024-04-04 14:16:41 +00:00			`Open WebUI exposes the LiteLLM API (for external providers) at the url http://localhost:8080/litellm/api/v1.`
			`Example usage:`
			``` sh
			`curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/litellm/api/v1/models`
			```
readme 2024-02-06 18:29:57 +00:00
update README 2024-04-04 14:16:41 +00:00			`The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.`
default model name to mistral 2024-02-22 02:45:43 +00:00
mention LM Studio 2024-02-22 03:05:38 +00:00			`## Alternatives`

			`Check out [LM Studio](https://lmstudio.ai/) for a more integrated, but non web-based alternative!`