1
0
Fork 0

update README

This commit is contained in:
Massaki Archambault 2024-04-04 10:16:41 -04:00
parent 9ea2781562
commit 1a3b9cde2c
1 changed files with 75 additions and 49 deletions

122
README.md
View File

@ -1,6 +1,6 @@
# local-llama # local-llm
A quick prototype to self-host [LibreChat](https://github.com/danny-avila/LibreChat) backed by a locally-run [Mistral](https://mistral.ai/news/announcing-mistral-7b/) model, and an OpenAI-like api provided by [LiteLLM](https://github.com/BerriAI/litellm) on the side. A quick prototype to self-host [Open WebUI](https://docs.openwebui.com/) backed by [Ollama](https://ollama.com/) to run LLM inference locally.
## Goals ## Goals
@ -21,22 +21,21 @@ A quick prototype to self-host [LibreChat](https://github.com/danny-avila/LibreC
1. Make sure your drivers are up to date. 1. Make sure your drivers are up to date.
2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). 2. Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
3. Clone the repo. 3. Clone the repo.
4. Copy the AMD compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml` 4. Copy the NVIDIA compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
6. Browse http://localhost:3080/ 6. Browse http://localhost:8080/
7. Create an admin account and start chatting! 7. Create an account and start chatting!
### Steps for AMD GPU ### Steps for AMD GPU
**Warning: AMD was not tested on Windows and support seems to not be as good as on Linux.** **Warning: AMD was not tested on Windows.**
1. Make sure your drivers are up to date. 1. Make sure your drivers are up to date.
2. Clone the repo. 2. Clone the repo.
3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml` 3. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
4. If you are using an RX (consumer) series GPU, you *may* need to set `HSA_OVERRIDE_GFX_VERSION` to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in *docker-compose.yml*, 4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 5. Browse http://localhost:8080/
6. Browse http://localhost:3080/ 6. Create an account and start chatting!
7. Create an admin account and start chatting!
### Steps for NO GPU (use CPU) ### Steps for NO GPU (use CPU)
@ -46,54 +45,81 @@ A quick prototype to self-host [LibreChat](https://github.com/danny-avila/LibreC
2. Clone the repo. 2. Clone the repo.
3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml` 3. Copy the CPU compose spec to select it. `cp docker-compose.cpu.yml docker.compose.yml`
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served. 4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:3080/ 5. Browse http://localhost:8080/
6. Create an admin account and start chatting! 6. Create an account and start chatting!
## Configuring additional models ## Configuring additional models
### SASS services ### Self-hosted (Ollama)
Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [gemma](https://ollama.com/library/gemma)
**TL:DR** #### Configuring via the command-line
``` sh
Let say we want to configure an OpenAI API key. docker compose exec ollama ollama pull gemma
1. Open the *.env* file.
2. Uncomment the line `# OPENAI_API_KEY=user_provided`.
3. Replace `user_provided` with your API key.
4. Restart LibreChat `docker compose restart librechat`.
Refer to the [LibreChat documentation](https://docs.librechat.ai/install/configuration/ai_setup.html#openai) for the full list of configuration options.
### Ollama (self-hosted)
Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [mistral-openorca](https://ollama.ai/library/mistral-openorca)
1. Open the *docker compose.yml* file.
2. Find the `ollama` service. Find the `command:` option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: `command: mistral mistral-openorca`).
3. Open the *litellm/config.yaml* file.
4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model
``` yaml
- model_name: {model_name}
litellm_params:
model: ollama/{model_name}
api_base: http://ollama:11434
``` ```
eg:
``` yaml ### External providers (OpenAI, Mistral, Anthropic, etc.)
- model_name: mistral-openorca
External providers can be configured through a [LiteLLM](https://github.com/BerriAI/litellm) instance embedded into open-webui. A full list of supported providers, and how to configure them, can be found in the [documentation](https://docs.litellm.ai/docs/providers).
Let say we want to configure gpt-3.5-turbo with an OpenAI API key.
#### Configuring via a config file
1. Open the file *./litellm/config.yaml* in your editor.
2. Add an entry under `model_list`:
``` yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params: litellm_params:
model: ollama/mistral-openorca model: gpt-3.5-turbo
api_base: http://ollama:11434 api_key: <put your OpenAI API key here>
```
3. Run `docker compose restart open-webui` to restart Open WebUI.
## Using the API
Open WebUI act as a proxy to Ollama and LiteLLM. For both API, authentication is done though a JWT token which can be fetched in the **Settings > About** page in Open WebUI.
Open WebUI exposes the Ollama API at the url http://localhost:8080/ollama/api.
Example usage:
``` sh
curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/ollama/api/tags
``` ```
5. Restart the stack `docker compose restart`. Wait for a few minutes for the model to be downloaded and served.
## Architecture components The Ollama API can also be queried directly on port 11434, without proxing through Open WebUI. In some cases, like when working locally, it may be easier to use without having to proxy through Open WebUI. In that case, there is no authentification.
Example usage:
``` sh
curl http://localhost:11434/api/tags
```
* [LibreChat](https://github.com/danny-avila/LibreChat) is a ChatGPT clone with support for multiple AI endpoints. It's deployed alongside a [MongoDB](https://github.com/mongodb/mongo) database and [Meillisearch](https://github.com/meilisearch/meilisearch) for search. It's exposed on http://localhost:3080/. [Ollama also have some OpenAI-compatible APIs](https://ollama.com/blog/openai-compatibility). See the blog post for more detailed usage instructions.
* [LiteLLM](https://github.com/BerriAI/litellm) is an OpenAI-like API. It is exposed on http://localhost:8000/ without any authentication by default. Example usage:
* [Ollama](https://github.com/ollama/ollama) manages and serve the local models. ``` sh
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
```
Open WebUI exposes the LiteLLM API (for external providers) at the url http://localhost:8080/litellm/api/v1.
Example usage:
``` sh
curl -H "Authorization: Bearer <Paste your JWT token here>" http://localhost:8080/litellm/api/v1/models
```
The JWT token can be used in place of the OpenAI API key for OpenAI-compatible libraries/applications.
## Alternatives ## Alternatives