local-llm/README.md

# librechat-mistral

A quick prototype to self-host [LibreChat](https://github.com/danny-avila/LibreChat) with [Mistral](https://mistral.ai/news/announcing-mistral-7b/), and a OpenAI-like api provided by [LiteLLM](https://github.com/BerriAI/litellm) on the side.

## Goals

* Deploy a ChatGPT Clone for daily use.
* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

## Getting started

### Prerequisites

* Linux or WSL2
* docker

### Steps for NVIDIA GPU

1. Clone the repo
2. Copy the AMD compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
3. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
4. Browse http://localhost:3080/
5. Create an admin account and start chatting!

### Steps for AMD GPU

1. Clone the repo
2. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
3. If you are using an RX (consumer) series GPU, you *may* need to set `HSA_OVERRIDE_GFX_VERSION` to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in *docker-compose.yml*,
4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
5. Browse http://localhost:3080/
6. Create an admin account and start chatting!

The API along with the APIDoc will be available at http://localhost:8000/

## Configuring additional models

### SASS services

Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints

**TL:DR**

Let say we want to configure an OpenAI API key.

1. Open the *.env* file.
2. Uncomment the line `# OPENAI_API_KEY=user_provided`.
3. Replace `user_provided` with your API key.
4. Restart LibreChat `docker compose restart librechat`.

Refer to the [LibreChat documentation](https://docs.librechat.ai/install/configuration/ai_setup.html#openai) for the full list of configuration options.

### Ollama (self-hosted)

Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [mistral-openorca](https://ollama.ai/library/mistral-openorca)

1. Open the *docker compose.yml* file.
2. Find the `ollama` service. Find the `command:` option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: `command: mistral mistral-openorca`).
3. Open the *litellm/config.yaml* file.
4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model
``` yaml
  - model_name: {model_name}
    litellm_params:
      model: ollama/{model_name}
      api_base: http://ollama:11434
```
eg:
``` yaml
  - model_name: mistral-openorca
    litellm_params:
      model: ollama/mistral-openorca
      api_base: http://ollama:11434
```
5. Open the *librechat/librechat.yaml* file.
6. In our case, **mistral-openorca** is a variation of **mistral-7b** so we will group it with the existing **Mistral** endpoint. Refer to the [LibreChat documentation](https://docs.librechat.ai/install/configuration/custom_config.html#custom-endpoint-object-structure) if you wish to organize your new model as a new Endpoint.
``` yaml
      models: 
        default: ["mistral-7b"]
```
becomes:
``` yaml
      models: 
        default: ["mistral-7b", "mistral-openorca"]
```
7. Restart the stack `docker compose restart`. Wait for a few minutes for the model to be downloaded and served.

## Architecture components

* [LibreChat](https://github.com/danny-avila/LibreChat) is a ChatGPT clone with support for multiple AI endpoints. It's deployed alongside a [MongoDB](https://github.com/mongodb/mongo) database and [Meillisearch](https://github.com/meilisearch/meilisearch) for search. It's exposed on http://localhost:3080/.
* [LiteLLM](https://github.com/BerriAI/litellm) is an OpenAI-like API. It is exposed on http://localhost:8000/ without any authentication by default.
* [Ollama](https://github.com/ollama/ollama) manages and serve the local models.
Initial commit 2024-02-02 17:19:32 +00:00			`# librechat-mistral`

readme 2024-02-06 18:29:57 +00:00			`A quick prototype to self-host [LibreChat](https://github.com/danny-avila/LibreChat) with [Mistral](https://mistral.ai/news/announcing-mistral-7b/), and a OpenAI-like api provided by [LiteLLM](https://github.com/BerriAI/litellm) on the side.`

			`## Goals`

			`* Deploy a ChatGPT Clone for daily use.`
			`* Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.`
			`* Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.`

			`## Getting started`

			`### Prerequisites`

update readme 2024-02-08 04:19:26 +00:00			`* Linux or WSL2`
readme 2024-02-06 18:29:57 +00:00			`* docker`

update readme 2024-02-08 04:19:26 +00:00			`### Steps for NVIDIA GPU`

			`1. Clone the repo`
			2. Copy the AMD compose spec to select it. `cp docker-compose.nvidia.yml docker.compose.yml`
			3. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
			`4. Browse http://localhost:3080/`
			`5. Create an admin account and start chatting!`

			`### Steps for AMD GPU`
readme 2024-02-06 18:29:57 +00:00
			`1. Clone the repo`
update readme 2024-02-08 04:19:26 +00:00			2. Copy the AMD compose spec to select it. `cp docker-compose.amd.yml docker.compose.yml`
			3. If you are using an RX (consumer) series GPU, you may need to set `HSA_OVERRIDE_GFX_VERSION` to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in docker-compose.yml,
			4. Run `docker compose up`. Wait for a few minutes for the model to be downloaded and served.
			`5. Browse http://localhost:3080/`
			`6. Create an admin account and start chatting!`
readme 2024-02-06 18:29:57 +00:00
			`The API along with the APIDoc will be available at http://localhost:8000/`

			`## Configuring additional models`

			`### SASS services`

			`Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints`

			`TL:DR`

			`Let say we want to configure an OpenAI API key.`

			`1. Open the .env file.`
			2. Uncomment the line `# OPENAI_API_KEY=user_provided`.
			3. Replace `user_provided` with your API key.
update readme 2024-02-08 04:19:26 +00:00			4. Restart LibreChat `docker compose restart librechat`.
readme 2024-02-06 18:29:57 +00:00
update readme 2024-02-08 04:19:26 +00:00			`Refer to the [LibreChat documentation](https://docs.librechat.ai/install/configuration/ai_setup.html#openai) for the full list of configuration options.`
readme 2024-02-06 18:29:57 +00:00
			`### Ollama (self-hosted)`

			`Browse the [Ollama models library](https://ollama.ai/library) to find a model you wish to add. For this example we will add [mistral-openorca](https://ollama.ai/library/mistral-openorca)`

update readme 2024-02-08 04:19:26 +00:00			`1. Open the docker compose.yml file.`
readme 2024-02-06 18:29:57 +00:00			2. Find the `ollama` service. Find the `command:` option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: `command: mistral mistral-openorca`).
			`3. Open the litellm/config.yaml file.`
			`4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model`
			``` yaml
			`- model_name: {model_name}`
			`litellm_params:`
			`model: ollama/{model_name}`
			`api_base: http://ollama:11434`
			```
			`eg:`
			``` yaml
			`- model_name: mistral-openorca`
			`litellm_params:`
			`model: ollama/mistral-openorca`
			`api_base: http://ollama:11434`
			```
			`5. Open the librechat/librechat.yaml file.`
			`6. In our case, mistral-openorca is a variation of mistral-7b so we will group it with the existing Mistral endpoint. Refer to the [LibreChat documentation](https://docs.librechat.ai/install/configuration/custom_config.html#custom-endpoint-object-structure) if you wish to organize your new model as a new Endpoint.`
			``` yaml
			`models:`
			`default: ["mistral-7b"]`
			```
			`becomes:`
			``` yaml
			`models:`
			`default: ["mistral-7b", "mistral-openorca"]`
			```
update readme 2024-02-08 04:19:26 +00:00			7. Restart the stack `docker compose restart`. Wait for a few minutes for the model to be downloaded and served.
readme 2024-02-06 18:29:57 +00:00
			`## Architecture components`

fix typos 2024-02-07 00:16:13 +00:00			`* [LibreChat](https://github.com/danny-avila/LibreChat) is a ChatGPT clone with support for multiple AI endpoints. It's deployed alongside a [MongoDB](https://github.com/mongodb/mongo) database and [Meillisearch](https://github.com/meilisearch/meilisearch) for search. It's exposed on http://localhost:3080/.`
readme 2024-02-06 18:29:57 +00:00			`* [LiteLLM](https://github.com/BerriAI/litellm) is an OpenAI-like API. It is exposed on http://localhost:8000/ without any authentication by default.`
			`* [Ollama](https://github.com/ollama/ollama) manages and serve the local models.`