badjware

Archived

This repository has been archived on 2024-03-28. You can view files and clone it, but cannot push or open issues or pull requests.

Go to file

Massaki Archambault ee7bbad389 update readme summary		2024-02-21 22:45:45 -05:00
librechat	default model name to mistral	2024-02-21 21:45:43 -05:00
litellm	default model name to mistral	2024-02-21 21:45:43 -05:00
ollama	fix issue with http_proxy	2024-02-21 22:45:35 -05:00
.env	fix issue with http_proxy	2024-02-21 22:45:35 -05:00
.gitignore	modularize docker-compose spec	2024-02-07 22:42:57 -05:00
README.md	update readme summary	2024-02-21 22:45:45 -05:00
docker-compose.amd.yml	update readme	2024-02-07 23:19:26 -05:00
docker-compose.base.yml	fix issue with http_proxy	2024-02-21 22:45:35 -05:00
docker-compose.cpu.yml	add CPU version of the deployment	2024-02-21 22:01:53 -05:00
docker-compose.nvidia.yml	add nvidia version of docker compose spec	2024-02-07 22:55:21 -05:00

librechat-mistral

A quick prototype to self-host LibreChat backed by a locally-run Mistral model, and an OpenAI-like api provided by LiteLLM on the side.

Goals

Streamline deployment of a local LLM for experimentation purpose.
Deploy a ChatGPT Clone for daily use.
Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

Make sure your drivers are up to date.
Install the NVIDIA Container Toolkit.
Clone the repo.
Copy the AMD compose spec to select it. cp docker-compose.nvidia.yml docker.compose.yml
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:3080/
Create an admin account and start chatting!

Warning: AMD was not tested on Windows and support seems to not be as good as on Linux.

Make sure your drivers are up to date.
Clone the repo.
Copy the AMD compose spec to select it. cp docker-compose.amd.yml docker.compose.yml
If you are using an RX (consumer) series GPU, you may need to set HSA_OVERRIDE_GFX_VERSION to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in docker-compose.yml,
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:3080/
Create an admin account and start chatting!

Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model

Make sure your drivers are up to date.
Clone the repo.
Copy the CPU compose spec to select it. cp docker-compose.cpu.yml docker.compose.yml
Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
Browse http://localhost:3080/
Create an admin account and start chatting!

TL:DR

Let say we want to configure an OpenAI API key.

Refer to the LibreChat documentation for the full list of configuration options.

Browse the Ollama models library to find a model you wish to add. For this example we will add mistral-openorca

Open the docker compose.yml file.
Find the ollama service. Find the command: option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: command: mistral mistral-openorca).
Open the litellm/config.yaml file.
Add the following a the end of the file, replace {model_name} placeholders with the name of your model

  - model_name: {model_name}
    litellm_params:
      model: ollama/{model_name}
      api_base: http://ollama:11434

eg:

  - model_name: mistral-openorca
    litellm_params:
      model: ollama/mistral-openorca
      api_base: http://ollama:11434

Restart the stack docker compose restart. Wait for a few minutes for the model to be downloaded and served.

LibreChat is a ChatGPT clone with support for multiple AI endpoints. It's deployed alongside a MongoDB database and Meillisearch for search. It's exposed on http://localhost:3080/.
LiteLLM is an OpenAI-like API. It is exposed on http://localhost:8000/ without any authentication by default.
Ollama manages and serve the local models.

Check out LM Studio for a more integrated, but non web-based alternative!