1
0
Fork 0
This repository has been archived on 2024-03-28. You can view files and clone it, but cannot push or open issues or pull requests.
Go to file
Massaki Archambault 87af1e1de2 enable auto-fetch for librechat 2024-02-10 23:53:37 -05:00
librechat enable auto-fetch for librechat 2024-02-10 23:53:37 -05:00
litellm working prototype using ollama 2024-02-02 22:27:25 -05:00
ollama bump ollama version 2024-02-07 22:53:11 -05:00
.env update readme 2024-02-07 23:19:26 -05:00
.gitignore modularize docker-compose spec 2024-02-07 22:42:57 -05:00
README.md update readme 2024-02-07 23:19:26 -05:00
docker-compose.amd.yml update readme 2024-02-07 23:19:26 -05:00
docker-compose.base.yml modularize docker-compose spec 2024-02-07 22:42:57 -05:00
docker-compose.nvidia.yml add nvidia version of docker compose spec 2024-02-07 22:55:21 -05:00

README.md

librechat-mistral

A quick prototype to self-host LibreChat with Mistral, and a OpenAI-like api provided by LiteLLM on the side.

Goals

  • Deploy a ChatGPT Clone for daily use.
  • Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
  • Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

Getting started

Prerequisites

  • Linux or WSL2
  • docker

Steps for NVIDIA GPU

  1. Clone the repo
  2. Copy the AMD compose spec to select it. cp docker-compose.nvidia.yml docker.compose.yml
  3. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  4. Browse http://localhost:3080/
  5. Create an admin account and start chatting!

Steps for AMD GPU

  1. Clone the repo
  2. Copy the AMD compose spec to select it. cp docker-compose.amd.yml docker.compose.yml
  3. If you are using an RX (consumer) series GPU, you may need to set HSA_OVERRIDE_GFX_VERSION to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in docker-compose.yml,
  4. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  5. Browse http://localhost:3080/
  6. Create an admin account and start chatting!

The API along with the APIDoc will be available at http://localhost:8000/

Configuring additional models

SASS services

Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints

TL:DR

Let say we want to configure an OpenAI API key.

  1. Open the .env file.
  2. Uncomment the line # OPENAI_API_KEY=user_provided.
  3. Replace user_provided with your API key.
  4. Restart LibreChat docker compose restart librechat.

Refer to the LibreChat documentation for the full list of configuration options.

Ollama (self-hosted)

Browse the Ollama models library to find a model you wish to add. For this example we will add mistral-openorca

  1. Open the docker compose.yml file.
  2. Find the ollama service. Find the command: option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: command: mistral mistral-openorca).
  3. Open the litellm/config.yaml file.
  4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model
  - model_name: {model_name}
    litellm_params:
      model: ollama/{model_name}
      api_base: http://ollama:11434

eg:

  - model_name: mistral-openorca
    litellm_params:
      model: ollama/mistral-openorca
      api_base: http://ollama:11434
  1. Open the librechat/librechat.yaml file.
  2. In our case, mistral-openorca is a variation of mistral-7b so we will group it with the existing Mistral endpoint. Refer to the LibreChat documentation if you wish to organize your new model as a new Endpoint.
      models: 
        default: ["mistral-7b"]

becomes:

      models: 
        default: ["mistral-7b", "mistral-openorca"]
  1. Restart the stack docker compose restart. Wait for a few minutes for the model to be downloaded and served.

Architecture components