1
0
Fork 0
This repository has been archived on 2024-03-28. You can view files and clone it, but cannot push or open issues or pull requests.
Go to file
Massaki Archambault 657ebae73c add nvidia version of docker compose spec 2024-02-07 22:55:21 -05:00
librechat working prototype using ollama 2024-02-02 22:27:25 -05:00
litellm working prototype using ollama 2024-02-02 22:27:25 -05:00
ollama bump ollama version 2024-02-07 22:53:11 -05:00
.env add some comments to config files 2024-02-06 19:27:03 -05:00
.gitignore modularize docker-compose spec 2024-02-07 22:42:57 -05:00
README.md fix typos 2024-02-06 19:16:13 -05:00
docker-compose.amd.yml bump ollama version 2024-02-07 22:53:11 -05:00
docker-compose.base.yml modularize docker-compose spec 2024-02-07 22:42:57 -05:00
docker-compose.nvidia.yml add nvidia version of docker compose spec 2024-02-07 22:55:21 -05:00

README.md

librechat-mistral

A quick prototype to self-host LibreChat with Mistral, and a OpenAI-like api provided by LiteLLM on the side.

Currently setup to run on an AMD GPU (RX 7xxx series), although the deployment could be adapted for Nvidia or other AMD GPUS

Goals

  • Deploy a ChatGPT Clone for daily use.
  • Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
  • Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

Getting started

Prerequisites

  • Linux (WSL2 is untested)
  • An AMD 7xxx series GPU (technically optional, Ollama will fallback to using the CPU but it will be very slow. Other GPUS are supported but the deployment must be modified to use them)
  • docker
  • docker-compose

Steps

  1. Clone the repo
  2. Run docker-compose up. Wait for a few minutes for the model to be downloaded and served.
  3. Browse http://localhost:3080/
  4. Create an admin account and start chatting!

The API along with the APIDoc will be available at http://localhost:8000/

Configuring additional models

SASS services

Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints

TL:DR

Let say we want to configure an OpenAI API key.

  1. Open the .env file.
  2. Uncomment the line # OPENAI_API_KEY=user_provided.
  3. Replace user_provided with your API key.
  4. Restart LibreChat docker-compose restart librechat.

Refer to the LibreChat documentation for the full list of configuration options.

Ollama (self-hosted)

Browse the Ollama models library to find a model you wish to add. For this example we will add mistral-openorca

  1. Open the docker-compose.yml file.
  2. Find the ollama service. Find the command: option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: command: mistral mistral-openorca).
  3. Open the litellm/config.yaml file.
  4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model
  - model_name: {model_name}
    litellm_params:
      model: ollama/{model_name}
      api_base: http://ollama:11434

eg:

  - model_name: mistral-openorca
    litellm_params:
      model: ollama/mistral-openorca
      api_base: http://ollama:11434
  1. Open the librechat/librechat.yaml file.
  2. In our case, mistral-openorca is a variation of mistral-7b so we will group it with the existing Mistral endpoint. Refer to the LibreChat documentation if you wish to organize your new model as a new Endpoint.
      models: 
        default: ["mistral-7b"]

becomes:

      models: 
        default: ["mistral-7b", "mistral-openorca"]
  1. Restart the stack docker-compose restart. Wait for a few minutes for the model to be downloaded and served.

Architecture components

TODO

At the time of this project, I only had access to a Linux machine with an AMD RX 7800XT GPU. I would like to include support for Windows and/or Nvidia GPUs when I get the chance.