1
0
Fork 0
Go to file
Massaki Archambault f912cf0179 remove .env 2024-04-04 10:16:41 -04:00
litellm promote use of open-webui embedded litellm 2024-04-04 10:16:41 -04:00
ollama fix issue with http_proxy 2024-02-21 22:45:35 -05:00
.gitignore modularize docker-compose spec 2024-02-07 22:42:57 -05:00
README.md rename project ot local-llama 2024-03-28 10:14:04 -04:00
docker-compose.amd.yml remove .env 2024-04-04 10:16:41 -04:00
docker-compose.base.yml remove .env 2024-04-04 10:16:41 -04:00
docker-compose.cpu.yml remove .env 2024-04-04 10:16:41 -04:00
docker-compose.nvidia.yml remove .env 2024-04-04 10:16:41 -04:00

README.md

local-llama

A quick prototype to self-host LibreChat backed by a locally-run Mistral model, and an OpenAI-like api provided by LiteLLM on the side.

Goals

  • Streamline deployment of a local LLM for experimentation purpose.
  • Deploy a ChatGPT Clone for daily use.
  • Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
  • Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.

Getting started

Prerequisites

  • Linux or WSL2
  • docker

Steps for NVIDIA GPU

  1. Make sure your drivers are up to date.
  2. Install the NVIDIA Container Toolkit.
  3. Clone the repo.
  4. Copy the AMD compose spec to select it. cp docker-compose.nvidia.yml docker.compose.yml
  5. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  6. Browse http://localhost:3080/
  7. Create an admin account and start chatting!

Steps for AMD GPU

Warning: AMD was not tested on Windows and support seems to not be as good as on Linux.

  1. Make sure your drivers are up to date.
  2. Clone the repo.
  3. Copy the AMD compose spec to select it. cp docker-compose.amd.yml docker.compose.yml
  4. If you are using an RX (consumer) series GPU, you may need to set HSA_OVERRIDE_GFX_VERSION to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in docker-compose.yml,
  5. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  6. Browse http://localhost:3080/
  7. Create an admin account and start chatting!

Steps for NO GPU (use CPU)

Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model

  1. Make sure your drivers are up to date.
  2. Clone the repo.
  3. Copy the CPU compose spec to select it. cp docker-compose.cpu.yml docker.compose.yml
  4. Run docker compose up. Wait for a few minutes for the model to be downloaded and served.
  5. Browse http://localhost:3080/
  6. Create an admin account and start chatting!

Configuring additional models

SASS services

Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints

TL:DR

Let say we want to configure an OpenAI API key.

  1. Open the .env file.
  2. Uncomment the line # OPENAI_API_KEY=user_provided.
  3. Replace user_provided with your API key.
  4. Restart LibreChat docker compose restart librechat.

Refer to the LibreChat documentation for the full list of configuration options.

Ollama (self-hosted)

Browse the Ollama models library to find a model you wish to add. For this example we will add mistral-openorca

  1. Open the docker compose.yml file.
  2. Find the ollama service. Find the command: option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg: command: mistral mistral-openorca).
  3. Open the litellm/config.yaml file.
  4. Add the following a the end of the file, replace {model_name} placeholders with the name of your model
  - model_name: {model_name}
    litellm_params:
      model: ollama/{model_name}
      api_base: http://ollama:11434

eg:

  - model_name: mistral-openorca
    litellm_params:
      model: ollama/mistral-openorca
      api_base: http://ollama:11434
  1. Restart the stack docker compose restart. Wait for a few minutes for the model to be downloaded and served.

Architecture components

Alternatives

Check out LM Studio for a more integrated, but non web-based alternative!