Massaki Archambault 9ea2781562 | ||
---|---|---|
litellm | ||
ollama | ||
.gitignore | ||
README.md | ||
docker-compose.amd.yml | ||
docker-compose.base.yml | ||
docker-compose.cpu.yml | ||
docker-compose.nvidia.yml |
README.md
local-llama
A quick prototype to self-host LibreChat backed by a locally-run Mistral model, and an OpenAI-like api provided by LiteLLM on the side.
Goals
- Streamline deployment of a local LLM for experimentation purpose.
- Deploy a ChatGPT Clone for daily use.
- Deploy an OpenAI-like API for hacking on Generative AI using well-supported libraries.
- Use docker to prepare for an eventual deployment on a container orchestration platform like Kubernetes.
Getting started
Prerequisites
- Linux or WSL2
- docker
Steps for NVIDIA GPU
- Make sure your drivers are up to date.
- Install the NVIDIA Container Toolkit.
- Clone the repo.
- Copy the AMD compose spec to select it.
cp docker-compose.nvidia.yml docker.compose.yml
- Run
docker compose up
. Wait for a few minutes for the model to be downloaded and served. - Browse http://localhost:3080/
- Create an admin account and start chatting!
Steps for AMD GPU
Warning: AMD was not tested on Windows and support seems to not be as good as on Linux.
- Make sure your drivers are up to date.
- Clone the repo.
- Copy the AMD compose spec to select it.
cp docker-compose.amd.yml docker.compose.yml
- If you are using an RX (consumer) series GPU, you may need to set
HSA_OVERRIDE_GFX_VERSION
to an appropriate value for the model of your GPU. You will need to look it up. The value can be set in docker-compose.yml, - Run
docker compose up
. Wait for a few minutes for the model to be downloaded and served. - Browse http://localhost:3080/
- Create an admin account and start chatting!
Steps for NO GPU (use CPU)
Warning: This may be very slow depending on your CPU and may us a lot of RAM depending on the model
- Make sure your drivers are up to date.
- Clone the repo.
- Copy the CPU compose spec to select it.
cp docker-compose.cpu.yml docker.compose.yml
- Run
docker compose up
. Wait for a few minutes for the model to be downloaded and served. - Browse http://localhost:3080/
- Create an admin account and start chatting!
Configuring additional models
SASS services
Read: https://docs.librechat.ai/install/configuration/dotenv.html#endpoints
TL:DR
Let say we want to configure an OpenAI API key.
- Open the .env file.
- Uncomment the line
# OPENAI_API_KEY=user_provided
. - Replace
user_provided
with your API key. - Restart LibreChat
docker compose restart librechat
.
Refer to the LibreChat documentation for the full list of configuration options.
Ollama (self-hosted)
Browse the Ollama models library to find a model you wish to add. For this example we will add mistral-openorca
- Open the docker compose.yml file.
- Find the
ollama
service. Find thecommand:
option under the ollama sevice. Append the name of the model you wish to add at the end of the list (eg:command: mistral mistral-openorca
). - Open the litellm/config.yaml file.
- Add the following a the end of the file, replace {model_name} placeholders with the name of your model
- model_name: {model_name}
litellm_params:
model: ollama/{model_name}
api_base: http://ollama:11434
eg:
- model_name: mistral-openorca
litellm_params:
model: ollama/mistral-openorca
api_base: http://ollama:11434
- Restart the stack
docker compose restart
. Wait for a few minutes for the model to be downloaded and served.
Architecture components
- LibreChat is a ChatGPT clone with support for multiple AI endpoints. It's deployed alongside a MongoDB database and Meillisearch for search. It's exposed on http://localhost:3080/.
- LiteLLM is an OpenAI-like API. It is exposed on http://localhost:8000/ without any authentication by default.
- Ollama manages and serve the local models.
Alternatives
Check out LM Studio for a more integrated, but non web-based alternative!