The LLaMa Web is a web interface to chat or play with LLaMa based models.
cd client && pnpm install --frozen-lockfile && pnpm build
cd ..
cd api && yarn install --frozen-lockfile && yarn build
Copy the example.env file from the both folder to .env and edit it.
cp client/example.env client/.env && nano client/.env
cp api/example.env api/.env && nano api/.env
In the both folder run the following command:
yarn start
docker-compose.yml fileEdit the docker-compose.yml file and change the environment variables.
However, you can’t change the DB, LLAMA_PATH and LLAMA_EMBEDDING_PATH variables.
If you don’t want to use Keycloak, you can enable the SKIP_AUTH variable, by setting it to true in client AND api.
docker-compose up -d
[!NOTE] We assume you want to use TheBloke/Llama-2-7B-Chat-GGUF. Good to know: This project is tested by using TheBloke’s GGUF model.
[!NOTE] On Docker or without docker, the steps are the same.
Models tabInstall a new modelllama-2-7b-chat)https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf)Install the new model[!WARNING] Built-in models management is not supported when using an alternative compute backend. You have to edit the alternative backend directly to support the model you want to use. No support will be provided for this. It is possible to use the built-in models management and the alternative compute backend at the same time.
[!NOTE] You can disable alternative compute backend by setting
ALLOW_ALTERNATIVE_COMPUTE_BACKENDtofalsein the api.envfile.
examples/alt-backend/mixtral8x7B.pyModels tabInstall a new modelllama-2-7b-chat)Use alternative compute backendhttps://my-alternative-compute-backend.domain.com)Add the alternative backend model