TGI Inference with Mistral-7b
In this tutorial, we will use Huggingface's TGI (Text Generation Interface) (opens in a new tab) API to query a Large Language Model (LLM) and enable users to requests jobs from it, both on-chain and off-chain.
Hardware Requirements
- A compute instance with GPU access.
- Any laptop
Tutorial Video
Install Pre-requisites
For this tutorial you'll need to have the following installed.
Setting up a TGI LLM Service
Included with this tutorial, is a containerized llm service (opens in a new tab).
Rent a GPU machine
To run this service, you will need to have access to a machine with a powerful GPU. In the video above, we use a
A100-80G
instance on Paperspace (opens in a new tab).
Install docker
You need to install Docker. For Ubuntu, you can run the following commands:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
As docker installation may vary depending on your operating system, consult the official documentation (opens in a new tab) for more information.
After installation, you can verify that docker is installed by running:
docker run hello-world
and expecting to see output like:
Hello from Docker!
Ensure CUDA is installed
Depending on where you rent your GPU machine, CUDA is typically pre-installed. For Ubuntu, you can follow the instructions here (opens in a new tab).
You can verify that CUDA is installed by running:
# verify installation
python -c '
import torch
print("torch.cuda.is_available()", torch.cuda.is_available())
print("torch.cuda.device_count()", torch.cuda.device_count())
print("torch.cuda.current_device()", torch.cuda.current_device())
print("torch.cuda.get_device_name(0)", torch.cuda.get_device_name(0))
'
If CUDA is installed and available, your output will look similar to the following:
torch.cuda.is_available() True
torch.cuda.device_count() 1
torch.cuda.current_device() 0
torch.cuda.get_device_name(0) Tesla V100-SXM2-16GB
Ensure nvidia-container-runtime
is installed
For your container to be able to access the GPU, you will need to install the nvidia-container-runtime
.
On Ubuntu, you can run the following commands:
# Docker GPU support
# nvidia container-runtime repos
# https://nvidia.github.io/nvidia-container-runtime/
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
# install nvidia-container-runtime
# https://docs.docker.com/config/containers/resource_constraints/#gpu
sudo apt-get install -y nvidia-container-runtime
As always, consult the official documentation (opens in a new tab) for more information.
You can verify that nvidia-container-runtime
is installed by running:
which nvidia-container-runtime-hook
# this should return a path to the nvidia-container-runtime-hook
Now, with the pre-requisites installed, we can move on to setting up the TGI
service.
Clone this repository
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Run the Stable Diffusion service
make run-service project=tgi-llm service=tgi
This will start the tgi
service. Note that this service will have to download a large model file,
so it may take a few minutes to be fully ready. Downloaded model will get cached, so subsequent runs will be faster.
Testing the tgi-llm
service via the gradio UI
Included with this project is a simple gradio chat UI that allows you to interact with the tgi-llm
service. This is
not needed for running the Infernet node, but a nice way to debug and test the TGI
service.
Ensure docker
& foundry
exist
To check for docker
, run the following command in your terminal:
docker --version
# Docker version 25.0.2, build 29cf629 (example output)
You'll also need to ensure that docker-compose exists in your terminal:
which docker-compose
# /usr/local/bin/docker-compose (example output)
To check for foundry
, run the following command in your terminal:
forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)
Clone the starter repository
Just like our other examples, we're going to clone this repository.
All of the code and instructions for this tutorial can be found in the
projects/tgi-llm
(opens in a new tab)
directory of the repository.
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Configure the UI Service
You'll need to configure the UI service to point to the tgi
service. To do this, you'll have to
pass that info as environemnt variables. There exists a gradio_ui.env.sample
(opens in a new tab)
file in the projects/tgi-llm/ui
(opens in a new tab)
directory. Simply copy this file to gradio_ui.env
and set the TGI_SERVICE_URL
to the address of the tgi
service.
cd projects/tgi-llm/ui
cp gradio_ui.env.sample gradio_ui.env
Then modify the content of gradio_ui.env
to look like this:
TGI_SERVICE_URL={your_service_ip}:{your_service_port} # <- replace with your service ip & port
HF_API_TOKEN={huggingface_api_token} # <- replace with your huggingface api token
PROMPT_FILE_PATH=./prompt.txt # <- path to the prompt file
The env vars are as follows:
TGI_SERVICE_URL
is the address of thetgi
serviceHF_API_TOKEN
is the Huggingface API token. You can get one by signing up at Huggingface (opens in a new tab)PROMPT_FILE_PATH
is the path to the system prompt file. By default it is set to./prompt.txt
. A simpleprompt.txt
file is included in theui
directory.
Build the UI service
From the top-level directory of the repository, simply run the following command to build the UI service:
# navigate back to the root of the repository
cd ../../..
# build the UI service
make build-service project=tgi-llm service=ui
Run the UI service
In the same directory, you can also run the following command to run the UI service:
make run-service project=tgi-llm service=ui
By default the service will run on http://localhost:3001
. You can navigate to this address in your browser to see
the UI.
Chat with the TGI service!
🎉 Congratulations! You can now chat with the TGI
service using the gradio UI. You can enter a prompt and see the
response from the TGI
service.
Now that we've tested the TGI
service, we can move on to setting up the Infernet Node and the tgi-llm
container.
Setting up the Infernet Node along with the tgi-llm
container
You can follow the following steps on your local machine to setup the Infernet Node and the tgi-llm
container.
Steps 1 & 2 are identical to that of the previous section. So if you've already completed those steps, you can skip to configuring the tgi-llm container.
Ensure docker
& foundry
exist
To check for docker
, run the following command in your terminal:
docker --version
# Docker version 25.0.2, build 29cf629 (example output)
You'll also need to ensure that docker-compose exists in your terminal:
which docker-compose
# /usr/local/bin/docker-compose (example output)
To check for foundry
, run the following command in your terminal:
forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)
Clone the starter repository
Just like our other examples, we're going to clone this repository. All of the code
and instructions for this tutorial can be found in the projects/tgi-llm
(opens in a new tab) directory of the repository.
# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter
Configure the tgi-llm
container
The tgi-llm
container needs to know where to find the TGI
service that we started in the steps above. To do this,
we need to modify the configuration file for the tgi-llm
container. We have a sample config.json (opens in a new tab) file.
Simply navigate to the projects/tgi-llm
directory and set up the config file:
cd projects/tgi-llm/container
cp config.sample.json config.json
In the containers
field, you will see the following:
"containers": [
{
// ...
"env": {
// TODO: replace with your service ip & port
"TGI_SERVICE_URL": "http://{your_service_ip}:{your_service_port}"
}
}
],
Replase {your_service_ip}
and {your_service_port}
with the address and IP of the TGI
service.
Build the tgi-llm
container
First, navigate back to the root of the repository. Then simply run the following command to build the tgi-llm
container:
make build-container project=tgi-llm
Deploy the tgi-llm
container with Infernet
You can run a simple command to deploy the tgi-llm
container along with bootstrapping the rest of the
Infernet node stack in one go:
make deploy-container project=tgi-llm
Check the running containers
At this point it makes sense to check the running containers to ensure everything is running as expected.
docker container ps
You should expect to see something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
83d39a063615 ritualnetwork/example-tgi-llm-infernet:latest "hypercorn app:creat…" 15 hours ago Up 15 hours 0.0.0.0:3000->3000/tcp tgi-llm
47758185b1cc ritualnetwork/infernet-node:1.3.1 "/app/entrypoint.sh" 15 hours ago Up 15 hours 0.0.0.0:4000->4000/tcp infernet-node
49cc4b28f8d1 redis:7.4.0 "docker-entrypoint.s…" 15 hours ago Up 15 hours 0.0.0.0:6379->6379/tcp infernet-redis
16e96b377a15 fluent/fluent-bit:3.1.4 "/fluent-bit/bin/flu…" 15 hours ago Up 15 hours 2020/tcp, 24224/tcp infernet-fluentbit
0b2c077302f5 ritualnetwork/infernet-anvil:1.0.0 "anvil --host 0.0.0.…" 15 hours ago Up 15 hours 0.0.0.0:8545->3000/tcp infernet-anvil
Notice that five different containers are running, including the infernet-node
and the tgi-llm
containers.
Send a job request to the tgi-llm
container
From here, we can make a Web-2 job request to the container by posting a request to the api/jobs
(opens in a new tab) endpoint.
curl -X POST http://127.0.0.1:4000/api/jobs \
-H "Content-Type: application/json" \
-d '{"containers": ["tgi-llm"], "data": {"prompt": "Can shrimp actually fry rice fr?"}}'
You will get a job id in response. You can use this id to check the status of the job:
{"id": "7a375a56-0da0-40d8-91e0-6440b3282ed8"}
Check the status of the job
You can make a GET
request to the api/jobs
(opens in a new tab) endpoint to check the status of the job.
curl -X GET "http://127.0.0.1:4000/api/jobs?id=7a375a56-0da0-40d8-91e0-6440b3282ed8"
You will get a response similar to this:
[
{
"id": "7a375a56-0da0-40d8-91e0-6440b3282ed8",
"result": {
"container": "tgi-llm",
"output": {
"data": "\n\n## Can you fry rice in a wok?\n\nThe wok is the"
}
},
"status": "success"
}
]
🎉 Congratulations! You have successfully setup the Infernet Node and the tgi-llm
container. Now let's move on to
calling our service from a smart contract (a la web3 request).
Calling our service from a smart contract
In the following steps, we will deploy our consumer contract (opens in a new tab) and make a subscription request by calling the contract.
Setup
Ensure that you have followed Steps 1-6 in the previous section to setup the Infernet Node and the tgi-llm
container.
Notice that in one of the steps above we have an Anvil node running on port 8545
.
By default, the infernet-anvil
(opens in a new tab) image used deploys the
Infernet SDK (opens in a new tab) and other relevant contracts for you:
- Coordinator:
0x5FbDB2315678afecb367f032d93F642f64180aa3
- Primary node:
0x70997970C51812dc3A010C7d01b50e0d17dc79C8
Deploy our Prompter
smart contract
In this step, we will deploy our Prompter.sol
(opens in a new tab)
to the Anvil node. This contract simply allows us to submit a prompt to the LLM, and receives the result of the
prompt and prints it to the anvil console.
Anvil logs
During this process, it is useful to look at the logs of the Anvil node to see what's going on. To follow the logs, in a new terminal, run:
docker logs -f infernet-anvil
Deploying the contract
Once ready, to deploy the Prompter
consumer contract, in another terminal, run:
make deploy-contracts project=tgi-llm
You should expect to see similar Anvil logs:
eth_getTransactionReceipt
Transaction: 0x17a9d17cc515d39eef26b6a9427e04ed6f7ce6572d9756c07305c2df78d93ffe
Contract created: 0x663f3ad617193148711d28f5334ee4ed07016602
Gas used: 731312
Block Number: 1
Block Hash: 0xd17b344af15fc32cd3359e6f2c2724a8d0a0283fc3b44febba78fc99f2f00189
Block Time: "Wed, 6 Mar 2024 18:21:01 +0000"
eth_getTransactionByHash
From our logs, we can see that the Prompter
contract has been deployed to address
0x663f3ad617193148711d28f5334ee4ed07016602
.
Call the contract
Now, let's call the contract with a prompt! In the same terminal, run:
make call-contract project=tgi-llm prompt="What is 2 * 3?"
You should first expect to see an initiation transaction sent to the Prompter
contract:
eth_getTransactionReceipt
Transaction: 0x988b1b251f3b6ad887929a58429291891d026f11392fb9743e9a90f78c7a0801
Gas used: 190922
Block Number: 2
Block Hash: 0x51f3abf62e763f1bd1b0d245a4eab4ced4b18f58bd13645dbbf3a878f1964044
Block Time: "Wed, 6 Mar 2024 18:21:34 +0000"
eth_getTransactionByHash
eth_getTransactionReceipt
Shortly after that you should see another transaction submitted from the Infernet Node which is the result of your on-chain subscription and its associated job request:
eth_sendRawTransaction
_____ _____ _______ _ _ _
| __ \|_ _|__ __| | | | /\ | |
| |__) | | | | | | | | | / \ | |
| _ / | | | | | | | |/ /\ \ | |
| | \ \ _| |_ | | | |__| / ____ \| |____
|_| \_\_____| |_| \____/_/ \_\______|
subscription Id 1
interval 1
redundancy 1
node 0x70997970C51812dc3A010C7d01b50e0d17dc79C8
output:
2 * 3 = 6
Transaction: 0xdaaf559c2baba212ab218fb268906613ce3be93ba79b37f902ff28c8fe9a1e1a
Gas used: 116153
Block Number: 3
Block Hash: 0x2f26b2b487a4195ff81865b2966eab1508d10642bf525a258200eea432522e24
Block Time: "Wed, 6 Mar 2024 18:21:35 +0000"
eth_blockNumber
🎉 Congratulations! You have successfully enabled a contract to have access to a TGI
LLM service.
Next steps
This container is for demonstration purposes only, and is purposefully simplified for readability and ease of comprehension. For a production-ready version of this code, check out:
- The TGI Client Inference Workflow (opens in a new tab): A Python class that implements a
TGI
service client similar to this example, and can be used to build production-ready containers. - The TGI Client Inference Service (opens in a new tab): A production-ready, Infernet (opens in a new tab)-compatible container that works out-of-the-box with minimal configuration, and serves inference using the
TGI Client Inference Workflow
.