Examples
TGI Inference with Mistral-7B

TGI Inference with Mistral-7b

In this tutorial, we will use Huggingface's TGI (Text Generation Interface) (opens in a new tab) API to query a Large Language Model (LLM) and enable users to requests jobs from it, both on-chain and off-chain.

Hardware Requirements

  1. A compute instance with GPU access.
  2. Any laptop

Tutorial Video

Install Pre-requisites

For this tutorial you'll need to have the following installed.

  1. Docker (opens in a new tab)
  2. Foundry (opens in a new tab)

Setting up a TGI LLM Service

Included with this tutorial, is a containerized llm service (opens in a new tab).

Rent a GPU machine

To run this service, you will need to have access to a machine with a powerful GPU. In the video above, we use a A100-80G instance on Paperspace (opens in a new tab).

Install docker

You need to install Docker. For Ubuntu, you can run the following commands:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

As docker installation may vary depending on your operating system, consult the official documentation (opens in a new tab) for more information.

After installation, you can verify that docker is installed by running:

docker run hello-world

and expecting to see output like:

Hello from Docker!

Ensure CUDA is installed

Depending on where you rent your GPU machine, CUDA is typically pre-installed. For Ubuntu, you can follow the instructions here (opens in a new tab).

You can verify that CUDA is installed by running:

# verify installation
python -c '
import torch
print("torch.cuda.is_available()", torch.cuda.is_available())
print("torch.cuda.device_count()", torch.cuda.device_count())
print("torch.cuda.current_device()", torch.cuda.current_device())
print("torch.cuda.get_device_name(0)", torch.cuda.get_device_name(0))
'

If CUDA is installed and available, your output will look similar to the following:

torch.cuda.is_available() True
torch.cuda.device_count() 1
torch.cuda.current_device() 0
torch.cuda.get_device_name(0) Tesla V100-SXM2-16GB

Ensure nvidia-container-runtime is installed

For your container to be able to access the GPU, you will need to install the nvidia-container-runtime. On Ubuntu, you can run the following commands:

# Docker GPU support
# nvidia container-runtime repos
# https://nvidia.github.io/nvidia-container-runtime/
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
 
# install nvidia-container-runtime
# https://docs.docker.com/config/containers/resource_constraints/#gpu
sudo apt-get install -y nvidia-container-runtime

As always, consult the official documentation (opens in a new tab) for more information.

You can verify that nvidia-container-runtime is installed by running:

which nvidia-container-runtime-hook
# this should return a path to the nvidia-container-runtime-hook

Now, with the pre-requisites installed, we can move on to setting up the TGI service.

Clone this repository

# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter

Run the Stable Diffusion service

make run-service project=tgi-llm service=tgi

This will start the tgi service. Note that this service will have to download a large model file, so it may take a few minutes to be fully ready. Downloaded model will get cached, so subsequent runs will be faster.

Testing the tgi-llm service via the gradio UI

Included with this project is a simple gradio chat UI that allows you to interact with the tgi-llm service. This is not needed for running the Infernet node, but a nice way to debug and test the TGI service.

Ensure docker & foundry exist

To check for docker, run the following command in your terminal:

docker --version
# Docker version 25.0.2, build 29cf629 (example output)

You'll also need to ensure that docker-compose exists in your terminal:

which docker-compose
# /usr/local/bin/docker-compose (example output)

To check for foundry, run the following command in your terminal:

forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)

Clone the starter repository

Just like our other examples, we're going to clone this repository. All of the code and instructions for this tutorial can be found in the projects/tgi-llm (opens in a new tab) directory of the repository.

# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter

Configure the UI Service

You'll need to configure the UI service to point to the tgi service. To do this, you'll have to pass that info as environemnt variables. There exists a gradio_ui.env.sample (opens in a new tab) file in the projects/tgi-llm/ui (opens in a new tab) directory. Simply copy this file to gradio_ui.env and set the TGI_SERVICE_URL to the address of the tgi service.

cd projects/tgi-llm/ui
cp gradio_ui.env.sample gradio_ui.env

Then modify the content of gradio_ui.env to look like this:

TGI_SERVICE_URL={your_service_ip}:{your_service_port} # <- replace with your service ip & port
HF_API_TOKEN={huggingface_api_token} # <- replace with your huggingface api token
PROMPT_FILE_PATH=./prompt.txt # <- path to the prompt file

The env vars are as follows:

  • TGI_SERVICE_URL is the address of the tgi service
  • HF_API_TOKEN is the Huggingface API token. You can get one by signing up at Huggingface (opens in a new tab)
  • PROMPT_FILE_PATH is the path to the system prompt file. By default it is set to ./prompt.txt. A simple prompt.txt file is included in the ui directory.

Build the UI service

From the top-level directory of the repository, simply run the following command to build the UI service:

# navigate back to the root of the repository
cd ../../..
# build the UI service
make build-service project=tgi-llm service=ui

Run the UI service

In the same directory, you can also run the following command to run the UI service:

make run-service project=tgi-llm service=ui

By default the service will run on http://localhost:3001. You can navigate to this address in your browser to see the UI.

Chat with the TGI service!

🎉 Congratulations! You can now chat with the TGI service using the gradio UI. You can enter a prompt and see the response from the TGI service.

Now that we've tested the TGI service, we can move on to setting up the Infernet Node and the tgi-llm container.

Setting up the Infernet Node along with the tgi-llm container

You can follow the following steps on your local machine to setup the Infernet Node and the tgi-llm container.

Steps 1 & 2 are identical to that of the previous section. So if you've already completed those steps, you can skip to configuring the tgi-llm container.

Ensure docker & foundry exist

To check for docker, run the following command in your terminal:

docker --version
# Docker version 25.0.2, build 29cf629 (example output)

You'll also need to ensure that docker-compose exists in your terminal:

which docker-compose
# /usr/local/bin/docker-compose (example output)

To check for foundry, run the following command in your terminal:

forge --version
# forge 0.2.0 (551bcb5 2024-02-28T07:40:42.782478000Z) (example output)

Clone the starter repository

Just like our other examples, we're going to clone this repository. All of the code and instructions for this tutorial can be found in the projects/tgi-llm (opens in a new tab) directory of the repository.

# Clone locally
git clone --recurse-submodules https://github.com/ritual-net/infernet-container-starter
# Navigate to the repository
cd infernet-container-starter

Configure the tgi-llm container

The tgi-llm container needs to know where to find the TGI service that we started in the steps above. To do this, we need to modify the configuration file for the tgi-llm container. We have a sample config.json (opens in a new tab) file. Simply navigate to the projects/tgi-llm directory and set up the config file:

cd projects/tgi-llm/container
cp config.sample.json config.json

In the containers field, you will see the following:

"containers": [
    {
        // ...
        "env": {
            // TODO: replace with your service ip & port
            "TGI_SERVICE_URL": "http://{your_service_ip}:{your_service_port}"
        }
    }
],

Replase {your_service_ip} and {your_service_port} with the address and IP of the TGI service.

Build the tgi-llm container

First, navigate back to the root of the repository. Then simply run the following command to build the tgi-llm container:

make build-container project=tgi-llm

Deploy the tgi-llm container with Infernet

You can run a simple command to deploy the tgi-llm container along with bootstrapping the rest of the Infernet node stack in one go:

make deploy-container project=tgi-llm

Check the running containers

At this point it makes sense to check the running containers to ensure everything is running as expected.

docker container ps

You should expect to see something like this:

CONTAINER ID   IMAGE                                           COMMAND                  CREATED        STATUS        PORTS                    NAMES
83d39a063615   ritualnetwork/example-tgi-llm-infernet:latest   "hypercorn app:creat…"   15 hours ago   Up 15 hours   0.0.0.0:3000->3000/tcp   tgi-llm
47758185b1cc   ritualnetwork/infernet-node:1.3.1               "/app/entrypoint.sh"     15 hours ago   Up 15 hours   0.0.0.0:4000->4000/tcp   infernet-node
49cc4b28f8d1   redis:7.4.0                                     "docker-entrypoint.s…"   15 hours ago   Up 15 hours   0.0.0.0:6379->6379/tcp   infernet-redis
16e96b377a15   fluent/fluent-bit:3.1.4                         "/fluent-bit/bin/flu…"   15 hours ago   Up 15 hours   2020/tcp, 24224/tcp      infernet-fluentbit
0b2c077302f5   ritualnetwork/infernet-anvil:1.0.0              "anvil --host 0.0.0.…"   15 hours ago   Up 15 hours   0.0.0.0:8545->3000/tcp   infernet-anvil

Notice that five different containers are running, including the infernet-node and the tgi-llm containers.

Send a job request to the tgi-llm container

From here, we can make a Web-2 job request to the container by posting a request to the api/jobs (opens in a new tab) endpoint.

curl -X POST http://127.0.0.1:4000/api/jobs \
-H "Content-Type: application/json" \
-d '{"containers": ["tgi-llm"], "data": {"prompt": "Can shrimp actually fry rice fr?"}}'

You will get a job id in response. You can use this id to check the status of the job:

{"id": "7a375a56-0da0-40d8-91e0-6440b3282ed8"}

Check the status of the job

You can make a GET request to the api/jobs (opens in a new tab) endpoint to check the status of the job.

curl -X GET "http://127.0.0.1:4000/api/jobs?id=7a375a56-0da0-40d8-91e0-6440b3282ed8"

You will get a response similar to this:

[
    {
        "id": "7a375a56-0da0-40d8-91e0-6440b3282ed8",
        "result": {
            "container": "tgi-llm",
            "output": {
                "data": "\n\n## Can you fry rice in a wok?\n\nThe wok is the"
            }
        },
        "status": "success"
    }
]

🎉 Congratulations! You have successfully setup the Infernet Node and the tgi-llm container. Now let's move on to calling our service from a smart contract (a la web3 request).

Calling our service from a smart contract

In the following steps, we will deploy our consumer contract (opens in a new tab) and make a subscription request by calling the contract.

Setup

Ensure that you have followed Steps 1-6 in the previous section to setup the Infernet Node and the tgi-llm container.

Notice that in one of the steps above we have an Anvil node running on port 8545.

By default, the infernet-anvil (opens in a new tab) image used deploys the Infernet SDK (opens in a new tab) and other relevant contracts for you:

  • Coordinator: 0x5FbDB2315678afecb367f032d93F642f64180aa3
  • Primary node: 0x70997970C51812dc3A010C7d01b50e0d17dc79C8

Deploy our Prompter smart contract

In this step, we will deploy our Prompter.sol (opens in a new tab) to the Anvil node. This contract simply allows us to submit a prompt to the LLM, and receives the result of the prompt and prints it to the anvil console.

Anvil logs

During this process, it is useful to look at the logs of the Anvil node to see what's going on. To follow the logs, in a new terminal, run:

docker logs -f infernet-anvil

Deploying the contract

Once ready, to deploy the Prompter consumer contract, in another terminal, run:

make deploy-contracts project=tgi-llm

You should expect to see similar Anvil logs:

eth_getTransactionReceipt

Transaction: 0x17a9d17cc515d39eef26b6a9427e04ed6f7ce6572d9756c07305c2df78d93ffe
Contract created: 0x663f3ad617193148711d28f5334ee4ed07016602
Gas used: 731312

Block Number: 1
Block Hash: 0xd17b344af15fc32cd3359e6f2c2724a8d0a0283fc3b44febba78fc99f2f00189
Block Time: "Wed, 6 Mar 2024 18:21:01 +0000"

eth_getTransactionByHash

From our logs, we can see that the Prompter contract has been deployed to address 0x663f3ad617193148711d28f5334ee4ed07016602.

Call the contract

Now, let's call the contract with a prompt! In the same terminal, run:

make call-contract project=tgi-llm prompt="What is 2 * 3?"

You should first expect to see an initiation transaction sent to the Prompter contract:

eth_getTransactionReceipt

Transaction: 0x988b1b251f3b6ad887929a58429291891d026f11392fb9743e9a90f78c7a0801
Gas used: 190922

Block Number: 2
Block Hash: 0x51f3abf62e763f1bd1b0d245a4eab4ced4b18f58bd13645dbbf3a878f1964044
Block Time: "Wed, 6 Mar 2024 18:21:34 +0000"

eth_getTransactionByHash
eth_getTransactionReceipt

Shortly after that you should see another transaction submitted from the Infernet Node which is the result of your on-chain subscription and its associated job request:

eth_sendRawTransaction


_____  _____ _______ _    _         _
|  __ \|_   _|__   __| |  | |  /\   | |
| |__) | | |    | |  | |  | | /  \  | |
|  _  /  | |    | |  | |  | |/ /\ \ | |
| | \ \ _| |_   | |  | |__| / ____ \| |____
|_|  \_\_____|  |_|   \____/_/    \_\______|


subscription Id 1
interval 1
redundancy 1
node 0x70997970C51812dc3A010C7d01b50e0d17dc79C8
output:

2 * 3 = 6

Transaction: 0xdaaf559c2baba212ab218fb268906613ce3be93ba79b37f902ff28c8fe9a1e1a
Gas used: 116153

Block Number: 3
Block Hash: 0x2f26b2b487a4195ff81865b2966eab1508d10642bf525a258200eea432522e24
Block Time: "Wed, 6 Mar 2024 18:21:35 +0000"

eth_blockNumber

🎉 Congratulations! You have successfully enabled a contract to have access to a TGI LLM service.

Next steps

This container is for demonstration purposes only, and is purposefully simplified for readability and ease of comprehension. For a production-ready version of this code, check out: