Walkthrough: Deploying a Hugging Face Model as a Worker Node on the Allora Network
This guide provides a step-by-step process to deploy a Hugging Face model as a Worker Node within the Allora Network. By following these instructions, you will be able to integrate and run models from Hugging Face, contributing to the Allora decentralized machine intelligence ecosystem.
Prerequisites
Before you start, ensure you have the following:
- A Docker environment with
docker compose
installed. - Basic knowledge of machine learning and the Hugging Face (opens in a new tab) ecosystem.
- Familiarity with Allora Network documentation on building and deploying a worker node using Docker.
Overview
During this walkthrough, we will build a worker node from an existing Hugging Face model to deploy and participate on the Allora Network. We will use this model to predict the price of BTC in 24h.
You can find all the files in this Git repository (opens in a new tab).
In this example, we will use the Chronos model: amazon/chronos-t5-tiny (opens in a new tab). Chronos is a family of pretrained time series forecasting models based on language model architectures. In essence:
- A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss.
- Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context.
Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes.
For simplicity, we will use Zero-shot forecasting, which refers to the ability of models to generate forecasts from unseen datasets.
Our worker will provide inferences on the BTC 24h Prediction, which is Topic 4
on Allora Testnet.
Note: To deploy on the Allora Network, you will need to pick the topic ID you wish to generate inference for, or create a new topic.
We will use Coingecko (opens in a new tab) to fetch the data. You will need to create an API key (opens in a new tab).
Clone the repo
Clone the basic-coin-prediction-node (opens in a new tab) repository. It will serve as the base sample for your quick setup.
git clone https://github.com/allora-network/basic-coin-prediction-node
cd basic-coin-prediction-node
Configure Your Environment
- Copy
config.example.json
and name the copyconfig.json
. - Open
config.json
and update the necessary fields inside thewallet
sub-object andworker
config with your specific values:
wallet
Sub-object
nodeRpc
: The RPC URL for the corresponding network the node will be deployed onaddressKeyName
: The name you gave your wallet key when setting up your walletaddressRestoreMnemonic
: The mnemonic that was outputted when setting up a new key
worker
Config
topicId
: The specific topic ID you created the worker for.InferenceEndpoint
: The endpoint exposed by your worker node to provide inferences to the network.Token
: The token for the specific topic you are providing inferences for. The token needs to be exposed in the inference server endpoint for retrieval.
- The
Token
variable is specific to the endpoint you expose in yourmain.py
file. It is not related to any topic parameter.
The worker
config is an array of sub-objects, each representing a different topic ID. This structure allows you to manage multiple topic IDs, each within its own sub-object.
To deploy a worker that provides inferences for multiple topics, you can duplicate the existing sub-object and add it to the worker
array. Update the topicId
, InferenceEndpoint
and Token
fields with the appropriate values for each new topic:
"worker": [
{
"topicId": 1,
"inferenceEntrypointName": "api-worker-reputer",
"loopSeconds": 5,
"parameters": {
"InferenceEndpoint": "http://localhost:8000/inference/{Token}",
"Token": "ETH"
}
},
// worker providing inferences for topic ID 2
{
"topicId": 2,
"inferenceEntrypointName": "api-worker-reputer",
"loopSeconds": 5,
"parameters": {
"InferenceEndpoint": "http://localhost:8000/inference/{Token}", // the specific endpoint providing inferences
"Token": "ETH" // The token specified in the endpoint
}
}
],
Creating the inference server
We will create a very simple Flask application to serve inferences from the Hugging Face model.
Here is an example of our newly created app.py
:
from flask import Flask, Response
import requests
import json
import pandas as pd
import torch
from chronos import ChronosPipeline
# create our Flask app
app = Flask(__name__)
# define the Hugging Face model we will use
model_name = "amazon/chronos-t5-tiny"
def get_coingecko_url(token):
base_url = "https://api.coingecko.com/api/v3/coins/"
token_map = {
'ETH': 'ethereum',
'SOL': 'solana',
'BTC': 'bitcoin',
'BNB': 'binancecoin',
'ARB': 'arbitrum'
}
token = token.upper()
if token in token_map:
url = f"{base_url}{token_map[token]}/market_chart?vs_currency=usd&days=30&interval=daily"
return url
else:
raise ValueError("Unsupported token")
# define our endpoint
@app.route("/inference/<string:token>")
def get_inference(token):
"""Generate inference for given token."""
try:
# use a pipeline as a high-level helper
pipeline = ChronosPipeline.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
)
except Exception as e:
return Response(json.dumps({"pipeline error": str(e)}), status=500, mimetype='application/json')
try:
# get the data from Coingecko
url = get_coingecko_url(token)
except ValueError as e:
return Response(json.dumps({"error": str(e)}), status=400, mimetype='application/json')
headers = {
"accept": "application/json",
"x-cg-demo-api-key": "<Your Coingecko API key>" # replace with your API key
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
df = pd.DataFrame(data["prices"])
df.columns = ["date", "price"]
df["date"] = pd.to_datetime(df["date"], unit='ms')
df = df[:-1] # removing today's price
print(df.tail(5))
else:
return Response(json.dumps({"Failed to retrieve data from the API": str(response.text)}),
status=response.status_code,
mimetype='application/json')
# define the context and the prediction length
context = torch.tensor(df["price"])
prediction_length = 1
try:
forecast = pipeline.predict(context, prediction_length) # shape [num_series, num_samples, prediction_length]
print(forecast[0].mean().item()) # taking the mean of the forecasted prediction
return Response(str(forecast[0].mean().item()), status=200)
except Exception as e:
return Response(json.dumps({"error": str(e)}), status=500, mimetype='application/json')
# run our Flask app
if __name__ == '__main__':
app.run(host="0.0.0.0", port=8000, debug=True)
Modifying requirements.txt
Update the requirements.txt
to include the necessary packages for the inference server:
flask[async]
gunicorn[gthread]
transformers[torch]
pandas
git+https://github.com/amazon-science/chronos-forecasting.git
Deployment
Now that the node is configured, let's deploy and register it to the network. To run the node, follow these steps:
Export Variables
Execute the following command from the root directory:
chmod +x init.config
./init.config
This command will automatically export the necessary variables from the account created. These variables are used by the offchain node and are bundled with your provided config.json
, then passed to the node as environment variables.
If you need to make changes to your config.json
file after you ran the init.config
command, rerun:
chmod +x init.config
./init.config
before proceeding.
Request from Faucet
Copy your Allora address and request some tokens from the Allora Testnet Faucet (opens in a new tab) to register your worker in the next step successfully.
Deploy the Node
docker compose up --build
Both the offchain node and the source services will be started. They will communicate through endpoints attached to the internal DNS.
If your node is working correctly, you should see it actively checking for the active worker nonce:
offchain_node | {"level":"debug","topicId":1,"time":1723043600,"message":"Checking for latest open worker nonce on topic"}
A successful response from your Worker should display:
{"level":"debug","msg":"Send Worker Data to chain","txHash":<tx-hash>,"time":<timestamp>,"message":"Success"}
Congratulations! You've successfully deployed and registered your node on Allora.
Testing
You can test your local inference server by performing a GET
request on http://localhost:8000/inference/<token>
.
curl http://localhost:8000/inference/<token>