Walkthrough: Deploying a Hugging Face Model as a Worker Node on the Allora Network

This guide provides a step-by-step process to deploy a Hugging Face model as a Worker Node within the Allora Network. By following these instructions, you will be able to integrate and run models from Hugging Face, contributing to the Allora decentralized machine intelligence ecosystem.

Prerequisites

Before you start, ensure you have the following:

A Docker environment with docker compose installed.
Basic knowledge of machine learning and the Hugging Face (opens in a new tab) ecosystem.
Familiarity with Allora Network documentation on building and deploying a worker node using Docker.

Overview

During this walkthrough, we will build a worker node from an existing Hugging Face model to deploy and participate on the Allora Network. We will use this model to predict the price of BTC in 24h.

You can find all the files in this Git repository (opens in a new tab).

In this example, we will use the Chronos model: amazon/chronos-t5-tiny (opens in a new tab). Chronos is a family of pretrained time series forecasting models based on language model architectures. In essence:

A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss.
Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context.

Chronos models have been trained on a large corpus of publicly available time series data, as well as synthetic data generated using Gaussian processes.

For simplicity, we will use Zero-shot forecasting, which refers to the ability of models to generate forecasts from unseen datasets.

Our worker will provide inferences on the BTC 24h Prediction, which is Topic 4 on Allora Testnet.

Note: To deploy on the Allora Network, you will need to pick the topic ID you wish to generate inference for, or create a new topic.

We will use Coingecko (opens in a new tab) to fetch the data. You will need to create an API key (opens in a new tab).

Clone the repo

Clone the basic-coin-prediction-node (opens in a new tab) repository. It will serve as the base sample for your quick setup.

git clone https://github.com/allora-network/basic-coin-prediction-node
cd basic-coin-prediction-node

Configure Your Environment

Copy config.example.json and name the copy config.json.
Open config.json and update the necessary fields inside the wallet sub-object and worker config with your specific values:

`wallet` Sub-object

nodeRpc: The RPC URL for the corresponding network the node will be deployed on
addressKeyName: The name you gave your wallet key when setting up your wallet
addressRestoreMnemonic: The mnemonic that was outputted when setting up a new key

`worker` Config

topicId: The specific topic ID you created the worker for.
InferenceEndpoint: The endpoint exposed by your worker node to provide inferences to the network.
Token: The token for the specific topic you are providing inferences for. The token needs to be exposed in the inference server endpoint for retrieval.

The Token variable is specific to the endpoint you expose in your main.py file. It is not related to any topic parameter.

⚠️

The worker config is an array of sub-objects, each representing a different topic ID. This structure allows you to manage multiple topic IDs, each within its own sub-object.

To deploy a worker that provides inferences for multiple topics, you can duplicate the existing sub-object and add it to the worker array. Update the topicId, InferenceEndpoint and Token fields with the appropriate values for each new topic:

"worker": [
      {
        "topicId": 1,
        "inferenceEntrypointName": "api-worker-reputer",
        "loopSeconds": 5,
        "parameters": {
          "InferenceEndpoint": "http://localhost:8000/inference/{Token}",
          "Token": "ETH"
        }
      },
      // worker providing inferences for topic ID 2
      {
        "topicId": 2, 
        "inferenceEntrypointName": "api-worker-reputer",
        "loopSeconds": 5,
        "parameters": {
          "InferenceEndpoint": "http://localhost:8000/inference/{Token}", // the specific endpoint providing inferences
          "Token": "ETH" // The token specified in the endpoint
        }
      }
    ],

Creating the inference server

We will create a very simple Flask application to serve inferences from the Hugging Face model.

Here is an example of our newly created app.py:

from flask import Flask, Response
import requests
import json
import pandas as pd
import torch
from chronos import ChronosPipeline
 
# create our Flask app
app = Flask(__name__)
 
# define the Hugging Face model we will use
model_name = "amazon/chronos-t5-tiny"
 
def get_coingecko_url(token):
    base_url = "https://api.coingecko.com/api/v3/coins/"
    token_map = {
        'ETH': 'ethereum',
        'SOL': 'solana',
        'BTC': 'bitcoin',
        'BNB': 'binancecoin',
        'ARB': 'arbitrum'
    }
    
    token = token.upper()
    if token in token_map:
        url = f"{base_url}{token_map[token]}/market_chart?vs_currency=usd&days=30&interval=daily"
        return url
    else:
        raise ValueError("Unsupported token")
 
# define our endpoint
@app.route("/inference/<string:token>")
def get_inference(token):
    """Generate inference for given token."""
    try:
        # use a pipeline as a high-level helper
        pipeline = ChronosPipeline.from_pretrained(
            model_name,
            device_map="auto",
            torch_dtype=torch.bfloat16,
        )
    except Exception as e:
        return Response(json.dumps({"pipeline error": str(e)}), status=500, mimetype='application/json')
 
    try:
        # get the data from Coingecko
        url = get_coingecko_url(token)
    except ValueError as e:
        return Response(json.dumps({"error": str(e)}), status=400, mimetype='application/json')
 
    headers = {
        "accept": "application/json",
        "x-cg-demo-api-key": "<Your Coingecko API key>" # replace with your API key
    }
 
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        data = response.json()
        df = pd.DataFrame(data["prices"])
        df.columns = ["date", "price"]
        df["date"] = pd.to_datetime(df["date"], unit='ms')
        df = df[:-1] # removing today's price
        print(df.tail(5))
    else:
        return Response(json.dumps({"Failed to retrieve data from the API": str(response.text)}), 
                        status=response.status_code, 
                        mimetype='application/json')
 
    # define the context and the prediction length
    context = torch.tensor(df["price"])
    prediction_length = 1
 
    try:
        forecast = pipeline.predict(context, prediction_length)  # shape [num_series, num_samples, prediction_length]
        print(forecast[0].mean().item()) # taking the mean of the forecasted prediction
        return Response(str(forecast[0].mean().item()), status=200)
    except Exception as e:
        return Response(json.dumps({"error": str(e)}), status=500, mimetype='application/json')
 
# run our Flask app
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8000, debug=True)

Modifying requirements.txt

Update the requirements.txt to include the necessary packages for the inference server:

flask[async]
gunicorn[gthread]
transformers[torch]
pandas
git+https://github.com/amazon-science/chronos-forecasting.git

Deployment

Now that the node is configured, let's deploy and register it to the network. To run the node, follow these steps:

Export Variables

Execute the following command from the root directory:

chmod +x init.config
./init.config

This command will automatically export the necessary variables from the account created. These variables are used by the offchain node and are bundled with your provided config.json, then passed to the node as environment variables.

💡

If you need to make changes to your config.json file after you ran the init.config command, rerun:

chmod +x init.config
./init.config

before proceeding.

Request from Faucet

Copy your Allora address and request some tokens from the Allora Testnet Faucet (opens in a new tab) to register your worker in the next step successfully.

Deploy the Node

docker compose up --build

Both the offchain node and the source services will be started. They will communicate through endpoints attached to the internal DNS.

If your node is working correctly, you should see it actively checking for the active worker nonce:

offchain_node    | {"level":"debug","topicId":1,"time":1723043600,"message":"Checking for latest open worker nonce on topic"}

A successful response from your Worker should display:

{"level":"debug","msg":"Send Worker Data to chain","txHash":<tx-hash>,"time":<timestamp>,"message":"Success"}

Congratulations! You've successfully deployed and registered your node on Allora.

Testing

You can test your local inference server by performing a GET request on http://localhost:8000/inference/<token>.

curl http://localhost:8000/inference/<token>

Using the Allora Model Development Kit Price Prediction Worker