Build and Deploy a Worker using the Allora Model Development Kit (MDK)
The Allora MDK is an open-source github repository that allows users to spin up an inference model for over 7,000 cryptocurrencies and stocks. The MDK leverages the Tiingo API (opens in a new tab) as a data feed for these cryptocurrencies and stocks, although custom datasets can be integrated as well.
Let's walk through the steps needed to download, train, and evaluate a given model on a custom dataset, and then deploy this trained model onto the network.
Regression Techniques
Each of these regression techniques is implemented at a basic level and is available out of the box in the Model Development Kit (MDK). These models provide a foundation that you can build upon to create more advanced solutions.
Model | Description |
---|---|
ARIMA | Auto-Regressive Integrated Moving Average model used for time series forecasting by modeling the dependencies between data points. |
LSTM | Long Short-Term Memory neural network, a type of recurrent neural network (RNN) that excels in capturing long-term dependencies in sequential data, like time series. |
Prophet | A forecasting model developed by Facebook, designed to handle seasonality and make predictions over long time horizons. |
Random Forest | An ensemble learning method for regression tasks that builds multiple decision trees and outputs the average prediction from individual trees. |
Random Forest (Time Series) | A time series variant of Random Forest, optimized for predicting time-dependent variables. |
Regression | A simple linear regression model for predicting continuous values based on input features. |
Regression (Time Series) | A time series version of basic regression models, optimized for forecasting trends over time. |
XGBoost | Extreme Gradient Boosting, a highly efficient and scalable implementation of gradient boosting machines for regression tasks, often used for time series forecasting. |
XGBoost (Time Series) | A time series-specific adaptation of XGBoost, tuned for forecasting with sequential data. |
Although these models are already integrated into the MDK, you can add more models as well as modify existing ones to create a better inference model tailored to your specific needs.
Installation
Clone the MDK Repository
Run the following commands in a new terminal window:
git clone https://github.com/allora-network/allora-model-maker.git
cd allora-model-maker
Conda not Installed?
On Mac, simply use brew to install Miniconda:
brew install miniconda
Create Conda Environment
conda env create -f environment.yml
If you want to set it up manually:
conda create --name modelmaker python=3.9 && conda activate modelmaker
pip install setuptools==72.1.0 Cython==3.0.11 numpy==1.24.3
Install Dependencies
pip install -r requirements.txt
Add Tiingo API Key
Go to tiingo.com (opens in a new tab) and set up an API Key after creating an account, which you will input into your .env
file:
# .env
TIINGO_API_KEY=your_tiingo_api_key
Usage
Model Training
make train
Running the above command will guide you through a series of sub-prompts that you can use to curate a unique training set for the given cryptocurrency or stock you choose as a target variable.
Select the Data Source
After running make train
, the command line will prompt you to select your dataset:
Select the data source:
1. Tiingo Stock Data
2. Tiingo Crypto Data
3. Load data from CSV file
Enter your choice (1/2/3):
- Although the MDK is natively integrated with Tiingo, a model maker can effectively configure any data set to train on from a CSV file as well.
Select the Target Variable
After selecting your data source, you will be prompted to pick a target variable for your model to provide inferences on.
Enter the crypto symbol (default: btcusd):
Select the Time Interval
Next, you'll have to select the time interval. The time interval determines how frequently the data points are sampled or aggregated over a given period of time.
- If you're dealing with smaller epoch lengths, shorter intervals like minutes or seconds might be necessary to capture rapid changes in the market.
- For longer epoch lengths, you may choose daily, weekly, or monthly intervals.
Enter the frequency (1min/5min/4hour/1day, default: 1day):
Using shorter time intervals increases CPU power requirements because the dataset grows significantly. More data points lead to larger memory consumption, longer data processing times, and more complex computations. The CPU has to handle more input/output operations, and models take longer to train due to the higher volume of data needed to capture patterns effectively.
Start and End Date
When selecting the start and end dates for your training data, keep in mind that larger time periods result in more data, requiring increased CPU power and memory. Longer timeframes capture more trends but also demand greater computational resources, especially during model training.
Enter the start date (YYYY-MM-DD, default: 2021-01-01):
Enter the end date (YYYY-MM-DD, default: 2024-10-20):
Selecting Models to Train
Now that we've set up our data source, target variable, and time interval, it's time to select the models to train on. In the prompt, you can either choose to train on all available models or make a custom selection.
Select the models to train:
1. All models
2. Custom selection
Enter your choice (1/2):
If you opt for Custom selection, you will be prompted to choose from the regression techniques listed earlier, such as ARIMA, LSTM, Random Forest, or XGBoost.
Model Selection
Now that we've set up our data source, target variable, and time interval, it's time to select the models to train on. In the prompt, you can either choose to train on all available models or make a custom selection.
Select the models to train:
1. All models
2. Custom selection
Enter your choice (1/2):
If you opt for Custom selection, you will be prompted to choose from the regression techniques listed earlier, such as ARIMA, LSTM, Random Forest, or XGBoost. You can select the models that are best suited for your specific problem or dataset.
Model Evaluation
After selecting and training the models, the next step is to evaluate them. The MDK provides built-in tools to assess the performance of your model using standard metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Simply run:
make eval
This will generate performance reports, helping you identify the best model to deploy.
Deployment
Deploying a model requires packaging your trained model from the MDK and integrating it with a worker node repository before exposing the worker as an endpoint.
Package your Trained Model
Run the following command to package your model for the Allora worker:
make package-arima
Replace arima with the name of the model you’d like to package (e.g., lstm, xgboost, etc.).
This will:
- Copy the model’s files and dependencies into the
packaged_models/package folder
. - Run test's for inference and training to validate funtionality in a worker
- Generate a configuration file,
config.py
, that contains the active model information.
Clone the Allora Worker Repository
Run the following commands in a new terminal window:
git clone https://github.com/allora-network/allora-worker.git
cd allora-worker
Integrate your Model
After running the packaging command:
- Navigate to the packaged_models folder in your allora-model-maker repo.
- Copy the package folder into the src folder of your allora-worker repository.
If you did this right in your allora-worker repo you'll now have allora-worker/src/package
Deploy your Worker
Expose the Endpoint
Run:
MODEL=ARIMA make run
cd src && uvicorn main:app --reload --port 8000
Replace ARIMA with the name of the model you’d like to package (e.g., LSTM, XGBOOST, etc.).
This will expose your endpoint, which will be called when a worker nonce is available. If your endpoint is exposed successfully, you should see the following output on your command line:
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
You can query your endpoint in the CLI by running:
curl http://127.0.0.1:8000/inference
Deploy to the Network
Now that you have a specific endpoint that can be queried for an inference output, you can paste the endpoint into your config.json
file of your prediction node repository.
Configure Your Environment
- Copy
example.config.json
and name the copyconfig.json
. - Open
config.json
and update the necessary fields inside thewallet
sub-object andworker
config with your specific values:
wallet
Sub-object
nodeRpc
: The RPC URL for the corresponding network the node will be deployed onaddressKeyName
: The name you gave your wallet key when setting up your walletaddressRestoreMnemonic
: The mnemonic that was outputted when setting up a new key
worker
Config
topicId
: The specific topic ID you created the worker for.InferenceEndpoint
: The endpoint exposed by your worker node to provide inferences to the network.Token
: The token for the specific topic you are providing inferences for. The token needs to be exposed in the inference server endpoint for retrieval.
- The
Token
variable is specific to the endpoint you expose in yourmain.py
file. It is not related to any topic parameter.
The worker
config is an array of sub-objects, each representing a different topic ID. This structure allows you to manage multiple topic IDs, each within its own sub-object.
To deploy a worker that provides inferences for multiple topics, you can duplicate the existing sub-object and add it to the worker
array. Update the topicId
, InferenceEndpoint
and Token
fields with the appropriate values for each new topic:
"worker": [
{
"topicId": 1,
"inferenceEntrypointName": "api-worker-reputer",
"loopSeconds": 5,
"parameters": {
"InferenceEndpoint": "http://localhost:8000/inference/{Token}",
"Token": "ETH"
}
},
// worker providing inferences for topic ID 2
{
"topicId": 2,
"inferenceEntrypointName": "api-worker-reputer",
"loopSeconds": 5,
"parameters": {
"InferenceEndpoint": "http://localhost:8000/inference/{Token}", // the specific endpoint providing inferences
"Token": "ETH" // The token specified in the endpoint
}
}
],
Then run:
make node-env
make compose
- This will load your config into your environment and spin up your docker node, which will check for open worker nonces and submit inferences to the network.
If your node is working correctly, you should see it actively checking for the active worker nonce:
offchain_node | {"level":"debug","topicId":1,"time":1723043600,"message":"Checking for latest open worker nonce on topic"}
A successful response from your Worker should display:
{"level":"debug","msg":"Send Worker Data to chain","txHash":<tx-hash>,"time":<timestamp>,"message":"Success"}