README
ReadyAI
Introduction to ReadyAI
ReadyAI is an open-source initiative aimed at provide a low-cost resource-minimal data structuring and semantic tagging pipeline for any individual or business. AI runs on Structured Data. ReadyAI is a low-cost, structured data pipeline to turn your raw data into structured data for your vector databases and AI applications.
If you are new to Bittensor, please checkout the Bittensor Website before proceeding to the setup section.
Key Features
Raw Data in, structured AI Ready Data out
Fractal data mining allows miners to process a wide variety of data sources and create tagged, structured data for the end user’s specific needs
Validators establish a ground truth by tagging the data in full, create data windows for fractal mining, and score miner submissions
Scoring is based on a cosine distance calculation between the miner’s window tagged output and the validator’s ground truth tagged output
ReadyAI has created a low-cost structured data pipeline capitalizing on two key innovations: (1) LLMs are now more accurate and cheaper than human annotators and (2) Distributed compute vs. distributed workers make this infinitely scalable
Incentivized mining and validation system for data contribution and integrity
Getting Started
Installation & Compute Requirements
This repository requires Python version greater than 3.8 and up to 3.11. To get started, clone the repository and install the required dependencies:
Miners & Validators using an OpenAI API Key will need a CPU with at least 8GB of Ram and 20GB of Disk Space.
Quickstart Mock Tests
The best way to begin to understand ReadyAI’s data pipeline is to run the unit tests. These tests are meant to provide verbose output so you can see how the process works.
Configuration
Let's configure your instance and run the tests that verify everything is setup properly.
You'll need to duplicate the dotenv file to setup your own configuration:
Use your editor to open the .env file, and follow instructions to enter the required API Keys and configurations. An OpenAI API key is required by both miners and validators*. GPT-4o is the default LLM used for all operations, as it is the cheapest and most performant model accessible via API. Please see LLM Selection Below for more information.
A Weights and Biases Key is required by both miners and validators as well.
Please follow all instructions in the .env
If you're on a Linux box, the nano editor is usually the easiest:
LLM Selection
Please follow all instructions in the .env
LLM utilization is required in this subnet to annotate raw data. As a miner or validator, GPT-4o is the default LLM used for all operations. If you wish to override this default selection, you can follow override instructions below or in your .env file. After completing the steps in Configuration, you can open up your .env file, and view the options. Currently, we offer out-of-the-box configuration for OpenAI, Anthropic, and groq APIs.
To change the default OpenAI Model used by your miner or validator, you first must uncomment LLM_TYPE_OVERRIDE=openai and the select your model using the OPENAI_MODEL parameter in the .env:
If you wish to use a provider other than OpenAI, you select your LLM Override by uncommenting a line in this section of the .env:
Please ensure you only have one LLM_TYPE_OVERRIDE config parameter uncommented before moving on. Once you have selected the LLM_TYPE, follow prompts in the .env file to fill in required fields for your override LLM provider.
Running the Tests
Once you have finalized your configuration, you can run the miner loop test suite, which now exercises the entire flow (validator + miner). These tests use conversations from ReadyAI's test API or a local source and your OpenAI key, but they do not touch the Bittensor network.
First, set up a fresh virtual environment for running tests, and install the test requirements. (These requirements differ from production; keep them separate from your normal venv.)
Miner Loop Test
Run the miner loop test:
This test will:
Start a validator
Obtain a test conversation from the ReadyAI API (or from your local API if configured)
Details on how to use a local api are here
Generate ground-truth tags
Break the conversation into windows
Behave like 3 miners
Send conversation windows to the miners
Each miner:
Processes the window with the LLM
Generates tags, annotations, and embeddings
Returns metadata to the validator
The validator:
Receives metadata
Scores tags against the full ground truth
Pushes metadata into the store
You’ll see detailed logs of scoring and metadata evaluation. Example output is shown below.
Notes
These tests run outside the Bittensor network (so no emissions).
They require your OpenAI key in
.env.The Api hosts and ports used in the tests come from the
.envtoo.Check in the according section for more details
If you see errors, check your
.envand Python environment and re-run.
Once the test passes and you’re ready to connect to the testnet. Please see Registration
For testing with a local API
If you want to use the local API instead, you need to follow the steps here
Then modify the .env to point at the web server. Comment out the lines:
Uncomment the lines:
If you want your run to be uploaded to WandB, set WAND_ENABLED=1
After these changes, the DB Read/Write Configuration section of the .env file should look like this:
Now you can run the test script and see the data written properly (replace the filename with your database file).
Or from the Docker:
That will provide some of the data inserted into the results table.
API Metrics
The API exposes a /metrics endpoint that you can scrape with Prometheus to have information about the usage of your API.
By default, the basic metrics are exposed, but there is also a custom one:
api_requests_total is a counter that is increased everytime a requests is received. It is labeled with the
api_key,ip,pathandstatusof the request.
Feel free to add more and open a PR. If they help you they will help someone else!
To scrape the metric endpoint of the API with a local Prometheus deployment, add this to your scrape_configs:
If you host it somewhere else, adjust the target to use your specific ip and port combination.
Registration
Before mining or validating, you will need a UID, which you can acquire by following documentation on the bittensor website here.
To register on testnet, add the flag --subtensor.network test to your registration command, and specify --netuid 138 which is our testnet subnet uid.
To register on mainnet, you can speciy --netuid 33 which is our mainnet subnet uid.
Subnet Roles
Mining
You can launch your miners on testnet using the following command.
To run with pm2 please see instructions here
If you are running on runpod, please read instructions here.
To setup and run a miner with Docker, see instructions here.
Once you've registered on on mainnet SN33, you can start your miner with this command:
Validating
To run a validator, you will first need to generate a ReadyAI Conversation Server API Key. Please see the guide here. If you wish to validate via local datastore, please see the section below on Validating with a Custom Conversation Server
You can launch your validator on testnet using the following command.
To run with pm2 please see instructions here
If you are running on runpod, please read instructions here
To setup and run a validator with Docker, see instructions here.
Once you've registered on on mainnet SN33, you can start your miner with this command:
Validating with a Custom Conversation Server
Validators, by default, access the ReadyAI API to retrieve conversations and store results. However, the subnet is designed to be a decentralized “Scale AI” where each validator can sell access to their bandwidth for structuring raw data. The validator can run against any of its own data sources and process custom or even proprietary data.
Make sure the raw data source is reasonably large. We recommend 50,000 input items at a minimum to prevent miners re-using previous results.
The Code
In the web/ folder, you will find a sample implementation of a Custom Server setup. You will want to modify this server for your own needs.
The relevant code files in the web/ folder include:
readyai_conversation_data_importer.py-- An example processor that reads ReadyAi/5000-podcast-conversations-with-metadata-and-embedding-dataset and processes a subset of it and inserts it into theconversations.sqlitedata storefacebook_conversation_data_importer.py-- An example processor that reads the subset of the Facebook conversation data and processes it into theconversations.sqlitedata storeapp.py-- A FastAPI-based web server that provides both the read and write endpoints for conversation server.
Data files include:
facebook-chat-data_2000rows.csv-- A 128 conversation subset of the Facebook conversation data (full data available here: https://www.kaggle.com/datasets/atharvjairath/personachat/data)
Additional files included:
start_conversation_store.sh-- Convenient bash file to start the server
Converting the Example Data
Install dependencies and navigate to the proper folder:
Now you will run the data importer script:Sqlite database:
This will download the training data from ReadyAi/5000-podcast-conversations-with-metadata-and-embedding-dataset and insert the conversations into the conversations.sqlite database. If you delete the conversations.sqlite then it will create a new one and insert the data.
You can also use
facebook_conversation_data_importer.pyif you want another dataset!
After launching the command, should see progress like this:
If you have sqlite3 installed, you can open the database file and see the inserted data like like:
That will show you the tables in the database (only 1 -- conversations) and then you will see one of the conversations like this:
With the data populated, you're ready to start running the server.
Important: Do not run your validator against this example dataset on mainnet. Please use a custom dataset of at least 50,000 raw data sources at a minimum to prevent miners from re-using previous results. Modify this script to process and load the data from a more robust data store that you've selected.
🚀 Running the Conversation Server Locally
Using the Prebuilt Docker Image
This section shows you how to quickly run the API server using a prebuilt Docker image—no build step required!
The image comes preloaded with a conversations.sqlite database containing 4,888 podcast conversations ready for training or testing.
Steps
Create Your
.envFileCopy the example environment file and create your own configuration:
Configure the Environment
Open the
.envfile and set theTYPEvariable toapi:You will also need to adjust the endpoints for your needs as explained here
[!IMPORTANT] If you are a validator and you want to use the local api and test dataset to send conversations to miners, you have to set
TYPE=validatorandSTART_LOCAL_CGP_API=trueinsteadStart the Server
Run the following script to launch the server:
This will:
Download the Docker image if not already present
Start the API using Docker Compose
To build the image yourself instead of using the prebuilt one:
Verifying the Server
If the server starts correctly, your logs should show something like:
Test the API
Make a test request to verify it’s running:
Expected output:
From the Python Code
This section will walk you through how to get the server up and running from the available Python code.
To get the server up and running, you can use the bash file:
To run this in pm2, please following installation instructions here and then use the command
Important: By default, the API will return a random task ("conversation_tagging", "webpage_metadata_generation", "survey_metadata_generation").
If you want it to return a specific task for testing purposes, you can add the following body to the post:
Helpful Guides
Using Runpod
Runpod is a very helpful resource for easily launching and managing cloud GPU and CPU instances, however, there are several configuration settings that must be implemented both on Runpod and in your start command for the subnet.
Choosing an Instance
To run the subnet code for ReadyAI, you'll need either a GPU or a CPU, depending on your subnet role and configuration.
Miners & Validators using an OpenAI API Key, you will need a CPU with at least 8GB of Ram and 20GB of Disk Space. Runpod provides basic CPU units of different processing powers.
Configuring Your Instance
Runpod Instances are dockerized. As a result, there are specific ports configurations needed to be able to run processes over the network.
When you are launching your pod, and have selected your instance, click "Edit Template."
With the editing window open, you adjust your container disk space and/or volume diskspace to match the needs of your neuron, and you can expose additional ports. You will need to expose symmetrical TCP Ports, which requires you to specify non-standard ports >=70000 in the "Expose TCP ports" field. Add however many ports you will need (we recommend at least 2, or more if you want to run additional miners).
Now, you can deploy your instance. Once it is deployed, navigate to your pods, find the instance you just launched, click "Connect" and navigate to the "TCP Port Mappings" tab. here, you should see your Symmetrical TCP Port IDs.
NOTE: Even though the port does not match the original values of 70000 and 70001, two symmetrical port mappings were created. These can be used for bittensor neurons
Starting Your Neuron
Important!! You will need to add one of these ports to your start command for the neuron you are running, using the flag
--axon.port <port ID>
Every process will require a unique port, so if you run a second neuron, you will need a second Port ID.
Running a Subtensor on Runpod
Unfortunately, there is no stable and reliable way to run a local subtensor on a Runpod Instance. You can, however, leverage another cloud provider of your choice to run a Subtensor, and connect to that local subtensor using the --subtensor.chain_endpoint <your chain endpoint> flag in your neuron start command. For further information on running a local subtensor, please see the Bittensor Docs.
Managing Processes
While there are many options for managing your processes, we recommend either pm2 or Screen. Please see below for instructions on installing and running pm2
Making sure your port is open
For nodes to talk together properly, it's imperative the ports they use are open to communication. Miner/Validator communication is done via HTTP, therefore you have to ensure your node can receive that type of traffic on the port you serve in your Axon.
Example: If your axon is served on 123.123.123.123:22222, you must ensure HTTP traffic works on port 22222.
To easily validate if the port is open and receives traffic, you can do the following:
Get on the machine you want to validate the port on
Start a temporary Python HTTP server on the specified port using the following command:
If you see this, it worked!
Test connectivity from another machine by running:
Check the response:
If you see incoming requests in the terminal of the server machine, the port is open and functioning correctly.
If no request appears, the traffic is being blocked. You may need to investigate firewall settings, network rules, or port forwarding configurations.
pm2 Installation
To install Pm2 on your Ubuntu Device, use
The basic command structure to run a process in pm2 is below:
Running a Miner with PM2
To run a miner with PM2, you can use the following template:
Running a Validator with PM2
To run a validator with PM2, you can use the following template:
Useful PM2 Commands
The following Commands will be useful for management:
Running a Miner or a Validator with Docker
Requirements
Registering the hotkey on the subnet 33
or 138 if you want to run on the test network
Getting up and running
Follow these steps to set up and run a miner or validator using Docker:
1. Configure Your Wallet
Ensure that your coldkey and hotkey are properly set up on the machine you intend to use. These should be stored in:
It is best practices when you regenerate your coldkey on a machine to mine or validate, to regenerate only the public coldkey using the following command:
btcli w regen-coldkeypub
2. Set Up Environment Variables
At the root of the repository, create a copy of the environment variables file:
3. Update Configuration
Modify the .env file to include your specific values and ensure all required fields are set:
Do not forget to set your OpenAI API Key
Set
TYPE=minerto run a miner, orTYPE=validatorto run a validator.Set
NETWORK=finneyto run on the main net, orNETWORK=testto run on the test net.Don't forget the port you chose has to be open and be able to receive HTTP requests. To validate follow the steps here.
If you are a validator:
Do not forget to set your
WANDB_API_KEYand to setWAND_ENABLEDto 1On Finney, do not forget to setup your ReadyAI API key by following the steps here and make sure you have a file called
readyai_api_data.jsoncontaining your API key.On Test net, rename the provided API key in the root of the repository from
testnet_readyai_api_data.jsontoreadyai_api_data.jsonusingcp testnet_readyai_api_data.json readyai_api_data.json. It will be pre-loaded in the Docker automaticaly.
4. Start the Node
Once the configuration is complete, start the node using:
5. Monitor Logs
To check the node logs:
List running containers:
Find the CONTAINER ID of your node.
Stream the logs:
This setup ensures that your miner or validator runs smoothly within a Docker environment. 🚀
How to Run a Bittensor Miner on Subnet 33 Using Runpod
This guide walks you through the process of deploying a Bittensor miner on Subnet 33 using Runpod. In a few simple steps, you’ll go from zero to mining on the testnet or mainnet.
✅ Requirements
A Runpod.io account
An OpenAI API key
🔐 Generate Your SSH Key
You’ll use SSH to access your miner. On your local machine, run the following:
Then, add your key to the SSH agent:
🚀 Deploy the Pod on Runpod
Click the template link to start deploying 👉 Launch Template
Create a Network Volume
This is your miner’s persistent storage.Choose a data center with available CPU instances.

Set a volume name and allocate 10 GB (enough for typical usage).
✅ Tip: Make sure the volume is selected before creating the pod.
Choose a Pod Configuration
Select the cheapest available CPU option — it’s sufficient for this subnet.
Edit the Template and Set Environment Variables
VariableValueNETWORKUse
testfor testnet orfinneyfor mainnet.OPENAI_API_KEYCreate a Runpod secret named
openai_keycontaining your API key.SSH_PUBLIC_KEYPaste the contents of your public SSH key (Usually in:
~/.ssh/id_ed25519.pub).
🧳 First Boot: Create Your Wallet
Once the pod starts, it will download the Docker image and initialize. You’ll see an error if your wallet isn't yet created — that's expected.
Connect via SSH
In your pod's Connect section, you'll find the SSH command to access the miner. Use that to SSH in:
Create Your Wallet (if you don’t already have one) or recreate your wallet if you do!
Your wallet must be in
/workspace/walletsin order be picked up by the miner and be persisted.
Option 1: Create your wallet
We will create both the hotkey and coldkey at the same time:
🔒 IMPORTANT: Backup your wallet to your local machine using
scp:
Option 2: Recreate your wallet
First recreate your public coldkey:
Then recreate your hotkey:
🌍 Make Sure Your Miner is Reachable
Validators need to be able to reach your miner's Axon port.
Click Connect on your pod.

Note the Direct TCP port (not port 22). This is your Axon port and must be publicly accessible.

To test if it's reachable:
You should:
Get a JSON-like response in your terminal

See logs in your miner indicating it received a connection

🔑 Registering your miner
You will need:
Enough Tao to pay the registration fee
You can register by following these steps:
Connect via SSH into your miner as explained here
Run the following command and complete the prompts:
You can change the --netuid to 33 and the --network to finney if you want to register on the main Bittensor network
🎉 Your Miner Is Live!
You should now see logs indicating that the miner is running and active. When it actually handles validator requests, you will see logs like this: 
It can take up to 1 hour before validators send requests to your miner.
📌 Final Notes
If you're using testnet, you can get TAO from the faucet.
Keep an eye on your pod’s logs to monitor performance and connection health.
Don’t forget to monitor your wallet, especially if you're switching to mainnet!
ReadyAI Overview
ReadyAI uses the Bittensor infrastructure to annotate raw data creating structured data, the “oil” required by AI Applications to operate.
Benefits
Cost-efficiency: Our validators can generate structured data from any arbitrary raw text data. ReadyAI provides a cost-efficient pipeline for the processing of unstructured data into the valuable digital commodity of structured data.
Quality: By using advanced language models and built-in quality control via the incentive mechanism arbitrated by validation, we can achieve more consistent, higher-quality annotations compared to crowd workers.
Speed: AI-powered annotation can process data orders of magnitude faster than human annotators.
Flexibility: The decentralized nature of our system allows it to rapidly scale and adapt to new task types. Validators can independently sell access to this data generation pipeline to process any type of text-based data (e.g. conversational transcript, corporate documents, web scraped data, etc.)
Specialized knowledge: Unlike general-purpose crowd workers, our AI models can be fine-tuned on domain-specific data, allowing for high-quality annotations on specialized topics.
System Design
Data stores: Primary source of truth, fractal data windows, and vector embedding creation
Validator roles: Pull data, generates overview metadata for data ground truth, create windows, and score submissions
Miner roles: Process data windows, provide metadata and annotations
Data flow: Ground truth establishment, window creation, miner submissions, scoring, and validation
Reward Mechanism
The reward mechanism for the ReadyAI subnet is designed to incentivize miners to contribute accurate and valuable metadata to the ReadyAI dataset. Three miners are selected by a validator to receive the same Data Window, which is pulled from a larger raw data source. After they generate a set of tags for their assigned window, miners are rewarded based on the quality and relevance of their tags, as evaluated by validators against the set of tags for the full, ground truth data source.
A score for each miner-submitted tag is derived by a cosine distance calculation from the embedding of that tag to the vector neighborhood of the ground truth tags. The set of miner tags is then evaluated in full based on the mean of their top 3 unique tag scores (55% weight), the overall mean score of the set of tags submitted (25% weight), the median score of the tags submitted (10% weight) and their single top score (10% weight). The weights for each scoring component prioritize the overall goal of the miner– to provide unique and meaningful tags on the corpus of data – while still allowing room for overlap between the miner and ground truth tag sets, which is an indication of a successful miner. There are also a set of penalties that will be assessed if the miner response doesn’t meet specific requirements - such as not providing any tags shared with the ground truth, not providing a minimum number of unique tags, and not providing any tags over a low-score threshold. The tag scoring system informs the weighting and ranking of each server in the subnet.
License
This repository is licensed under the MIT License.
Last updated