When your automation needs to process tens of thousands of products, running them sequentially in a single job can take hours. Batch processing splits the work into smaller chunks that run in parallel, dramatically reducing total processing time.
When to Use Batching
Batching is most useful when:
- Your product catalog exceeds 10,000 items
- The automation exports to a slow API endpoint (one request per product)
- You want to process large vendor feeds faster by parallelizing the work
How Batching Works
When you set limit_batch_size, the Automation Engine queries all matching products, divides them into chunks of that size, and enqueues each chunk as a separate parallel job. Each batch processes independently, and a finalization step runs once all batches complete to stitch the results together.
Batch Lifecycle
- Enqueueing -- The parent automation queries all matching product IDs and splits them into batches. Each batch is saved and enqueued as a separate job.
- Processing -- Each batch runs as an independent automation, processing only its assigned product IDs. Multiple batches execute in parallel up to the concurrency limit.
- Complete / Failed -- Each batch updates its status when it finishes. If a batch fails, it can be retried automatically.
- Finalization -- Once all batches reach a terminal state (complete or failed), the finalizer job runs to consolidate results and update the automation log.
Configuration Options
Core Batch Settings
Add these to a file_configs entry:
"file_configs": [
{
"field_map": { "guid": "sku", "price": "price", "stock": "qty" },
"update": "edit",
"limit_batch_size": 5000,
"limit_batch_concurrency": 10
}
]
| Key | Description | Default |
|---|---|---|
| limit_batch_size | Number of items per batch. Maximum is 10,000. | -- (batching disabled) |
| limit_batch_concurrency | Maximum number of batches running in parallel at once. Maximum is 100. | 10 |
Stall Handling
Sometimes a batch can get stuck -- perhaps the worker crashed or the API endpoint became unresponsive. The engine monitors for stalled batches and automatically retries them:
| Key | Description | Default |
|---|---|---|
| batch_stall_time | Seconds of inactivity before a batch is considered stalled. | 10800 (3 hours) |
| batch_max_retries | Number of times to retry a stalled batch before marking it as failed. | 3 |
When a batch has been inactive longer than batch_stall_time, the engine re-enqueues it. If a batch exceeds batch_max_retries, it is marked as failed so the remaining batches can finish and the finalizer can run.
Batch Throttling
If the target API has strict rate limits, you can throttle how quickly batches are dispatched:
"batch_throttle": {
"request_limit": 1,
"time_period": 60
}
| Key | Description | Default |
|---|---|---|
| request_limit | Number of batches to dispatch within the time period. | Required |
| time_period | Time window in seconds. | 1 |
Other Limit Controls
Beyond batching, several other limit settings control how much data an automation processes:
| Key | Description |
|---|---|
| limit_export | When exporting with payload_multi: true, limits the number of objects included in each HTTP request. The automation splits into multiple requests to send all data. |
| limit_import | Maximum total number of items to process during an import. Items beyond this limit are skipped. |
| limit_template_size | Limit a template payload to approximately this size in megabytes. Useful when APIs have request body size limits. |
| limit_files | When using regex to match multiple files, limit how many files are processed. |
Example: Processing 50K Products in Batches
Suppose you have a vendor feed with 50,000 products and you want to import updates efficiently:
{
"name": "Daily Vendor Stock Update",
"vendor": "Acme Distributor",
"active": true,
"schedule": "0 6 * * *",
"type": "products",
"action": "import",
"connection": {
"type": "sftp",
"address": "sftp.acme.com",
"username": "{{sftp_user}}",
"password": "{{sftp_pass}}",
"path": "/exports/",
"port": 22
},
"file_configs": [
{
"name": "daily_inventory.csv",
"update": "edit",
"field_map": {
"guid": "SKU",
"stock": "QtyOnHand",
"price": "DealerPrice"
},
"diff_update": true,
"diff_fields": ["stock", "price"],
"limit_batch_size": 5000,
"limit_batch_concurrency": 5,
"batch_stall_time": 7200,
"batch_max_retries": 2
}
]
}
With this configuration:
- The 50,000 products are split into 10 batches of 5,000 each.
- Up to 5 batches run in parallel at a time.
- If any batch has no activity for 2 hours, it is retried up to 2 times.
- Combined with diff_update, only products with actual stock or price changes generate updates, further reducing processing time.