Batch rendering at scale: queues, retries, idempotency, cost model

Batch rendering is the act of running the same pinned template version across a dataset to produce many outputs (PNG, PDF, MP4) in a predictable, operationally safe way.

Success looks like this: you can launch a "render 10,000 outputs" job, finish within an expected time window, and get a complete manifest that explains what succeeded, what failed, what was skipped, and exactly how to re-run only the missing subset without duplicating outputs.

This is where teams usually get burned when they "just throw it on a queue":

Retries create duplicates because the system is not truly idempotent
One bad record poisons the whole batch
Font or asset drift makes outputs non-repeatable across time
A small failure rate (1–2%) becomes a support and ops nightmare
Costs become unpredictable because there is no measurement loop

If you are implementing batch rendering around an embedded template/editor stack, treat rendering like payments: expect retries, expect partial failures, and design for deterministic, auditable outcomes.

Quick get started (working Polotno render in ~1 minute)

This is a minimal "hello world" you can run locally to prove the end-to-end loop: load a template JSON, merge one record, and export a PNG/PDF.

1) Install

bash

npm init -y
npm i polotno react-konva konva

If you are adding this to an existing React app, you can skip npm init -y.

2) Create a tiny template + dataset

Use the same mental model as VDP: one template, many records.

javascript

const dataset = [
  {
    customer_name: 'Sarah Patel',
    invoice_id: 'INV-2026-0412',
    due_date: '2026-05-10',
    amount_cents: 129900,
    currency: 'USD',
    plan: 'Business',
    is_overdue: false,
  },
  {
    customer_name: 'James Chen',
    invoice_id: 'INV-2026-0413',
    due_date: '2026-04-01',
    amount_cents: 259900,
    currency: 'USD',
    plan: 'Enterprise',
    is_overdue: true,
  },
];

const designJSON = {
  width: 900,
  height: 520,
  pages: [
    {
      background: '#0B1220',
      children: [
        {
          id: 'title',
          type: 'text',
          x: 48,
          y: 56,
          width: 800,
          text: 'Invoice invoice_id',
          fontSize: 44,
          fontFamily: 'Arial',
          fill: '#ffffff',
        },
        {
          id: 'to',
          type: 'text',
          x: 48,
          y: 124,
          width: 800,
          text: 'Customer: customer_name',
          fontSize: 20,
          fontFamily: 'Arial',
          fill: '#C7D2FE',
        },
        {
          id: 'line',
          type: 'text',
          x: 48,
          y: 168,
          width: 800,
          text: 'Plan: plan  ·  Due: due_date',
          fontSize: 18,
          fontFamily: 'Arial',
          fill: '#A5B4FC',
        },
        {
          id: 'amount',
          type: 'text',
          x: 48,
          y: 240,
          width: 800,
          text: 'Total: amount_formatted',
          fontSize: 34,
          fontFamily: 'Arial',
          fill: '#ffffff',
        },
        {
          id: 'status',
          type: 'text',
          x: 48,
          y: 300,
          width: 800,
          text: 'status_label',
          fontSize: 18,
          fontFamily: 'Arial',
          fill: '#FCA5A5',
        },
      ],
    },
  ],
};

3) Replace variables (deterministically)

Keep the merge step pure and deterministic. No network calls, no timestamps.

javascript

const formatMoney = ({ amount_cents, currency }) => {
  const amount = amount_cents / 100;
  return new Intl.NumberFormat('en-US', {
    style: 'currency',
    currency,
  }).format(amount);
};

const buildRenderData = (record) => {
  const amount_formatted = formatMoney(record);
  const status_label = record.is_overdue
    ? 'Status: Payment overdue'
    : 'Status: Pending payment';

  return {
    ...record,
    amount_formatted,
    status_label,
  };
};

const replaceVariables = (template, data) => {
  let json = JSON.stringify(template);
  Object.keys(data).forEach((key) => {
    const regex = new RegExp(`{{${key}}}`, 'g');
    json = json.replace(regex, String(data[key]));
  });
  return JSON.parse(json);
};

4) Render + export (prototype loop)

This is intentionally single-threaded so it is easy to understand. In production, this becomes your queue worker.

javascript

// pseudo-code outline (exact API wiring depends on your app setup)
// 1) Create a Polotno store
// 2) store.loadJSON(template)
// 3) store.saveAsImage() / store.saveAsPDF()

const renderAll = async ({ store }) => {
  const baseTemplate = designJSON;

  for (const record of dataset) {
    const renderData = buildRenderData(record);
    const personalized = replaceVariables(baseTemplate, renderData);

    await store.loadJSON(personalized);
    await store.waitLoading();

    await store.saveAsImage({
      fileName: `${record.invoice_id}.png`,
      pixelRatio: 2,
    });

    // Or export a PDF per record
    // await store.saveAsPDF({ fileName: `${record.invoice_id}.pdf` });
  }

  await store.loadJSON(baseTemplate);
};

5) What to do next

If you want to sanity-check the editing experience, open the Polotno Studio editor and try a couple of exports.

If you are embedding this into your own product, follow the SDK getting started guide and wire the same render loop into a worker.

1) What "batch rendering" means (in template systems)

Batch rendering means executing the same template version across many records to produce outputs deterministically.

In template systems, it typically shows up as:

Personalized PDFs for direct mail, invoices, or statements
Social and ad creative variants generated from a product feed
On-demand document generation inside a SaaS product
Branded assets generated from AI outputs and structured data

Typical triggers include campaign sends, nightly exports, CRM segment updates, product feed changes, or "re-render everything with the latest brand template."

Typical deliverables include:

One output file per record (preferred for traceability)
Optional merged PDFs for print houses or stakeholder review
A machine-readable manifest of all records and their output URLs

2) Core concepts (define once)

These definitions are intentionally operational. Use them in logs, manifests, and dashboards.

Record: One unit of personalization. Example: one customer, one product SKU, one listing, one invoice.
Dataset: A versioned collection of records. Prefer "dataset version" over "current snapshot."
Template version: An immutable snapshot of the template (and its dependencies) used to render outputs. Treat it as a build artifact.
Render job: The batch-level unit of work. Example: "render template v42 for dataset v7."
Render task: The per-record unit of work inside a job.
Attempt: A single execution of a render task. Each attempt has an outcome and error metadata.
Determinism vs best-effort: Deterministic means same inputs produce the same output. Best-effort means you accept drift.
Idempotency key: A stable identifier used to deduplicate execution and outputs across retries and replays.
Dead letter queue (DLQ): The holding area for tasks that exceeded retries or are deemed non-retriable.

3) Reference architecture (end-to-end)

A resilient batch renderer is an assembly line, not a single service.

Numbered pipeline flow (happy path):

A producer creates a render job from a dataset and a pinned template version.
The producer expands that job into per-record render tasks and pushes them to a durable queue.
Stateless workers claim tasks, fetch assets, render, and upload outputs.
Workers write per-record results to a manifest store (or job database).
A control plane tracks progress and enables pause, cancel, or partial rerun.
A reconciler verifies expected vs produced outputs and generates a final report.

Reference components:

Producer: Creates jobs and tasks. Owns "what should exist."
Queue: Durable message broker with visibility timeouts and retry semantics.
Workers: Stateless render executors. Horizontal scale. Strict resource limits.
Asset layer: Cached fonts/images and deterministic asset resolution.
Output storage: Object store (S3/GCS/etc.) with stable paths and optional CDN.
Control plane: Job state, throttling, cancel/pause, run history, reruns.

If Polotno SDK is part of your rendering stack, the integration path is typically: template creation inside your product (embedded editor) → store template JSON as a versioned artifact → render that template JSON on a backend worker and export to PNG/PDF/MP4.

For an integration starting point, use the SDK getting started guide: Start integrating

4) Job payload design (what to include)

Your payload should be debug-friendly, safe, and stable across time.

Minimum fields:

template_version_id (required): The immutable identifier.
record_id: Stable ID used in logs and output naming.
record payload or pointer: Either include the data or reference a signed URL / database row version.
output spec: Format (pdf/png/jpg/mp4), size, bleed, DPI, naming rules.
trace/correlation IDs: For joining logs across producer, queue, worker, and storage.
idempotency key: See next section.

Payload guidance:

Prefer a pointer to a versioned payload if records are large or contain PII.
Include a schema version in the payload so workers can handle evolution safely.
Include explicit locale/timezone formatting directives if text/date formatting matters.

5) Idempotency & deterministic outputs

"Same inputs → same output" is non-negotiable if you want partial reruns, safe retries, and predictable debugging.

Pin everything that can drift:

Template version (not "latest template")
Font files and font versions
Images and external assets, ideally by content hash
Rendering runtime version (or container image digest)
Any "data transforms" used to map raw records into template-ready fields

Idempotency key: how to compute it

A practical pattern is to compute an idempotency key as a hash of:

template_version_id
dataset_version_id (or record_version)
record_id
output_spec (size, format, bleed)
renderer_version

If any of those change, you want a new key. If they do not change, you want the exact same output path and the system should dedupe safely.

Output naming + dedupe strategy

Store outputs at a stable path derived from the idempotency key.
Make uploads conditional. Example: "upload only if object does not already exist," or "compare content hash."
Write manifest entries with the idempotency key so the reconciliation step can detect duplicates.

Handling non-deterministic inputs

The most common determinism killers are timestamps, random IDs, and "current state" lookups.

Guardrails:

Forbid non-deterministic fields in templates (or isolate them behind an explicit "best-effort" mode).
Never call external APIs at render time unless you version and cache responses.

6) Retry strategy (the hard part)

Retries are a feature only if the system can tolerate replays.

Classify failures:

Transient: Network timeouts, temporary storage errors, short-lived rate limits.
Deterministic: Bad data, missing required fields, invalid images, broken template JSON.
Systemic: A bad deploy, a broken renderer build, an expired credential, an upstream outage.

Backoff strategy and max attempts

Use exponential backoff with jitter for transient errors.
Keep max attempts low enough to protect the system under incident conditions.
Consider separate retry policies per error class.

When to stop retrying and route to DLQ

A render task should go to the DLQ when:

It is clearly deterministic (same failure twice with identical inputs).
It exceeds max attempts.
It fails with an explicit non-retriable validation error.

Capturing error artifacts

To debug "1% failures," you need artifacts. Capture at least:

Error type + message
Template version and renderer version
The resolved record payload (or pointer)
Asset fetch failures with URLs and status codes
Optional: a proof render (low-res) or screenshot of the broken output

Polotno-focused note: if you expose a programmatic rendering API internally, keep it stable and versioned. The more "debuggability" you add (render logs, asset resolution logs), the faster your support loop becomes.

7) Partial reruns and reconciliation

The goal is not "all green." The goal is "explainable outcomes and safe reruns."

Re-render only failed records

Store per-record states so you can re-queue only:

failed
timed_out
skipped_due_to_dependency

Keep "success" immutable unless you intentionally invalidate it (template version change, renderer change, or data change).

Produce a reconciliation report

Your final output should be a report that includes:

Total records expected
Total outputs produced
Success count
Failure count
Skipped count
DLQ count
Links to a manifest and any DLQ artifacts

"Exactly-once" vs "at-least-once" reality

Most queues give you at-least-once delivery. Accept it.

Design consequences:

Workers must be idempotent.
Output storage must be dedupe-safe.
Manifests must be upserted using a stable key.

8) Throughput, concurrency, and backpressure

Scaling batch rendering is primarily about protecting shared dependencies.

Worker concurrency limits

Set strict CPU and memory budgets per worker.
Limit concurrent renders per node. Rendering is often memory-heavy.
Add a global cap per job to keep large jobs from starving smaller ones.

Queue visibility timeout / lease

Ensure the visibility timeout exceeds worst-case render time.
Extend leases when a render is still active.
If a task exceeds the lease and gets re-delivered, idempotency prevents duplicate outputs.

Rate limits when fetching assets

Asset fetches often become your bottleneck.

Apply per-domain rate limits.
Cache aggressively.
Pre-warm assets for large jobs.

Protecting shared resources

Avoid hitting your primary DB on every render. Use payload pointers and caching.
Treat fonts as build artifacts, not runtime fetches.

9) Caching and asset efficiency

Caching is the difference between "rendering is expensive" and "rendering is predictable."

Font caching across jobs

Package fonts into the worker image or mount a versioned font bundle.
Cache fonts by content hash so you can safely reuse across jobs.

Image caching and resizing strategy

Cache original images and derived sizes.
Resize once, reuse many times.
Keep "render-time transforms" minimal.

Warm pools vs cold starts

For spiky workloads, keep a warm pool of workers.
For steady workloads, autoscale with measured concurrency.

10) Output packaging patterns

Your packaging should match your consumer.

One file per record vs merged PDF

One file per record is best for traceability, partial reruns, and downstream automation.
Merged PDFs are useful for print houses or stakeholder review, but make partial reruns harder.

Folder layout

A clean, predictable layout avoids accidental overwrites:

/job_id/record_id/output.pdf
/job_id/record_id/output.png
/job_id/manifest.json

Manifest schema

At minimum: record_id → output_url + status + idempotency_key + error metadata.

Compression / zip bundles (optional)

Only compress if a downstream consumer truly needs it. Compression adds CPU and creates large "single artifact" failure modes.

11) Cost model (make it concrete)

Rendering cost is not mysterious. It is measurable and controllable.

Cost drivers

CPU time per render (seconds)
Memory footprint (drives instance sizing)
Asset bandwidth and egress
Storage (outputs + manifests + artifacts)
Proof runs and retries

Levers

Reduce render time with caching and prefetching
Reduce retries by validating records before enqueue
Reduce egress with regional storage/CDN placement
Reduce storage by expiring intermediate artifacts
Bound concurrency to avoid cascading failures

Estimation method

Sample 100 representative records.
Measure p50 and p95 render time, memory usage, and asset bytes.
Extrapolate to the full dataset.
Add a safety margin for incident conditions (asset slowness, transient retries).

A practical rule: you should be able to predict job duration and compute cost to within a small factor before you press "run." If you cannot, you are missing instrumentation.

12) Observability and operations

Assume incidents happen. Operate the renderer like production infrastructure.

Metrics

Success rate per template version
p50 / p95 render time
Retries per job and per error class
Queue backlog and age
DLQ volume
Asset fetch error rate

Logging

Log with record_id, job_id, template_version_id, idempotency key.
Emit structured logs so you can aggregate by template version or customer.

Alerting

Spike in failure rate above baseline
Rapid growth in queue backlog
DLQ growth
Sudden increase in render time (possible asset issues or regression)

13) Security & compliance (if dataset has PII)

Batch rendering often touches sensitive data. Keep it tight.

Minimize payload exposure in queues by using pointers and encryption.
Use encryption in transit and at rest.
Define retention policies for outputs and error artifacts.
Prefer scoped, time-limited access tokens for asset fetch and storage upload.

14) FAQ

How do I generate 10,000 personalized PDFs safely?

Pin a template version, validate records before enqueue, use an at-least-once queue with idempotent workers, write a per-record manifest, and design reruns that only reprocess failed records.

How do I debug "1% failures" without rerunning everything?

Capture error artifacts per record, route non-retriable failures to a DLQ, and use the manifest to re-queue only failed record_ids. Build a "failure explorer" that groups by error type and template version.

How do I cancel or pause a running batch?

Use a control plane state (paused/cancelled). Workers should check state between major steps (before render, before upload). For cancellation, ensure idempotency so tasks that are in-flight cannot create duplicate outputs.

How do I keep costs predictable?

Measure render time and bytes early, cap concurrency per job, cache aggressively, and run a small sample to estimate total cost before running the full batch.

Glossary of terms

At-least-once delivery: A queue may deliver the same task more than once; consumers must dedupe.
Backoff: Waiting longer between retries to reduce pressure on dependencies.
Control plane: The system that tracks job state, progress, cancel/pause, and reruns.
Dead letter queue (DLQ): Where tasks go when they cannot be processed successfully.
Determinism: Same inputs, same output, across time.
Idempotency key: Stable key that ensures repeated attempts do not create duplicates.
Manifest: A file or table that maps each record to output URL and status.
Partial rerun: Re-processing only failed or missing records.
Render worker: A stateless process that executes render tasks.
Template version: Immutable template snapshot used for a job.