Polotno

Developer tool

PDF to JSON converter

Convert PDF to structured JSON in your browser. Get pages, text, fonts, images, and positions as a clean Polotno-schema object — feed it to an LLM, automate edits, or load into a canvas editor.

Quick answer

Drop a PDF, get a structured JSON describing every page, text run, image, and shape. Same schema as the Polotno editor.

Formula: PDF → Polotno JSON (browser-only via pdf.js)

Drop a PDF file here to convert it to JSON

or

Runs entirely in your browser. Your file never leaves your device.

Drop a PDF, get a structured JSON object in the Polotno design schema— typed elements with explicit position, font, color, and embedded image data, ready to feed to a canvas editor or to an LLM with spatial context. Not a plain-text extractor and not an invoice parser; if that's what you need, this tool is probably not the best fit.

The output shape

The JSON has a top-level width / height / dpi / unit describing the document, an array of fonts used, and a pages array. Each page contains a children array of typed elements:

  • type: "text" — text content, fontFamily, fontSize, fontWeight, fill, x/y/width/height, rotation, alignment.
  • type: "image" — base64 src, crop region, position, opacity.
  • type: "svg" — vector shapes re-emitted as inline SVG, with position and size.
  • type: "line" — stroke segments with color, dash, position.

Use cases

  • LLM ingestion — give the model not just text but also visual context (positions, fonts) for layout-aware tasks: contract review, form understanding, quote extraction.
  • Automated edits — redact a phrase, swap a template variable, change a color, then re-export to PDF.
  • Editor handoff— load the JSON into a Polotno editor in your app and let the user customize the design. That's the live demo on this page.
  • Schema-driven storage — store designs as JSON rows in your database instead of opaque PDF blobs.

The same conversion in code

pdf-to-json.ts
import { pdfToJson } from "@polotno/pdf-import";

const buffer = await file.arrayBuffer();
const json = await pdfToJson({ pdf: buffer });

// json is a Polotno design — load into a store, edit, re-export
import { createStore } from "polotno/model/store";
const store = createStore({ key: "YOUR_KEY" });
store.loadJSON(json);

Full API reference: PDF Import docs.

Frequently asked questions

Q: What does PDF to JSON give me?

A structured object in the Polotno design schema. Top-level width / height / dpi, a fonts array, and a pages array where each page has a children array of typed elements — text (with font, size, weight, color, position, characters), images (with src and crop region), shapes (paths and lines with stroke/fill), and SVG fragments. This is exactly what store.loadJSON() in the Polotno SDK accepts.

Q: Is this a generic PDF-to-JSON parser?

No. The output uses the Polotno design schema, which is purpose-built for canvas editors. If you need plain-text extraction, form-field data, table extraction, or invoice OCR, this is probably not the best fit. If you want layout-aware structured data you can render in a canvas editor or feed to an LLM as visually grounded context, this is the intended use.

Q: Why JSON instead of plain text extraction?

Plain-text extraction throws away everything except the words. JSON preserves layout: where each text run sits, what font, what color, what page. That's what you need to (a) feed an LLM that benefits from spatial context, (b) automate edits in a pipeline, (c) re-render the design in a different format, or (d) load it into a canvas editor for further work.

Q: Is this the same as the @polotno/pdf-import package?

Yes. This page is a UI on top of the same pdfToJson() function from @polotno/pdf-import that our SDK customers use server-side and in their own browser apps. If you like the output here, npm install @polotno/pdf-import and you can run the same conversion programmatically.

Q: Can I use the JSON to re-render or modify the PDF?

Yes. Pass the JSON to a Polotno store via store.loadJSON(json), then call store.saveAsPDF() to round-trip, store.saveAsImage() for raster, store.saveAsSVG() for vector — or mutate any element first. The Edit button after conversion runs that exact flow with a live editor UI.

Q: Does this run in my browser?

Yes — fully client-side. The PDF is parsed via Mozilla's pdf.js (compiled to JavaScript) inside @polotno/pdf-import. Your file never reaches our servers; safe for confidential documents.

Q: What about scanned PDFs?

Image-only scans will return JSON containing image elements but no extractable text — there's no text data in the file to pull out. For scans, run an OCR pass first (e.g. Tesseract.js in the browser, or a server-side OCR service), then convert the resulting text-augmented PDF here.

Want this in your app? Embed Polotno SDK.

TRUSTED BY

100,000+

CREATORS

300+

BUSINESSES

ExpediaUnbounceLovePopPostGridPredis.ai