Question 1

What does PDF to JSON give me?

Accepted Answer

A structured object in the Polotno design schema. Top-level width / height / dpi, a fonts array, and a pages array where each page has a children array of typed elements — text (with font, size, weight, color, position, characters), images (with src and crop region), shapes (paths and lines with stroke/fill), and SVG fragments. This is exactly what store.loadJSON() in the Polotno SDK accepts.

Question 2

Is this a generic PDF-to-JSON parser?

Accepted Answer

No. The output uses the Polotno design schema, which is purpose-built for canvas editors. If you need plain-text extraction, form-field data, table extraction, or invoice OCR, this is probably not the best fit. If you want layout-aware structured data you can render in a canvas editor or feed to an LLM as visually grounded context, this is the intended use.

Question 3

Why JSON instead of plain text extraction?

Accepted Answer

Plain-text extraction throws away everything except the words. JSON preserves layout: where each text run sits, what font, what color, what page. That's what you need to (a) feed an LLM that benefits from spatial context, (b) automate edits in a pipeline, (c) re-render the design in a different format, or (d) load it into a canvas editor for further work.

Question 4

Is this the same as the @polotno/pdf-import package?

Accepted Answer

Yes. This page is a UI on top of the same pdfToJson() function from @polotno/pdf-import that our SDK customers use server-side and in their own browser apps. If you like the output here, npm install @polotno/pdf-import and you can run the same conversion programmatically.

Question 5

Can I use the JSON to re-render or modify the PDF?

Accepted Answer

Yes. Pass the JSON to a Polotno store via store.loadJSON(json), then call store.saveAsPDF() to round-trip, store.saveAsImage() for raster, store.saveAsSVG() for vector — or mutate any element first. The Edit button after conversion runs that exact flow with a live editor UI.

Question 6

Does this run in my browser?

Accepted Answer

Yes — fully client-side. The PDF is parsed via Mozilla's pdf.js (compiled to JavaScript) inside @polotno/pdf-import. Your file never reaches our servers; safe for confidential documents.

Question 7

What about scanned PDFs?

Accepted Answer

Image-only scans will return JSON containing image elements but no extractable text — there's no text data in the file to pull out. For scans, run an OCR pass first (e.g. Tesseract.js in the browser, or a server-side OCR service), then convert the resulting text-augmented PDF here.

PDF to JSON converter

The output shape

Use cases

The same conversion in code

Frequently asked questions

Q: What does PDF to JSON give me?

Q: Is this a generic PDF-to-JSON parser?

Q: Why JSON instead of plain text extraction?

Q: Is this the same as the @polotno/pdf-import package?

Q: Can I use the JSON to re-render or modify the PDF?

Q: Does this run in my browser?

Q: What about scanned PDFs?

Related tools

Want this in your app? Embed Polotno SDK.