Client-side PDF to image conversion

Today was about making PDF conversion actually useful. The image converter tool already handled format conversions - PNG to JPG, WebP to whatever. But PDFs were a gap. Users kept asking for it.

The challenge: convert multi-page PDFs to images, entirely in the browser, with no server uploads. Privacy is the whole point of these tools.

The Architecture

PDF rendering in browsers requires PDF.js - Mozilla's JavaScript PDF renderer. It's powerful but comes with complexity: a separate worker thread for parsing, canvas rendering for each page, and memory management that can crash tabs if you're not careful.

The key insight: single-page PDFs should return a direct image file. Multi-page PDFs need packaging - we went with ZIP. Nobody wants to download 47 separate images.

The Worker Problem

PDF.js needs a worker file. It handles the heavy parsing off the main thread. The naive approach is loading it from a CDN:

// Don't do this
pdfjs.GlobalWorkerOptions.workerSrc = "https://unpkg.com/pdfjs-dist@5.4.530/build/pdf.worker.min.mjs";

This breaks immediately if you have Content Security Policy headers. And you should have CSP headers. The alternative: bundle the worker locally.

We solved this with a postinstall script:

{
  "scripts": {
    "postinstall": "cp ./node_modules/pdfjs-dist/build/pdf.worker.min.mjs ./public/"
  }
}

Now the worker loads from the same origin:

pdfjs.GlobalWorkerOptions.workerSrc = "/pdf.worker.min.mjs";

Clean, no CSP changes needed, and it stays in sync with the package version.

Lazy Loading PDF.js

PDF.js is heavy - about 400KB parsed. Users converting PNG to JPG shouldn't pay that cost. The solution: dynamic imports that only load when needed.

type PdfJsLib = typeof import("pdfjs-dist");
let pdfjsLibPromise: Promise<PdfJsLib> | null = null;
 
async function getPdfJs(): Promise<PdfJsLib> {
  if (typeof window === "undefined") {
    throw new Error("PDF.js can only be used in the browser");
  }
  if (!pdfjsLibPromise) {
    pdfjsLibPromise = (async () => {
      const pdfjs = await import("pdfjs-dist");
      pdfjs.GlobalWorkerOptions.workerSrc = "/pdf.worker.min.mjs";
      return pdfjs;
    })();
  }
  return pdfjsLibPromise;
}

The typeof window === "undefined" check prevents SSR crashes. Next.js tries to render everything server-side first, and PDF.js depends on browser APIs that don't exist in Node.

The Conversion Function

The core function handles both single and multi-page PDFs. The branching happens after we know how many pages exist:

export async function convertPdfToImages(
  file: File,
  targetFormat: ImageFormat,
  options: PdfConversionOptions = {}
): Promise<ConversionResult> {
  const { dpi = 150, quality = 0.92, onProgress } = options;
 
  const pdfjs = await getPdfJs();
  const arrayBuffer = await file.arrayBuffer();
  const pdf = await pdfjs.getDocument({ data: arrayBuffer }).promise;
  const numPages = pdf.numPages;
 
  // Single page - return image directly
  if (numPages === 1) {
    onProgress?.(1, 1);
    const { blob, width, height } = await renderPage(1);
    return {
      blob,
      url: URL.createObjectURL(blob),
      fileName: `${baseName}.${extension}`,
      // ...
    };
  }
 
  // Multi-page - create ZIP
  const JSZip = (await import("jszip")).default;
  const zip = new JSZip();
 
  for (let i = 1; i <= numPages; i++) {
    onProgress?.(i, numPages);
    const { blob } = await renderPage(i);
    zip.file(`${baseName}-page${i}.${extension}`, blob);
  }
 
  const zipBlob = await zip.generateAsync({ type: "blob" });
  return {
    blob: zipBlob,
    url: URL.createObjectURL(zipBlob),
    fileName: `${baseName}-${numPages}pages.zip`,
    // ...
  };
}

JSZip is also lazy-loaded. No point bundling it for users who only convert JPGs.

Rendering Pages to Canvas

Each PDF page gets rendered to a canvas, then converted to a blob. The DPI setting controls output quality - 72 is screen resolution, 150 is good for most uses, 300 is print quality:

const renderPage = async (pageNum: number) => {
  const page = await pdf.getPage(pageNum);
  const scale = dpi / 72; // PDF default is 72 DPI
  const viewport = page.getViewport({ scale });
 
  const canvas = document.createElement("canvas");
  canvas.width = Math.floor(viewport.width);
  canvas.height = Math.floor(viewport.height);
 
  const ctx = canvas.getContext("2d", {
    alpha: targetFormat === "png" || targetFormat === "webp", 
  });
 
  // White background for formats without transparency
  ctx.fillStyle = "#FFFFFF";
  ctx.fillRect(0, 0, canvas.width, canvas.height);
 
  await page.render({ canvas, viewport }).promise;
 
  const blob = await new Promise<Blob>((resolve, reject) => {
    canvas.toBlob(
      (result) => result ? resolve(result) : reject(new Error("Conversion failed")),
      MIME_TYPES[targetFormat],
      quality
    );
  });
 
  return { blob, width: canvas.width, height: canvas.height };
};

The alpha option on canvas context matters. JPEG doesn't support transparency - enabling alpha wastes memory. PNG and WebP need it.

Canvas Size Limits

Browsers have maximum canvas dimensions. Chrome caps around 16,384 pixels per side. A 300 DPI render of an A4 page hits ~3500 pixels - safe. But a 10-page legal document at high DPI could exceed limits.

const MAX_CANVAS_DIMENSION = 16384;
const DEFAULT_MAX_DIMENSION = 8192;
 
function calculateDimensions(width: number, height: number, maxDimension: number) {
  const maxDim = Math.min(maxDimension, MAX_CANVAS_DIMENSION);
 
  if (width <= maxDim && height <= maxDim) {
    return { width, height, scaled: false };
  }
 
  const ratio = Math.min(maxDim / width, maxDim / height);
  return {
    width: Math.floor(width * ratio),
    height: Math.floor(height * ratio),
    scaled: true, 
  };
}

If dimensions exceed the limit, we scale down proportionally. The user gets a working image instead of a crashed tab.

Progress Feedback

Multi-page conversions can take time. The progress callback keeps users informed:

onProgress: (current, total) => {
  setConversion((prev) =>
    prev ? { ...prev, pdfProgress: { current, total } } : null
  );
}

The UI renders it in the button:

{conversion.status === "converting" ? (
  <>
    <RefreshCw className="h-5 w-5 animate-spin" />
    {conversion.pdfProgress 
      ? `Converting page ${conversion.pdfProgress.current} of ${conversion.pdfProgress.total}...`
      : "Converting..."} // [!code ++]
  </>
) : (
  // ...
)}

Docker Deployment

The postinstall script needed Docker adjustments. The deps stage runs pnpm install, but the public folder doesn't exist yet:

# Before: postinstall fails
RUN pnpm install --frozen-lockfile // [!code --]
 
# After: create directory first
RUN mkdir -p ./public // [!code ++]
RUN pnpm install --frozen-lockfile // [!code ++]

We also copy the worker explicitly in the builder stage - belt and suspenders:

# Copy PDF.js worker to public folder
RUN cp ./node_modules/pdfjs-dist/build/pdf.worker.min.mjs ./public/

What We Removed

The original implementation had a page selector - users picked which page to convert. This made sense initially but the feedback was clear: people want all pages.

The code got simpler too. No more pdfSelectedPage state, no page number validation, no UI for incrementing/decrementing. The state interface went from this:

interface ConversionState {
  // ...
  pdfPageCount?: number;
  pdfSelectedPage: number; 
  pdfDpi: PdfDpi;
  pdfProgress?: { current: number; total: number }; 
}

Memory Management

PDF.js creates document references that need explicit cleanup. Without it, converting several large PDFs in a session eventually crashes the tab:

try {
  // ... conversion logic
} finally {
  if (pdf) {
    pdf.destroy(); 
  }
}

For image conversions (non-PDF), the same principle applies to ImageBitmap objects:

finally {
  if (imageBitmap) {
    imageBitmap.close(); 
  }
}

The .gitignore Addition

The worker file is generated, not source. It shouldn't be committed:

# generated files
/public/pdf.worker.min.mjs

Fresh clones run pnpm install, which triggers postinstall, which copies the worker. The file appears where it needs to be.

Takeaways

Browser-based document processing is viable but requires care:

Lazy load heavy dependencies - Don't penalize users who don't need them
Handle SSR explicitly - Check for browser APIs before using them
Respect canvas limits - Scale down rather than crash
Clean up resources - Memory leaks compound quickly
Progress feedback matters - Users need to know something is happening

The 10-page PDF converts in about 3 seconds at 150 DPI. Fast enough to feel instant, slow enough that the progress indicator earns its keep.