#Extraction Fallback
extractPdf(...) is the high-level helper intended for OpenClaw-style model input.
import { extractPdf } from "clawpdf";
const result = await extractPdf("report.pdf", {
mode: "auto",
maxPages: 20,
minTextChars: 200,
image: {
dpi: 96,
maxPixels: 4_000_000,
maxDimension: 10_000,
forms: true,
},
});
Flow for mode: "auto":
- Extract text from selected pages.
- Return text only when text length reaches
minTextChars. - Otherwise render selected pages as compressed PNG images.
- Stop rendering when image budget is exhausted.
#Modes
auto: always extract text; render images only when text is short.text: extract text only.images: render images only.both: extract text and render images.
#Options
pages: one-based pages to inspect.maxPages: finite positive maximum pages to inspect; the default20is ignored whenpagesis provided, but an explicitmaxPagesstill caps that list.minTextChars: text threshold before image fallback, default200.maxTextChars: text output cap, default200_000.password: optional PDF user password.engine: optionalPdfEnginefor caller-owned reuse.image.dpiorimage.scale: fallback render size, defaultdpi: 96.image.maxPixels: finite positive total rendered image pixel budget, default4_000_000.image.maxDimension: finite positive maximum rendered PNG width or height, default10_000.image.forms: render form widgets in fallback images, defaulttrue.
#Result
type ExtractResult = {
text: string;
images: Array<{
page: number;
width: number;
height: number;
bytes: Uint8Array;
mimeType: "image/png";
}>;
pagesProcessed: number[];
truncated: {
text: boolean;
images: boolean;
};
};
Image bytes are raw PNG data. Use toMessageContent(result) or toDataUrls(result) when a transport needs base64.