HomeDocsAPI Reference › Extraction Endpoints

Extraction Endpoints

Extract text, embedded files, and metadata from PDF documents via the REST API.

Overview

The extraction endpoints allow you to pull data out of PDF documents without modifying them. You can extract all text content, retrieve embedded file attachments, read document metadata, and update metadata properties. Text and metadata endpoints return JSON responses; file extraction returns a ZIP archive.

Info
All extraction endpoints require authentication. Include your X-Api-Key and X-Api-Secret headers on every request. See Authentication for details.

Extract Text

POST /api/pdf/extract-text

Extracts all text content from a PDF document and returns it as a JSON response. Text is extracted page by page and preserves reading order. For scanned documents, consider using OCR first.

NameTypeRequiredDescription
fileFileYesThe PDF file to extract text from.
pagesStringNoPages to extract text from (e.g., 1-5, 1,3,5). Defaults to all pages.
passwordStringNoPassword to open the PDF, if it is password-protected.

Response format

{
    "totalPages": 5,
    "pages": [
        {
            "pageNumber": 1,
            "text": "This is the extracted text from page 1..."
        },
        {
            "pageNumber": 2,
            "text": "This is the extracted text from page 2..."
        }
    ]
}

Example request

curl -X POST https://pdf.mapsoft.com/api/pdf/extract-text \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "X-Api-Secret: YOUR_API_SECRET" \
  -F "file=@document.pdf" \
  -F "pages=1-3"

Extract Embedded Files

POST /api/pdf/extract-embedded-files

Extracts all file attachments embedded within a PDF document. Returns a ZIP archive containing the extracted files. If the PDF contains no embedded files, a 404 response is returned.

NameTypeRequiredDescription
fileFileYesThe PDF file to extract embedded files from.
outputFileNameStringNoCustom name for the output ZIP file.
passwordStringNoPassword to open the PDF, if it is password-protected.
curl -X POST https://pdf.mapsoft.com/api/pdf/extract-embedded-files \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "X-Api-Secret: YOUR_API_SECRET" \
  -F "file=@portfolio.pdf" \
  -o attachments.zip

Read Metadata

POST /api/pdf/read-metadata

Reads and returns the metadata properties of a PDF document, including title, author, subject, keywords, creation date, modification date, and producer information.

NameTypeRequiredDescription
fileFileYesThe PDF file to read metadata from.
passwordStringNoPassword to open the PDF, if it is password-protected.

Response format

{
    "title": "Annual Report 2025",
    "author": "John Smith",
    "subject": "Company financial report",
    "keywords": "finance, annual report, 2025",
    "creator": "Microsoft Word",
    "producer": "Mapsoft PDF Hub",
    "creationDate": "2025-01-15T10:30:00Z",
    "modificationDate": "2025-06-20T14:22:00Z",
    "pageCount": 42,
    "pdfVersion": "1.7",
    "fileSize": 2457600,
    "isEncrypted": false,
    "isLinearized": true
}

Example request

curl -X POST https://pdf.mapsoft.com/api/pdf/read-metadata \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "X-Api-Secret: YOUR_API_SECRET" \
  -F "file=@document.pdf"

Update Metadata

POST /api/pdf/update-metadata

Updates the metadata properties of a PDF document. Only the fields you include in the request will be modified; all other metadata is preserved. Returns the updated PDF file.

NameTypeRequiredDescription
fileFileYesThe PDF file to update metadata on.
titleStringNoNew document title.
authorStringNoNew document author.
subjectStringNoNew document subject.
keywordsStringNoNew document keywords (comma-separated).
creatorStringNoNew creator application name.
outputFileNameStringNoCustom name for the output PDF file.
passwordStringNoPassword to open the PDF, if it is password-protected.
curl -X POST https://pdf.mapsoft.com/api/pdf/update-metadata \
  -H "X-Api-Key: YOUR_API_KEY" \
  -H "X-Api-Secret: YOUR_API_SECRET" \
  -F "file=@document.pdf" \
  -F "title=Updated Document Title" \
  -F "author=Jane Doe" \
  -F "keywords=updated, metadata, example" \
  -o updated-metadata.pdf