Extraction Endpoints
Extract text, embedded files, and metadata from PDF documents via the REST API.
Overview
The extraction endpoints allow you to pull data out of PDF documents without modifying them. You can extract all text content, retrieve embedded file attachments, read document metadata, and update metadata properties. Text and metadata endpoints return JSON responses; file extraction returns a ZIP archive.
X-Api-Key and X-Api-Secret headers on every request. See Authentication for details.
Extract Text
POST /api/pdf/extract-text
Extracts all text content from a PDF document and returns it as a JSON response. Text is extracted page by page and preserves reading order. For scanned documents, consider using OCR first.
| Name | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The PDF file to extract text from. |
pages | String | No | Pages to extract text from (e.g., 1-5, 1,3,5). Defaults to all pages. |
password | String | No | Password to open the PDF, if it is password-protected. |
Response format
{
"totalPages": 5,
"pages": [
{
"pageNumber": 1,
"text": "This is the extracted text from page 1..."
},
{
"pageNumber": 2,
"text": "This is the extracted text from page 2..."
}
]
}
Example request
curl -X POST https://pdf.mapsoft.com/api/pdf/extract-text \
-H "X-Api-Key: YOUR_API_KEY" \
-H "X-Api-Secret: YOUR_API_SECRET" \
-F "file=@document.pdf" \
-F "pages=1-3"
Extract Embedded Files
POST /api/pdf/extract-embedded-files
Extracts all file attachments embedded within a PDF document. Returns a ZIP archive containing the
extracted files. If the PDF contains no embedded files, a 404 response is returned.
| Name | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The PDF file to extract embedded files from. |
outputFileName | String | No | Custom name for the output ZIP file. |
password | String | No | Password to open the PDF, if it is password-protected. |
curl -X POST https://pdf.mapsoft.com/api/pdf/extract-embedded-files \
-H "X-Api-Key: YOUR_API_KEY" \
-H "X-Api-Secret: YOUR_API_SECRET" \
-F "file=@portfolio.pdf" \
-o attachments.zip
Read Metadata
POST /api/pdf/read-metadata
Reads and returns the metadata properties of a PDF document, including title, author, subject, keywords, creation date, modification date, and producer information.
| Name | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The PDF file to read metadata from. |
password | String | No | Password to open the PDF, if it is password-protected. |
Response format
{
"title": "Annual Report 2025",
"author": "John Smith",
"subject": "Company financial report",
"keywords": "finance, annual report, 2025",
"creator": "Microsoft Word",
"producer": "Mapsoft PDF Hub",
"creationDate": "2025-01-15T10:30:00Z",
"modificationDate": "2025-06-20T14:22:00Z",
"pageCount": 42,
"pdfVersion": "1.7",
"fileSize": 2457600,
"isEncrypted": false,
"isLinearized": true
}
Example request
curl -X POST https://pdf.mapsoft.com/api/pdf/read-metadata \
-H "X-Api-Key: YOUR_API_KEY" \
-H "X-Api-Secret: YOUR_API_SECRET" \
-F "file=@document.pdf"
Update Metadata
POST /api/pdf/update-metadata
Updates the metadata properties of a PDF document. Only the fields you include in the request will be modified; all other metadata is preserved. Returns the updated PDF file.
| Name | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The PDF file to update metadata on. |
title | String | No | New document title. |
author | String | No | New document author. |
subject | String | No | New document subject. |
keywords | String | No | New document keywords (comma-separated). |
creator | String | No | New creator application name. |
outputFileName | String | No | Custom name for the output PDF file. |
password | String | No | Password to open the PDF, if it is password-protected. |
curl -X POST https://pdf.mapsoft.com/api/pdf/update-metadata \
-H "X-Api-Key: YOUR_API_KEY" \
-H "X-Api-Secret: YOUR_API_SECRET" \
-F "file=@document.pdf" \
-F "title=Updated Document Title" \
-F "author=Jane Doe" \
-F "keywords=updated, metadata, example" \
-o updated-metadata.pdf