📚 API Documentation

Submit PDF for PageIndex Computation

Endpoint (POST): https://api.vectify.ai/pageindex
Description: Initiates the conversion of a PDF document into a structured hierarchical tree format. Immediately returns a task identifier (task_id).

Request Body:

file (binary, required): PDF file to be processed

Optional Parameters:

model: OpenAI model to use (default: "gpt-4o-2024-11-20")
toc_check_page_num: Number of initial pages to check for table of contents (default: 20)
max_page_num_each_node: Max pages allowed for each node (default: 10)
max_token_num_each_node: Max tokens allowed for each node (default: 20000)
if_add_node_id: Include a node ID for each node ("yes" / "no") (default: "yes")
if_add_node_text: Include node text for each node ("yes" / "no") (default: "no")
if_add_node_summary: Include a summary for each node ("yes" / "no") (default: "no")
if_add_doc_description: Include a description for the document ("yes" / "no") (default: "yes")

Example Request:

with open('./2023-annual-report.pdf', 'rb') as file:
    response = requests.post(
        "https://api.vectify.ai/pageindex",
        headers={'api_key': 'YOUR_API_KEY_HERE'},
        files={'file': file}
    )

Example Request with Optional Parameters:

with open('./2023-annual-report.pdf', 'rb') as file:
    response = requests.post(
        "https://api.vectify.ai/pageindex",
        headers={'api_key': 'YOUR_API_KEY_HERE'},
        files={'file': file},
        data={
            "toc_check_pages": 15,
            "max_page_num_each_node": 8,
        }
    )

See here for the example PDF document.

Example Response:

{
  "task_id": "abc123def456"
}

Check Status and Retrieve Results

Endpoint (POST): https://api.vectify.ai/pageindex/status
Description: Checks computation status and retrieves results once processing is complete.

Request Body:

task_id (string, required): Task ID from submit response

Computation Status:

The status returned from the endpoint indicates the progress of PDF processing tasks:

queued: Task is queued and waiting to begin processing
processing: Task is currently being processed
completed: Task processing is complete; results are ready
failed: Task processing encountered an error

Example Request:

response = requests.post(
    "https://api.vectify.ai/pageindex/status",
    headers={'api_key': 'YOUR_API_KEY_HERE'},
    json={"task_id": "abc123def456"}
)

Example Response (Processing):

{
  "task_id": "abc123def456",
  "status": "processing"
}

Example Response (Completed):

{
  "task_id": "abc123def456",
  "status": "completed",
  "result": [
    ...
    {
        "title": "Financial Stability",
        "node_id": "0006",
        "start_index": 21,
        "end_index": 22,
        "summary": "The Federal Reserve maintains financial stability by...",
        "child_nodes": [
            {
                "title": "Monitoring Financial Vulnerabilities",
                "node_id": "0007",
                "start_index": 22,
                "end_index": 28,
                "summary": "The Federal Reserve's monitoring focuses on..."
            },
            {
                "title": "Domestic and International Cooperation and Coordination",
                "node_id": "0008",
                "start_index": 28,
                "end_index": 31,
                "summary": "In 2023, the Federal Reserve collaborated internationally..."
            }
        ]
    }
    ...
  ]
}

See here for a complete example output structure generated by PageIndex from the above example PDF document.

⚠️ API Response Codes

200: Request successful
400: Bad request due to missing/invalid parameters (Resolution: Check request parameters)
401: Unauthorized; invalid or missing API key (Resolution: Ensure API key is correct)
404: Task or PDF file not found (Resolution: Verify task_id and PDF path)
413: File size too large (Resolution: Use smaller file or contact support)
500: Internal server error (Resolution: Retry later; if persistent, contact support)