Quickstart

Quickstart: PageIndex API

Get started with PageIndex by first getting your 🔑 API key.

The PageIndex API consists of two main components:

  • PageIndex Tree Generation: Upload a document to generate a PageIndex tree index.
  • PageIndex Retrieval: Ask a query to retrieve relevant content from a document.

This document provides a brief introduction to using the API and includes example response formats.


🌲 PageIndex Tree Generation

Use this API to extract and return a PageIndex structure from a document.

Endpoints

  • Submit: https://api.vectify.ai/pageindex/submit (POST)
  • Status: https://api.vectify.ai/pageindex/status (POST)

Example (Python)

import requests

# Submit document for PageIndex tree generation
with open('./2023-annual-report.pdf', 'rb') as file:
    submit_response = requests.post(
        "https://api.vectify.ai/pageindex/submit",
        headers={"api_key": "YOUR_API_KEY_HERE"},
        files={"file": file}
    )
doc_id = submit_response.json()["doc_id"]

# Check processing status and retrieve result
status_response = requests.post(
    "https://api.vectify.ai/pageindex/status",
    headers={"api_key": "YOUR_API_KEY_HERE"},
    json={"doc_id": doc_id}
)
status_data = status_response.json()

if status_data["status"] == "completed":
    print("PageIndex Tree Structure:", status_data["result"])

🔎 PageIndex Retrieval

Use this API to retrieve relevant content from a document. This requires a completed PageIndex computation (doc_id).

Endpoints

  • Submit: https://api.vectify.ai/pageindex/retrieval/submit (POST)
  • Status: https://api.vectify.ai/pageindex/retrieval/status (POST)

Example (Python)

import requests

# Submit retrieval query
submit_response = requests.post(
    "https://api.vectify.ai/pageindex/retrieval/submit",
    headers={'api_key': 'YOUR_API_KEY_HERE'},
    json={
        "doc_id": "YOUR_PAGEINDEX_DOC_ID",
        "query": "What are the main risk factors mentioned in the filing?"
    }
)
retrieval_id = submit_response.json()["retrieval_id"]

# Check status and retrieve result
status_response = requests.post(
    "https://api.vectify.ai/pageindex/retrieval/status",
    headers={'api_key': 'YOUR_API_KEY_HERE'},
    json={"retrieval_id": retrieval_id}
)
status_data = status_response.json()

if status_data["status"] == "completed":
    print("Retrieved Content:", status_data["retrieved_nodes"])

👉 See the full API Reference for optional parameters, error codes, and integration guides.

📝 Notes

  • Accepts PDF files only (more formats coming soon)
  • Retrieval depends on a completed PageIndex computation and requires a valid doc_id
  • Continuous updates: better parsing, broader format support, and database export coming soon

💬 Help & Community


Ready to Get Started?

Explore our comprehensive documentation and start building with PageIndex today.