Quickstart: PageIndex API
Get started with PageIndex by first getting your 🔑 API key.
The PageIndex API consists of two main components:
- PageIndex Tree Generation: Upload a document to generate a PageIndex tree index.
- PageIndex Retrieval: Ask a query to retrieve relevant content from a document.
This document provides a brief introduction to using the API and includes example response formats.
🌲 PageIndex Tree Generation
Use this API to extract and return a PageIndex structure from a document.
Endpoints
- Submit:
https://api.vectify.ai/pageindex/submit
(POST) - Status:
https://api.vectify.ai/pageindex/status
(POST)
Example (Python)
import requests
# Submit document for PageIndex tree generation
with open('./2023-annual-report.pdf', 'rb') as file:
submit_response = requests.post(
"https://api.vectify.ai/pageindex/submit",
headers={"api_key": "YOUR_API_KEY_HERE"},
files={"file": file}
)
doc_id = submit_response.json()["doc_id"]
# Check processing status and retrieve result
status_response = requests.post(
"https://api.vectify.ai/pageindex/status",
headers={"api_key": "YOUR_API_KEY_HERE"},
json={"doc_id": doc_id}
)
status_data = status_response.json()
if status_data["status"] == "completed":
print("PageIndex Tree Structure:", status_data["result"])
🔎 PageIndex Retrieval
Use this API to retrieve relevant content from a document. This requires a completed PageIndex computation (doc_id
).
Endpoints
- Submit:
https://api.vectify.ai/pageindex/retrieval/submit
(POST) - Status:
https://api.vectify.ai/pageindex/retrieval/status
(POST)
Example (Python)
import requests
# Submit retrieval query
submit_response = requests.post(
"https://api.vectify.ai/pageindex/retrieval/submit",
headers={'api_key': 'YOUR_API_KEY_HERE'},
json={
"doc_id": "YOUR_PAGEINDEX_DOC_ID",
"query": "What are the main risk factors mentioned in the filing?"
}
)
retrieval_id = submit_response.json()["retrieval_id"]
# Check status and retrieve result
status_response = requests.post(
"https://api.vectify.ai/pageindex/retrieval/status",
headers={'api_key': 'YOUR_API_KEY_HERE'},
json={"retrieval_id": retrieval_id}
)
status_data = status_response.json()
if status_data["status"] == "completed":
print("Retrieved Content:", status_data["retrieved_nodes"])
👉 See the full API Reference for optional parameters, error codes, and integration guides.
📝 Notes
- Accepts PDF files only (more formats coming soon)
- Retrieval depends on a completed PageIndex computation and requires a valid
doc_id
- Continuous updates: better parsing, broader format support, and database export coming soon