Upload and Process
Overview
The Upload and Process Document endpoint uploads a document to the KnowledgeVerse AI platform and processes it through multiple preprocessing steps including format validation, OCR (Optical Character Recognition), intelligent chunking, and vector embedding. This makes the document searchable and ready for knowledge extraction.
Use this endpoint to:
- Upload documents from local storage or AWS S3 bucket.
- Process documents with automatic format detection and OCR.
- Generate vector embeddings for semantic search capabilities.
- Enable AI-powered knowledge extraction from documents.
- Build document ingestion pipelines for your applications.
Endpoint Details
Method:
POST
Endpoint:/api/doc/process_doc
Base URL:https://api.k-v.ai
Authentication: Access Key (Required)
Processing Pipeline
- Format validation and detection
- OCR for scanned/image-based documents
- Intelligent chunking for optimal retrieval
- Vector embedding
Request Specification
Method 1: Upload from Local File
Content-Type: multipart/form-data
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| access-key | string | Yes | Your unique access-key generated from the platform UI |
Form Data
| Field | Type | Required | Description |
|---|---|---|---|
| file | file | Yes | Document file to upload (PDF, DOCX, PPTX) |
Supported File Formats
- PDF (
.pdf) - Microsoft Word (
.docx) - Microsoft PowerPoint (
.pptx)
Document Limits
- Maximum page size: 100 pages per document
Method 2: Upload from AWS S3 Bucket
Content-Type: application/json
Headers
| Header | Type | Required | Description |
|---|---|---|---|
| access-key | string | Yes | Your unique access-key generated from the platform UI |
| Content-Type | string | Yes | Must be application/json |
Request Body
{
"s3_uri": "s3://bucket-name/path/to/document.pdf",
"aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
"aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
"aws_region": "us-east-1"
}Body Fields
| Field | Type | Required | Description |
|---|---|---|---|
| s3_uri | string | Yes | Full S3 URI in format: s3://bucket-name/path/to/file |
| aws_access_key_id | string | Yes | AWS IAM access key with S3 read permissions |
| aws_secret_access_key | string | Yes | AWS IAM secret access key |
| aws_region | string | Yes | AWS region where the S3 bucket is located (e.g., us-east-1, ap-south-1) |
AWS IAM Permissions Required
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}Response Specification
Success Response (200 OK)
{
"data": {
"doc_process_id": "c71a96e072b581f1108dbc5f7a93cd54",
"transactions_utilised": 117
},
"message": "Document Processed Successfully"
}Response Fields
| Field | Type | Description |
|---|---|---|
| data | object | Processing result object |
| data.doc_process_id | string | Unique identifier for the processed document (32-character hex string) |
| data.transactions_utilised | integer | Number of transactions consumed during processing |
| message | string | Human-readable response message |
Transaction Cost Factors
Transaction usage varies based on:
- Document size: Larger documents consume more transactions
- Page count: More pages require more processing
- OCR requirement: Scanned/image-based documents cost more
- File type complexity: Complex layouts (tables, charts) increase cost
Important Notes
- doc_process_id: Store this identifier to perform searches, queries, and deletions on this document. You can also retrieve this information using the "List Documents" API.
- Asynchronous Processing: Document processing happens asynchronously. Use the "List Documents" API to check processing status.
- Security: AWS credentials are used only for the upload operation and are not stored.
Error Responses
401 Unauthorized
{
"data": {},
"message": "Invalid or missing access key"
}Cause: Missing or invalid access-key header.
400 Bad Request
{
"data": {},
"message": "Unsupported file"
}Cause: Unsupported file format.
403 Forbidden
{
"data": {},
"message": "AWS credentials invalid or insufficient permissions"
}Cause: Invalid AWS credentials or insufficient S3 bucket permissions.
413 Payload Too Large
{
"data": {},
"message": "File size exceeds maximum limit of 100 pages"
}Cause: Document file exceeds 100 page limit.
500 Internal Server Error
{
"data": {},
"message": "Something went wrong"
}Cause: Server-side processing error or database connectivity issue.
Code Snippets
Method 1: Upload from Local File
curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--form 'file=@"/path/to/your/document.pdf"'import requests
url = "https://api.k-v.ai/api/doc/process_doc"
payload = {}
files = [
('file', ('document.pdf', open('/path/to/your/document.pdf', 'rb'), 'application/octet-stream'))
]
headers = {
'access-key': 'YOUR_ACCESS_KEY'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
let data = new FormData();
data.append('file', fs.createReadStream('/path/to/your/document.pdf'));
let config = {
method: 'post',
maxBodyLength: Infinity,
url: 'https://api.k-v.ai/api/doc/process_doc',
headers: {
'access-key': 'YOUR_ACCESS_KEY',
...data.getHeaders()
},
data: data
};
axios.request(config)
.then((response) => {
console.log(JSON.stringify(response.data));
})
.catch((error) => {
console.log(error);
});Method 2: Upload from AWS S3
curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--header 'Content-Type: application/json' \
--data '{
"s3_uri": "s3://your-bucket/path/to/document.pdf",
"aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
"aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
"aws_region": "YOUR_AWS_REGION"
}'import requests
import json
url = "https://api.k-v.ai/api/doc/process_doc"
payload = json.dumps({
"s3_uri": "s3://your-bucket/path/to/document.pdf",
"aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
"aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
"aws_region": "YOUR_AWS_REGION"
})
headers = {
'access-key': 'YOUR_ACCESS_KEY',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)const axios = require('axios');
let data = JSON.stringify({
"s3_uri": "s3://your-bucket/path/to/document.pdf",
"aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
"aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
"aws_region": "YOUR_AWS_REGION"
});
let config = {
method: 'post',
maxBodyLength: Infinity,
url: 'https://api.k-v.ai/api/doc/process_doc',
headers: {
'access-key': 'YOUR_ACCESS_KEY',
'Content-Type': 'application/json'
},
data: data
};
axios.request(config)
.then((response) => {
console.log(JSON.stringify(response.data));
})
.catch((error) => {
console.log(error);
});Data Retention
- Uploaded documents are retained for 100 days from upload date.
- Processed documents remain searchable until deleted or expired.
- Failed uploads are not counted toward storage quota.
Next Steps
After uploading your document:
- Check Processing Status: Use the "List Documents" API to monitor status.
- Use Knowledge Agents: Query your documents using AI-powered agents to extract insights, answer questions, and perform intelligent search with your documents.
- Delete Documents: Use the Delete Document API when no longer needed.
Need Help? Contact support at support@k-v.ai