Upload and Process

Overview

The Upload and Process Document endpoint uploads a document to the KnowledgeVerse AI platform and processes it through multiple preprocessing steps including format validation, OCR (Optical Character Recognition), intelligent chunking, and vector embedding. This makes the document searchable and ready for knowledge extraction.

Use this endpoint to:

Upload documents from local storage or AWS S3 bucket.
Process documents with automatic format detection and OCR.
Generate vector embeddings for semantic search capabilities.
Enable AI-powered knowledge extraction from documents.
Build document ingestion pipelines for your applications.

Endpoint Details

Method: POST
Endpoint: /api/doc/process_doc
Base URL: https://api.k-v.ai
Authentication: Access Key (Required)

Processing Pipeline

Format validation and detection
OCR for scanned/image-based documents
Intelligent chunking for optimal retrieval
Vector embedding

Request Specification

Method 1: Upload from Local File

Content-Type: multipart/form-data

Headers

Header	Type	Required	Description
access-key	string	Yes	Your unique access-key generated from the platform UI

Form Data

Field	Type	Required	Description
file	file	Yes	Document file to upload (PDF, DOCX, PPTX)

Supported File Formats

PDF (.pdf)
Microsoft Word (.docx)
Microsoft PowerPoint (.pptx)

Document Limits

Maximum page size: 100 pages per document

Method 2: Upload from AWS S3 Bucket

Content-Type: application/json

Headers

Header	Type	Required	Description
access-key	string	Yes	Your unique access-key generated from the platform UI
Content-Type	string	Yes	Must be `application/json`

Request Body

{
  "s3_uri": "s3://bucket-name/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "us-east-1"
}

Body Fields

Field	Type	Required	Description
s3_uri	string	Yes	Full S3 URI in format: `s3://bucket-name/path/to/file`
aws_access_key_id	string	Yes	AWS IAM access key with S3 read permissions
aws_secret_access_key	string	Yes	AWS IAM secret access key
aws_region	string	Yes	AWS region where the S3 bucket is located (e.g., `us-east-1`, `ap-south-1`)

AWS IAM Permissions Required

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

Response Specification

Success Response (200 OK)

{
  "data": {
    "doc_process_id": "c71a96e072b581f1108dbc5f7a93cd54",
    "transactions_utilised": 117
  },
  "message": "Document Processed Successfully"
}

Response Fields

Field	Type	Description
data	object	Processing result object
data.doc_process_id	string	Unique identifier for the processed document (32-character hex string)
data.transactions_utilised	integer	Number of transactions consumed during processing
message	string	Human-readable response message

Transaction Cost Factors

Transaction usage varies based on:

Document size: Larger documents consume more transactions
Page count: More pages require more processing
OCR requirement: Scanned/image-based documents cost more
File type complexity: Complex layouts (tables, charts) increase cost

Important Notes

doc_process_id: Store this identifier to perform searches, queries, and deletions on this document. You can also retrieve this information using the "List Documents" API.
Asynchronous Processing: Document processing happens asynchronously. Use the "List Documents" API to check processing status.
Security: AWS credentials are used only for the upload operation and are not stored.

Error Responses

401 Unauthorized

{
  "data": {},
  "message": "Invalid or missing access key"
}

Cause: Missing or invalid access-key header.

400 Bad Request

{
  "data": {},
  "message": "Unsupported file"
}

Cause: Unsupported file format.

403 Forbidden

{
  "data": {},
  "message": "AWS credentials invalid or insufficient permissions"
}

Cause: Invalid AWS credentials or insufficient S3 bucket permissions.

413 Payload Too Large

{
  "data": {},
  "message": "File size exceeds maximum limit of 100 pages"
}

Cause: Document file exceeds 100 page limit.

500 Internal Server Error

{
  "data": {},
  "message": "Something went wrong"
}

Cause: Server-side processing error or database connectivity issue.

Code Snippets

Method 1: Upload from Local File

curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--form 'file=@"/path/to/your/document.pdf"'

import requests

url = "https://api.k-v.ai/api/doc/process_doc"

payload = {}
files = [
  ('file', ('document.pdf', open('/path/to/your/document.pdf', 'rb'), 'application/octet-stream'))
]
headers = {
  'access-key': 'YOUR_ACCESS_KEY'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

let data = new FormData();
data.append('file', fs.createReadStream('/path/to/your/document.pdf'));

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://api.k-v.ai/api/doc/process_doc',
  headers: {
    'access-key': 'YOUR_ACCESS_KEY',
    ...data.getHeaders()
  },
  data: data
};

axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Method 2: Upload from AWS S3

curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
}'

import requests
import json

url = "https://api.k-v.ai/api/doc/process_doc"

payload = json.dumps({
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
})
headers = {
  'access-key': 'YOUR_ACCESS_KEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

const axios = require('axios');

let data = JSON.stringify({
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
});

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://api.k-v.ai/api/doc/process_doc',
  headers: {
    'access-key': 'YOUR_ACCESS_KEY',
    'Content-Type': 'application/json'
  },
  data: data
};

axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Data Retention

Uploaded documents are retained for 100 days from upload date.
Processed documents remain searchable until deleted or expired.
Failed uploads are not counted toward storage quota.

Next Steps

After uploading your document:

Check Processing Status: Use the "List Documents" API to monitor status.
Use Knowledge Agents: Query your documents using AI-powered agents to extract insights, answer questions, and perform intelligent search with your documents.
Delete Documents: Use the Delete Document API when no longer needed.

Need Help? Contact support at support@k-v.ai

Upload and Process

On this page