Logo
Document Management

Upload and Process

Overview

The Upload and Process Document endpoint uploads a document to the KnowledgeVerse AI platform and processes it through multiple preprocessing steps including format validation, OCR (Optical Character Recognition), intelligent chunking, and vector embedding. This makes the document searchable and ready for knowledge extraction.

Use this endpoint to:

  • Upload documents from local storage or AWS S3 bucket.
  • Process documents with automatic format detection and OCR.
  • Generate vector embeddings for semantic search capabilities.
  • Enable AI-powered knowledge extraction from documents.
  • Build document ingestion pipelines for your applications.

Endpoint Details

Method: POST
Endpoint: /api/doc/process_doc
Base URL: https://api.k-v.ai
Authentication: Access Key (Required)

Processing Pipeline

  • Format validation and detection
  • OCR for scanned/image-based documents
  • Intelligent chunking for optimal retrieval
  • Vector embedding

Request Specification

Method 1: Upload from Local File

Content-Type: multipart/form-data

Headers

HeaderTypeRequiredDescription
access-keystringYesYour unique access-key generated from the platform UI

Form Data

FieldTypeRequiredDescription
filefileYesDocument file to upload (PDF, DOCX, PPTX)

Supported File Formats

  • PDF (.pdf)
  • Microsoft Word (.docx)
  • Microsoft PowerPoint (.pptx)

Document Limits

  • Maximum page size: 100 pages per document

Method 2: Upload from AWS S3 Bucket

Content-Type: application/json

Headers

HeaderTypeRequiredDescription
access-keystringYesYour unique access-key generated from the platform UI
Content-TypestringYesMust be application/json

Request Body

{
  "s3_uri": "s3://bucket-name/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "us-east-1"
}

Body Fields

FieldTypeRequiredDescription
s3_uristringYesFull S3 URI in format: s3://bucket-name/path/to/file
aws_access_key_idstringYesAWS IAM access key with S3 read permissions
aws_secret_access_keystringYesAWS IAM secret access key
aws_regionstringYesAWS region where the S3 bucket is located (e.g., us-east-1, ap-south-1)

AWS IAM Permissions Required

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

Response Specification

Success Response (200 OK)

{
  "data": {
    "doc_process_id": "c71a96e072b581f1108dbc5f7a93cd54",
    "transactions_utilised": 117
  },
  "message": "Document Processed Successfully"
}

Response Fields

FieldTypeDescription
dataobjectProcessing result object
data.doc_process_idstringUnique identifier for the processed document (32-character hex string)
data.transactions_utilisedintegerNumber of transactions consumed during processing
messagestringHuman-readable response message

Transaction Cost Factors

Transaction usage varies based on:

  • Document size: Larger documents consume more transactions
  • Page count: More pages require more processing
  • OCR requirement: Scanned/image-based documents cost more
  • File type complexity: Complex layouts (tables, charts) increase cost

Important Notes

  • doc_process_id: Store this identifier to perform searches, queries, and deletions on this document. You can also retrieve this information using the "List Documents" API.
  • Asynchronous Processing: Document processing happens asynchronously. Use the "List Documents" API to check processing status.
  • Security: AWS credentials are used only for the upload operation and are not stored.

Error Responses

401 Unauthorized

{
  "data": {},
  "message": "Invalid or missing access key"
}

Cause: Missing or invalid access-key header.

400 Bad Request

{
  "data": {},
  "message": "Unsupported file"
}

Cause: Unsupported file format.

403 Forbidden

{
  "data": {},
  "message": "AWS credentials invalid or insufficient permissions"
}

Cause: Invalid AWS credentials or insufficient S3 bucket permissions.

413 Payload Too Large

{
  "data": {},
  "message": "File size exceeds maximum limit of 100 pages"
}

Cause: Document file exceeds 100 page limit.

500 Internal Server Error

{
  "data": {},
  "message": "Something went wrong"
}

Cause: Server-side processing error or database connectivity issue.


Code Snippets

Method 1: Upload from Local File

curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--form 'file=@"/path/to/your/document.pdf"'
import requests

url = "https://api.k-v.ai/api/doc/process_doc"

payload = {}
files = [
  ('file', ('document.pdf', open('/path/to/your/document.pdf', 'rb'), 'application/octet-stream'))
]
headers = {
  'access-key': 'YOUR_ACCESS_KEY'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

let data = new FormData();
data.append('file', fs.createReadStream('/path/to/your/document.pdf'));

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://api.k-v.ai/api/doc/process_doc',
  headers: {
    'access-key': 'YOUR_ACCESS_KEY',
    ...data.getHeaders()
  },
  data: data
};

axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Method 2: Upload from AWS S3

curl --location 'https://api.k-v.ai/api/doc/process_doc' \
--header 'access-key: YOUR_ACCESS_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
}'
import requests
import json

url = "https://api.k-v.ai/api/doc/process_doc"

payload = json.dumps({
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
})
headers = {
  'access-key': 'YOUR_ACCESS_KEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)
const axios = require('axios');

let data = JSON.stringify({
  "s3_uri": "s3://your-bucket/path/to/document.pdf",
  "aws_access_key_id": "YOUR_AWS_ACCESS_KEY",
  "aws_secret_access_key": "YOUR_AWS_SECRET_KEY",
  "aws_region": "YOUR_AWS_REGION"
});

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://api.k-v.ai/api/doc/process_doc',
  headers: {
    'access-key': 'YOUR_ACCESS_KEY',
    'Content-Type': 'application/json'
  },
  data: data
};

axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Data Retention

  • Uploaded documents are retained for 100 days from upload date.
  • Processed documents remain searchable until deleted or expired.
  • Failed uploads are not counted toward storage quota.

Next Steps

After uploading your document:

  • Check Processing Status: Use the "List Documents" API to monitor status.
  • Use Knowledge Agents: Query your documents using AI-powered agents to extract insights, answer questions, and perform intelligent search with your documents.
  • Delete Documents: Use the Delete Document API when no longer needed.

Need Help? Contact support at support@k-v.ai

On this page