KV API Documentation

Overview

The Data Extractor endpoint is a knowledge agent that extracts structured data from documents with high accuracy using advanced AI models. Specify the entities you want to extract (organizations, people, dates, amounts, etc.) and the AI automatically identifies and extracts them from your documents.

Use this endpoint to:

Extract structured data from documents
Identify and extract specific entities (organizations, people, dates, financial data, etc.)
Automate data entry from contracts, invoices, and legal documents
Build document processing pipelines with custom entity extraction
Parse multiple documents simultaneously for batch processing

Endpoint Details

Method: POST
Endpoint: /api/agent/data_extractor
Base URL: https://api.k-v.ai
Authentication: Access Key (Required)

Request Specification

Headers

Header	Type	Required	Description
access-key	string	Yes	Your unique access-key generated from the platform UI
Content-Type	string	Yes	Must be application/json

Request Body

{
    "doc_process_ids": [
        "264dfa262b748d15ccbeaada89430c68"
    ],
    "entity_list": [
        "Organisations",
        "People"
    ],
    "model_data": {
        "model_name": "gpt-5.1",
        "api_key": ""
    }
}

Body Fields

Field	Type	Required	Description
doc_process_ids	array	Yes	Array of document IDs to extract entities from (obtained from Upload or List Documents APIs)
entity_list	array	No	List of entity types to extract (e.g., "People", "Organisations", "Dates", "Amounts"). Leave empty to skip extraction
model_data	object	Yes	AI model configuration
model_data.model_name	string	Yes	AI model to use for extraction (see supported models below)
model_data.api_key	string	No	Your own LLM API key (leave empty to use platform's default keys)

Supported AI Models

Model Name	Provider
gpt-5.1	OpenAI
gpt-5-mini	OpenAI
claude-sonnet-4-5-20250929	Anthropic
gemini/gemini-2.5-flash-lite	Google
gemini/gemini-2.5-pro	Google
gemini/gemini-3-pro-preview	Google
mistral/mistral-small-latest	Mistral AI
mistral/mistral-medium-latest	Mistral AI
llama3.1-70b	Meta

Using Your Own LLM API Keys

Platform Keys (Default)

{
    "model_data": {
        "model_name": "gpt-5.1",
        "api_key": ""
    }
}

Your Own Keys

{
    "model_data": {
        "model_name": "gpt-5.1",
        "api_key": "sk-your-openai-api-key-here"
    }
}

Response Specification

Success Response (200 OK)

{
    "data": {
        "entities": [
            {
                "file_name": "Settlement Agreement (1).pdf",
                "doc_hash": "264dfa262b748d15ccbeaada89430c68",
                "entity_table": [
                    {
                        "entity_type": "Organisations",
                        "values": [
                            "Widget Corporation (Defendant)",
                            "Dewey, Cheatum & Howe LLP",
                            "ABC Software Corporation (Plaintiff)",
                            "Propel Software Corporation (Plaintiff)"
                        ]
                    },
                    {
                        "entity_type": "People",
                        "values": [
                            "Joe Average (CEO of ABC Software Corporation, Plaintiff)",
                            "James Smith, Esq. (Attorney for Plaintiff)"
                        ]
                    }
                ],
                "tokens": {
                    "input": 1208,
                    "output": 78,
                    "total": 1286
                }
            }
        ],
        "tokens": {
            "input": 1208,
            "output": 78,
            "total": 1286
        }
    },
    "message": "Entities extracted successfully"
}

Success Response - Empty Entity List

When entity_list is empty or not provided:

{
    "data": {
        "entities": [
            {
                "file_name": "Settlement Agreement (1).pdf",
                "doc_hash": "264dfa262b748d15ccbeaada89430c68",
                "entity_table": [],
                "tokens": {
                    "input": 0,
                    "output": 0,
                    "total": 0
                }
            }
        ],
        "tokens": {
            "input": 0,
            "output": 0,
            "total": 0
        }
    },
    "message": "Entites extracted successfully"
}

Note: No entities are extracted when entity_list is empty, and no tokens are consumed.

Response Fields

Field	Type	Description
data	object	Response data object
data.entities	array	Array of extracted entity results per document
data.entities[].file_name	string	Original filename of the document
data.entities[].doc_hash	string	Document process ID
data.entities[].entity_table	array	Array of entity types and their extracted values
data.entities[].entity_table[].entity_type	string	Type of entity extracted (matches entity_list values)
data.entities[].entity_table[].values	array	List of extracted values for this entity type
data.entities[].tokens	object	Token usage for this specific document
data.entities[].tokens.input	integer	Input tokens consumed
data.entities[].tokens.output	integer	Output tokens generated
data.entities[].tokens.total	integer	Total tokens used for this document
data.tokens	object	Total token usage across all documents
data.tokens.input	integer	Total input tokens consumed
data.tokens.output	integer	Total output tokens generated
data.tokens.total	integer	Total tokens used for all documents
message	string	Human-readable response message

Understanding Token Usage

Per-Document Tokens

Located inside each entities[] object, shows consumption for individual document processing.

Aggregate Tokens

Located at data.tokens level, represents the sum of all document token usage. Use this for cost calculation and monitoring.

Code Snippets

curl --location 'https://api.k-v.ai/api/agent/data_extractor' \
--header 'access-key: YOUR_ACCESS_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "doc_process_ids": [
        "264dfa262b748d15ccbeaada89430c68"
    ],
    "entity_list": [
        "Organisations",
        "People"
    ],
    "model_data": {
        "model_name": "gpt-5.1",
        "api_key": ""
    }
}'

import requests
import json

url = "https://api.k-v.ai/api/agent/data_extractor"

payload = json.dumps({
  "doc_process_ids": [
    "264dfa262b748d15ccbeaada89430c68"
  ],
  "entity_list": [
    "Organisations",
    "People"
  ],
  "model_data": {
    "model_name": "gpt-5.1",
    "api_key": ""
  }
})
headers = {
  'access-key': 'YOUR_ACCESS_KEY',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

const axios = require('axios');
let data = JSON.stringify({
  "doc_process_ids": [
    "264dfa262b748d15ccbeaada89430c68"
  ],
  "entity_list": [
    "Organisations",
    "People"
  ],
  "model_data": {
    "model_name": "gpt-5.1",
    "api_key": ""
  }
});

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://api.k-v.ai/api/agent/data_extractor',
  headers: { 
    'access-key': 'YOUR_ACCESS_KEY', 
    'Content-Type': 'application/json'
  },
  data : data
};

axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

from kv_platform_sdk import AsyncClient
from kv_platform_sdk.models.extractor import DataExtractorRequest
from kv_platform_sdk.models.model_data import ModelData
import asyncio

ACCESS_KEY = "YOUR_ACCESS_KEY"

async def main():
    async with AsyncClient(access_key=ACCESS_KEY, timeout=10 * 60) as client:
        request = DataExtractorRequest(
            doc_process_ids=["b16be8415ed216509bbcd4c04dd05b9e"],
            entity_list=["Organisations", "People"],
            model_data=ModelData(model_name="gpt-5.1"),
        )

        result = await client.data_extractor.extract(request)

        print(result)


asyncio.run(main())

from kv_platform_sdk import Client
from kv_platform_sdk.models.extractor import DataExtractorRequest
from kv_platform_sdk.models.model_data import ModelData

ACCESS_KEY = "YOUR_ACCESS_KEY"

client = Client(access_key=ACCESS_KEY, timeout=10 * 60)

request = DataExtractorRequest(
    doc_process_ids=["b16be8415ed216509bbcd4c04dd05b9e"],
    entity_list=["Organisations", "People"],
    model_data=ModelData(model_name="gpt-5.1"),
)

result = client.data_extractor.extract(request)

print(result)

client.close()

Error Responses

401 Unauthorized

{
    "data": {},
    "message": "Invalid or missing access key"
}

Cause: Missing or invalid access-key header.

422 Unprocessable Entity - Invalid Model Name

{
    "detail": [
        {
            "type": "literal_error",
            "loc": [
                "body",
                "model_data",
                "model_name"
            ],
            "msg": "Input should be 'gpt-5-chat-latest', 'gpt-5.1', 'gpt-5-mini', 'claude-sonnet-4-5-20250929', 'gemini/gemini-2.5-flash-lite', 'mistral/mistral-small-latest', 'mistral/mistral-medium-latest', 'gemini/gemini-2.5-pro', 'gemini/gemini-3-pro-preview' or 'llama3.1-70b'",
            "input": "",
            "ctx": {
                "expected": "'gpt-5-chat-latest', 'gpt-5.1', 'gpt-5-mini', 'claude-sonnet-4-5-20250929', 'gemini/gemini-2.5-flash-lite', 'mistral/mistral-small-latest', 'mistral/mistral-medium-latest', 'gemini/gemini-2.5-pro', 'gemini/gemini-3-pro-preview' or 'llama3.1-70b'"
            }
        }
    ]
}

Cause: Invalid or unsupported model_name in model_data. See supported models list above.

400 Bad Request - Invalid Document IDs

{
    "data": {},
    "message": "Invalid docs selected"
}

Causes:

Missing or invalid doc_process_ids.

500 Internal Server Error - Invalid LLM API Key

{
    "data": {},
    "message": "litellm.AuthenticationError: AuthenticationError: OpenAIException - Incorrect API key provided: tyrdfuih**uhf7. You can find your API key at https://platform.openai.com/account/api-keys."
}

Cause: The api_key provided in model_data is invalid or expired. Verify your LLM provider API key.

500 Internal Server Error - General Error

{
    "data": {},
    "message": "Something went wrong"
}

Causes:

LLM service temporarily unavailable
Server-side processing error

Important Notes

Document Processing: Only extract from documents with status: "processed"
Entity Specificity: More specific entity names yield better extraction accuracy
Empty Entity List: Returns 200 OK with empty entity_table and zero token usage
Token Costs: Monitor tokens.total for cost management

Next Steps

After extracting entities:

Validate Results: Review extracted values for accuracy
Automate Workflows: Integrate into document processing pipelines

Need Help? Contact support at support@k-v.ai

Data Extractor

On this page