Classification API

The Classification API offers an asynchronous API for document classification using two REST interfaces for document upload and result polling

START

1. Overview

 

 

Base URL for all requests:
https://56kkv8d9n7.execute-api.eu-central-1.amazonaws.com/PROD/

POST /document

2. POST /document

Used to upload your document for classification. The response includes a job id which is used to poll for results using REST interface /class/<JOB_ID>

2.1 Request Header

content-type: HTTP content type 

supported values: “multipart/form-data“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

2.2 Request Parameter

The request body includes the following list of form parameters:

document : file containing your invoice or contract document; documents must be limited to 10 pages and a file size of 4 MB.

supported file types: pdf (single and multi page)

required: yes

useEmbeddedText : use embedded document text to skip OCR step and, hence, improve request performance; only applicable for pdf files

supported values: [ „true“ | „false“ ]

required: no

default: „false“

getHocr : return the document’s content in hOCR format (in addition to plain text)

supported values: [ „true“ | „false“ ]

required: no

default: „false“

2.3 Response HTTP Status

200: Document uploaded successfully.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

415: Unsupported file format.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

2.4 Response Header

content-type: HTTP content type

supported values: “application /json”

2.5 Response Body

jobId: Job id used for polling the resulting document class from REST interface /class

type: String

uploadFile: Description of the uploaded file

type: Map<String, Object>

object properties:

  • name: „size“

    type: Integer

  • name: „mime“

    type: String

  • name: „name“

    type: String

2.6 Example

Request

POST /document
 
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}
body {"document": <YOUR_DOCUMENT_FILE>}

 

Response

{
    "jobId": "229cdae0162805414755d5ee7eed216bc975738c",
    "uploadFile": {
        "size": 30393,
        "mime": "application/pdf",
        "name": "Demo.pdf"
    }
}

 

GET /class/<JOB_ID>

3. GET /class/<job_id>

Used to poll for the classification result for the uploaded document using the job id from REST interface /document response.

3.1 Request Header

content-type: HTTP content type 

supported values: “application/json“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

3.2 Path Variable

JOB_ID: the job id is received from the response of the call to REST interface /document

3.3 Response HTTP Status

200: Document classification successful.

202: Job processing not finished yet.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

422: File cannot be processed.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

3.4 Response Header

content-type: HTTP content type

supported values: “application /json”

3.5 Response Body

doc-class:domain of the input document

supported values: [ „INVOICE_DE“ | „INVOICE_EN“ | „CONTRACT_DE“ | „CONTRACT_EN“ ]

3.6 Example

Request

GET /class/229cdae0162805414755d5ee7eed216bc975738c
 
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}

 

Response

{
    "doc-class": "INVOICE_DE",
}

 

POST /training

4. POST /training

Used to upload training samples for training of the classification model.

4.1 Request Header

content-type: HTTP content type 

supported values: “multipart/form-data“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

4.2 Request Parameter

The request body includes the following list of form parameters:

documentClass: document class id of the training document

required: yes

language: language id of the training document

supported values: [ „en“ | „de“ ]

required: yes

document: training document file

supported file types: [ pdf | tiff | jpg ]

required: yes

text: plain text from training document

required: yes

4.3 Response HTTP Status

200: Successfully submitted train data.

204: Train data empty.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

4.4 Response Header

content-type: HTTP content type

supported values: “application /json”

4.5 Response Body

errorMsg: error description; null on success

4.6 Example

Request

POST /training
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}
body {
  "documentClass": "INVOICE",
  "language": "de",
  "text": "MEDIA MARKT E-BUSINESS GMBH “(\n\nWANKELSTRASSE 5 -\n\n85046 INGOLSTADT\n\nTel.: 0841/6344545\n\nE-Mail: ONLINESHOP@MEDIAMARKT.DE\n\nRechnungsadresse Rechnung Nr. 458001350\n\nDaniel Winter Rechnungsdatum 10.08.2017\n\nRudolf-Harbig-Weg 26\n\n48149 Münster Kunden-Nr. 3050789\n\nFällig Am 31.08.2017\n\nRechnung Betrag €88,05\n\nMenge Beschreibung Einzelpreis Gesamtpreis\n\n1 PIXMA MX475 A4 MFP INJEKT (P) 69,00 69,00\n\n1 Versandkosten 4,99 4,99\nSumme Netto 73,99\nMwSt. 190% 14,06\n\n",
  "document": <YOUR_DOCUMENT_FILE>
}

 

Supported Document Classes

Supported Document Classes

The Classification API currently supports the following document classes and languages:

Document Class IdDocument ClassDocument Language
INVOICE_ENinvoice documentEnglish
INVOICE_DEinvoice documentGerman
CONTRACT_ENcontract documentEnglish
CONTRACT_DEcontract documentGerman

 

Copy link
Powered by Social Snap