Classification API

The Classification API offers an asynchronous API for document classification using two REST interfaces for document upload and result polling

1.Overview

Base URL for all requests:
https://56kkv8d9n7.execute-api.eu-central‑1.amazonaws.com/PROD/

2.POST /document

Used to upload your document for classification. The response includes a job id which is used to poll for results using REST interface GET /class/

2.1.Request Header

content-type: HTTP content type 

supported values: “multipart/form-data“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x‑api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

2.2.Request Parameter

The request body includes the following list of form parameters:

document : file containing your invoice or contract document; documents must be limited to 10 pages and a file size of 4 MB.

supported file types: pdf (single and multi page)

required: yes

useEmbeddedText : use embedded document text to skip OCR step and, hence, improve request performance; only applicable for pdf files

supported values: [ “true” | “false” ]

required: no

default: “false”

getHocr : return the document’s content in hOCR format (in addition to plain text)

supported values: [ “true” | “false” ]

required: no

default: “false”

uploadId : required for processing document files > 4 MB: upload id returned from endpoint GET /uploadurl; each upload id must only be used once

required: no

2.3.Response HTTP Status

200: Document uploaded successfully.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

415: Unsupported file format.

429: Too many requests. The overall number of requests to all REST endpoints of the Classification API must not exceed 100 requests/s for each customer.

2.4.Response Header

content-type: HTTP content type

supported values: “application /json”

2.5.Response Body

  • jobId: Job id used for polling the resulting document class from REST interface GET /class

type: String

uploadFile: Description of the uploaded file

type: Map

object properties:

  • name: "size"
    type: Integer
  • name: "mime"
    type: String
  • name: "name"
    type: String

2.6.Example

Request

POST /document
 
headers {"x-api-key": , "customer-id": }
body {"document": }

Response

{
    "jobId": "229cdae0162805414755d5ee7eed216bc975738c",
    "uploadFile": {
        "size": 30393,
        "mime": "application/pdf",
        "name": "Demo.pdf"
    }
}

3.GET /class/

Used to poll for the classification result for the uploaded document using the job id from REST interface POST /document response.

3.1.Request Header

content-type: HTTP content type 

supported values: “application/json“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x‑api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

3.2.Path Variable

JOB_ID: the job id is received from the response of the call to REST interface POST /document

3.3.Response HTTP Status

200: Document classification successful.

202: Job processing not finished yet.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

422: File cannot be processed.

429: Too many requests. The overall number of requests to all REST endpoints of the Classification API must not exceed 100 requests/s for each customer.

3.4.Response Header

content-type: HTTP content type

supported values: “application /json”

3.5.Response Body

doc-class: domain of the input document

supported values: [ “INVOICE_DE” | “INVOICE_EN” | “CONTRACT_DE” | “CONTRACT_EN” ]

3.6.Example

Request

GET /class/229cdae0162805414755d5ee7eed216bc975738c
 
headers {"x-api-key": , "customer-id": }

Response

{
    "doc-class": "INVOICE_DE",
}

4.POST /training

Used to upload training samples for training of the classification model.

4.1.Request Header

content-type: HTTP content type 

supported values: “multipart/form-data“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x‑api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

4.2.Request Parameter

The request body includes the following list of form parameters:

documentClass: document class id of the training document

required: yes

language: language id of the training document

supported values: [ “en” | “de” ]

required: yes

document: training document file

supported file types: [ pdf | tiff | jpg ]

required: yes

text: plain text from training document

required: yes

4.3.Response HTTP Status

200: Successfully submitted train data.

204: Train data empty.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

429: Too many requests. The overall number of requests to all REST endpoints of the Classification API must not exceed 100 requests/s for each customer.

4.4.Response Header

content-type: HTTP content type

supported values: “application /json”

4.5.Response Body

errorMsg: error description; null on success

4.6.Example

Request

POST /training
headers {"x-api-key": , "customer-id": }
body {
  "documentClass": "INVOICE",
  "language": "de",
  "text": "MEDIA MARKT E-BUSINESS GMBH “(nnWANKELSTRASSE 5 -nn85046 INGOLSTADTnnTel.: 0841/6344545nnE-Mail: ONLINESHOP@MEDIAMARKT.DEnnRechnungsadresse Rechnung Nr. 458001350nnDaniel Winter Rechnungsdatum 10.08.2017nnRudolf-Harbig-Weg 26nn48149 Münster Kunden-Nr. 3050789nnFällig Am 31.08.2017nnRechnung Betrag €88,05nnMenge Beschreibung Einzelpreis Gesamtpreisnn1 PIXMA MX475 A4 MFP INJEKT (P) 69,00 69,00nn1 Versandkosten 4,99 4,99nSumme Netto 73,99nMwSt. 190% 14,06nn",
  "document": 
}

 

5.GET /uploadurl

Used to request a URL for the upload of large document files (> 4 MB). After uploading your document you must use endpoint POST /document to start document processing.

5.1.Request Header

content-type: HTTP content type 

supported values: “application/json“

required: yes

customer-id: part of your credentials which you receive upon registration for the Classification API

required: yes

x‑api-key: part of your credentials which you receive upon registration for the Classification API

required: yes

5.2.Response HTTP Status

200: Successfully generated upload URL

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

5.3.Response Header

content-type: HTTP content type

supported values: “application /json”

5.4.Response Body

uploadUrl: URL for uploading your document file using HTTP REST. Each URL must only be used once.

uploadId: upload id that must be passed as parameter ‘uploadId’ in POST /document endpoint for processing your uploaded document file

5.5.Example

Request

GET /uploadurl
 
headers {"x-api-key": , "customer-id": }

Response

{
    "uploadUrl": ,
    "uploadId": "ac9e77e0-13cd-4d81-a4f1-9ba88b52899d"
}

6.Supported Document Classes

The Classification API currently supports the following document classes and languages:

Document Class IdDocument ClassDocument Language
INVOICE_ENinvoice documentEnglish
INVOICE_DEinvoice documentGerman
CONTRACT_ENcontract documentEnglish
CONTRACT_DEcontract documentGerman

Suggest Edit

Copy link
Powered by Social Snap