Entity Extraction API

The Entity Extraction API offers an asynchronous API for Entity Extraction from invoice and contract documents using two REST interfaces for document upload and result polling

START

1. Overview

 

 

Base URL for all requests:
https://uaz3xro0r4.execute-api.eu-central-1.amazonaws.com/PROD/

POST /document

2. POST /document

Used to upload your document for entity extraction. Document data can be uploaded either as file or OCR data, e.g., from previous OCR. The response includes a job id which is used to poll for results using REST interface /entities/<JOB_ID>

2.1 Request Header

content-type: HTTP content type

supported values: “multipart/form-data; boundary=<SOME_BOUNDARY_STRING>“

required: yes

customer-id: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

2.2 Request Parameter

The request body includes the following list of form parameters:

document : file containing your document; documents must be limited to a file size of 4 MB; entity extraction is limited to the first 10 pages of the document

supported file types: pdf (single and multi page), tiff (single page and multi page, supported compressions: none, adobe_deflate, ccitt group 3 or 4, lzw) and jpg

required: yes, excludes usage of parameters „text“ and „hocr“

Usage of parameters „document“ and „text“/“hocr“ do exclude each other!

text : OCR text of your document, e.g., resulting from previous OCR

required: yes, requires usage of parameter „hocr“ and excludes usage of parameter „document“

hocr : hOCR data of your document, e.g., resulting from previous OCR

required yes, requires usage of parameter „text“ and excludes usage of parameter „document“

Usage of parameters „text“ and „hocr“ will skip the OCR step of the Entity Extraction API and, hence, significantly improve the request performance!

language : language used for character recognition (OCR)

supported values: [ ”en” | ”de” | ”en+de”]

required: no

default: “en+de”

documentClass : domain of your document; determines the entity types extracted by the Entity Extraction API

supported values: [ ”invoice” | ”contract” ]

required: no

default: determined automatically

useEmbeddedText : use embedded document text to skip OCR step and, hence, improve request performance; only applicable for pdf files when using parameter „document“

supported values: [ „true“ | „false“ ]

required: no

default: „false“

getHocr : return the document’s content in hOCR format (in addition to plain text)

supported values: [ ”true” | ”false” ]

required: no

default: “false”

callbackUrl : callback URL to which the Entity Extraction API sends a HTTP POST request after document processing is finished; the callback request includes a job-id which can be used to call GET /entities/<JOB_ID> for the extraction results

required: no

 

Example Callback

POST <CALLBACK_URL> 

headers {"content-type": application/json"} 
body {"jobId": "229cdae0162805414755d5ee7eed216bc975738c"}

2.3 Response HTTP Status

200: Document uploaded successfully.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

415: Unsupported file format.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

2.4 Response Header

content-type: HTTP content type

supported values: “application /json”

2.5 Response Body

jobId: Job id used for polling the resulting entities from REST interface /entities

type: String

uploadFile: Description of the uploaded file

type: Map<String, Object>

object properties:

  • name: „size“

    type: Integer

  • name: „mime“

    type: String

  • name: „name“

    type: String

2.6 Example

Request

POST /document

headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}
body {"document": <YOUR_DOCUMENT_FILE>}

 

Response

{
  "jobId": "229cdae0162805414755d5ee7eed216bc975738c",
  "uploadFile": {
    "size": 30393,
    "mime": "application/pdf",
    "name": "Demo.pdf"
  }
}

 

GET /entities/

3. GET /entities/<job_id>

Used to poll for results from processing of the uploaded document using the job id from REST interface POST /document response.

 

The response includes the resulting entities which are organised in ungrouped and grouped entities depending on the entity type:

  • Ungrouped entities (field ‘entities’) consist of an entity name and a list of 0..n entity values which are sorted according to decreasing probability, i.e., the first value is the most likely result.

     

    Each entity value includes the following attributes:

    • originalValue: OCR value that was read from the document

    • value: normalized value for specific entity types, e.g., for currency, ‚€‘ is replaced by ‚EUR‘

    • confidence: float value between 0..1 which denotes the probability that an entity value is valid, i.e., the Entity Extraction API proposes potentially multiple values for each entity type which might include valid and invalid values

    • verified: boolean flag which denotes that an entity value is valid with respect to a dedicated set of high-level validation rules that are applied by the Entity Extraction API to each entity type, e.g., invoice amounts must be parsable to floating point values

  • Grouped entities (field ‘groups’) consist of a group name and an unsorted list of 0..n group entities.

    Currently supported group types:

    • taxRates: includes entities ‚invoice_taxRateGroup_taxRate‘, ‚invoice_taxRateGroup_taxAmount’‚ and ‚invoice_taxRateGroup_netAmount‘

    • items: includes entities ‚item_group_quantity‘, ‚item_group_singleNetAmount‘ and ‚item_group_totalNetAmount

  • Each group entity includes the following attributes:

    • members: a tuple of entity values (see above) that are related to each other; each group type is assigned to a static set of entity types; in a particular group entity, each entity type can be included exactly once or can be missing due to suboptimal extraction results

    • verified: boolean flag which denotes that a group entity is consistent with respect to a dedicated set of high-level validation rules that are applied by the Entity Extraction API to each group type, e.g., ‚taxRate‘ * ’netAmount‘ = ‚taxAmount‘

3.1 Request Header

content-type: HTTP content type

supported values: “application/json“

required: yes

customer-id: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

3.2 Path Variable

JOB_ID : the job id is received from the response of the call to REST interface /document

3.3 Response HTTP Status

200: Entity extraction successful.

202: Job processing not finished yet.

204: No entities found.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

422: File cannot be processed.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

3.4 Response Header

content-type: HTTP content type

supported values: “application /json”

3.5 Response Body

documentClass: domain of the input document

supported values: [ „INVOICE_DE“ | „INVOICE_EN“ | „CONTRACT_DE“ | „CONTRACT_EN“ ]

entities: entities extracted from the input document (see entities)

groups: entity groups containing entity tuples from the input document (see groups)

supported groups: [ “items” | ”taxRates” ]

filename: name of the input document file

text: OCR text read from the input document

hocr: document’s content in hOCR format; enabled via input parameter „getHocr“

errorMsg: error description; null on success

3.6 Example

Request

GET /entities/229cdae0162805414755d5ee7eed216bc975738c
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}

 

Response

{
    "documentClass": "INVOICE_DE",
    "errorMsg": null,
    "entities": {
        "vendor_city": [
            {
                "value": "INGOLSTADT",
                "originalValue": "INGOLSTADT",
                "confidence": 0.98210305,
                "verified": null
            }
        ],
        "vendor_zip": [
            {
                "value": "85046",
                "originalValue": "85046",
                "confidence": 0.8447278,
                "verified": true
            }
        ],
        "vendor_vatNumber": [],
        "vendor_iban": [],
        "recipient_street": [
            {
                "value": "Rudolf-Harbig-Weg 26",
                "originalValue": "Rudolf-Harbig-Weg 26",
                "confidence": 0.9971908,
                "verified": null
            }
        ],
        "invoice_invoiceNumber": [
            {
                "value": "458001350",
                "originalValue": "458001350",
                "confidence": null,
                "verified": null
            }
        ],
        "invoice_orderNumber": [],
        "recipient_accountNumber": [],
        "invoice_taxRateGroup_taxAmount": [
            {
                "value": "14.06",
                "originalValue": "14,06",
                "confidence": null,
                "verified": true
            }
        ],
        "vendor_taxIdNumber": [],
        "recipient_city": [
            {
                "value": "Münster",
                "originalValue": "Münster",
                "confidence": 0.9588283,
                "verified": null
            }
        ],
        "recipient_zip": [
            {
                "value": "48149",
                "originalValue": "48149",
                "confidence": 0.9821135,
                "verified": true
            }
        ],
        "invoice_taxRateGroup_netAmount": [],
        "invoice_deliveryNumber": [],
        "recipient_company": [],
        "vendor_bic": [],
        "vendor_name": [
            {
                "value": "MEDIA MARKT E-BUSINESS GMBH",
                "originalValue": "MEDIA MARKT E-BUSINESS GMBH",
                "confidence": null,
                "verified": null
            }
        ],
        "invoice_dueDate": [],
        "invoice_invoiceCurrency": [
            {
                "value": "EUR",
                "originalValue": "€",
                "confidence": null,
                "verified": true
            }
        ],
        "vendor_bankName": [],
        "invoice_deliveryDate": [],
        "invoice_invoiceDate": [
            {
                "value": "10.08.2017",
                "originalValue": "10.08.2017",
                "confidence": 0.9015204,
                "verified": true
            },
            {
                "value": "31.08.2017",
                "originalValue": "31.08.2017",
                "confidence": 0.8834753,
                "verified": true
            }
        ],
        "invoice_taxRateGroup_taxRate": [
            {
                "value": "190",
                "originalValue": "190%",
                "confidence": null,
                "verified": null
            }
        ],
        "vendor_street": [
            {
                "value": "WANKELSTRASSE 5",
                "originalValue": "WANKELSTRASSE 5",
                "confidence": null,
                "verified": null
            }
        ],
        "invoice_invoiceGrossAmount": [
            {
                "value": "88.05",
                "originalValue": "88,05",
                "confidence": 0.46333426,
                "verified": true
            },
            {
                "value": "73.99",
                "originalValue": "73,99",
                "confidence": 0.45646423,
                "verified": false
            },
            {
                "value": "4.99",
                "originalValue": "4,99",
                "confidence": null,
                "verified": false
            }
        ]
    },
    "groups": {
        "taxRates": [
            {
                "members": {
                    "invoice_taxRateGroup_taxAmount": {
                        "value": "14.06",
                        "originalValue": "14,06",
                        "confidence": null,
                        "verified": true
                    },
                    "invoice_taxRateGroup_taxRate": {
                        "value": "190",
                        "originalValue": "190%",
                        "confidence": null,
                        "verified": null
                    }
                },
                "verified": null
            }
        ],
        "items": []
    },
    "text": "MEDIA MARKT E-BUSINESS GMBH “(\n\nWANKELSTRASSE 5 -\n\n85046 INGOLSTADT\n\nTel.: 0841/6344545\n\nE-Mail: ONLINESHOP@MEDIAMARKT.DE\n\nRechnungsadresse Rechnung Nr. 458001350\n\nDaniel Winter Rechnungsdatum 10.08.2017\n\nRudolf-Harbig-Weg 26\n\n48149 Münster Kunden-Nr. 3050789\n\nFällig Am 31.08.2017\n\nRechnung Betrag €88,05\n\nMenge Beschreibung Einzelpreis Gesamtpreis\n\n1 PIXMA MX475 A4 MFP INJEKT (P) 69,00 69,00\n\n1 Versandkosten 4,99 4,99\nSumme Netto 73,99\nMwSt. 190% 14,06\n\n",
    "filename": "Demo.pdf",
    "hocr": null
}

 

POST /jobs/query

4. POST /jobs/query

Used to query the state of multiple jobs

4.1 Request Header

content-type: HTTP content type

supported values: “application/json“

required: yes

customer-id: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

4.2 Request Parameter

The request body includes a single body parameter that includes the following fields in JSON format:

jobIds : list of job ids received from calling POST /document

type: list

required: yes

4.3 Response HTTP Status

200: Query finished successfully.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

4.4 Response Header

content-type: HTTP content type

supported values: “application /json”

4.5 Response Body

jobs: map containing job ids from request as keys and job states as values

type: Map

supported job states: [ „PROCESSING“ | „FINISHED“ | „UNKNOWN“ ]

4.6 Example

Request

POST /jobs/query
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}
body {
  "jobIds": [
    "229cdae0162805414755d5ee7eed216bc975738c",
    "ab4dcb09e196acd6d859f571b97d94d8dc7fae57",
    "1848c3ed29f03ade5cf29c31d7b6dc0665c4d836"
  ]
}

 

Response

{
  "jobs": {
    "229cdae0162805414755d5ee7eed216bc975738c": "FINISHED",
    "ab4dcb09e196acd6d859f571b97d94d8dc7fae57": "PROCESSING",
    "1848c3ed29f03ade5cf29c31d7b6dc0665c4d836": "UNKNOWN"
  }
}

 

POST /training

5. POST /training

Used to upload training samples for training of the extraction models.

5.1 Request Header

content-type: HTTP content type

supported values: [ “application/json“ | “multipart/form-data“ ]

required: yes

customer-id: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

x-api-key: part of your credentials which you receive upon registration for the Entity Extraction API

required: yes

5.2 Request Parameter

The request body includes a single body parameter that includes the following fields in JSON format:

documentClass : document class id of training document

required : yes

supported values : [ „INVOICE“ | „CONTRACT“ ]

language : language id of the training document

required : yes

supported values : [ „en“ | „de“ ]

text : plain text from training document

required: yes

document : document file

supported file types: [ pdf | tiff | jpg ]

required: no

Usage of parameters „document“ requires content-type „multipart/form-data“. If parameter „document“ is omitted the content-type must be „application/json“.

entities : entities from training document

required : yes

Must only include supported entities  listed at 6. SUPPORTED ENTITY TYPES. Entities may be omitted or may include empty values (i.e., empty array). Entity values may include all attributed described in 3. GET /ENTITIES/<JOB_ID>:

  • originalValue: value used for training
  • value: value used for training if attribute „originalValue“ is empty or omitted; ignored otherwise
  • verified: must be „true“ or omitted, otherwise this entity value is ignored for training
  • confidence: attribute is ignored for training

groups : group entities from training document

required : no

supported groups : [ „items“ | „taxRates“ ]

Must only include supported entities  listed at 6. SUPPORTED ENTITY TYPES. Entities may be omitted or may include empty values (i.e., empty array). Entity values are organised as in field „entities“. All entities can be put into field „entities“ exclusively without affecting the training effect! There is no need to use field „groups“!

5.3 Response HTTP Status

200: Successfully submitted train data.

204: Train data empty.

400: Bad request. Missing or invalid input parameter.

401: Authorization failed. Operation not allowed.

403: Authorization failed due to invalid credentials.

429: Too many requests. The overall number of requests to all REST endpoints of the Entity Extraction API must not exceed 100 requests/s for each customer.

5.4 Response Header

content-type: HTTP content type

supported values: “application /json”

5.5 Response Body

errorMsg: error description; null on success

5.6 Example

Request

POST /training
headers {"x-api-key": <YOUR_API_KEY>, "customer-id": <YOUR_CUSTOMER_ID>}
body {
  "documentClass": "INVOICE",
  "language": "de",
  "text": "MEDIA MARKT E-BUSINESS GMBH “(\n\nWANKELSTRASSE 5 -\n\n85046 INGOLSTADT\n\nTel.: 0841/6344545\n\nE-Mail: ONLINESHOP@MEDIAMARKT.DE\n\nRechnungsadresse Rechnung Nr. 458001350\n\nDaniel Winter Rechnungsdatum 10.08.2017\n\nRudolf-Harbig-Weg 26\n\n48149 Münster Kunden-Nr. 3050789\n\nFällig Am 31.08.2017\n\nRechnung Betrag €88,05\n\nMenge Beschreibung Einzelpreis Gesamtpreis\n\n1 PIXMA MX475 A4 MFP INJEKT (P) 69,00 69,00\n\n1 Versandkosten 4,99 4,99\nSumme Netto 73,99\nMwSt. 190% 14,06\n\n",
  "entities": {
    "vendor_city": [
      {
        "value": "INGOLSTADT"
      }
    ],
    "vendor_zip": [
      {
        "value": "85046"
      }
    ],
    "recipient_street": [
      {
        "value": "Rudolf-Harbig-Weg 26"
      }
    ],
    "invoice_invoiceNumber": [
      {
        "value": "458001350"
      }
    ],
    "invoice_taxRateGroup_taxAmount": [
      {
        "value": "14,06"
      }
    ],
    "recipient_city": [
      {
        "value": "Münster"
      }
    ],
    "recipient_zip": [
      {
        "value": "48149"
      }
    ],
    "vendor_name": [
      {
        "value": "MEDIA MARKT E-BUSINESS GMBH"
      }
    ],
    "invoice_invoiceCurrency": [
      {
        "value": "€"
      }
    ],
    "invoice_invoiceDate": [
      {
        "value": "10.08.2017"
      }
    ],
    "invoice_taxRateGroup_taxRate": [
      {
        "value": "19,0"
      }
    ],
    "vendor_street": [
      {
        "value": "WANKELSTRASSE 5"
      }
    ],
    "invoice_invoiceGrossAmount": [
      {
        "value": "88,05"
      }
    ]
  }
}

 

Supported Entity Types

6.1 Invoice Entities

For invoice documents, the Entity Extraction API provides a default set of entities that are extracted. The Buildsimple team may add additional entities to the default entity set for invoice documents in future releases.

6.1.1 Invoice

The following entities are located at field ‚entities‘ of the response from GET /entities/<JOB_ID>

Entity nameDescription
invoice_invoiceDateinvoice date
invoice_invoiceNumberinvoice number
invoice_orderNumberorder number
invoice_deliveryDatedelivery date
invoice_invoiceCurrencyinvoice currency
invoice_invoiceGrossAmountinvoice gross amount
invoice_dueDatedue date
invoice_deliveryNumberdelivery number

6.1.2 Vendor

The following entities are located at field ‚entities‘ of the response from GET /entities/<JOB_ID>

Entity nameDescription
vendor_namevendor name
vendor_streetvendor street name and house number
vendor_zipvendor zip code
vendor_cityvendor city
vendor_bankNamename of the vendor’s bank
vendor_ibanvendor IBAN
vendor_bicvendor BIC
vendor_taxIdNumbervendor tax id
vendor_vatNumbervendor VAT number/td>

6.1.3 Recipent

The following entities are located at field ‚entities‘ of the response from GET /entities/<JOB_ID>

Entity nameDescription
recipient_companyname of the recipient’s company
recipient_streetrecipient street and house number
recipient_ziprecipient zip code
recipient_cityrecipient city

6.1.4 Invoice Items

The following entities are located at field ‚members‘ within list ‚groups[‚items‘]‘ of the response from GET /entities/<JOB_ID>.Groups may be incomplete, i.e., contain only 1-5 entities.

Entity nameDescription
item_group_quantityquantity of an invoice item (grouped by invoice item)
item_group_singleNetAmountsingle net amount (grouped by invoice item)
item_group_totalNetAmounttotal net amount (grouped by invoice item)
item_group_descriptiondescription of invoice item (grouped by invoice item)
item_group_materialNumbermaterial number (grouped by invoice item)
item_group_taxRatetax rate applied to invoice item (grouped by invoice item)

6.1.5 Tax Rates

The following entities are located at field ‚members‘ within list ‚groups[‚taxRates‘]‘ of the response from GET /entities/<JOB_ID>. Groups may be incomplete, i.e., contain only 1-2 entities.

Entity nameDescription
invoice_taxRateGroup_taxRatetax rate
invoice_taxRateGroup_netAmounttotal net amount (grouped by tax rate)
invoice_taxRateGroup_taxAmounttotal tax amount (grouped by tax rate)

6.2 Contract Entities

For contract documents, the Entity Extraction API provides a default set of entities that are extracted. The Buildsimple team may add additional entities to the default entity set for invoice documents in future releases.

6.2.1 Contractor

The following entities are located at field ‚entities‚ of the response from GET /entities/<JOB_ID>

Entity nameDescription
contractor_namecontractor name
contractor_streetcontractor street and house number
contractor_zipcontractor zip code
contractor_citycontractor city
contractor_contactcontact person on contractor side

6.2.2 Contractee

The following entities are located at field ‚entities‚ of the response from GET /entities/<JOB_ID>

Entity nameDescription
contractor_namecontractor name
contractor_streetcontractor street and house number
contractor_zipcontractor zip code
contractor_citycontractor city
contractor_contactcontact person on contractor side

6.2.3 Contract

The following entities are located at field ‚entities‚ of the response from GET /entities/<JOB_ID>

Entity nameDescription
contract_numbercontract number
contract_datecontract date
contract_begin_datecontract begin date
contract_end_datecontract end date
contract_periodduration of contract
contract_objectobject of contract
contract_volumevolume of contract
contract_currencycurrency of contract volume

Swagger API

7. Swagger API

The following listing provides a Swagger description of the Entity Extraction API in YAML

 

swagger: '2.0'
info:
  description: Extract entities from your documents.
  version: '1.0.0'
  title: ISR Entity Extraction API
host: 'jhguolkp91.execute-api.eu-central-1.amazonaws.com'
basePath: /QA
schemes:
  - https
paths:
  /document:
    post:
      summary: Upload document for entity extraction
      operationId: uploadUsingPOST
      consumes:
        - multipart/form-data
      produces:
        - application/json
      parameters:
        - name: customer-id
          in: header
          description: customer-id
          required: true
          type: string
        - name: x-api-key
          in: header
          description: x-api-key
          required: true
          type: string
        - name: document
          in: formData
          description: document
          required: true
          type: file
        - name: language
          in: formData
          description: language
          required: false
          type: string
          default: en+de
        - name: documentClass
          in: formData
          description: documentClass
          required: false
          type: string
        - name: getHocr
          in: formData
          description: getHocr
          required: false
          type: boolean
          default: false
      responses:
        '200':
          description: OK
          schema:
            $ref: '#/definitions/UploadResponse'
        '400':
          description: Bad request. Missing or invalid input parameter.
        '401':
          description: Authorization failed. Operation not allowed.
        '403':
          description: Authorization failed due to invalid credentials.
        '415':
          description: Unsupported file format.
        '429':
          description: Usage limit exceeded.
        '500':
          description: Internal server error during processing.
        '503':
          description: Required service unavailable.
  /entities/{job-id}:
    get:
      summary: Get entity results
      operationId: getResultUsingGET
      consumes:
        - application/json
      produces:
        - application/json
      parameters:
        - name: customer-id
          in: header
          description: customer-id
          required: true
          type: string
        - name: x-api-key
          in: header
          description: x-api-key
          required: true
          type: string
        - name: job-id
          in: path
          description: job-id
          required: true
          type: string
      responses:
        '200':
          description: OK
          schema:
            $ref: '#/definitions/ExtractResponse'
        '202':
          description: Job processing not finished yet
        '204':
          description: No entities found
        '400':
          description: Bad request. Missing or invalid input parameter.
        '401':
          description: Authorization failed. Operation not allowed.
        '403':
          description: Authorization failed due to invalid credentials.
        '415':
          description: Unsupported file format.
        '429':
          description: Usage limit exceeded.
        '500':
          description: Internal server error during processing.
        '503':
          description: Required service unavailable.
definitions:
  UploadResponse:
    type: object
    properties:
      uploadFile:
        type: object
        additionalProperties:
          type: object
          properties:
            size:
              type: integer
            mime:
              type: string
            name:
              type: string
      jobId:
        type: string
  ExtractResponse:
    type: object
    properties:
      filename:
        type: string
      documentClass:
        type: string
      entities:
        type: object
        additionalProperties:
          type: array
          items:
            $ref: '#/definitions/ExtractEntity'
      groups:
        type: object
        additionalProperties:
          type: array
          items:
            $ref: '#/definitions/GroupEntity'
      hocr:
        type: string
      text:
        type: string
      errorMsg:
        type: string
  GroupEntity:
    type: object
    properties:
      members:
        type: object
        additionalProperties:
          $ref: '#/definitions/ExtractEntity'
      verified:
        type: boolean
  ExtractEntity:
    type: object
    properties:
      confidence:
        type: number
        format: float
      originalValue:
        type: string
      value:
        type: string
      verified:
        type: boolean