Entity Extraction API Getting Started

After you have successfully registered for the Entity Extraction API, read the following information to test the extraction of your documents

1.Overview

Please make sure to check our API Documentation for further information on URL endpoints, request parameters and result values!

2.Document Trainer

Used to upload your document for entity extraction. Document data can be uploaded either as file or OCR data, e.g., from previous OCR. The response includes a job id which is used to poll for results using REST interface /entities/

2.1.Trainer

The Document Trainer is our demonstration and training tool for the Entity Extraction API, which is the easiest way to test your own documents with the Entity Extraction API and visualize extraction results. Testing your documents with the Document Trainer is free of charge.

2.2.Welcome Mail

In addition to the registration mail from the Entity Extraction API, you receive a welcome mail from the Document Trainer. Since the usage of the Document Trainer is non-obligatory, you need to activate your Document Trainer account explicitly by following the activation link within the mail.

2.3.First Login

After following the activation link, you are automatically signed in to your Document Trainer account and are prompted to enter your password and to accept the usage terms and the data processing agreement of the Document Trainer.

2.4.Upload Document

The next screen shows your user data. In order to test the extraction of your own document, you need to select ‚Train documents‘ from the menu.

The next screen shows the inbox which should be initially empty. Click on the blue area in the middle or the red button in the bottom right corner to upload your document.

During the processing of your document, you’ll notice a progress bar which disappears after the extraction is finished. To see the detailed extraction results for your document, click on the inbox entry for your document. 

Feel free to upload further documents to evaluate the extraction results for multiple of your documents.

3.cURL

You can use cURL to send requests to the Entity Extraction API via command line interface.

3.1.Upload Document

curl -X POST  
-H "customer-id: "  
-H "x-api-key: "  
-F 'document=@/path/to/invoice.pdf'  
https://uaz3xro0r4.execute-api.eu-central-1.amazonaws.com/PROD/document

3.2.Receive Result

curl -X GET  
-H "customer-id: "  
-H "x-api-key: "  
https://uaz3xro0r4.execute-api.eu-central-1.amazonaws.com/PROD/entities/

4.Postman

You can use Postman to send requests to the Entity Extraction API via the Postman App:

4.1.Upload Document

First, you need to create a POST request and enter the URL of the Entity Extraction API endpoint for document upload. You find the base URL for the Entity Extraction API in the API documentation. The endpoint for document upload has the suffix /document

In the request header, you need to provide your custom user credentials, which you received after registration for the Entity Extraction API.

In the request body, you need to specify the request content-type as ‚form-data‘ and provide your document as input type ‚File‘.

After sending the request, you find the JOB-ID of your request in the response.

4.2.Receive Result

In order to collect the extraction result for your document, you need to create a new GET request and enter the URL of the Entity Extraction API endpoint for receiving results (/entities/) and append your JOB-ID to the query URL.

In the request header, you need to provide the same user credentials as for the upload document request. 

Schlagen  Sie  bearbeiten

FAQs

blank
Which requests to the Entity Extraction API are charged and which are not?

The following requests are charged:

  • After exceeding 50 free requests, each request to endpoint POST /document/  which is processed successfully, i.e., request to GET /entities/ with corresponding job-id results in HTTP status 2xx

The following requests are not charged:

  • First 50 requests to endpoint POST /document/ which are processed successfully (HTTP response 2xx from GET /entities/)
  • All requests to endpoint POST /document/ which are not processed successfully (HTTP response 4xx or 5xx from POST /document/ or GET /entities/)
  • All requests to endpoint GET /entities/
  • All documents uploaded with the Document Trainer
Why do I receive HTTP status "400 Bad request" for POST /document/ ?

While there a various reasons for receiving bad failures from POST /document/, the most common error causes are related to the content-type of the request. The request must provide content-type "multipart/form-dataand a suitable boundary. Hence, the request header "Content-Type" must include a definition of the boundary string, e.g.,

content-type: multipart/form-data; boundary=------------------------eae50bb16c861b9d
Why do I receive exception message "SslHandshake failed" for every request?

The Entity Extraction API requires the client to enable Server Name Indication (SNI) for his requests. If SNI is disabled, the connection attempt will be rejected due to a failed TLS/SSL handshake. Most common HTTP libraries support SNI.

Why do I receive HTTP status "422 Unprocessable entity" for certain documents?
While the Entity Extraction API supports various file types as input document (e.g., pdf, tiff), it is also important to consider further file properties:

  • Encrypted files cannot be processed by the Entity Extraction API
  • For optimal extraction results we recommend a minimum of 300 DPI
  • Supported compression formats for each file type are listed in the API documentation

If the cause for HTTP status 422 still remains unclear, we suggest to open a ticket in the Buildsimple Community for further investigation.

Why do I receive HTTP status "429 Too Many Requests"?

The Entity Extraction API restricts the number of requests for each customer to 100 requests per second. This limit refers to the sum of requests to all endpoints of the Entity Extraction API. If this limit is exceeded by any request, HTTP status 429 is returned and the corresponding request should be retried by the customer.

If you need a higher limit please do not hesitate to contact us.

Where do I find further help and assistance?

We would love to assist you to get started with the Entity Extraction API. You can ask your questions in the Buildsimple Service Desk or schedule a personal call  with our support team.

Copy link
Powered by Social Snap