Entity Extraction API Getting Started
After you have successfully registered for the Entity Extraction API, read the following information to test the extraction of your documents
Used to upload your document for entity extraction. Document data can be uploaded either as file or OCR data, e.g., from previous OCR. The response includes a job id which is used to poll for results using REST interface /entities/
The Document Trainer is our demonstration and training tool for the Entity Extraction API, which is the easiest way to test your own documents with the Entity Extraction API and visualize extraction results. Testing your documents with the Document Trainer is free of charge.
In addition to the registration mail from the Entity Extraction API, you receive a welcome mail from the Document Trainer. Since the usage of the Document Trainer is non-obligatory, you need to activate your Document Trainer account explicitly by following the activation link within the mail.
After following the activation link, you are automatically signed in to your Document Trainer account and are prompted to enter your password and to accept the usage terms and the data processing agreement of the Document Trainer.
The next screen shows your user data. In order to test the extraction of your own document, you need to select ‚Train documents‘ from the menu.
The next screen shows the inbox which should be initially empty. Click on the blue area in the middle or the red button in the bottom right corner to upload your document.
During the processing of your document, you’ll notice a progress bar which disappears after the extraction is finished. To see the detailed extraction results for your document, click on the inbox entry for your document.
Feel free to upload further documents to evaluate the extraction results for multiple of your documents.
You can use cURL to send requests to the Entity Extraction API via command line interface.
curl -X POST -H "customer-id:
" -H "x-api-key: " -F 'document=@/path/to/invoice.pdf' https://uaz3xro0r4.execute-api.eu-central-1.amazonaws.com/PROD/document
curl -X GET -H "customer-id:
" -H "x-api-key: " https://uaz3xro0r4.execute-api.eu-central-1.amazonaws.com/PROD/entities/
You can use Postman to send requests to the Entity Extraction API via the Postman App:
First, you need to create a POST request and enter the URL of the Entity Extraction API endpoint for document upload. You find the base URL for the Entity Extraction API in the API documentation. The endpoint for document upload has the suffix /document
In the request header, you need to provide your custom user credentials, which you received after registration for the Entity Extraction API.
In the request body, you need to specify the request content-type as ‚form-data‘ and provide your document as input type ‚File‘.
After sending the request, you find the JOB-ID of your request in the response.
In order to collect the extraction result for your document, you need to create a new GET request and enter the URL of the Entity Extraction API endpoint for receiving results (/entities/) and append your JOB-ID to the query URL.
In the request header, you need to provide the same user credentials as for the upload document request.
Which requests to the Entity Extraction API are charged and which are not?
The following requests are charged:
- After exceeding 50 free requests, each request to endpoint POST /document/ which is processed successfully, i.e., request to GET /entities/ with corresponding job-id results in HTTP status 2xx
The following requests are not charged:
- First 50 requests to endpoint POST /document/ which are processed successfully (HTTP response 2xx from GET /entities/)
- All requests to endpoint POST /document/ which are not processed successfully (HTTP response 4xx or 5xx from POST /document/ or GET /entities/)
- All requests to endpoint GET /entities/
- All documents uploaded with the Document Trainer
Why do I receive HTTP status "400 Bad request" for POST /document/ ?
While there a various reasons for receiving bad failures from POST /document/, the most common error causes are related to the content-type of the request. The request must provide content-type "multipart/form-data" and a suitable boundary. Hence, the request header "Content-Type" must include a definition of the boundary string, e.g.,
Why do I receive exception message "SslHandshake failed" for every request?
The Entity Extraction API requires the client to enable Server Name Indication (SNI) for his requests. If SNI is disabled, the connection attempt will be rejected due to a failed TLS/SSL handshake. Most common HTTP libraries support SNI.
Why do I receive HTTP status "422 Unprocessable entity" for certain documents?
- Encrypted files cannot be processed by the Entity Extraction API
- For optimal extraction results we recommend a minimum of 300 DPI
- Supported compression formats for each file type are listed in the API documentation
If the cause for HTTP status 422 still remains unclear, we suggest to open a ticket in the Buildsimple Community for further investigation.
Why do I receive HTTP status "429 Too Many Requests"?
The Entity Extraction API restricts the number of requests for each customer to 100 requests per second. This limit refers to the sum of requests to all endpoints of the Entity Extraction API. If this limit is exceeded by any request, HTTP status 429 is returned and the corresponding request should be retried by the customer.
If you need a higher limit please do not hesitate to contact us.