Overview

The Sight API offers advanced text recognition for PDFs and images. It can handle human handwriting, including cursive, with accuracy levels comparable to Google Cloud Vision and Amazon Textract.

You can try out the API via a graphical web interface by signing up for an account and going to the Sight dashboard.

Without writing any code

You can recognize text in documents without writing any code. There is a command-line program which takes files to recognize text in and writes the results to a specified file.

For developers

We provide official, single-function libraries for the Sight API:

Additionally, there are community-maintained, unofficial libraries:

All other users make HTTP requests containing base64-encoded files and recognized text is returned as JSON that looks like this:

    {
      "RecognizedText": [
        {
          "Text": "Invoice",
          "Confidence": 0.22863210084975458
          "TopLeftX": 395,
          "TopLeftY": 35,
          "TopRightX": 449,
          "TopRightY": 35,
          "BottomLeftX": 395,
          "BottomLeftY": 47,
          "BottomRightX": 449,
          "BottomRightY": 47,
        },
        ...
      ]
    }

The Endpoint

Make a POST request to https://siftrics.com/api/sight/ with JSON that looks like this:

    {
      "files": [
        {
          "mimeType": "application/pdf",
          "base64File": fileContentsEncodedAsBase64String
        },
        ...
      ]
    }

Additionally, an API key must be provided by setting the "Authorization" header to

    "Basic API_KEY_HERE"

Other valid mimeTypes are: "image/bmp", "image/gif", "image/png", "image/jpeg", and "image/jpg".

The Response

The format of the JSON response differs based on whether one page or multiple pages where requested to be processed. "Page" is emphasized because one single PDF document can contain multiple pages.

Response for One-Page Requests

If exactly one page was requested to be processed, then the returned JSON looks like this:

    {
      "RecognizedText": [
        {
          "Text": "Invoice",
          "Confidence": 0.22863210084975458
          "TopLeftX": 395,
          "TopLeftY": 35,
          "TopRightX": 449,
          "TopRightY": 35,
          "BottomLeftX": 395,
          "BottomLeftY": 47,
          "BottomRightX": 449,
          "BottomRightY": 47,
        },
        ...
      ]
    }

Optionally, it is possible to force the Sight API to spawn an asynchronous job to process the single page. To do this, set the top-level boolean field "DoAsync" to true in your POST request. If "DoAsync" is set to true, or if multiple pages are requested to be processed in one POST request, then the endpoint returns a URL to be polled periodically by the user. Details are described in the following section:

Response for Multiple-Page Requests

If multiple pages were requested to be processed, or if the top-level boolean field "DoAsync" is set to true, then a URL is returned which must be periodically polled with a GET request:

    { "PollingURL": "https://siftrics.com/api/sight/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

Periodically send a GET request to that URL, with the "Authorization" header set as in your original POST request. It is recommended to send a request once every 1,000 milliseconds. The returned JSON looks like this:

    {
      "Pages": [
        {
          "Error": "",
          "FileIndex": 0,
          "PageNumber": 1,
          "NumberOfPagesInFile": 3,
          "RecognizedText": [ ... ]
        },
        ...
      ]
    }

If "Error" is not an empty string, then there was an error processing that page.

"FileIndex", "PageNumber", and "NumberOfPagesInFile" can be used to keep track of which pages have been seen.

"FileIndex" is the index of this file in the original request's "files" array. This value is always valid.

"PageNumber" and "NumberOfPagesInFile" are always valid, unless there was an error processing the file, in which case these values are set to -1.

The results of any given page are only returned once; after a page has been seen in a response, it will not be seen again. It is up to the API user to keep track of which pages have been seen and stop polling after all have been seen. Exactly 60 seconds after all pages are available to a user, the polling URL is destroyed and becomes unavailable.

Word-level Bounding Boxes

At the top level of the JSON payload in your POST request, if you set the field "makeSentences" to false, then word-level bounding boxes are returned, instead of sentence-level bounding boxes:

    {
      "files": [ ... ],
      "makeSentences": false
    }

If this field is omitted or set to true, then sentence-level bounding boxes are returned.

Auto-RotateBETA

The Sight API can rotate and return images so that the majority of recognized text is upright. This feature is part of the "Advanced" feature set, so it is billed at 4 pages per page — approximately $2.00 per 1,000 pages.

Rotated images are returned as base64-encoded JPEG images, by setting the string field "Base64Image" in each "Page" object.

To enable this behavior, set the top-level boolean field "DoAutoRotate" to "true" in your post request:

    {
      "files": [ ... ],
      "doAutoRotate": true
    }

Why are my bounding boxes rotated 90 degrees? EXIF Orientation and How to Deal with It

Some images, particularly .jpeg images, use the EXIF data format. This data format contains a metadata field indicating the orientation of an image --- i.e., whether the image should be rotated 90 degrees, 180 degrees, flipped horizontally, etc., when viewing it in an image viewer.

This means that when you view such an image in Chrome, Firefox, Safari, or the stock Windows and Mac image viewer applications, it will appear upright, despite the fact that the underlying pixels of the image are encoded in a different orientation.

If you find your bounding boxes are rotated or flipped relative to your image, it is because the image decoder you are using to load images in your program obeys EXIF orientation, but the Sight API ignores it (or vice versa).

All the most popular imaging libraries ignore EXIF orientation. You should determine whether your image decoder obeys EXIF orientation and tell the Sight API to do the same thing. You can tell the Sight API to obey the EXIF orientation by setting the top-level field "doExifRotate" to true:

    {
      "files": [ ... ],
      "doExifRotate": true
    }

By default, the Sight API ignores EXIF orientation.

Note that the EXIF "orientation" field is not always correct. Sometimes, the field says an image should be rotated 90 degrees, even though it is already upright, due to faulty camera sensors. If you just want all of your images rotated so that the text appears upright, take a look at the Auto-Rotate feature, which automatically rotates images so they are upright and returns them alongside recognized text and bounding boxes.