De-Skewing Documents with Hydra

Published September 30, 2020

Hydra is a documents-to-database automation service. After giving Hydra one example document, it can process any version of the document — be it an upside-down scan or a skewed photograph. Hydra extracts text, images, and tables and returns this information in a format fully ready for database insertion.

For example, here's a sideways scan of an invoice that was physically cut in half with scissors:

image of sideways invoice

Hydra has always been able to extract text from documents like this. Here's Hydra's result:

"RecognizedText": {
  "Address": "456 Viking Lane\nOslo, MN 23456",
  "Customer": "Nordic Airways, Inc.",
  "Date Issued": "2019/12/19",
  "Expiry Date": "2020/01/19",
  "Purchased Items": [
    {
      "Item": "In-flight entertainment system",
      "Price": "$500",
      "Quantity": "500",
      "Total": "$250,000"
    },
    {
      "Item": "Cockpit Displays",
      "Price": "$2,000",
      "Quantity": "20",
      "Total": "$40,000"
    }
  ],
  "Total Amount Owed": "$290,000"
}

What Hydra hasn't been able to do — until just recently, that is — is return the de-skewed image. Now, users can simply set the flag "returnTransformedImages" to true when sending documents to Hydra and Hydra will send back the de-skewed images. Here is the result of the above image, de-skewed:

image of de-skewed sideways invoice

If you want to learn more about our Documents-to-Database service, here's a friendly link. If you're already completed convinced, here's a link to sign up. Tangentially related, if you just want a cheap alternative to high-quality OCR services like Google Cloud Vision and Amazon Textract, then you may be interested in our text recognition service. Good day!