Solving the "Offset Problem" in Zonal Text Recognition

Published February 21, 2020

The foremost problem with zonal text recognition — zonal text recognition is when users draw bounding boxes on a document and text is then extracted from these bounding boxes in a series of documents — is small differences in position, scale, and rotation of documents.

For example, here I am drawing the bounding boxes on the MIT Cat Registration form:

image caption (sorry lol)

Of course the text recognition is satisfactory in this document, since I manually drew the bounding boxes in the right places. But what if we try to use these bounding boxes on another scan? Say, this one:

image caption (sorry lol)

The bounding boxes I drew don't line up perfectly, since the document was in a slightly different place when I scanned it. Here's the relevant part of the new document, overlayed the original document:

image caption (sorry lol)

Well, after a lot of research and hard work, we developed some secret sauce to solve this problem. Our Documents-to-Database service is now scale, rotation, and offset invariant! After magically aligning the new document with the old one, text recognition is pretty satisfactory:

image caption (sorry lol)

Now, I know that there are bound to be many detractors at this point in the blog post — they're thinking "Those documents weren't even that misaligned to begin with!". Alas, I present a new, very misaligned scan, overlayed on the original document:

image caption (sorry lol)

Not only is it slightly rotated and offset, it's even a completely different size! A closer look at the relevant part:

image caption (sorry lol)

Not even close! But after a bit of spooky linear algebra, all is calm on the seas of MIT Cat Registration form collection:

image caption (sorry lol)

If you want to learn more about our Documents-to-Database service, here's a friendly link. If you're already completed convinced, here's a link to sign up. Tangentially related, if you just want a cheap alternative to high-quality OCR services like Google Cloud Vision and Amazon Textract, then you may be interested in our text recognition service. Good day!