Project Oxford: Computer Vision API pros & cons


Microsoft project code named Oxford has number of interesting APIs. Let’s dig into one of them – OCR.

Computer Vision API provides OCR conversion from images to text. It could be tested from here. It is really fast, but requires understanding of the subject. For example, I have coloured image with a text

snip_20150923092422Here is result:

I don -t need 
l. Google 
my wife knows

Hmm, last string is lost. But if you apply charcoal transformation removing some noise

snip_20150923092751 Here is result:

I don-t need 
Go, gle 
my wife knows 
everything

Much better, last string is picked up. The same OCR engine Microsoft has in OneNote, but it has the same problem.

Pros:

  • Simple
  • Fast
  • Supports many languages and auto-detection (?)
  • REST based
  • Really interesting package of APIs and features

Cons:

  • Requires Azure subscription. Which means not free
  • Requires file preparation before load
  • Does not support vectorised images (SVG)
  • Does not understand image orientation

snip_20150923094654

  • Does not support photo EXIF attributes. This is key point if you know what I mean
  • Recognition is not as good and customizable as you can do with free tools like Tesseract-OCR, ImageMagic, autotrace, inkscape.

					
Advertisements

About Andrew Butenko

https://www.mcpvirtualbusinesscard.com/VBCServer/a9939be0-be6f-4249-a775-6665eccff2e4/card https://www.microsoftvirtualacademy.com/Profile.aspx?alias=530492
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s