Monday, February 8, 2010

OCR to PDF

So, when utilizing Optical Character Recognition , if your purpose is to make documents searchable PDFs, you want to choose the appropriate recognition engine.  So first I think we should explain all the different types of PDFs:

- Image - this is just a picture, no text layer.

-Text or Normal - this is normally what is created when you utilize the Adobe Acrobat distiller

-Image with Hidden Text - this is the standard in PDF OCR and provides a "pristine" image, with all the OCR text in the background.

The image with hidden text PDF is a great OCR output format, as it allows you to search your PDFs with hit highlighting.  So if you are utilizing a document capture application, or plan on Scanning to SharePoint and utilizing the Adobe iFilter for searching, the image with hidden text is the best format for OCR / PDF.

No comments:

Post a Comment