So, when utilizing Optical Character Recognition , if your purpose is to make documents searchable PDFs, you want to choose the appropriate recognition engine. So first I think we should explain all the different types of PDFs:
- Image - this is just a picture, no text layer.
-Text or Normal - this is normally what is created when you utilize the Adobe Acrobat distiller
-Image with Hidden Text - this is the standard in PDF OCR and provides a "pristine" image, with all the OCR text in the background.
The image with hidden text PDF is a great OCR output format, as it allows you to search your PDFs with hit highlighting. So if you are utilizing a document capture application, or plan on Scanning to SharePoint and utilizing the Adobe iFilter for searching, the image with hidden text is the best format for OCR / PDF.
The AS400 Remains a ‘POWER’ to be reckoned with
2 months ago
No comments:
Post a Comment