OCR and PDF/A - What it means.
So what exactly is PDF/A, and why does it matter to me? The Portable Document Format (PDF) has long been a simple, pervasive format for the sharing of documents, especially in the scanning, document capture and ECM industry. But for long-term storage and archiving, many organization chose TIFF as there were concerns over the viability of PDF for long-term digital preservation of electronic documents. In steps the PDF/A standard, with the goal of eliminating any feature that would inhibit long-term archiving. PDF/A is a standardized version of the PDF format that places a focus on removing constrained features like font embedding, and focuses on standardizing viewing requirements, support for embedded fonts, guidelines surrounding color management and the ability to read embedded comments and annotations. Below are some of the compatibility elements:
- Any executable code is forbidden
- Color is standardized
- All fonts require the ability to be embedded
- No encryption
- No audio or video
- Metadata is standards based
- Digital signatures are allowed based on standards
- Embedded files are allowed with the latest revision
- External content references are not allowed
- Compression standards are enforced
So why does all this matter? If you are archiving files for the long run, this standard will ensure that you will be able to open, view and read your archived content. Most document scanning and capture solutions will support this output type, and this can prevent long-term issues in your Records Center.
Here are some great references: