20
Introduction
What is Optical Character Recognition?
Optical character recognition (OCR) is the process of extracting text
from images. Images can result from scanning paper documents or
opening image files. Images do not have editable text characters; they
have many tiny dots (pixels) that together form character shapes.
These present a picture of the text on a page.
During OCR, OmniPage Pro analyzes the character shapes in an
image and determines character solutions to produce editable text. In
other words, the OCR program ‘reads’ the page.
After OCR, you can export the recognized text to a variety of word-
processing, desktop publishing, and spreadsheet applications.
Beyond OCR
In addition to text, OmniPage Pro X can retain the following elements
in a document after OCR for display and export.
t
Graphics
Photos, logos and drawings are examples of graphics. The program
cannot recognize handwriting, but signatures can be saved as graphics.
t
Text formatting
Font types, sizes, and styles (such as
bold
or italic) are examples of
character formatting. Indents, tabs, margins and line spacing are
examples of paragraph formatting.
t
Page formatting
Column structure, paragraph spacing, and placement of graphics are
examples of page formatting.
The elements that are retained depend on settings you select before
OCR and on the capabilities of the saving format you choose. See
chapter 4, Settings, for more information.
Summary of Contents for OMNIPAGE PRO X
Page 1: ......