Understanding OCR 249
Chapter 9
Understanding OCR
Caere’s OmniPage products represent the leading edge in page- and text-
recognition technology. OmniPage Professional and OmniPage can
recognize virtually any scanned page, separate text from graphics, and
convert almost any printed material to text files for your favorite word-
processor, spreadsheet, or database applications.
OmniPage Professional’s speed and accuracy are at the forefront of current
technology. The integration of
3D
OCR, the Language Analyst, True Page
output formatting, training, and 24-bit color image-editing make
OmniPage Professional’s power and productivity unbeatable.
How OCR Works
OCR is optical character recognition: the process of transferring text from
printed pages into an editable computer file — without retyping.
A scanner is more than a copy machine, simply transferring an image into
your computer. A scanner translates a page into data by dividing the
scanned image into millions of dots or bits (usually from 40,000 to 90,000
per square inch). It then assigns a value to each dot, depending upon
whether it is inked, partially inked, or blank.
The composite document stored in your computer is the map of these dots,
or a
bitmap
. Your computer sees this data not as editable text, but as one
bitmapped image, editable only with image editing tools.
OCR is the process of translating this image into editable text. Text
characters are designed by assigning a code corresponding to the keys on
the keyboard to each letter, number, or symbol. There are a variety of
different code sets in use, but the most common code set is the ASCII
(American Standard Code for Information Interchange) table of character
equivalents. ASCII is generally recognized as the universal code for small
computers. Almost every program that makes use of text and/or numbers
understands ASCII.