What Is Optical Character Recognition (OCR)?
12
Chapter 2
What Is Optical Character Recognition (OCR)?
Optical character recognition
(
OCR
) is the process of turning an
image
into
computer-editable text. An image is an electronic picture of text such as
a scanned paper document or an electronic fax file. Images do not have
editable text characters; they have many tiny dots (
pixels
) that together
form a picture of text.
During OCR, OmniPage Web analyzes an image and defines characters
to produce editable text. After OCR, you can convert the resulting text
to HTML format using OmniPage Web’s
outlining
feature.
What Is Outlining?
Outlining
is the process of examining the structure of a document,
detecting original document elements (called
objects
in OmniPage Web),
and creating hypertext links.
OmniPage Web can recognize and outline these objects in the original
document during outlining:
• Headline (the title of the document)
• Headings (levels 1 - 6)
• Body text
• Captions
• Tables
• Graphics
• Headers and footers
• URLs and e-mail addresses
• Cross-references
Once outlining is complete, the document outline appears in outline
view where you can do the following:
• Filter which objects appear in outline view
• Change the hierarchy of the objects
Содержание OMNIPAGE WEB
Страница 1: ...OmniPage Web User s Manual...
Страница 6: ...vi...
Страница 16: ...10 Chapter 1...
Страница 26: ...20 Chapter 2...
Страница 48: ...Testing Your HTML Document 42 Chapter 3...