
65
ADOBE ACROBAT 8 STANDARD
User Guide
2
Choose Document > OCR Text Recognition > Recognize Text Using OCR.
3
In the Recognize Text dialog box, select an option under Pages.
4
(Optional) Click Edit to open the Recognize Text - Settings dialog box, and select the options you want to use.
Recognize Text - Settings
Optical Character Recognition (OCR) software enables you to search, correct, and copy the text in a scanned PDF.
If you do not apply OCR when you create a PDF by scanning a paper document, you can apply OCR to the PDF later
if you have set the scanner resolution at 72 ppi and higher.
OCR runs with header/footer/Bates number on image PDF files.
Primary OCR Language
Specifies the language for the OCR engine to use to identify the characters.
PDF Output Style
Determines the type of PDF to be produced. All options require an input resolution of 72 ppi or
higher (recommended). All formats apply OCR and font and page recognition to the text images and convert them
to normal text.
•
Searchable Image
Ensures that text is searchable and selectable. This option keeps the original image, deskews it
as needed, and places an invisible text layer over it. The selection for Downsample Images in this same dialog box
determines whether or not the image will be downsampled and to what extent.
•
Searchable Image (Exact)
Ensures that text is searchable and selectable. This option keeps the original image and
places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image.
•
Formatted Text & Graphics
Reconstructs the original page using recognized text, fonts, and graphic elements. The
accuracy of the results depends on the scanning resolution and other factors. You may need to review and correct
the OCR text in the new PDF page after scanning.
Note:
The Formatted Text & Graphics option is available for only some languages.
Black-and-white scanning at 300 ppi produces the best text for conversion. At 150 ppi, OCR accuracy is slightly lower,
and more font-recognition errors occur. For text printed on colored paper, try increasing the brightness and contrast
by about 10%. If your scanner has color-filtering capability, consider using a filter or lamp that drops out the background
color.
Downsample Images
Decreases the number of pixels in color, grayscale, and monochrome images after OCR is
complete. Choose the degree of downsampling that you want to apply. Higher-numbered options do less downsam
pling, producing higher-resolution PDFs.
Correct OCR text in PDFs
When you scan to Formatted Text & Graphics output, Acrobat analyzes bitmaps of text and substitutes words and
characters for those bitmap areas. If the ideal substitution is uncertain, Acrobat marks the word as suspect. Suspects
appear in the PDF as the original bitmap of the word, but the text is included on an invisible layer behind the bitmap
of the word. This makes the word searchable even though it is displayed as a bitmap. You can accept these suspects
as they are, or you can use the TouchUp Text tool
to correct them.
Note:
If you try to select text in a scanned PDF that does not have OCR applied, or try to perform a Read Out Loud
operation on an image file, Acrobat asks if you want to run OCR. If you click OK, the Recognize Text dialog box opens
and you can select options, which are described in detail under the previous topic.
1
Do one of the following:
•
Choose Document > OCR Text Recognition > Find All OCR Suspects. All suspect words on the page are enclosed
in boxes. Click any suspect word to show the suspect text in the Find Element dialog box.