
156
Chapter 8 Verity Spider
Note
You should not run more than one Verity Spider process in persistent mode. As the
Verity Spider is a resource intensive process, you should only run it in persistent
mode with an interval of less than one day. For time intervals greater than twelve
hours, you should use some form of scheduling. Some examples are cron jobs for
UNIX, and the AT command for Windows NT Server.
-preferred
Syntax
:
-preferred exp_1 [exp_n] ...
Type
: Web crawling only
Specifies a list of hosts or domains which are to be preferred when retrieving
documents for viewing. You can use wildcard expressions, where the asterisk ( * ) is
for text strings and the question mark ( ? ) is for single characters. To use regular
expressions, also specify the -regexp option. Use this option when you leave
duplicate detection enabled and do not specify
-nodupdetect
.
When indexing, you may encounter a non-preferred host first. In that case,
documents are parsed and followed and stored as candidates. When duplicates are
encountered on another server, which is preferred, the duplicate documents from
the non-preferred server are skipped. When documents are requested for viewing,
they will be retrieved from the preferred server.
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (
-cmdfile
).
See Also
-regexp
-prefixmap
Syntax
:
-prefixmap path_and_filename
Type
: File system only
Specifies a control file (simple ASCII text) that maps file system paths to Web aliases.
In conjunction with
-abspath
, this option is typically used to create an URL field that
is the Web equivalent of a file system path. File system indexing is faster than Web
crawling over the network. If you use
-prefixmap
to replace the file system path with
the Web URL, relative hyperlinks in the HTML pages are kept intact when viewed
through Information Server.
The format for the control file is:
src_field src_prefix dest_field dest_prefix
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
Summary of Contents for COLDFUSION 5-ADVANCED ADMINISTRATION
Page 1: ...Macromedia Incorporated Advanced ColdFusion Administration ColdFusion 5...
Page 20: ......
Page 56: ...38 Chapter 1 Advanced Data Source Management...
Page 74: ...56 Chapter 2 Administrator Tools...
Page 76: ......
Page 86: ...68 Chapter 3 ColdFusion Security...
Page 87: ...To Learn More About Security 69...
Page 88: ...70 Chapter 3 ColdFusion Security...
Page 130: ...112 Chapter 5 Configuring Advanced Security...
Page 132: ......
Page 154: ...136 Chapter 6 Configuring Verity K2 Server...
Page 162: ...144 Chapter 7 Indexing XML Documents...
Page 202: ...184 Chapter 8 Verity Spider...
Page 236: ...218 Chapter 10 Verity Troubleshooting Utilities...
Page 238: ......
Page 348: ...330 Chapter 14 ClusterCATS Utilities...
Page 349: ...Using sniff 331...
Page 350: ...332 Chapter 14 ClusterCATS Utilities...
Page 362: ...344 Chapter 15 Optimizing ClusterCATS...
Page 372: ...354 Index...