![MACROMEDIA COLFUSION MX 7 - INSTALLING AND USING COLDFUSION... Manual Download Page 110](http://html1.mh-extra.com/html/macromedia/colfusion-mx-7-installing-and-using-coldfusion/colfusion-mx-7-installing-and-using-coldfusion_manual_3307073110.webp)
110
Chapter 9: Indexing Collections with Verity Spider
Web standard support
Verity Spider supports key web standards used by Internet and intranet sites. Standard HREF
links and frames pointers are recognized, so that navigation through them is supported.
Redirected pages are followed so that the real underlying document is indexed. Verity Spider
adheres to the robots exclusion standard specified in robots.txt files, so that administrators can
maintain friendly visits to remote websites. HTTP Basic Authentication mechanism is supported
so that password-protected sites can be indexed.
Restart capability
When an indexing job fails, or for some reason Verity Spider cannot index a significant number or
type of URLs, you can now restart the indexing job to update the collection. Only those URLs
that were not successfully indexed previously are processed.
State maintenance through a persistent store
Verity Spider stores the state of gathered and indexed URLs in a persistent store, which lets it
track progress for the purposes of gracefully and efficiently restarting halted indexing jobs.
Performance
Spidering performance is greatly improved over previous versions, because of low memory
requirements, flow control, and the help of multithreading and efficient Domain Name System
(DNS) lookups.
Flow control
When indexing websites, Verity Spider distributes requests to web servers in a round-robin
manner. This means that one URL is fetched from each web server in turn. With flow control, a
faster website can finish before a slower one. The Verity Spider optimizes indexing on every web
server.
Verity Spider adjusts the number of connections per server depending on the download
bandwidth. When the download bandwidth from a web server falls below a certain value, Verity
Spider automatically scales back the number of connections to that web server. There will always
be at least one connection to a web server. When the download bandwidth increases to an
acceptable level, Verity Spider reallocates connections (per the value of the
-connections
option,
which is 4 by default). You can turn off flow control with the
-noflowctrl
option.
Multithreading
Verity Spider separates the gathering and indexing jobs into multiple threads for concurrence.
Additionally, Verity Spider can create concurrent connections to web servers for fetching
documents, and have concurrent indexing threads for maximum utilization. This translates to an
overall improvement in throughput.
Summary of Contents for COLFUSION MX 7 - INSTALLING AND USING COLDFUSION...
Page 1: ...COLDFUSION MX7 Configuring and Administering ColdFusion MX ...
Page 6: ...6 Contents ...
Page 10: ......
Page 78: ...78 Chapter 4 Web Server Management ...
Page 84: ...84 Chapter 5 Deploying ColdFusion Applications ...
Page 102: ...102 Chapter 7 Using Multiple Server Instances ...
Page 104: ......
Page 108: ...108 Chapter 8 Introducing Verity and Verity Tools ...