Re: How to keep a maintained index with crawled data

2011-01-20 Thread Erlend GarĂ¥sen
Thanks Jack! I will give it a try, even though I finally have a Nutch configuration that does exactly what I want it to do (except keeping an eye on updated and deleted documents). Erlend On 19.01.11 16.52, Jack Krupansky wrote: Take a look at Apache ManifoldCF (incubating, close to 0.1 re

Re: How to keep a maintained index with crawled data

2011-01-19 Thread Jack Krupansky
Take a look at Apache ManifoldCF (incubating, close to 0.1 release): http://incubator.apache.org/connectors/ In addition to a fairly sophisticated general web crawler which maintains the state of crawled web pages it has a file system crawler and crawlers for a variety of document repositories