Please visit the Nutch project. It is a powerful crawler and can integrate 
with Solr.

http://nutch.apache.org/

> Hi Solr users,
> 
> I hope you can help.  We are migrating our intranet web site management
> system to Windows 2008 and need a replacement for Index Server to do the
> text searching.  I am trying to establish if Lucene and Solr is a feasible
> replacement, but I cannot find the answers to these questions:
> 
> 1. Can Solr be set up to recursively index a folder containing an
> indeterminate and variable large number of subfolders, containing files of
> all types:  XML, HTML, PDF, DOC, spreadsheets, powerpoint presentations,
> text files etc.  If so, how?
> 2. Can Solr be queried over the web and return a list of files that match a
> search query entered by a user, and also return the abstracts for these
> files, as well as 'hit highlighting'.  If so, how?
> 3. Can Solr be run as a service (like Index Server) that automatically
> detects changes to the files within the indexed folder and updates the
> index? If so, how?
> 
> Thanks for your help
> 
> Cathy Hemsley

Reply via email to