Nutch can crawl the file system as well. Nutch 1.x can also provide search but this is delegated to Solr in Nutch 2.x. Solr can provide the search and Nutch can provide Solr with content from your intranet.
On Friday 14 January 2011 13:17:52 Cathy Hemsley wrote: > Hi, > Thanks for suggesting this. > However, I'm not sure a 'crawler' will work: as the various pages are not > necessarily linked (it's complicated: basically our intranet is a dynamic > and managed collection of independantly published web sites, and users > found information using categorisation and/or text searching), so we need > something that will index all the files in a given folder, rather than > follow links like a crawler. Can Nutch do this? As well as the other > requirements below? > Regards > Cathy > > On 14 January 2011 12:09, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Please visit the Nutch project. It is a powerful crawler and can > > integrate with Solr. > > > > http://nutch.apache.org/ > > > > > Hi Solr users, > > > > > > I hope you can help. We are migrating our intranet web site management > > > system to Windows 2008 and need a replacement for Index Server to do > > > the text searching. I am trying to establish if Lucene and Solr is a > > > > feasible > > > > > replacement, but I cannot find the answers to these questions: > > > > > > 1. Can Solr be set up to recursively index a folder containing an > > > indeterminate and variable large number of subfolders, containing files > > > > of > > > > > all types: XML, HTML, PDF, DOC, spreadsheets, powerpoint > > > presentations, text files etc. If so, how? > > > 2. Can Solr be queried over the web and return a list of files that > > > match > > > > a > > > > > search query entered by a user, and also return the abstracts for these > > > files, as well as 'hit highlighting'. If so, how? > > > 3. Can Solr be run as a service (like Index Server) that automatically > > > detects changes to the files within the indexed folder and updates the > > > index? If so, how? > > > > > > Thanks for your help > > > > > > Cathy Hemsley -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350