Take a look at Apache ManifoldCF for crawling enterprise repositories such
as SharePoint (as well as lighterweight web crawling and file system
crawling).
http://manifoldcf.apache.org/en_US/index.html
-- Jack Krupansky
-----Original Message-----
From: Venky Naganathan
Sent: Thursday, October 18, 2012 2:21 PM
To: solr-user@lucene.apache.org
Subject: Building an enterprise quality search engine using Apache Solr
Hello,
Can some one please provide me advise on the below ?
1) I am considering building an enterprise search engine that indexes
different types of documents:
- Text, Microsoft formats (including Outlook email), PDF, Sharepoint,
Wikipedia etc
As i understand, using Apache Solr, Apache Nutch (for crawling), Apache
Tika (for document formats), I should be able to implement a crawler,
indexer/searcher with support for numerous formats. Is this correct ? Do i
need any other special packages for sharepoint and wikipedia ?
2) How much development effort is required in terms of person months to
accomplish the above ?
3) Does anyone have experience building an enterprise search engine using
Solr ? How is the quality of the search results compared to other popular
engines ?
Thank you very much for your advise. I can be reached at venky2...@gmail.com
-Venky