Take a look at Apache ManifoldCF for crawling enterprise repositories such as SharePoint (as well as lighterweight web crawling and file system crawling).

http://manifoldcf.apache.org/en_US/index.html

-- Jack Krupansky

-----Original Message----- From: Venky Naganathan
Sent: Thursday, October 18, 2012 2:21 PM
To: solr-user@lucene.apache.org
Subject: Building an enterprise quality search engine using Apache Solr

Hello,

Can some one please provide me advise on the below ?

1) I am considering building an enterprise search engine that indexes
different types of documents:
  - Text, Microsoft formats (including Outlook email), PDF, Sharepoint,
Wikipedia etc
  As i understand, using Apache Solr, Apache Nutch (for crawling), Apache
Tika (for document formats), I should be able to implement a crawler,
indexer/searcher with support for numerous formats. Is this correct ? Do i
need any other special packages for sharepoint and wikipedia ?

2) How much development effort is required in terms of person months to
accomplish the above ?

3) Does anyone have experience building an enterprise search engine using
Solr ? How is the quality of the search results compared to other popular
engines ?

Thank you very much for your advise. I can be reached at venky2...@gmail.com

-Venky

Reply via email to