Thanks saurish.
My office *intranet *is a sharepoint website. When I am crawling it using nutch, i am getting "Unauthorized access(404)" error. NTLM realm is used in this website. I checked on one nutch JIRA link that sharepoint could be accessed using nutch. Nutch has below properties in nutch-default.xml. http.proxy.host (should it be intranet site path?) http.proxy.port http.proxy.username (should this contain domain too?) http.proxy.password http.proxy.realm (should it be my desktop machin domain by which i login to my machine? using same domain/username i could access intranet from browser) Also, nutch has "httpclient-auth" xml file for giving credentials for authentication. What do I provide in below properties in nutch-site.xml? And what should be values in httpclient-auth.xml file? Regards, Rashmi On Mon, Jan 27, 2014 at 3:57 PM, saurish <srinivas.oruga...@gmail.com>wrote: > Hi, > > Looks like there is support for Sharepoint as well as Windows Share in > ManifoldCF. > > Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc > with a local file folder). > > Nutch 1.7 and Solr 4.5.1 have worked for me. > > Regards, > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Rashmi Be the change that you want to see in this world! www.minnal.zor.org disha.resolve.at www.artofliving.org