Thanks saurish.

My office *intranet *is a sharepoint website. When I am crawling it using
nutch, i am getting "Unauthorized access(404)" error. NTLM realm is used in
this website.

I checked on one nutch JIRA link that sharepoint could be accessed using
nutch. Nutch has below properties in nutch-default.xml.

http.proxy.host (should it be intranet site path?)
http.proxy.port
http.proxy.username  (should this contain domain too?)
http.proxy.password
http.proxy.realm (should it be my desktop machin domain by which i login to
my machine? using same domain/username i could access intranet from browser)


Also, nutch has "httpclient-auth" xml file for giving credentials for
authentication.

What do  I provide in below properties in nutch-site.xml?


And what should be values in httpclient-auth.xml file?



Regards,
Rashmi


On Mon, Jan 27, 2014 at 3:57 PM, saurish <srinivas.oruga...@gmail.com>wrote:

> Hi,
>
> Looks like there is support for Sharepoint as well as Windows Share in
> ManifoldCF.
>
> Yes, You can craw folders with Nutch (Atleast i have worked on a windows pc
> with a local file folder).
>
> Nutch 1.7 and Solr 4.5.1 have worked for me.
>
> Regards,
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Fwd-Search-Engine-Framework-decision-tp4113584p4113677.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org

Reply via email to