You could run the HTML import from Tika (see the Solr tutorial on the Solr 
website).  The job that ran Tika would need the user/password of the site to be 
indexed, but Solr would not.  (You might have to write a little script to get 
the HTML page using curl or wget or Nutch).

Users could then search the index so created, without having access to the 
actual web site, which I think is what you are asking.

But beware:  Depending on what / how you index, you may end up revealing 
information that you did not intend to reveal in the index.

-----Original Message-----
From: deniz [mailto:denizdurmu...@gmail.com] 
Sent: Wednesday, August 24, 2011 4:38 AM
To: solr-user@lucene.apache.org
Subject: how to deal with URLDatasource which needs authorization?

hi all

i am trying to index a page which basically returns an xml file. But i dont
want it to be accessible for anyone else... the page will basically check
for authorization like username and password...

e.g

the page which return is this :

www.blablabla.com/xyz

i would like to index the data from here, but i dont want anyone else to
access it. 

so what to do for adding authorization information to solr, order to let it
index the data

-----
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3280515.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to