You could run the HTML import from Tika (see the Solr tutorial on the Solr website). The job that ran Tika would need the user/password of the site to be indexed, but Solr would not. (You might have to write a little script to get the HTML page using curl or wget or Nutch).
Users could then search the index so created, without having access to the actual web site, which I think is what you are asking. But beware: Depending on what / how you index, you may end up revealing information that you did not intend to reveal in the index. -----Original Message----- From: deniz [mailto:denizdurmu...@gmail.com] Sent: Wednesday, August 24, 2011 4:38 AM To: solr-user@lucene.apache.org Subject: how to deal with URLDatasource which needs authorization? hi all i am trying to index a page which basically returns an xml file. But i dont want it to be accessible for anyone else... the page will basically check for authorization like username and password... e.g the page which return is this : www.blablabla.com/xyz i would like to index the data from here, but i dont want anyone else to access it. so what to do for adding authorization information to solr, order to let it index the data ----- Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3280515.html Sent from the Solr - User mailing list archive at Nabble.com.