Hi, I did not try this, but could you not read the URL client side and pass it to SolrJ as a ContentStream?
ContentStream urlStream = ContentStreamBase.URLStream("http://my.site/file.html"); req.addContentStream(urlStream); -- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 4. feb. 2010, at 10.47, dhamu wrote: > > Hi, > I am newbie to solr and exploring solr last few days. > I am using solr cell with tika for parsing, indexing and searching > Posting the rich text documents via Solrj. > My actual requirement is instead of using local documents(pdf, doc & docx), > i want to use webpages(urls for eg..,(http://www.apache.org)). > > eg.., > req.addFile(new File("docs/mailing_lists.html")); > instead > req.url(new urlconnection("http://www.apache.org") > anything like the above is there in solrj. > > Actually i am using curl for testing. it works fine > > curl > "http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true" > -F "stream.url=http://wiki.apache.org/solr/SolrConfigXml" > > but i am in need to use otherthan curl. > Below code works fine for local document indexing and searching. But instead > i want to post urls. > > here is my code., > > String url = "http://localhost:8983/solr"; > SolrServer server = new CommonsHttpSolrServer(url); > ContentStreamUpdateRequest req = new ContentStreamUpdateRequest( > "/update/extract"); > req.addFile(new File("docs/mailing_lists.html")); > req.setParam("literal.id", "index1"); > req.setParam("uprefix", "attr_"); > req.setParam("fmap.content", "attr_content"); > req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); > NamedList result = server.request(req); > assertNotNull("Couldn't upload index.pdf", result); > QueryResponse rsp = server.query(new SolrQuery("*:*")); > Assert.assertEquals(1, rsp.getResults().getNumFound()); > > any suggestion or answer will be appreciated. > > > -- > View this message in context: > http://old.nabble.com/How-to-send-web-pages%28urls%29-to-solr-cell-via-solrj--tp27450083p27450083.html > Sent from the Solr - User mailing list archive at Nabble.com. >