Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
On Mon, Jun 6, 2011 at 1:47 PM, Naveen Gupta wrote: > Hi Tomas, > > 1. Regarding SolrInputDocument, > > We are not using java client, rather we are using php solr, wrapping > content > in SolrInputDocument, i am not sure how to do in PHP client? In this case, > we need tika related jars to avail

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Naveen Gupta
Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related jars to avail the metadata such as content .. we certainly don't want to handle al

Re: TIKA INTEGRATION PERFORMANCE

2011-06-06 Thread Tomás Fernández Löbbe
1. About the commit strategy, all the ExtractingRequestHandler (request handler that uses Tika to extract content from the input file) will do is extract the content of your file and add it to a SolrInputDocument. The commit strategy should not change because of this, compared to other documents yo

TIKA INTEGRATION PERFORMANCE

2011-06-05 Thread Naveen Gupta
Hi Since it is php, we are using solphp for calling curl based call, what my concern here is that for each user, we might be having 20-40 attachments needed to be indexed each day, and there are various users ..daily we are targeting around 500-1000 users .. right now if you see, we http://loca