Re: Query/Delete performance difference between straight HTTP and SolrJ

Shawn Heisey Thu, 27 Oct 2011 16:52:01 -0700

On 10/27/2011 5:56 AM, Michael Sokolov wrote:

From everything you've said, it certainly sounds like a low-level I/Oproblem in the client, not a server slowdown of any sort. Maybe Perlis using the same connection over and over (keep-alive) and Java isnot. I really don't know. One thing I've heard is thatStreamingUpdateSolrServer (I think that's what it's called) can givebetter throughput for large request batches. If you're not usingthat, you may be having problems w/closing and re-opening connections?

Although I can't claim to know for sure, I'm fairly sure that the simpleLWP classes I'm using don't do keepalive unless you specificallyconfigure the user agent to do so. I'll look into it some more.

The StreamingUpdateSolrServer says that they only recommend using itwith the /update handler, not for queries. I'm not having a problemwith the deletes themselves, they go pretty fast. It's all of thequeries before each delete that are relatively slow. Doing thosequeries really adds up. With multithreading, it does all the shards atonce, but it still can only query for a limited number of values at atime due to maxBooleanClauses. Now I'm checking and deleting 1000values at a time, on all shards simultanously. I useCommonsHttpSolrServer, and each of those objects is created only once,when the program first starts up.


I figure there are three possibilities:

1) A glaring inefficiency in CommonsHttpSolrServer queries as comparedto a straight HTTP POST request.2) The compartmentalization provided by the virtual machine architecturecreates an odd synergy that is not present when there are only two Solrinstances on physical machines instead of eight of them (seven shardsplus a search broker) on virtual machines.3) The extra physical memory on the servers with virtualization isgranting more of a disk-cache-related performance improvement than thelack of virtualization on the others.

Only the first of those possible problems is something that can bedetermined or fixed without migrating the other servers to my newsystem. I'm having one other problem with the new build program. Ihaven't figured out exactly what that problem is, so I am very reluctantto switch everything over. So far it seems to be related to the MySQLJDBC connector or my attempt at threading, not Solr.

I mentioned that the hardware is identical except for memory. That'snot quite true - the servers accessed by the java program are better.One of them has a slightly faster CPU than its counterpart withvirtualization, and they all have 1TB hard drives as opposed to themixed 500GB & 750GB drives in the other servers. All of the servers areDell 2950 with six-drive RAID10 arrays.

Re: Query/Delete performance difference between straight HTTP and SolrJ

Reply via email to