On 10/27/2011 5:56 AM, Michael Sokolov wrote:
From everything you've said, it certainly sounds like a low-level I/O problem in the client, not a server slowdown of any sort. Maybe Perl is using the same connection over and over (keep-alive) and Java is not. I really don't know. One thing I've heard is that StreamingUpdateSolrServer (I think that's what it's called) can give better throughput for large request batches. If you're not using that, you may be having problems w/closing and re-opening connections?

Although I can't claim to know for sure, I'm fairly sure that the simple LWP classes I'm using don't do keepalive unless you specifically configure the user agent to do so. I'll look into it some more.

The StreamingUpdateSolrServer says that they only recommend using it with the /update handler, not for queries. I'm not having a problem with the deletes themselves, they go pretty fast. It's all of the queries before each delete that are relatively slow. Doing those queries really adds up. With multithreading, it does all the shards at once, but it still can only query for a limited number of values at a time due to maxBooleanClauses. Now I'm checking and deleting 1000 values at a time, on all shards simultanously. I use CommonsHttpSolrServer, and each of those objects is created only once, when the program first starts up.

I figure there are three possibilities:

1) A glaring inefficiency in CommonsHttpSolrServer queries as compared to a straight HTTP POST request. 2) The compartmentalization provided by the virtual machine architecture creates an odd synergy that is not present when there are only two Solr instances on physical machines instead of eight of them (seven shards plus a search broker) on virtual machines. 3) The extra physical memory on the servers with virtualization is granting more of a disk-cache-related performance improvement than the lack of virtualization on the others.

Only the first of those possible problems is something that can be determined or fixed without migrating the other servers to my new system. I'm having one other problem with the new build program. I haven't figured out exactly what that problem is, so I am very reluctant to switch everything over. So far it seems to be related to the MySQL JDBC connector or my attempt at threading, not Solr.

I mentioned that the hardware is identical except for memory. That's not quite true - the servers accessed by the java program are better. One of them has a slightly faster CPU than its counterpart with virtualization, and they all have 1TB hard drives as opposed to the mixed 500GB & 750GB drives in the other servers. All of the servers are Dell 2950 with six-drive RAID10 arrays.


Reply via email to