I've got two build systems for my Solr index that I wrote. The first one is in Perl and uses GET/POST requests via HTTP, the second is in Java using SolrJ. I've noticed a performance discrepancy when processing every one of my delete records, currently about 25000 of them. It takes about 5 seconds in Perl and a minute or more via SolrJ. In the perl system, I do a full delete like this once an hour. The performance impact of doing it once an hour in the SolrJ version has forced me to do it only once per day. The normal delete process in both cases looks for new records and deletes just those. It happens every two minutes in the Perl program and every minute in the Java program.

What might be causing the problem, and what information can I collect to help with the diagnosis? Here's a start:

In both systems, the way this is done is by breaking the list of deleted IDs into smaller chunks, doing a query for that chunk, and if results are found, issuing a delete for the same query. This entire process is completed sequentially on all seven shards. In the Perl system, it's done using HTTP POST calls, the query using /select and the delete using /update and deleteByQuery. The query looks like what's below, only a lot longer. The did field is a tlong:

did:(281472047+OR+281472023+OR+281472022+OR+281472021+OR+281472020+OR+281472019
+OR+281472018+OR+281472017+OR+281472016+OR+276514457+OR+281472031+OR ... )

In the perl system, I limit each query to 1024 values, my maxBooleanClauses. In the SolrJ system, I have limited it to an even 1000. By adding detailed logging to the Java program, I have determined that it is the query part that's slow, not the delete part. Each chunk of 1000 takes a few seconds to return results, most of which will be zero. In the Perl program a commit is executed for each shard, but waitSearcher and waitFlush are set to false. In the SolrJ program, the commit happens later in the code and is not counted.

There are some differences in the Solr implementation that each build system talks to. The Java program talks to a pair of Solr 3.4.0 servers running CentOS 6 (ext4). The Perl program talks to a pair of Xen/CentOS 5 (ext3) machines (identical hardware) that each host a set of CentOS 5 virtual machines running Solr 3.2.0. On CentOS 6, each server houses three of the large shards in separate cores, one of them also hosts a seventh smaller shard. On the VM environment, the distribution is the same, except that each shard lives in a virtual machine with its own Solr instance.

The CentOS 6 machines have 32GB of RAM, the physical hosts for the VM environment have recently been upgraded to 64GB and each VM's reservation increased, but I can tell you for sure that even when they only had 32GB, it still only took a few seconds for the full delete to occur in the Perl program.

Logfiles below. Something to note: At 3:33:42, the idx_delete table was trimmed from over 40000 entries to just under 25000 - so the Perl code handled a lot more entries, but did it MUCH faster. I've changed the company name to REDACTED in the SolrJ logs, but otherwise left them alone.

LOG: Thu Oct 20 03:03:01 2011: /------====== Start 03:03, wday:4, yday:292
LOG: Thu Oct 20 03:03:01 2011: There are 40170 entries in idx_delete
DBG: Thu Oct 20 03:03:02 2011: MAX(id) is 12998586
LOG: Thu Oct 20 03:03:02 2011: This run - 40170 entries in idx_delete
LOG: Thu Oct 20 03:03:02 2011: Retrieved 40170 entries for deletion
LOG: Thu Oct 20 03:03:03 2011: Shard 0: skipped
DBG: Thu Oct 20 03:03:03 2011: Shard 1: delete ok
LOG: Thu Oct 20 03:03:03 2011: Shard 1: committed
DBG: Thu Oct 20 03:03:04 2011: Shard 2: delete ok
LOG: Thu Oct 20 03:03:04 2011: Shard 2: committed
DBG: Thu Oct 20 03:03:05 2011: Shard 3: delete ok
LOG: Thu Oct 20 03:03:05 2011: Shard 3: committed
LOG: Thu Oct 20 03:03:06 2011: Shard 4: skipped
LOG: Thu Oct 20 03:03:07 2011: Shard 5: skipped
LOG: Thu Oct 20 03:03:07 2011: Shard inc: skipped
LOG: Thu Oct 20 03:03:10 2011: Erased flatDelete file
LOG: Thu Oct 20 03:03:10 2011: Wrote 40170 entries to flatDelete
LOG: Thu Oct 20 03:03:10 2011: \------======  End  03:03

Oct 20, 2011 3:34:00 AM com.REDACTED.idxbuild.Main updateChain
INFO: /---- Running update on chain a
Oct 20, 2011 3:34:00 AM com.REDACTED.idxbuild.solr.IdxChain buildQuery
INFO: chain.a: buildQuery: SELECT * FROM idx_delete WHERE (id > 0 AND id <= 12998597)
Oct 20, 2011 3:34:02 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:05 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:07 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:09 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:11 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:14 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:16 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:18 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:20 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:23 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:25 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:27 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:29 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:32 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:34 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:36 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:38 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:40 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:43 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:45 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:47 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:50 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:52 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:55 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:57 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 920 docs

Thanks,
Shawn

Reply via email to