I've got two build systems for my Solr index that I wrote. The first
one is in Perl and uses GET/POST requests via HTTP, the second is in
Java using SolrJ. I've noticed a performance discrepancy when
processing every one of my delete records, currently about 25000 of
them. It takes about 5 seconds in Perl and a minute or more via SolrJ.
In the perl system, I do a full delete like this once an hour. The
performance impact of doing it once an hour in the SolrJ version has
forced me to do it only once per day. The normal delete process in both
cases looks for new records and deletes just those. It happens every
two minutes in the Perl program and every minute in the Java program.
What might be causing the problem, and what information can I collect to
help with the diagnosis? Here's a start:
In both systems, the way this is done is by breaking the list of deleted
IDs into smaller chunks, doing a query for that chunk, and if results
are found, issuing a delete for the same query. This entire process is
completed sequentially on all seven shards. In the Perl system, it's
done using HTTP POST calls, the query using /select and the delete using
/update and deleteByQuery. The query looks like what's below, only a lot
longer. The did field is a tlong:
did:(281472047+OR+281472023+OR+281472022+OR+281472021+OR+281472020+OR+281472019
+OR+281472018+OR+281472017+OR+281472016+OR+276514457+OR+281472031+OR ... )
In the perl system, I limit each query to 1024 values, my
maxBooleanClauses. In the SolrJ system, I have limited it to an even
1000. By adding detailed logging to the Java program, I have determined
that it is the query part that's slow, not the delete part. Each chunk
of 1000 takes a few seconds to return results, most of which will be
zero. In the Perl program a commit is executed for each shard, but
waitSearcher and waitFlush are set to false. In the SolrJ program, the
commit happens later in the code and is not counted.
There are some differences in the Solr implementation that each build
system talks to. The Java program talks to a pair of Solr 3.4.0 servers
running CentOS 6 (ext4). The Perl program talks to a pair of Xen/CentOS
5 (ext3) machines (identical hardware) that each host a set of CentOS 5
virtual machines running Solr 3.2.0. On CentOS 6, each server houses
three of the large shards in separate cores, one of them also hosts a
seventh smaller shard. On the VM environment, the distribution is the
same, except that each shard lives in a virtual machine with its own
Solr instance.
The CentOS 6 machines have 32GB of RAM, the physical hosts for the VM
environment have recently been upgraded to 64GB and each VM's
reservation increased, but I can tell you for sure that even when they
only had 32GB, it still only took a few seconds for the full delete to
occur in the Perl program.
Logfiles below. Something to note: At 3:33:42, the idx_delete table was
trimmed from over 40000 entries to just under 25000 - so the Perl code
handled a lot more entries, but did it MUCH faster. I've changed the
company name to REDACTED in the SolrJ logs, but otherwise left them alone.
LOG: Thu Oct 20 03:03:01 2011: /------====== Start 03:03, wday:4, yday:292
LOG: Thu Oct 20 03:03:01 2011: There are 40170 entries in idx_delete
DBG: Thu Oct 20 03:03:02 2011: MAX(id) is 12998586
LOG: Thu Oct 20 03:03:02 2011: This run - 40170 entries in idx_delete
LOG: Thu Oct 20 03:03:02 2011: Retrieved 40170 entries for deletion
LOG: Thu Oct 20 03:03:03 2011: Shard 0: skipped
DBG: Thu Oct 20 03:03:03 2011: Shard 1: delete ok
LOG: Thu Oct 20 03:03:03 2011: Shard 1: committed
DBG: Thu Oct 20 03:03:04 2011: Shard 2: delete ok
LOG: Thu Oct 20 03:03:04 2011: Shard 2: committed
DBG: Thu Oct 20 03:03:05 2011: Shard 3: delete ok
LOG: Thu Oct 20 03:03:05 2011: Shard 3: committed
LOG: Thu Oct 20 03:03:06 2011: Shard 4: skipped
LOG: Thu Oct 20 03:03:07 2011: Shard 5: skipped
LOG: Thu Oct 20 03:03:07 2011: Shard inc: skipped
LOG: Thu Oct 20 03:03:10 2011: Erased flatDelete file
LOG: Thu Oct 20 03:03:10 2011: Wrote 40170 entries to flatDelete
LOG: Thu Oct 20 03:03:10 2011: \------====== End 03:03
Oct 20, 2011 3:34:00 AM com.REDACTED.idxbuild.Main updateChain
INFO: /---- Running update on chain a
Oct 20, 2011 3:34:00 AM com.REDACTED.idxbuild.solr.IdxChain buildQuery
INFO: chain.a: buildQuery: SELECT * FROM idx_delete WHERE (id > 0 AND id
<= 12998597)
Oct 20, 2011 3:34:02 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:05 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:07 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:09 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:11 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:14 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:16 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:18 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:20 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:23 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:25 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:27 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:29 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:32 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:34 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:36 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:38 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:40 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:43 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:45 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:47 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:50 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:52 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:55 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 1000 docs
Oct 20, 2011 3:34:57 AM com.REDACTED.idxbuild.REDACTEDChain doDelete
INFO: chain.a: deleted up to 920 docs
Thanks,
Shawn