Hi Bharath,
I'm no expert, but we had some major problems because of deleteByQuery ( in
short DBQ).
We ended up replacing all of our DBQ to delete by ids.

My suggestion is that if you don't realy need it - don't use it.
Especially in your case, since you already know the population of ids, it
is redundant to query for it.

I don't know how CDCR works, but we have a replication factor of 2 on our
SolrCloud cluster.
Since Solr 5.x , DBQ were stuck for a long while on the replicas, blocking
all updates.
It appears that on the replica side, there's an overhead of reordering and
executing the same DBQ over and over again, for consistency reasons.
It ends up buffering many delete by queries and blocks all updates.
In addition there's another defect on related slowness on DBQ - LUCENE-7049





On Tue, Aug 9, 2016 at 7:14 AM, Bharath Kumar <bharath.mvku...@gmail.com>
wrote:

> Hi All,
>
> We are using SOLR 6.1 and i wanted to know which is better to use -
> deleteById or deleteByQuery?
>
> We have a program which deletes 100000 documents every 5 minutes from the
> SOLR and we do it in a batch of 200 to delete those documents. For that we
> now use deleteById(List<String> ids, 10000) to delete.
> I wanted to know if we change it to deleteByQuery(query, 10000) where the
> query is like this - (id:1 OR id:2 OR id:3 OR id:4). Will this have a
> performance impact?
>
> We use SOLR cloud with 3 SOLR nodes in the cluster and also we have a
> similar setup on the target site and we use Cross Data Center Replication
> to replicate from main site.
>
> Can you please let me know if using deleteByQuery will have any impact? I
> see it opens real time searcher on all the nodes in cluster.
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>

Reply via email to