Yeah, basically ConcurrentUpdateSolrClient is a shortcut to getting multi
threaded bulk API updates out of the single threaded, single update API.
The downsides to this are: It is not cloud aware - you have to point it at
a server, you have to add special code to see if there are any errors, you
do
bq. But don't forget a final client.add(list) after the while-loop ;-)
Ha! But only "if (list.size() > 0)"
And then there was the memorable time I forgot the "list.clear()" when
I sent the batch and wondered why my indexing progress got slower and
slower...
Not to mention the time I re-used the
On 5/15/2018 12:12 AM, Bernd Fehling wrote:
OK, I have the CloudSolrClient with SolrJ now running but it seams
a bit slower compared to ConcurrentUpdateSolrClient.
This was not expected.
The logs show that CloudSolrClient send the docs only to the leaders.
So the only advantage of CloudSolrClien
Am 15.05.2018 um 14:33 schrieb Erick Erickson:
You might find this useful:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
I have seen that already and can confirm it.
From my observations about a 3x3 cluster with 3 server and my hardware:
- have at least 6 CPUs on each server
You might find this useful:
https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
One tricky bit: Assuming docs have a random distribution amongst
shards, you should batch so at least 100 docs go to each _shard_. You
can see from the link that the speedup is mostly going from 1 to 100.
S
Hi Erik,
yes indeed, batching solved it.
I used ConcurrentUpdateSolrClient with queue size of 1 but
CloudSolrClient doesn't have this feature.
I build my own queue now.
Ah!!! So I obviously use default NRT but actually don't need it because
I don't have any NRT data to index. A latency of se
What did you do to solve your performance problem?
Batching updates is one thing that helps performance.
bq. I thought that only the leaders are under load
until any commit and then replicate to the other replicas.
True if (and only if) you're using PULL or TLOG replicas.
When using the default
Thanks, solved, performance is good now.
Regards,
Bernd
Am 15.05.2018 um 08:12 schrieb Bernd Fehling:
OK, I have the CloudSolrClient with SolrJ now running but it seams
a bit slower compared to ConcurrentUpdateSolrClient.
This was not expected.
The logs show that CloudSolrClient send the docs o
OK, I have the CloudSolrClient with SolrJ now running but it seams
a bit slower compared to ConcurrentUpdateSolrClient.
This was not expected.
The logs show that CloudSolrClient send the docs only to the leaders.
So the only advantage of CloudSolrClient is that it is "Cloud aware"?
With Concurre
It's been a while since I've been in this deeply, but it should be
something like:
sendUpdateOnlyToShardLeaders will select the leaders for each shard as the
load balanced targets for update. The updates may not go to the *right*
leader, but only the leaders will be chosen, followers (non leader
r
You may not need to deal with any of this.
The default CloudSolrClient call creates a new LBHttpSolrClient for
you. So unless you're doing something custom with any LBHttpSolrClient
you create, you don't need to create one yourself.
Second, the default for CloudSolrClient.add() is to take the lis
Hi list,
while going from single core master/slave to cloud multi core/node
with leader/replica I want to change my SolrJ loading, because
ConcurrentUpdateSolrClient isn't cloud aware and has performance
impacts.
I want to use CloudSolrClient with LBHttpSolrClient and updates
should only go to sha
12 matches
Mail list logo