Ian:
Thanks much for the writeup! It's always good to have real-world documentation!
Best,
Erick
On Fri, Nov 7, 2014 at 8:26 AM, Shawn Heisey wrote:
> On 11/7/2014 7:17 AM, Ian Rose wrote:
>> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
>> the docs to where they n
On 11/7/2014 7:17 AM, Ian Rose wrote:
> *tl;dr: *Routing updates to a random Solr node (and then letting it forward
> the docs to where they need to go) is very expensive, more than I
> expected. Using a "smart" router that uses the cluster config to route
> documents directly to their shard resul
Hi again, all -
Since several people were kind enough to jump in to offer advice on this
thread, I wanted to follow up in case anyone finds this useful in the
future.
*tl;dr: *Routing updates to a random Solr node (and then letting it forward
the docs to where they need to go) is very expensive,
bq: but it should be more or less a constant factor no matter how many
Solr nodes you are using, right?
Not really. You've stated that you're not driving Solr very hard in
your tests. Therefore you're waiting on I/O. Therefore your tests just
aren't going to scale linearly with the number of shard
On 11/1/2014 9:52 AM, Ian Rose wrote:
> Just to make sure I am thinking about this right: batching will certainly
> make a big difference in performance, but it should be more or less a
> constant factor no matter how many Solr nodes you are using, right? Right
> now in my load tests, I'm not actu
Erick,
Just to make sure I am thinking about this right: batching will certainly
make a big difference in performance, but it should be more or less a
constant factor no matter how many Solr nodes you are using, right? Right
now in my load tests, I'm not actually that concerned about the absolute
Yes, I was inadvertently sending them to a replica. When I sent them to the
leader, the leader reported (1000 adds) and the replica reported only 1 add
per document. So, it looks like the leader forwards the batched jobs
individually to the replicas.
On Fri, Oct 31, 2014 at 3:26 PM, Erick Erickson
Internally, the docs are batched up into smaller buckets (10 as I
remember) and forwarded to the correct shard leader. I suspect that's
what you're seeing.
Erick
On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan wrote:
> Regarding batch indexing:
> When I send batches of 1000 docs to a standalone S
Regarding batch indexing:
When I send batches of 1000 docs to a standalone Solr server, the log file
reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the
leader of a replicated index, the leader log file reports much smaller
numbers, usually "(12 adds)". Why do the batches appea
NP, just making sure.
I suspect you'll get lots more bang for the buck, and
results much more closely matching your expectations if
1> you batch up a bunch of docs at once rather than
sending them one at a time. That's probably the easiest
thing to try. Sending docs one at a time is something of
Hi Erick -
Thanks for the detailed response and apologies for my confusing
terminology. I should have said "WPS" (writes per second) instead of QPS
but I didn't want to introduce a weird new acronym since QPS is well
known. Clearly a bad decision on my part. To clarify: I am doing
*only* writes
I'm really confused:
bq: I am not issuing any queries, only writes (document inserts)
bq: It's clear that once the load test client has ~40 simulated users
bq: A cluster of 3 shards over 3 Solr nodes *should* support
a higher QPS than 2 shards over 2 Solr nodes, right
QPS is usually used to mea
Thanks for the suggestions so for, all.
1) We are not using SolrJ on the client (not using Java at all) but I am
working on writing a "smart" router so that we can always send to the
correct node. I am certainly curious to see how that changes things.
Nonetheless even with the overhead of extra r
Your indexing client, if written in SolrJ, should use CloudSolrServer
which is, in Matt's terms "leader aware". It divides up the
documents to be indexed into packets that where each doc in
the packet belongs on the same shard, and then sends the packet
to the shard leader. This avoids a lot of re-
On 10/30/2014 2:56 PM, Ian Rose wrote:
> I think this is true only for actual queries, right? I am not issuing
> any queries, only writes (document inserts). In the case of writes,
> increasing the number of shards should increase my throughput (in
> ops/sec) more or less linearly, right?
No, that
If you are issuing writes to shard non-leaders, then there is a large overhead
for the eventual redirect to the leader. I noticed a 3-5 times performance
increase by making my write client leader aware.
On Oct 30, 2014, at 2:56 PM, Ian Rose wrote:
>>
>> If you want to increase QPS, you shoul
>
> If you want to increase QPS, you should not be increasing numShards.
> You need to increase replicationFactor. When your numShards matches the
> number of servers, every single server will be doing part of the work
> for every query.
I think this is true only for actual queries, right? I a
On 10/30/2014 2:23 PM, Ian Rose wrote:
> My methodology is as follows.
> 1. Start up a K solr servers.
> 2. Remove all existing collections.
> 3. Create N collections, with numShards=K for each.
> 4. Start load testing. Every minute, print the number of successful
> updates and the number of faile
Howdy all -
The short version is: We are not seeing Solr Cloud performance scale (event
close to) linearly as we add nodes. Can anyone suggest good diagnostics for
finding scaling bottlenecks? Are there known 'gotchas' that make Solr Cloud
fail to scale?
In detail:
We have used Solr (in non-Clou
19 matches
Mail list logo