RE: Some performance questions....

Davis, Daniel (NIH/NLM) [C] Fri, 16 Mar 2018 13:41:57 -0700

Deepak,

A better test of multi-user support might be to vary the queries and try to 
simulate a realistic 'working set' of search data.

I've made this same performance analysis mistake with the search index of 
www.indexengines.com, which I developed (in part).   Somewhat different from 
Lucene, inside, although.

What we cared a lot about were these things:

- if a query was done warm, e.g. with results cached in memory, response time 
should be very fast.
- If a query was done cold, e.g. with results from disk, response time should 
still be acceptable.
- If a lot of different queries were done, that we think simulate the real 
behavior of N users, that the memory usage of cache should be acceptable, e.g. 
the cache should get warm and there should be few cache misses.

This last test was key - if we have designed our caching properly, then the 
queries of X users will fit in Y memory, and we will be able to develop a 
simple understanding of that, with our target users.

Generating that realistic amount of query behavior for X users is hard.   Using 
real search logs from your previous search product is a good idea.   For 
instance, if you look at the top 1000 queries performed by your users over a 
particular period of time, you can then say that some percentage of user 
queries were covered by the top 1000 queries, e.g. 90%.   Then, maybe you 
measure of that same period your queries per second (QPS).

Now, you can say that if you randomly sample those top 1000 queries while 
generating the same QPS with an exponential distribution generator, that you 
have covered 90% of your real traffic.   Your queries are much more randomly 
distributed, but that's OK, because what you want to know is whether it all 
fits in cache memory, the effect of # of CPUs, amount of Memory, number of 
cluster nodes, sharding, and replication on the response time and such.

Depending on your user community, top 1000 queries may not be enough to hit 
90%, it may only hit 70%.   Maybe you also need to look at the rate of 
"advanced search" and "search", or account for queries that drive business 
intelligence reports.   It really depends on your use case.   I wish I'd had 
the cloud available to test performance with - we were really naïve and did all 
this testing with our metal because, well, we thought our stuff relied on that.

I recommend you read the first couple chapters of Ran Jain's Art of Computer 
Systems Performance Analysis.   It’s a great book even if you totally skip the 
later chapters on Queuing System analysis, and just think about what and how to 
test.

Hope this helps,

-Dan 

-----Original Message-----
From: Deepak Goel [mailto:deic...@gmail.com] 
Sent: Friday, March 16, 2018 4:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Some performance questions....

On Sat, Mar 17, 2018 at 1:06 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/16/2018 7:38 AM, Deepak Goel wrote:
> > I did a performance study of Solr a while back. And I found that it 
> > does not scale beyond a particular point on a single machine (could 
> > be due to the way its coded). Hence multiple instances might make sense.
> >
> > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O
> tLY_lwnc6wbXus/edit?usp=sharing
>
> How did you *use* that code that you've shown?  That is not apparent 
> (at least to me) from the document.
>
> If every usage of the SolrJ code went through ALL of the code you've 
> shown, then it's not done well.  It appears that you're creating and 
> closing a client object with every query.  This will be VERY inefficient.
>
> The client object should be created during an initialization step, and 
> then passed to the benchmark step to be used there.  One client object 
> can be used by many threads.

I wanted to test how many max connections can Solr handle concurrently.
Also I would have to implement an 'connection pooling' of the client-object 
connections rather than a single connection thread

However a single client object with thousands of queries coming in would surely 
become a bottleneck. I can test this scenario too.

Very likely the ES client works the same,
> but you'd need to ask them to be sure.
>
> That code seems to be doing an identical query on every run.  If 
> that's what's happening, it's not a good indicator of performance.  
> Running the same query over and over will show better performance than 
> you can expect from a real-world query load

What evidence do you see that Solr isn't scaling like you expect?
>
> The problem is the max throughput which I can get on the machine is 
> around
28 tps, even though I increase the load further & only 65% CPU is utilised 
(there is still 35% which is not being used). This clearly indicates the 
software is a problem as there is enough hardware resources.

Also very soon I would have a Linux environment with me, so I can conduct the 
test in the document on Linux too (for the users interested in Linux and not 
Windows)

> Thanks,
> Shawn
>
>

RE: Some performance questions....

Reply via email to