Deepak, A better test of multi-user support might be to vary the queries and try to simulate a realistic 'working set' of search data.
I've made this same performance analysis mistake with the search index of www.indexengines.com, which I developed (in part). Somewhat different from Lucene, inside, although. What we cared a lot about were these things: - if a query was done warm, e.g. with results cached in memory, response time should be very fast. - If a query was done cold, e.g. with results from disk, response time should still be acceptable. - If a lot of different queries were done, that we think simulate the real behavior of N users, that the memory usage of cache should be acceptable, e.g. the cache should get warm and there should be few cache misses. This last test was key - if we have designed our caching properly, then the queries of X users will fit in Y memory, and we will be able to develop a simple understanding of that, with our target users. Generating that realistic amount of query behavior for X users is hard. Using real search logs from your previous search product is a good idea. For instance, if you look at the top 1000 queries performed by your users over a particular period of time, you can then say that some percentage of user queries were covered by the top 1000 queries, e.g. 90%. Then, maybe you measure of that same period your queries per second (QPS). Now, you can say that if you randomly sample those top 1000 queries while generating the same QPS with an exponential distribution generator, that you have covered 90% of your real traffic. Your queries are much more randomly distributed, but that's OK, because what you want to know is whether it all fits in cache memory, the effect of # of CPUs, amount of Memory, number of cluster nodes, sharding, and replication on the response time and such. Depending on your user community, top 1000 queries may not be enough to hit 90%, it may only hit 70%. Maybe you also need to look at the rate of "advanced search" and "search", or account for queries that drive business intelligence reports. It really depends on your use case. I wish I'd had the cloud available to test performance with - we were really naïve and did all this testing with our metal because, well, we thought our stuff relied on that. I recommend you read the first couple chapters of Ran Jain's Art of Computer Systems Performance Analysis. It’s a great book even if you totally skip the later chapters on Queuing System analysis, and just think about what and how to test. Hope this helps, -Dan -----Original Message----- From: Deepak Goel [mailto:deic...@gmail.com] Sent: Friday, March 16, 2018 4:22 PM To: solr-user@lucene.apache.org Subject: Re: Some performance questions.... On Sat, Mar 17, 2018 at 1:06 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 3/16/2018 7:38 AM, Deepak Goel wrote: > > I did a performance study of Solr a while back. And I found that it > > does not scale beyond a particular point on a single machine (could > > be due to the way its coded). Hence multiple instances might make sense. > > > > https://docs.google.com/document/d/1kUqEcZl3NhOo6SLklo5Icg3fMnn9O > tLY_lwnc6wbXus/edit?usp=sharing > > How did you *use* that code that you've shown? That is not apparent > (at least to me) from the document. > > If every usage of the SolrJ code went through ALL of the code you've > shown, then it's not done well. It appears that you're creating and > closing a client object with every query. This will be VERY inefficient. > > The client object should be created during an initialization step, and > then passed to the benchmark step to be used there. One client object > can be used by many threads. I wanted to test how many max connections can Solr handle concurrently. Also I would have to implement an 'connection pooling' of the client-object connections rather than a single connection thread However a single client object with thousands of queries coming in would surely become a bottleneck. I can test this scenario too. Very likely the ES client works the same, > but you'd need to ask them to be sure. > > That code seems to be doing an identical query on every run. If > that's what's happening, it's not a good indicator of performance. > Running the same query over and over will show better performance than > you can expect from a real-world query load What evidence do you see that Solr isn't scaling like you expect? > > The problem is the max throughput which I can get on the machine is > around 28 tps, even though I increase the load further & only 65% CPU is utilised (there is still 35% which is not being used). This clearly indicates the software is a problem as there is enough hardware resources. Also very soon I would have a Linux environment with me, so I can conduct the test in the document on Linux too (for the users interested in Linux and not Windows) > Thanks, > Shawn > >