12 shards with 28GB for the heap and 90GB for each index means that you need at least 336GB for the heap (assuming you're using all of it which may be easily the case considering the way the GC is handling memory) and ~= 1TO for the index. Let's say that you don't need your entire index in RAM, the problem as I see it is that you don't have enough RAM for your index + heap. Assuming your machine has 370GB of RAM there are only 34GB left for your index, 1TO/34GB means that you can only have 1/30 of your entire index in RAM. I would advise you to check the swap activity on the machine and see if it correlates with the bad performance you're seeing. One important thing to notice is that a significant part of your index needs to be in RAM (especially if you're using SSDs) in order to achieve good performance:
*As mentioned above this is a big machine with 370+ gb of RAM and Solr (12 nodes total) is assigned 336 GB. The rest is still a good for other system activities.* The remaining size after you removed the heap usage should be reserved for the index (not only the other system activities). *Also the CPU utilization goes upto 400% in few of the nodes:* You said that only machine is used so I assumed that 400% cpu is for a single process (one solr node), right ? This seems impossible if you are sure that only one query is played at a time and no indexing is performed. Best thing to do is to dump stack trace of the solr nodes during the query and to check what the threads are doing. Jim 2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1...@gmail.com>: > Just to add one more point that one external Zookeeper instance is also > running on this particular machine. > > Regards, > Modassar > > On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather <modather1...@gmail.com> > wrote: > > > Hi Toke, > > Thanks for your response. My comments in-line. > > > > That is 12 machines, running a shard each? > > No! This is a single big machine with 12 shards on it. > > > > What is the total amount of physical memory on each machine? > > Around 370 gb on the single machine. > > > > Well, se* probably expands to a great deal of documents, but a huge bump > > in memory utilization and 3 minutes+ sounds strange. > > > > - What are your normal query times? > > Few simple queries are returned with in a couple of seconds. But the more > > complex queries with proximity and wild cards have taken more than 3-4 > > minutes and some times some queries have timed out too where time out is > > set to 5 minutes. > > - How many hits do you get from 'network se*'? > > More than a million records. > > - How many results do you return (the rows-parameter)? > > It is the default one 10. Grouping is enabled on a field. > > - If you issue a query without wildcards, but with approximately the > > same amount of hits as 'network se*', how long does it take? > > A query resulting in around half a million record return within a couple > > of seconds. > > > > That is strange, yes. Have you checked the logs to see if something > > unexpected is going on while you test? > > Have not seen anything particularly. Will try to check again. > > > > If you are using spinning drives and only have 32GB of RAM in total in > > each machine, you are probably struggling just to keep things running. > > As mentioned above this is a big machine with 370+ gb of RAM and Solr (12 > > nodes total) is assigned 336 GB. The rest is still a good for other > system > > activities. > > > > Thanks, > > Modassar > > > > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen <t...@statsbiblioteket.dk> > > wrote: > > > >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote: > >> > I have a setup of 12 shard cluster started with 28gb memory each on a > >> > single server. There are no replica. The size of index is around 90gb > on > >> > each shard. The Solr version is 5.2.1. > >> > >> That is 12 machines, running a shard each? > >> > >> What is the total amount of physical memory on each machine? > >> > >> > When I query "network se*", the memory utilization goes upto 24-26 gb > >> and > >> > the query takes around 3+ minutes to execute. Also the CPU utilization > >> goes > >> > upto 400% in few of the nodes. > >> > >> Well, se* probably expands to a great deal of documents, but a huge bump > >> in memory utilization and 3 minutes+ sounds strange. > >> > >> - What are your normal query times? > >> - How many hits do you get from 'network se*'? > >> - How many results do you return (the rows-parameter)? > >> - If you issue a query without wildcards, but with approximately the > >> same amount of hits as 'network se*', how long does it take? > >> > >> > Why the CPU utilization is so high and more than one core is used. > >> > As far as I understand querying is single threaded. > >> > >> That is strange, yes. Have you checked the logs to see if something > >> unexpected is going on while you test? > >> > >> > How can I disable replication(as it is implicitly enabled) permanently > >> as > >> > in our case we are not using it but can see warnings related to leader > >> > election? > >> > >> If you are using spinning drives and only have 32GB of RAM in total in > >> each machine, you are probably struggling just to keep things running. > >> > >> > >> - Toke Eskildsen, State and University Library, Denmark > >> > >> > >> > > >