bq: The slowness is happening for file_collection. though it has 3 shards,
documents are available in 2 shards. shard1 - 150M docs and shard2 has 330M
docs , shard3 is empty.

Well, this collection terribly balanced. Putting 330M docs on a single shard is
pushing the limits, the only time I've seen that many docs on a shard,
particularly
with 25G of ram, they were very small records. My guess is that you will find
the queries you send to that shard substantially slower than the 150M shard,
although 150M could also be pushing your limits. You can measure this
by sending the query to the specific core (something like

solr/files_shard1_replica1/query?(your queryhere)&distrib=false

My bet is that your QTime will be significantly different with the two shards.

It also sounds like you're using implicit routing where you control where the
files go, it's easy to have unbalanced shards in that case, why did you decide
to do it this way? There are valid reasons, but...

In short, my guess is that you've simply overloaded your shard with
330M docs. It's
not at all clear that even 150 will give you satisfactory performance,
have you stress
tested your servers? Here's the long form of sizing:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
> For each of the solr machines/shards you have.  Thanks.
>
> On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
>> Hello Anil,
>>
>> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
>> parameters under System / share the snapshot. ?
>>
>> Thanks,
>> Susheel
>>
>> On Mon, Mar 14, 2016 at 5:36 AM, Anil <anilk...@gmail.com> wrote:
>>
>>> HI Toke and Jack,
>>>
>>> Please find the details below.
>>>
>>> * How large are your 3 shards in bytes? (total index across replicas)
>>>           --  *146G. i am using CDH (cloudera), not sure how to check the
>>> index size of each collection on each shard*
>>> * What storage system do you use (local SSD, local spinning drives, remote
>>> storage...)? *Local (hdfs) spinning drives*
>>> * How much physical memory does your system have? *we have 15 data nodes.
>>> multiple services installed on each data node (252 GB RAM for each data
>>> node). 25 gb RAM allocated for solr service.*
>>> * How much memory is free for disk cache? *i could not find.*
>>> * How many concurrent queries do you issue? *very less. i dont see any
>>> concurrent queries to this file_collection for now.*
>>> * Do you update while you search? *Yes.. its very less.*
>>> * What does a full query (rows, faceting, grouping, highlighting,
>>> everything) look like? *for the file_collection, rows - 100, highlights =
>>> false, no facets, expand = false.*
>>> * How many documents does a typical query match (hitcount)? *it varies
>>> with
>>> each file. i have sort on int field to order commands in the query.*
>>>
>>> we have two sets of collections on solr cluster ( 17 data nodes)
>>>
>>> 1. main_collection - collection created per year. each collection uses 8
>>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>>>
>>> 2. file_collection (where files having commands are indexed) - collection
>>> created per 2 years. it uses 3 shards and 2 replicas. ex :
>>> file_collection_2014, file_collection_2016
>>>
>>> The slowness is happening for file_collection. though it has 3 shards,
>>> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>>> 330M
>>> docs , shard3 is empty.
>>>
>>> main_collection is looks good.
>>>
>>> please let me know if you need any additional details.
>>>
>>> Regards,
>>> Anil
>>>
>>>
>>> On 13 March 2016 at 21:48, Anil <anilk...@gmail.com> wrote:
>>>
>>> > Thanks Toke and Jack.
>>> >
>>> > Jack,
>>> >
>>> > Yes. it is 480 million :)
>>> >
>>> > I will share the additional details soon. thanks.
>>> >
>>> >
>>> > Regards,
>>> > Anil
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 13 March 2016 at 21:06, Jack Krupansky <jack.krupan...@gmail.com>
>>> > wrote:
>>> >
>>> >> (We should have a wiki/doc page for the "usual list of suspects" when
>>> >> queries are/appear slow, rather than need to repeat the same mantra(s)
>>> for
>>> >> every inquiry on this topic.)
>>> >>
>>> >>
>>> >> -- Jack Krupansky
>>> >>
>>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
>>> t...@statsbiblioteket.dk>
>>> >> wrote:
>>> >>
>>> >> > Anil <anilk...@gmail.com> wrote:
>>> >> > > i have indexed a data (commands from files) with 10 fields and 3 of
>>> >> them
>>> >> > is
>>> >> > > text fields. collection is created with 3 shards and 2 replicas. I
>>> >> have
>>> >> > > used document routing as well.
>>> >> >
>>> >> > > Currently collection holds 47,80,01,405 records.
>>> >> >
>>> >> > ...480 million, right? Funny digit grouping in India.
>>> >> >
>>> >> > > text search against text field taking around 5 sec. solr is query
>>> just
>>> >> > and
>>> >> > > of two terms with fl as 7 fields
>>> >> >
>>> >> > > fileId:"file unique id" AND command_text:(system login)
>>> >> >
>>> >> > While not an impressive response time, it might just be that your
>>> >> hardware
>>> >> > is not enough to handle that amount of documents. The usual culprit
>>> is
>>> >> IO
>>> >> > speed, so chances are you have a system with spinning drives and not
>>> >> enough
>>> >> > RAM: Switch to SSD and/or add more RAM.
>>> >> >
>>> >> > To give better advice, we need more information.
>>> >> >
>>> >> > * How large are your 3 shards in bytes?
>>> >> > * What storage system do you use (local SSD, local spinning drives,
>>> >> remote
>>> >> > storage...)?
>>> >> > * How much physical memory does your system have?
>>> >> > * How much memory is free for disk cache?
>>> >> > * How many concurrent queries do you issue?
>>> >> > * Do you update while you search?
>>> >> > * What does a full query (rows, faceting, grouping, highlighting,
>>> >> > everything) look like?
>>> >> > * How many documents does a typical query match (hitcount)?
>>> >> >
>>> >> > - Toke Eskildsen
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>

Reply via email to