Usually I just let the compositeId do its thing and only go for custom
routing when the default proves inadequate.

Note: your 480M documents may very well be too many for three shards!
You really have to test....

Erick


On Mon, Mar 14, 2016 at 10:04 AM, Anil <anilk...@gmail.com> wrote:
> Hi Erick,
> In b/w, Do you recommend any effective shard distribution method ?
>
> Regards,
> Anil
>
> On 14 March 2016 at 22:30, Erick Erickson <erickerick...@gmail.com> wrote:
>
>> Try shards.info=true, but pinging the shard directly is the most certain.
>>
>>
>> Best,
>> Erick
>>
>> On Mon, Mar 14, 2016 at 9:48 AM, Anil <anilk...@gmail.com> wrote:
>> > HI Erik,
>> >
>> > we have used document routing to balance the shards load and for
>> > expand/collapse. it is mainly used for main_collection which holds one to
>> > many relationship records. In file_collection, it is only for load
>> > distribution.
>> >
>> > 25GB for entire solr service. each machine will act as shard for some
>> > collections.
>> >
>> > we have not stress tested our servers at least for solr service. i have
>> > read the the link you have shared, i will do something on it. thanks for
>> > sharing.
>> >
>> > i have checked other collections, where index size is max 90GB and 5 M as
>> > max number of documents. but for the particular file_collection_2014 , i
>> > see total index size across replicas is 147 GB.
>> >
>> > Can we get any hints if we run the query with debugQuery=true ?  what is
>> > the effective way of load distribution ? Please advice.
>> >
>> > Regards,
>> > Anil
>> >
>> > On 14 March 2016 at 20:32, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> >
>> >> bq: The slowness is happening for file_collection. though it has 3
>> shards,
>> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has
>> 330M
>> >> docs , shard3 is empty.
>> >>
>> >> Well, this collection terribly balanced. Putting 330M docs on a single
>> >> shard is
>> >> pushing the limits, the only time I've seen that many docs on a shard,
>> >> particularly
>> >> with 25G of ram, they were very small records. My guess is that you will
>> >> find
>> >> the queries you send to that shard substantially slower than the 150M
>> >> shard,
>> >> although 150M could also be pushing your limits. You can measure this
>> >> by sending the query to the specific core (something like
>> >>
>> >> solr/files_shard1_replica1/query?(your queryhere)&distrib=false
>> >>
>> >> My bet is that your QTime will be significantly different with the two
>> >> shards.
>> >>
>> >> It also sounds like you're using implicit routing where you control
>> where
>> >> the
>> >> files go, it's easy to have unbalanced shards in that case, why did you
>> >> decide
>> >> to do it this way? There are valid reasons, but...
>> >>
>> >> In short, my guess is that you've simply overloaded your shard with
>> >> 330M docs. It's
>> >> not at all clear that even 150 will give you satisfactory performance,
>> >> have you stress
>> >> tested your servers? Here's the long form of sizing:
>> >>
>> >>
>> >>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar <susheel2...@gmail.com>
>> >> wrote:
>> >> > For each of the solr machines/shards you have.  Thanks.
>> >> >
>> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar <
>> susheel2...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Hello Anil,
>> >> >>
>> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory
>> >> >> parameters under System / share the snapshot. ?
>> >> >>
>> >> >> Thanks,
>> >> >> Susheel
>> >> >>
>> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil <anilk...@gmail.com> wrote:
>> >> >>
>> >> >>> HI Toke and Jack,
>> >> >>>
>> >> >>> Please find the details below.
>> >> >>>
>> >> >>> * How large are your 3 shards in bytes? (total index across
>> replicas)
>> >> >>>           --  *146G. i am using CDH (cloudera), not sure how to
>> check
>> >> the
>> >> >>> index size of each collection on each shard*
>> >> >>> * What storage system do you use (local SSD, local spinning drives,
>> >> remote
>> >> >>> storage...)? *Local (hdfs) spinning drives*
>> >> >>> * How much physical memory does your system have? *we have 15 data
>> >> nodes.
>> >> >>> multiple services installed on each data node (252 GB RAM for each
>> data
>> >> >>> node). 25 gb RAM allocated for solr service.*
>> >> >>> * How much memory is free for disk cache? *i could not find.*
>> >> >>> * How many concurrent queries do you issue? *very less. i dont see
>> any
>> >> >>> concurrent queries to this file_collection for now.*
>> >> >>> * Do you update while you search? *Yes.. its very less.*
>> >> >>> * What does a full query (rows, faceting, grouping, highlighting,
>> >> >>> everything) look like? *for the file_collection, rows - 100,
>> >> highlights =
>> >> >>> false, no facets, expand = false.*
>> >> >>> * How many documents does a typical query match (hitcount)? *it
>> varies
>> >> >>> with
>> >> >>> each file. i have sort on int field to order commands in the query.*
>> >> >>>
>> >> >>> we have two sets of collections on solr cluster ( 17 data nodes)
>> >> >>>
>> >> >>> 1. main_collection - collection created per year. each collection
>> uses
>> >> 8
>> >> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc
>> >> >>>
>> >> >>> 2. file_collection (where files having commands are indexed) -
>> >> collection
>> >> >>> created per 2 years. it uses 3 shards and 2 replicas. ex :
>> >> >>> file_collection_2014, file_collection_2016
>> >> >>>
>> >> >>> The slowness is happening for file_collection. though it has 3
>> shards,
>> >> >>> documents are available in 2 shards. shard1 - 150M docs and shard2
>> has
>> >> >>> 330M
>> >> >>> docs , shard3 is empty.
>> >> >>>
>> >> >>> main_collection is looks good.
>> >> >>>
>> >> >>> please let me know if you need any additional details.
>> >> >>>
>> >> >>> Regards,
>> >> >>> Anil
>> >> >>>
>> >> >>>
>> >> >>> On 13 March 2016 at 21:48, Anil <anilk...@gmail.com> wrote:
>> >> >>>
>> >> >>> > Thanks Toke and Jack.
>> >> >>> >
>> >> >>> > Jack,
>> >> >>> >
>> >> >>> > Yes. it is 480 million :)
>> >> >>> >
>> >> >>> > I will share the additional details soon. thanks.
>> >> >>> >
>> >> >>> >
>> >> >>> > Regards,
>> >> >>> > Anil
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > On 13 March 2016 at 21:06, Jack Krupansky <
>> jack.krupan...@gmail.com>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> (We should have a wiki/doc page for the "usual list of suspects"
>> >> when
>> >> >>> >> queries are/appear slow, rather than need to repeat the same
>> >> mantra(s)
>> >> >>> for
>> >> >>> >> every inquiry on this topic.)
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> -- Jack Krupansky
>> >> >>> >>
>> >> >>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen <
>> >> >>> t...@statsbiblioteket.dk>
>> >> >>> >> wrote:
>> >> >>> >>
>> >> >>> >> > Anil <anilk...@gmail.com> wrote:
>> >> >>> >> > > i have indexed a data (commands from files) with 10 fields
>> and
>> >> 3 of
>> >> >>> >> them
>> >> >>> >> > is
>> >> >>> >> > > text fields. collection is created with 3 shards and 2
>> >> replicas. I
>> >> >>> >> have
>> >> >>> >> > > used document routing as well.
>> >> >>> >> >
>> >> >>> >> > > Currently collection holds 47,80,01,405 records.
>> >> >>> >> >
>> >> >>> >> > ...480 million, right? Funny digit grouping in India.
>> >> >>> >> >
>> >> >>> >> > > text search against text field taking around 5 sec. solr is
>> >> query
>> >> >>> just
>> >> >>> >> > and
>> >> >>> >> > > of two terms with fl as 7 fields
>> >> >>> >> >
>> >> >>> >> > > fileId:"file unique id" AND command_text:(system login)
>> >> >>> >> >
>> >> >>> >> > While not an impressive response time, it might just be that
>> your
>> >> >>> >> hardware
>> >> >>> >> > is not enough to handle that amount of documents. The usual
>> >> culprit
>> >> >>> is
>> >> >>> >> IO
>> >> >>> >> > speed, so chances are you have a system with spinning drives
>> and
>> >> not
>> >> >>> >> enough
>> >> >>> >> > RAM: Switch to SSD and/or add more RAM.
>> >> >>> >> >
>> >> >>> >> > To give better advice, we need more information.
>> >> >>> >> >
>> >> >>> >> > * How large are your 3 shards in bytes?
>> >> >>> >> > * What storage system do you use (local SSD, local spinning
>> >> drives,
>> >> >>> >> remote
>> >> >>> >> > storage...)?
>> >> >>> >> > * How much physical memory does your system have?
>> >> >>> >> > * How much memory is free for disk cache?
>> >> >>> >> > * How many concurrent queries do you issue?
>> >> >>> >> > * Do you update while you search?
>> >> >>> >> > * What does a full query (rows, faceting, grouping,
>> highlighting,
>> >> >>> >> > everything) look like?
>> >> >>> >> > * How many documents does a typical query match (hitcount)?
>> >> >>> >> >
>> >> >>> >> > - Toke Eskildsen
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >>
>>

Reply via email to