Usually I just let the compositeId do its thing and only go for custom routing when the default proves inadequate.
Note: your 480M documents may very well be too many for three shards! You really have to test.... Erick On Mon, Mar 14, 2016 at 10:04 AM, Anil <anilk...@gmail.com> wrote: > Hi Erick, > In b/w, Do you recommend any effective shard distribution method ? > > Regards, > Anil > > On 14 March 2016 at 22:30, Erick Erickson <erickerick...@gmail.com> wrote: > >> Try shards.info=true, but pinging the shard directly is the most certain. >> >> >> Best, >> Erick >> >> On Mon, Mar 14, 2016 at 9:48 AM, Anil <anilk...@gmail.com> wrote: >> > HI Erik, >> > >> > we have used document routing to balance the shards load and for >> > expand/collapse. it is mainly used for main_collection which holds one to >> > many relationship records. In file_collection, it is only for load >> > distribution. >> > >> > 25GB for entire solr service. each machine will act as shard for some >> > collections. >> > >> > we have not stress tested our servers at least for solr service. i have >> > read the the link you have shared, i will do something on it. thanks for >> > sharing. >> > >> > i have checked other collections, where index size is max 90GB and 5 M as >> > max number of documents. but for the particular file_collection_2014 , i >> > see total index size across replicas is 147 GB. >> > >> > Can we get any hints if we run the query with debugQuery=true ? what is >> > the effective way of load distribution ? Please advice. >> > >> > Regards, >> > Anil >> > >> > On 14 March 2016 at 20:32, Erick Erickson <erickerick...@gmail.com> >> wrote: >> > >> >> bq: The slowness is happening for file_collection. though it has 3 >> shards, >> >> documents are available in 2 shards. shard1 - 150M docs and shard2 has >> 330M >> >> docs , shard3 is empty. >> >> >> >> Well, this collection terribly balanced. Putting 330M docs on a single >> >> shard is >> >> pushing the limits, the only time I've seen that many docs on a shard, >> >> particularly >> >> with 25G of ram, they were very small records. My guess is that you will >> >> find >> >> the queries you send to that shard substantially slower than the 150M >> >> shard, >> >> although 150M could also be pushing your limits. You can measure this >> >> by sending the query to the specific core (something like >> >> >> >> solr/files_shard1_replica1/query?(your queryhere)&distrib=false >> >> >> >> My bet is that your QTime will be significantly different with the two >> >> shards. >> >> >> >> It also sounds like you're using implicit routing where you control >> where >> >> the >> >> files go, it's easy to have unbalanced shards in that case, why did you >> >> decide >> >> to do it this way? There are valid reasons, but... >> >> >> >> In short, my guess is that you've simply overloaded your shard with >> >> 330M docs. It's >> >> not at all clear that even 150 will give you satisfactory performance, >> >> have you stress >> >> tested your servers? Here's the long form of sizing: >> >> >> >> >> >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> >> >> Best, >> >> Erick >> >> >> >> On Mon, Mar 14, 2016 at 7:05 AM, Susheel Kumar <susheel2...@gmail.com> >> >> wrote: >> >> > For each of the solr machines/shards you have. Thanks. >> >> > >> >> > On Mon, Mar 14, 2016 at 10:04 AM, Susheel Kumar < >> susheel2...@gmail.com> >> >> > wrote: >> >> > >> >> >> Hello Anil, >> >> >> >> >> >> Can you go to Solr Admin Panel -> Dashboard and share all 4 memory >> >> >> parameters under System / share the snapshot. ? >> >> >> >> >> >> Thanks, >> >> >> Susheel >> >> >> >> >> >> On Mon, Mar 14, 2016 at 5:36 AM, Anil <anilk...@gmail.com> wrote: >> >> >> >> >> >>> HI Toke and Jack, >> >> >>> >> >> >>> Please find the details below. >> >> >>> >> >> >>> * How large are your 3 shards in bytes? (total index across >> replicas) >> >> >>> -- *146G. i am using CDH (cloudera), not sure how to >> check >> >> the >> >> >>> index size of each collection on each shard* >> >> >>> * What storage system do you use (local SSD, local spinning drives, >> >> remote >> >> >>> storage...)? *Local (hdfs) spinning drives* >> >> >>> * How much physical memory does your system have? *we have 15 data >> >> nodes. >> >> >>> multiple services installed on each data node (252 GB RAM for each >> data >> >> >>> node). 25 gb RAM allocated for solr service.* >> >> >>> * How much memory is free for disk cache? *i could not find.* >> >> >>> * How many concurrent queries do you issue? *very less. i dont see >> any >> >> >>> concurrent queries to this file_collection for now.* >> >> >>> * Do you update while you search? *Yes.. its very less.* >> >> >>> * What does a full query (rows, faceting, grouping, highlighting, >> >> >>> everything) look like? *for the file_collection, rows - 100, >> >> highlights = >> >> >>> false, no facets, expand = false.* >> >> >>> * How many documents does a typical query match (hitcount)? *it >> varies >> >> >>> with >> >> >>> each file. i have sort on int field to order commands in the query.* >> >> >>> >> >> >>> we have two sets of collections on solr cluster ( 17 data nodes) >> >> >>> >> >> >>> 1. main_collection - collection created per year. each collection >> uses >> >> 8 >> >> >>> shards 2 replicas ex: main_collection_2016, main_collection_2015 etc >> >> >>> >> >> >>> 2. file_collection (where files having commands are indexed) - >> >> collection >> >> >>> created per 2 years. it uses 3 shards and 2 replicas. ex : >> >> >>> file_collection_2014, file_collection_2016 >> >> >>> >> >> >>> The slowness is happening for file_collection. though it has 3 >> shards, >> >> >>> documents are available in 2 shards. shard1 - 150M docs and shard2 >> has >> >> >>> 330M >> >> >>> docs , shard3 is empty. >> >> >>> >> >> >>> main_collection is looks good. >> >> >>> >> >> >>> please let me know if you need any additional details. >> >> >>> >> >> >>> Regards, >> >> >>> Anil >> >> >>> >> >> >>> >> >> >>> On 13 March 2016 at 21:48, Anil <anilk...@gmail.com> wrote: >> >> >>> >> >> >>> > Thanks Toke and Jack. >> >> >>> > >> >> >>> > Jack, >> >> >>> > >> >> >>> > Yes. it is 480 million :) >> >> >>> > >> >> >>> > I will share the additional details soon. thanks. >> >> >>> > >> >> >>> > >> >> >>> > Regards, >> >> >>> > Anil >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > On 13 March 2016 at 21:06, Jack Krupansky < >> jack.krupan...@gmail.com> >> >> >>> > wrote: >> >> >>> > >> >> >>> >> (We should have a wiki/doc page for the "usual list of suspects" >> >> when >> >> >>> >> queries are/appear slow, rather than need to repeat the same >> >> mantra(s) >> >> >>> for >> >> >>> >> every inquiry on this topic.) >> >> >>> >> >> >> >>> >> >> >> >>> >> -- Jack Krupansky >> >> >>> >> >> >> >>> >> On Sun, Mar 13, 2016 at 11:29 AM, Toke Eskildsen < >> >> >>> t...@statsbiblioteket.dk> >> >> >>> >> wrote: >> >> >>> >> >> >> >>> >> > Anil <anilk...@gmail.com> wrote: >> >> >>> >> > > i have indexed a data (commands from files) with 10 fields >> and >> >> 3 of >> >> >>> >> them >> >> >>> >> > is >> >> >>> >> > > text fields. collection is created with 3 shards and 2 >> >> replicas. I >> >> >>> >> have >> >> >>> >> > > used document routing as well. >> >> >>> >> > >> >> >>> >> > > Currently collection holds 47,80,01,405 records. >> >> >>> >> > >> >> >>> >> > ...480 million, right? Funny digit grouping in India. >> >> >>> >> > >> >> >>> >> > > text search against text field taking around 5 sec. solr is >> >> query >> >> >>> just >> >> >>> >> > and >> >> >>> >> > > of two terms with fl as 7 fields >> >> >>> >> > >> >> >>> >> > > fileId:"file unique id" AND command_text:(system login) >> >> >>> >> > >> >> >>> >> > While not an impressive response time, it might just be that >> your >> >> >>> >> hardware >> >> >>> >> > is not enough to handle that amount of documents. The usual >> >> culprit >> >> >>> is >> >> >>> >> IO >> >> >>> >> > speed, so chances are you have a system with spinning drives >> and >> >> not >> >> >>> >> enough >> >> >>> >> > RAM: Switch to SSD and/or add more RAM. >> >> >>> >> > >> >> >>> >> > To give better advice, we need more information. >> >> >>> >> > >> >> >>> >> > * How large are your 3 shards in bytes? >> >> >>> >> > * What storage system do you use (local SSD, local spinning >> >> drives, >> >> >>> >> remote >> >> >>> >> > storage...)? >> >> >>> >> > * How much physical memory does your system have? >> >> >>> >> > * How much memory is free for disk cache? >> >> >>> >> > * How many concurrent queries do you issue? >> >> >>> >> > * Do you update while you search? >> >> >>> >> > * What does a full query (rows, faceting, grouping, >> highlighting, >> >> >>> >> > everything) look like? >> >> >>> >> > * How many documents does a typical query match (hitcount)? >> >> >>> >> > >> >> >>> >> > - Toke Eskildsen >> >> >>> >> > >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> >> >> >> >> >> >> >> >> >>