Thanks, Erick. I will try that
On Sun, Feb 16, 2014 at 5:07 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Stored fields are what the Solr DocumentCache in solrconfig.xml > is all about. > > My general feeling is that stored fields are mostly irrelevant for > search speed, especially if lazy-loading is enabled. The only time > stored fields come in to play is when assembling the final result > list, i.e. the 10 or 20 documents that you return. That does imply > disk I/O, and if you have massive fields theres also decompression > to add to the CPU load. > > So, as usual, "it depends". Try measuring where you restrict the returned > fields to whatever your <uniqueKey> field is for one set of tests, then > try returning _everything_ for another? > > Best, > Erick > > > On Sun, Feb 16, 2014 at 12:18 PM, Nitin Sharma > <nitin.sha...@bloomreach.com>wrote: > > > Thanks Tri > > > > > > *a. Are you docs distributed evenly across shards: number of docs and > size > > of the shards* > > >> Yes the size of all the shards is equal (an ignorable delta in the > order > > of KB) and so are the # of docs > > > > *b. Is your test client querying all nodes, or all the queries go to > those > > 2 busy nodes?* > > *>> *Yes all nodes are receiving exactly the same amount of queries > > > > > > I have one more question. Do stored fields have significant impact on > > performance of solr queries? Having 50% of the fields stored ( out of 100 > > fields) significantly worse that having 20% of the fields stored? > > (signficantly == orders of 100s of milliseconds assuming all fields are > of > > the same size and type) > > > > How are stored fields retrieved in general (always from disk or loaded > into > > memory in the first query and then going forward read from memory?) > > > > Thanks > > Nitin > > > > > > > > On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao <tm...@me.com> wrote: > > > > > 1. Yes, that's the right way to go, well, in theory at least :) > > > 2. Yes, queries are alway fanned to all shards and will be as slow as > the > > > slowest shard. When I looked into > > > Solr distributed querying implementation a few months back, the support > > > for graceful degradation for things > > > like network failures and slow shards was not there yet. > > > 3. I doubt mmap settings would impact your read-only load, and it seems > > > you can easily > > > fit your index in RAM. You could try to warm the file cache to make > sure > > > with "cat $sorl_dir > /dev/null". > > > > > > It's odd that only 2 nodes are at 100% in your set up. I would check a > > > couple of things: > > > a. Are you docs distributed evenly across shards: number of docs and > size > > > of the shards > > > b. Is your test client querying all nodes, or all the queries go to > those > > > 2 busy nodes? > > > > > > Regards, > > > Tri > > > > > > On Feb 14, 2014, at 10:52 AM, Nitin Sharma < > nitin.sha...@bloomreach.com> > > > wrote: > > > > > > Hell folks > > > > > > We are currently using solrcloud 4.3.1. We have 8 node solrcloud > cluster > > > with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the > > > solrconfig used by our collections > > > > > > We have many collections and some of them are relatively very large > > > compared to the other. The size of the shard of these big collections > are > > > in the order of Gigabytes.We decided to split the bigger collection > > evenly > > > across all nodes (8 shards and 2 replicas) with maxNumShards > 1. > > > > > > We did a test with a read load only on one big collection and we still > > see > > > only 2 nodes running 100% CPU and the rest are blazing through the > > queries > > > way faster (under 30% cpu). [Despite all of them being sharded across > all > > > nodes] > > > > > > I checked the JVM usage and found that none of the pools have high > > > utilization (except Survivor space which is 100%). The GC cycles are in > > > the order of ms and mostly doing scavenge. Mark and sweep occurs once > > every > > > 30 minutes > > > > > > Few questions: > > > > > > 1. Sharding all collections (small and large) across all nodes evenly > > > > > > distributes the load and makes the system characteristics of all > machines > > > similar. Is this a recommended way to do ? > > > 2. Solr Cloud does a distributed query by default. So if a node is at > > > > > > 100% CPU does it slow down the response time for the other nodes > waiting > > > for this query? (or does it have a timeout if it cannot get a response > > from > > > a node within x seconds?) > > > 3. Our collections use Mmap directory but i specifically haven't > enabled > > > > > > anything related to mmaps (locked pages under ulimit ). Does it adverse > > > affect performance? or can lock pages even without this? > > > > > > Thanks a lot in advance. > > > Nitin > > > > > > > > > > > > -- > > - N > > > -- - N