"Just try it" seems like the method we'd all trust the most :) If you want to mod Solr a bit yourself and add some timers around aggregation, you could do that. You could use something like this: * Modify Solr code and add Coda Metrics timers around aggregation code [1] * Use SPM Reporter for Coda Metrics to send the timer info to SPM for graphing [2] * Put together a dashboard in SPM and put on it whichever Solr metrics you want PLUS the widget with your custom metrics [3]
This would let you see both Solr (and server) metrics along with the aggregation timer info side by side and hopefully understand performance of hit list aggregation better. I'd love to see the results! 1. http://metrics.codahale.com/ 2. https://github.com/sematext/sematext-metrics-reporter 3. If you log into SPM as bb...@sematext.com / bbuzz you should be able to see some dashboards with custom metrics, so you'll see what I mean. Btw. Shalin I just spoke to a company that uses Solr in a SolrCloud-like fashion, and they use this approach where they have an aggregator node in front of N query nodes, too. So people really do use this! :) JIRA? Otis -- Solr & ElasticSearch Support http://sematext.com/ On Mon, Jun 10, 2013 at 5:54 PM, Tim Vaillancourt <t...@elementspace.com> wrote: > To answer Otis' question of whether or not this would be useful, the > trouble is, I don't know! :) It very well could be useful for my use case. > > Is there any way to determine the impact of result merging (time spent? > Etc?) aside from just 'trying it'? > > Cheers, > > Tim > > > On 10 June 2013 14:48, Otis Gospodnetic <otis.gospodne...@gmail.com> wrote: > >> I think it would be useful. I know people using ElasticSearch use it >> relatively often. >> >> > Is aggregation expensive enough to warrant a separate box? >> >> I think it can get expensive if X in rows=X is highish. We've seen >> this reported here on the Solr ML before.... >> So to make sorting/merging of N result set from N "data nodes" on this >> "aggregator node" you may want to get all the CPU you can get and not >> have the CPU simultaneously also try to handle incoming queries. >> >> Otis >> -- >> Solr & ElasticSearch Support >> http://sematext.com/ >> >> >> >> >> >> On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar >> <shalinman...@gmail.com> wrote: >> > No, there's no such notion in SolrCloud. Each node that is part of a >> > collection/shard is a replica and will handle indexing/querying. Even >> > though you can send a request to a node containing a different >> collection, >> > the request would just be forwarded to the right node and will be >> executed >> > there. >> > >> > That being said, do people find such a feature useful? Is aggregation >> > expensive enough to warrant a separate box? In a distributed search, the >> > local index is used. One'd would just be adding a couple of extra network >> > requests if you don't have a local index. >> > >> > >> > On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic < >> > otis.gospodne...@gmail.com> wrote: >> > >> >> Hi, >> >> >> >> Is there a notion of a data-node vs. non-data node in SolrCloud? >> >> Something a la >> http://www.elasticsearch.org/guide/reference/modules/node/ >> >> >> >> >> >> Thanks, >> >> Otis >> >> Solr & ElasticSearch Support >> >> http://sematext.com/ >> >> >> > >> > >> > >> > -- >> > Regards, >> > Shalin Shekhar Mangar. >>