For the initial implementation we could skip the merge piece if that helps
get things done faster. In this scenario the metrics could be gathered
after some parallel operation, then there would be no need for a merge.
Sample syntax:

metrics(parallel(join())


Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Aug 16, 2016 at 1:25 PM, Joel Bernstein <joels...@gmail.com> wrote:

> The concept of a MetricStream was in the early designs but hasn't yet been
> implemented. Now might be a good time to work on the implementation.
>
> The MetricStream wraps a stream and gathers metrics in memory, continuing
> to emit the tuples from the underlying stream. This allows multiple
> MetricStreams to operate over the same stream without transforming the
> stream. Psuedo code for a metric expression syntax is below:
>
> metrics(metrics(search())
>
> The MetricStream delivers it's metrics through the EOF Tuple. So the
> MetricStream simply adds the finished aggregations to the EOF Tuple and
> returns it. If we're going to support parallel metric gathering then we'll
> also need to support the merging of the metrics. Something like this:
>
> metrics(parallel(metrics(join())
>
> Where the metrics wrapping the parallel function would need to collect the
> EOF tuples from each worker and the merge the metrics and then emit the
> merged metrics in and EOF Tuple.
>
> If you think this meets your needs, feel free to create a jira and add
> begin a patch and I can help get it committed.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Aug 16, 2016 at 11:52 AM, Radu Gheorghe <
> radu.gheor...@sematext.com> wrote:
>
>> Hello Solr users :)
>>
>> Right now it seems that if I want to rollup on two different fields
>> with streaming expressions, I would need to do two separate requests.
>> This is too slow for our use-case, when we need to do joins before
>> sorting and rolling up (because we'd have to re-do the joins).
>>
>> Since in our case we are actually looking for some not-necessarily
>> accurate facets (top N), the best solution we could come up with was
>> to implement a new stream decorator that implements an algorithm like
>> Count-min sketch[1] which would run on the tuples provided by the
>> stream function it wraps. This would have two big wins for us:
>> 1) it would do the facet without needing to sort on the facet field,
>> so we'll potentially save lots of memory
>> 2) because sorting isn't needed, we could do multiple facets in one go
>>
>> That said, I have two (broad) questions:
>> A) is there a better way of doing this? Let's reduce the problem to
>> streaming aggregations, where the assumption is that we have multiple
>> collections where data needs to be joined, and then facet on fields
>> from all collections. But maybe there's a better algorithm, something
>> out of the box or closer to what is offered out of the box?
>> B) whatever the best way is, could we do it in a way that can be
>> contributed back to Solr? Any hints on how to do that? Just another
>> decorator?
>>
>> Thanks and best regards,
>> Radu
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>> [1] https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch
>>
>
>

Reply via email to