Have you looked at Streaming Aggregation/Streaming Expressions/Parallel SQL etc?

Best,
Erick

On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain <rohit.j...@esgyn.com> wrote:
> Hi folks,
>
> We have a solution where we would like to connect to SOLR via an API, submit 
> a query, and then pre-process the results before we return the results to our 
> users.  However, in some cases, it is possible that the results being 
> returned by SOLR, in a large distributed cluster deployment, is very large.  
> In these cases, we would like to set up parallel streams, so that each 
> parallel SOLR worker feeds directly into one of our processes distributed 
> across the cluster.  That way, we can pre-process those results in parallel, 
> before we consolidate (and potentially reduce / aggregate) the results 
> further for the user, who has a single client connection to our solution.  
> Sort of a MapReduce type scenario where our processors are the reducers.  We 
> could consume the results as returned by these SOLR Worker processes, or 
> perhaps have them shuffled based on a shard key, before our processes would 
> receive them.
>
> Any ideas on how this could be done?
>
> Rohit Jain

Reply via email to