Have you looked at Streaming Aggregation/Streaming Expressions/Parallel SQL etc?
Best, Erick On Mon, Jun 12, 2017 at 9:24 AM, Rohit Jain <rohit.j...@esgyn.com> wrote: > Hi folks, > > We have a solution where we would like to connect to SOLR via an API, submit > a query, and then pre-process the results before we return the results to our > users. However, in some cases, it is possible that the results being > returned by SOLR, in a large distributed cluster deployment, is very large. > In these cases, we would like to set up parallel streams, so that each > parallel SOLR worker feeds directly into one of our processes distributed > across the cluster. That way, we can pre-process those results in parallel, > before we consolidate (and potentially reduce / aggregate) the results > further for the user, who has a single client connection to our solution. > Sort of a MapReduce type scenario where our processors are the reducers. We > could consume the results as returned by these SOLR Worker processes, or > perhaps have them shuffled based on a shard key, before our processes would > receive them. > > Any ideas on how this could be done? > > Rohit Jain