[
https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111458#comment-17111458
]
David Smiley commented on SOLR-14470:
-------------------------------------
Sounds great but hopefully can be done in a layered way. "/export" has a
straight-forward purpose. Adding aggregations _directly_ to it concerns me;
it's then not some straight-forward component.
> Add streaming expressions to /export handler
> --------------------------------------------
>
> Key: SOLR-14470
> URL: https://issues.apache.org/jira/browse/SOLR-14470
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Export Writer, streaming expressions
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
>
> Many streaming scenarios would greatly benefit from the ability to perform
> partial rollups (or other transformations) as early as possible, in order to
> minimize the amount of data that has to be sent from shards to the
> aggregating node.
> This can be implemented as a subset of streaming expressions that process the
> data directly inside each local {{ExportHandler}} and outputs only the
> records from the resulting stream.
> Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is
> the case with {{Combiner}}, because the input data is processed in batches
> there would be no guarantee that only 1 record per unique sort values would
> be emitted - in fact, in most cases multiple partial aggregations would be
> emitted. Still, in many scenarios this would allow reducing the amount of
> data to be sent by several orders of magnitude.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]