[ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128347#comment-17128347 ]
ASF subversion and git services commented on SOLR-14470: -------------------------------------------------------- Commit 107f655a7f256f193ef81bd69658de33549ab0a3 in lucene-solr's branch refs/heads/branch_8x from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=107f655 ] SOLR-14470: Add streaming expressions to /export handler. > Add streaming expressions to /export handler > -------------------------------------------- > > Key: SOLR-14470 > URL: https://issues.apache.org/jira/browse/SOLR-14470 > Project: Solr > Issue Type: Improvement > Components: Export Writer, streaming expressions > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Many streaming scenarios would greatly benefit from the ability to perform > partial rollups (or other transformations) as early as possible, in order to > minimize the amount of data that has to be sent from shards to the > aggregating node. > This can be implemented as a subset of streaming expressions that process the > data directly inside each local {{ExportHandler}} and outputs only the > records from the resulting stream. > Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is > the case with {{Combiner}}, because the input data is processed in batches > there would be no guarantee that only 1 record per unique sort values would > be emitted - in fact, in most cases multiple partial aggregations would be > emitted. Still, in many scenarios this would allow reducing the amount of > data to be sent by several orders of magnitude. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org