Hi , I am planning to write custom aggregator in solr which will use some probabilistic data structures per shard to accumate results and then after shard merging results will be sent to user as integer.
I explored 2 options to do this 1. Solr analytics API https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API I can implement merge policy and post filter to perform aggregation , I have example working using this , but I am not sure if it is ok to pass objects which > 1 MB in shard response? does solr use javabin serialization to optimize data gathering from shards? then leader shard will collect these 1 MB probabilistic data structures & produce count which will be included in response. 2. JSON Facet API http://yonik.com/json-facet-api/ After looking at https://github.com/apache/lucene-solr/tree/master/solr/core/src/java/org/apache/solr/search/facet FacetProcessor.java seems very similar to Solr analytics API. seems like merging happens similar way where response will include objects like hll and merge them one key difference is Solr analytics API is based on postFilter and JSON facet API is based on ValueSource but I dont understand impact of using one or the other. Can someone help me out?