On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <m...@apache.org> wrote:
> Hello!
>
> I need to count really wide facet on 30 shards index with roughly 100M
> docs, the facet response is about 100M values takes 0.5G in text file.
>
> So, far I experimented with old facets. It calculates per shard facets
> fine, but then a node which attempts to merge such 30 responses fails due
> to OOM. It's reasonable.
>
> I suppose I'll get pretty much same with json.facet, or it's better
> scalable?
>
> I want to experiment with Streaming Expression, which I've never taken yet.
> I've found facet() expression and select() with partitionKeys they'll try
> to merge facet values in FacetComponent/Module anyway.
> Is there a way to merge per-shard facet responses with Streaming?

Yeah, I think I've mentioned before that this is the way it should be
implemented (per-shard distrib=false facet request merged by streaming
expression).
The JSON Facet "stream" method does stream (i.e. does not build up the
response all in memory first), but only at the shard level and not at
the distrib/merge level.  This could then be fed into streaming to get
exact facets (and streaming facets).  But I don't think this has been
done yet.

-Yonik

Reply via email to