Re: Solr Streaming Queries Performance Issues [v7.2.1]

Joel Bernstein Fri, 28 Sep 2018 12:22:25 -0700

The facet expression is currently not as expressive as the JSON facet API.
So for very demanding use cases you can create more highly tuned JSON facet
API call.


The good news is we are working this. And also working on other expressions
that can be wrapped around the facet expression to implement parallelism
and scaling. We hope to have this ready for Solr 8, which is just around
the corner.



Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Sep 28, 2018 at 2:52 PM RAUNAK AGRAWAL <agrawal.rau...@gmail.com>
wrote:

> Thanks a lot Toki. I will get back to you soon regarding patch update after
> having discussion with the team.
>
> Thanks & Regards
>
>
> On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen <t...@kb.dk> wrote:
>
> > RAUNAK AGRAWAL <agrawal.rau...@gmail.com> wrote:
> >
> > > curl http://localhost:8983/solr/collection_name/stream -d
> > > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> > > desc",buckets="week",bucketSizeLimit=200,sum(sales),
> > > sum(amount),sum(days))'
> >
> > Stats on numeric fields then.
> >
> > > Also in my collection, I have almost 10 Billion documents
> > > with many deletions (close to 40%).
> >
> > Quite a lot of documents and in this case deletions counts, as the
> > internal structures for the deleted documents still needs to be iterated.
> > In scale this looks somewhat like our 18 billion document setup, with the
> > addendum that we use quite large segments (900GB).
> >
> > The performance regressions we encountered with Solr 7 lead to
> > https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot
> > (performance testing has not finished). If you have or can easily create
> a
> > test server where your shard(s) is the same size as your production
> shards,
> > I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am
> > looking for independent verification, so it is no bother.
> >
> > > I was planning to run optimise to merge the segments but
> > > spoke to admin team and lucidworks guys and they were
> > > against it saying that it will make very large segment file.
> >
> > If your bottleneck is the same as ours, the large segment would mean
> worse
> > performance (with Solr 7).
> >
> > > Is it true that optimise in solr should not be used, as it comes with
> > other issues?
> >
> > No simple answer there. If you have an index that you update very rarely,
> > it can save memory and processing power. If you have a live index where
> you
> > add and delete documents, it will probably be a bad idea. One strategy
> used
> > with time series data is to have old and immutable data in dedicated
> > collections, which can then be optimized.
> >
> > - Toke Eskildsen
> >
>

Re: Solr Streaming Queries Performance Issues [v7.2.1]

Reply via email to