Re: Solr cloud questions

Kojo Fri, 16 Aug 2019 07:39:58 -0700

Ere,
thanks for the advice. I don´t have this specific use case, but I am doing
some operations that I think could be risky, due to the first time I am
using.


There is a page that groups by one specific attribute of documents
distributed accros shards. I am using Composite ID to allow grouping
correctly, but I don´t know the performance of this task. This page groups
and lists this attributes like "snippets". And it is allowed to page.

I am doing some graph queries too, using streaming.  As far as I observe,
this features are not causing the problem I described.

Thank you,
Koji






Em sex, 16 de ago de 2019 às 04:34, Ere Maijala <ere.maij...@helsinki.fi>
escreveu:

> Does your web application, by any chance, allow deep paging or something
> like that which requires returning rows at the end of a large result
> set? Something like a query where you could have parameters like
> &rows=10&start=1000000 ? That can easily cause OOM with Solr when using
> a sharded index. It would typically require a large number of rows to be
> returned and combined from all shards just to get the few rows to return
> in the correct order.
>
> For the above example with 8 shards, Solr would have to fetch 1 000 010
> rows from each shard. That's over 8 million rows! Even if it's just
> identifiers, that's a lot of memory required for an operation that seems
> so simple from the surface.
>
> If this is the case, you'll need to prevent the web application from
> issuing such queries. This may mean something like supporting paging
> only among the first 10 000 results. Typical requirement may also be to
> be able to see the last results of a query, but this can be accomplished
> by allowing sorting in both ascending and descending order.
>
> Regards,
> Ere
>
> Kojo kirjoitti 14.8.2019 klo 16.20:
> > Shawn,
> >
> > Only my web application access this solr. at a first look at http server
> > logs I didnt find something different.  Sometimes I have a very big
> crawler
> > access to my servers, this was my first bet.
> >
> > No scheduled crons running at this time too.
> >
> > I think that I will reconfigure my boxes with two solr nodes each instead
> > of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> > Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
> > do you think?
> >
> > This is a production server, so I will plan to migrate.
> >
> > Regards,
> > Koji
> >
> >
> > Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey <apa...@elyograg.org>
> > escreveu:
> >
> >> On 8/13/2019 9:28 AM, Kojo wrote:
> >>> Here are the last two gc logs:
> >>>
> >>>
> >>
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
> >>
> >> Thank you for that.
> >>
> >> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> >> That log covers 58 hours of runtime, and everything looks very good to
> me.
> >>
> >> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
> >>
> >> But the small log shows a different story.  That log only covers a
> >> little more than four minutes.
> >>
> >> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
> >>
> >> What happened at approximately 10:55:15 PM on the day that the smaller
> >> log was produced?  Whatever happened caused Solr's heap usage to
> >> skyrocket and require more than 6GB.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>

Re: Solr cloud questions

Reply via email to