Ere, thanks for the advice. I don´t have this specific use case, but I am doing some operations that I think could be risky, due to the first time I am using.
There is a page that groups by one specific attribute of documents distributed accros shards. I am using Composite ID to allow grouping correctly, but I don´t know the performance of this task. This page groups and lists this attributes like "snippets". And it is allowed to page. I am doing some graph queries too, using streaming. As far as I observe, this features are not causing the problem I described. Thank you, Koji Em sex, 16 de ago de 2019 às 04:34, Ere Maijala <ere.maij...@helsinki.fi> escreveu: > Does your web application, by any chance, allow deep paging or something > like that which requires returning rows at the end of a large result > set? Something like a query where you could have parameters like > &rows=10&start=1000000 ? That can easily cause OOM with Solr when using > a sharded index. It would typically require a large number of rows to be > returned and combined from all shards just to get the few rows to return > in the correct order. > > For the above example with 8 shards, Solr would have to fetch 1 000 010 > rows from each shard. That's over 8 million rows! Even if it's just > identifiers, that's a lot of memory required for an operation that seems > so simple from the surface. > > If this is the case, you'll need to prevent the web application from > issuing such queries. This may mean something like supporting paging > only among the first 10 000 results. Typical requirement may also be to > be able to see the last results of a query, but this can be accomplished > by allowing sorting in both ascending and descending order. > > Regards, > Ere > > Kojo kirjoitti 14.8.2019 klo 16.20: > > Shawn, > > > > Only my web application access this solr. at a first look at http server > > logs I didnt find something different. Sometimes I have a very big > crawler > > access to my servers, this was my first bet. > > > > No scheduled crons running at this time too. > > > > I think that I will reconfigure my boxes with two solr nodes each instead > > of four and increase heap to 16GB. This box only run Solr and has 64Gb. > > Each Solr will use 16Gb and the box will still have 32Gb for the OS. What > > do you think? > > > > This is a production server, so I will plan to migrate. > > > > Regards, > > Koji > > > > > > Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey <apa...@elyograg.org> > > escreveu: > > > >> On 8/13/2019 9:28 AM, Kojo wrote: > >>> Here are the last two gc logs: > >>> > >>> > >> > https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ > >> > >> Thank you for that. > >> > >> Analyzing the 20MB gc log actually looks like a pretty healthy system. > >> That log covers 58 hours of runtime, and everything looks very good to > me. > >> > >> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0 > >> > >> But the small log shows a different story. That log only covers a > >> little more than four minutes. > >> > >> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0 > >> > >> What happened at approximately 10:55:15 PM on the day that the smaller > >> log was produced? Whatever happened caused Solr's heap usage to > >> skyrocket and require more than 6GB. > >> > >> Thanks, > >> Shawn > >> > > > > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland >