Mạnh, Shalin, I tried to reproduce it locally but i failed, it is not just a stream of queries and frequent updates/commits. We will temporarily abuse a production machine to run 7.3 and a control machine on 7.2 to rule some things out.
We have plenty custom plugins, so when i can reproduce it again, we can rule stuff out and hopefully get back at you guys! Thanks, Markus -----Original message----- > From:Đạt Cao Mạnh <caomanhdat...@gmail.com> > Sent: Monday 30th April 2018 4:07 > To: solr-user@lucene.apache.org > Subject: Re: 7.3 appears to leak > > Hi Markus, > > I tried indexing documents and query documents with queries and filter > question, but can't not find any leak problems. Can you give us more > information about the leak? > > Thanks! > > On Fri, Apr 27, 2018 at 5:11 PM Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > Hi Markus, > > > > Can you give an idea of what your filter queries look like? Any custom > > plugins or things we should be aware of? Simple indexing artificial docs, > > querying and committing doesn't seem to reproduce the issue for me. > > > > On Thu, Apr 26, 2018 at 10:13 PM, Markus Jelsma < > > markus.jel...@openindex.io> > > wrote: > > > > > Hello, > > > > > > We just finished upgrading our three separate clusters from 7.2.1 to 7.3, > > > which went fine, except for our main text search collection, it appears > > to > > > leak memory on commit! > > > > > > After initial upgrade we saw the cluster slowly starting to run out of > > > memory within about an hour and a half. We increased heap in case 7.3 > > just > > > requires more of it, but the heap consumption graph is still growing on > > > each commit. Heap space cannot be reclaimed by forcing the garbage > > > collector to run, everything just piles up in the OldGen. Running with > > this > > > slightly larger heap, the first nodes will run out of memory in about two > > > and a half hours after cluster restart. > > > > > > The heap eating cluster is a 2shard/3replica system on separate nodes. > > > Each replica is about 50 GB in size and about 8.5 million documents. On > > > 7.2.1 it ran fine with just a 2 GB heap. With 7.3 and 2.5 GB heap, it > > will > > > take just a little longer for it to run out of memory. > > > > > > I inspected reports shown by the sampler of VisualVM and spotted one > > > peculiarity, the number of instances of SortedIntDocSet kept growing on > > > each commit by about the same amount as the number of cached filter > > > queries. But this doesn't happen on the logs cluster, SortedIntDocSet > > > instances are neatly collected there. The number of instances also > > accounts > > > for the number of commits since start up times the cache sizes > > > > > > Our other two clusters don't have this problem, one of them receives very > > > few commits per day, but the other receives data all the time, it logs > > user > > > interactions so a large amount of data is coming in all the time. I > > cannot > > > reproduce it locally by indexing data and committing all the time, the > > peak > > > usage in OldGen stays about the same. But, i can reproduce it locally > > when > > > i introduce queries, and filter queries while indexing pieces of data and > > > committing it. > > > > > > So, what is the problem? I dug in the CHANGES.txt of both Lucene and > > Solr, > > > but nothing really caught my attention. Does anyone here have an idea > > where > > > to look? > > > > > > Many thanks, > > > Markus > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > >