Hello Anshum,

Good point! We sort on the collection's uniqueKey, our id field and this one 
does not have docValues enabled for it. It could be a contender but is it the 
problem? I cannot easily test it at this scale.

Thanks,
Markus
 
-----Original message-----
> From:Anshum Gupta <ans...@anshumgupta.net>
> Sent: Monday 26th October 2020 17:00
> To: solr-user@lucene.apache.org
> Subject: Re: Performance issues with CursorMark
> 
> Hey Markus,
> 
> What are you sorting on? Do you have docValues enabled on the sort field ?
> 
> On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma <markus.jel...@openindex.io>
> wrote:
> 
> > Hello,
> >
> > We have been using a simple Python tool for a long time that eases
> > movement of data between Solr collections, it uses CursorMark to fetch
> > small or large pieces of data. Recently it stopped working when moving data
> > from a production collection to my local machine for testing, the Solr
> > nodes began to run OOM.
> >
> > I added 500M to the 3G heap and now it works again, but slow (240docs/s)
> > and costing 3G of the entire heap just to move 32k docs out of 76m total.
> >
> > Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
> > 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
> > documents are very small, they are logs of various interactions of users
> > with our main text search engine.
> >
> > I monitored all four nodes with VisualVM during the transfer, all four
> > went up to 3g heap consumption very quickly. After the transfer it took a
> > while for two nodes to (forcefully) release the no longer for the transfer
> > needed heap space. The two other nodes, now, 17 minutes later, still think
> > they have to hang on to their heap consumption. When i start the same
> > transfer again, the nodes that already have high memory consumption just
> > seem to reuse that, not consuming additional heap. At least the second time
> > it went 920docs/s. While we are used to transfer these tiny documents at
> > light speed of multiple thousands per second.
> >
> > What is going on? We do not need additional heap, Solr is clearly not
> > asking for more and GC activity is minimal. Why did it become so slow?
> > Regular queries on the collection are still going fast, but CursorMarking
> > even through a tiny portion is taking time and memory.
> >
> > Many thanks,
> > Markus
> >
> 
> 
> -- 
> Anshum Gupta
> 

Reply via email to