Hello,

We have been using a simple Python tool for a long time that eases movement of 
data between Solr collections, it uses CursorMark to fetch small or large 
pieces of data. Recently it stopped working when moving data from a production 
collection to my local machine for testing, the Solr nodes began to run OOM.

I added 500M to the 3G heap and now it works again, but slow (240docs/s) and 
costing 3G of the entire heap just to move 32k docs out of 76m total.

Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has 38m 
docs almost no deletions (0.4%) taking up ~10.6g disk space. The documents are 
very small, they are logs of various interactions of users with our main text 
search engine.

I monitored all four nodes with VisualVM during the transfer, all four went up 
to 3g heap consumption very quickly. After the transfer it took a while for two 
nodes to (forcefully) release the no longer for the transfer needed heap space. 
The two other nodes, now, 17 minutes later, still think they have to hang on to 
their heap consumption. When i start the same transfer again, the nodes that 
already have high memory consumption just  seem to reuse that, not consuming 
additional heap. At least the second time it went 920docs/s. While we are used 
to transfer these tiny documents at light speed of multiple thousands per 
second.

What is going on? We do not need additional heap, Solr is clearly not asking 
for more and GC activity is minimal. Why did it become so slow? Regular queries 
on the collection are still going fast, but CursorMarking even through a tiny 
portion is taking time and memory.

Many thanks,
Markus

Reply via email to