jpountz commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1236975827

   I like the idea of exploring a combination of the current approach and 
on-disk buffering to flush less often.
   
   For the record, the approach of building the graph at flush time has a few 
other downsides that are not well captured by an indexing benchmark. Mike 
mentioned the fact that we use a similar amount of memory at flush time (though 
it's more transient), but there is also the logic we have for stalling that 
waits until flush segments + buffered segments use 2x the size of the RAM 
buffer before stalling indexing. 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java#L286-L294
 Because flushes take very long times when building the graph on search, it's 
more likely that IndexWriter goes over (up to 2x) the amount of RAM that it's 
allowed to spend on the indexing buffer (which could be surprising on its own 
to users, could cause OOMEs) and indexing gets stalled (which can be surprising 
to users as well). Maybe getting rid of this downside is worth losing a bit of 
indexing throughput.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to