mikemccand commented on issue #10025:
URL: https://github.com/apache/lucene/issues/10025#issuecomment-2604376572

   > > Michael McCandless ([@mikemccand](https://github.com/mikemccand)) 
([migrated from 
JIRA](https://issues.apache.org/jira/browse/LUCENE-8982?focusedCommentId=17223693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17223693))
   > > Yes please!  Feel free to tackle this!  I can help w/ benchmarking.
   > 
   > Just curious.. are there any benchmarking results that can be shared with 
this enabled?
   
   Oh hello, sorry, no I never managed to do any benchmarking here.  Did you?  
I'd still be curious about the results ... direct IO is an interesting 
low-level optimization (and which [Linus notably is not a fan 
of](https://www.theregister.com/2019/06/21/linus_torvalds_rant)!  Not sure if 
his thinking has changed...) and it's not clear (to me) where it's actually 
helpful in Lucene.
   
   The original theory / use-case for this directory was to ensure merging 
segments would write the newly merged segment straight to the storage device, 
bypassing the OS's write cache, and leaving more free RAM to hold hot pages for 
searching, reducing page faults for searching while heavy merging is going on.
   
   At Amazon Product Search we use Lucene with [near-real-time segment 
replication](https://blog.mikemccandless.com/2017/09/lucenes-near-real-time-segment-index.html)
 to efficiently distribute index updates to many replicas (to scale to high 
QPS) for each shard.  Long ago, we tried fixing that segment replication copy 
to use direct IO, on the same theory that copying in many bytes for new 
segments might evict hot pages used for searching.  But what we found is that 
direct IO caused even more page faults once the replica lit (switched over to 
them for searching) the new segments as the OS now had to page in the very cold 
bytes for the newly copied segments on the synchronous query hot path.
   
   We are now wondering if some sort of [bandwidth cap/budget on merge policy 
or scheduler](https://github.com/apache/lucene/issues/14148) might be a better 
approach.  This is just an anecdote from our production experience, not a 
full/clean A/B benchmark, but at least it's one data point :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to