[ 
https://issues.apache.org/jira/browse/LUCENE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568962#comment-17568962
 ] 

Michael McCandless commented on LUCENE-10658:
---------------------------------------------

+1, merges should abort promptly.  But it is indeed only a "best effort" 
mechanism.

I guess Lucene's completion field is building FSTs during merging and not 
writing bytes to disk as it builds the large FST, until the end?

Maybe there are other parts of Lucene merging that also fail to check promptly 
enough, e.g. maybe when dimensional points are doing a (large) offline sort 
before writing anything to the output files?

Maybe we could instrument {{MergeRateLimiter}} to write a WARNING into 
{{infoStream}} whenever too much time has elapsed between visits to its 
{{maybePause}} API?  We could use that to tease out other places that are 
failing to write bytes frequently enough for abort checking.

Lucene used to check for merge abort deep inside {{IndexWriter}} and merging 
code (e.g. merging postings would check periodically, same for doc values, 
etc.), but I think we refactored that down to the rate limiter only in 
LUCENE-7700 which was a nice cleanup / step forward.

> Merges should periodically check for abort
> ------------------------------------------
>
>                 Key: LUCENE-10658
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10658
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 9.3
>            Reporter: Nhat Nguyen
>            Priority: Major
>
> Rolling back an IndexWriter without committing shouldn't take long (i.e., 
> less than several seconds), and Elasticsearch cluster coordination [relies 
> on|https://github.com/elastic/elasticsearch/issues/88055] this assumption. If 
> some merges are taking place, the rollback can take several minutes as merges 
> only check for abort when writing to files via 
> [MergeRateLimiter|https://github.com/apache/lucene/blob/3d7d85f245381f84c46c766119695a8645cde2b8/lucene/core/src/java/org/apache/lucene/index/MergeRateLimiter.java#L117-L119].
>  Merging a completion field, for example, can take a long time without 
> touching output files. Another reason merges should periodically check for 
> abort is its outputs will be discarded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to