uschindler opened a new issue, #15106:
URL: https://github.com/apache/lucene/issues/15106

   ### Description
   
   One user reported and I was able to explain this behaviour: 
https://github.com/apache/lucene/issues/15068#issuecomment-3210953067
   
   Basically the RefCountedSharedArena throws an `IllegalStateException` when 
the refcounting is wrong. Tis should only happen when something is wrong, like 
you close an arena too often. But due to the official API of Arena, throwing 
IllegalStateException is the worst thing to do, because the documentation 
exactly specifies when this may happen:
   
   > Throws:
   > - 
[IllegalStateException](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/IllegalStateException.html)
 - if the arena has already been closed
   > - 
[IllegalStateException](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/IllegalStateException.html)
 - if a segment associated with this arena is being accessed concurrently, e.g. 
by a downcall method handle .... _addition by Uwe:_ it is enough if any code 
actually invokes a method on a memory segment in that shared arena while 
another thread is calling close!
   
   Throwing IllegalStateException is perfectly fine theoretically, if and only 
if the underlying `Scope` is already closed. But we don't check this at all and 
just throw the exception if the close method was called to often.
   
   If now the second documented `IllegalStateException` is thrown by the Arena 
we delegate to, the refcount was already decremented.
   
   Now the "official" way to handle the second case is this spinloop, which is 
implemented in `MemorySegmentIndexInput.close()`:
   
   
https://github.com/apache/lucene/blob/cb16075c5fd5c4bbcb0fab561e185fe2e3f8e939/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L716-L724
   
   This spinloop checks if the arena's scope is still open and if that is still 
the case it repeats the close call on `IllegalStateException` _until_ the 
arena's scope returns closed. This code was negotiated with the OpenJDK list 
and we also have a multithreaded test for it. Of course this test does not use 
RefCountedArenas as it uses a test file with a name which is not grouped.
   
   Normal case:
   - MemorySegmentIndexInput code calls close for first time (inner arenas 
scope still open). If the arena is still open, all is fine and refcount is 
decremented and inner arena is not touched. All fine
   
   What now happens if the inner Arena's close throws that 
`IllegalStateException`: 
   - MemorySegmentIndexInput calls close on the last open file in that shared 
group.
   - The refcount is lowered, no exception thrown
   - **For bad luck, some other thread acceses an IndexInput**
   - The inner Arena's close throws IllegalStateException in the final cleanup
   - MemorySegmentIndexInput (above code) catches exception and enters spinloop
   - MemorySegmentIndexInput repeats close
   - The refcount is now already 0 and therefor the RefCountedSharedArena's 
code throws IllegalStateException (while inner Arena's close is no longer 
called at all).
   - MemorySegmentIndexInput (above code) catches exception and enters spinloop
   - and this repeats forever, the inner arena is never closed
   
   Basically to fix this, the following should be done:
   - The code here should be removed: 
https://github.com/apache/lucene/blob/b1c1770e8052a74fd30e3520e4fd662e2724bd1c/lucene/core/src/java/org/apache/lucene/store/RefCountedSharedArena.java#L106-L108
   - Instead if the refcount reaches 0 it should stay 0 (or go negative, no 
matter).
   - The code should run the cleanup again and therefore call inner Arena's 
close again. If this arena throws IllegalStateException it can be handled by 
the caller.
   
   So basically I'd remove the two parts of the refocunting and change the code 
to trivially do:
   - decrement refcount
   - if refcount>0, do nothing
   - otherwise call inner Arena's close
   
   If for some reason the inner scope is already closed the Arena#close() will 
throw IllegalState anyways, we do not need to duplicate the refcounting!
   
   Severity: The bug is severe, if and only if somebody closes an 
IndexInput/IndexReader while it is used in another thread. This may lead to a 
spinloop if the concurrent acces is activcely accessing a segment..
   
   SearcherMananger and normal Lucene use should not be affected, but the bug 
may still causehalting  the JVM due to a infinite loop when something goes 
wrong on close!
   
   @ChrisHegarty 
   
   ### Version and environment details
   
   Affected versions: Lucene 9.12, 10.x, main branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to