[ 
https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288316#comment-17288316
 ] 

Paweł Bugalski commented on LUCENE-9791:
----------------------------------------

{quote}Another idea would be to remove the scratch BytesRef: Maybe we can 
simply access the byte[] behind the BytesRefHash and use a standard 
Arrays.equals() with correct offset and count. As this is a private method, I 
see no issue in this.{quote}
This is very tempting. But there is a number of problems with that. 
BytesRefHash is using BytesRefPool as its underlying storage and that means 
that it is not a single continues array but rather a linked list of bytes 
buffers. So we would need to implement the same logic of chaining those buffers 
together as setBytesRef does. Additionally from what I checked other usages of 
setBytesRef are using a resulting scratch BytesRef to do compareTo but they 
also expect the scratch to point to correct term, so we probably could not 
reuse this new method. On the other hand, we need to do comparison only so we 
will get a performance benefit by avoiding unneeded copy operation. However, 
that improvement might be small as setBytesRef is already avoiding creation of 
unnecessary copy if term is not spanning across multiple buffers.  
{quote}Nevertheless: You should never ever add new entries while doing lookups 
at same time. IMHO, a class like this should allow lookups concurrently, once 
all entries were added (like a HashMap: If you have build a HashMap, you can 
concurerntly access the getters/contains - but never ever change it 
again).{quote}
I like this comparison. This is exactly the behaviour I was striving for with 
my patch but I did not find the correct analogy to describe it.
{quote}Maybe another clean way would be some method in BytesRefHash that 
returns a read-only clone.
{quote}
BytesRefHash is not used in to many places so updating its javadoc to 
explicitly state above contract could be a good middle ground.

> Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-9791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: master (9.0), 8.7, 8.8
>            Reporter: Paweł Bugalski
>            Priority: Major
>         Attachments: LUCENE-9791.patch
>
>
> _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that 
> should be matched by one of registered queries if match operations are run 
> concurrently from multiple threads. 
> This is because sometimes in a concurrent environment 
> _TermFilteredPresearcher_ might not select a query that could later on match 
> one of documents being matched.
> Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of 
> _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is 
> correctly initialized under lock and its internal state (a map of 
> _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. 
> Later one when those instances are used concurrently a problem with 
> _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not 
> thread safe.
> _org.apache.lucene.util.BytesRefHash#find_ internally is using a private 
> _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an 
> instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ 
> parameter with contents of _ByteBlockPool_. This is not thread safe and can 
> cause incorrect answers as well as _ArrayOutOfBoundException_. 
> __
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to