[ 
https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288331#comment-17288331
 ] 

Paweł Bugalski commented on LUCENE-9791:
----------------------------------------

{quote}BytesRefHash is using BytesRefPool as its underlying storage and that 
means that it is not a single continues array but rather a linked list of bytes 
buffers. So we would need to implement the same logic of chaining those buffers 
together as setBytesRef does.
{quote}
Apparently this is not true for BytesRefHash as it is putting its terms to 
BytesRefPool directly and it is careful to keep a single term inside a single 
buffer. That makes me a bit scared of BytesRefPool/BytesRefHash combo as 
apparently setBytesRef inside BytesRefPool is using its knowledge about 
BytesRefHash to access terms inserted by it.

That said now that I know above the idea to do the direct comparison makes much 
more sense for me.

> Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-9791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>    Affects Versions: master (9.0), 8.7, 8.8
>            Reporter: Paweł Bugalski
>            Priority: Major
>         Attachments: LUCENE-9791.patch
>
>
> _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that 
> should be matched by one of registered queries if match operations are run 
> concurrently from multiple threads. 
> This is because sometimes in a concurrent environment 
> _TermFilteredPresearcher_ might not select a query that could later on match 
> one of documents being matched.
> Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of 
> _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is 
> correctly initialized under lock and its internal state (a map of 
> _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. 
> Later one when those instances are used concurrently a problem with 
> _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not 
> thread safe.
> _org.apache.lucene.util.BytesRefHash#find_ internally is using a private 
> _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an 
> instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ 
> parameter with contents of _ByteBlockPool_. This is not thread safe and can 
> cause incorrect answers as well as _ArrayOutOfBoundException_. 
> __
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to