[ https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288331#comment-17288331 ]
Paweł Bugalski commented on LUCENE-9791: ---------------------------------------- {quote}BytesRefHash is using BytesRefPool as its underlying storage and that means that it is not a single continues array but rather a linked list of bytes buffers. So we would need to implement the same logic of chaining those buffers together as setBytesRef does. {quote} Apparently this is not true for BytesRefHash as it is putting its terms to BytesRefPool directly and it is careful to keep a single term inside a single buffer. That makes me a bit scared of BytesRefPool/BytesRefHash combo as apparently setBytesRef inside BytesRefPool is using its knowledge about BytesRefHash to access terms inserted by it. That said now that I know above the idea to do the direct comparison makes much more sense for me. > Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find > ----------------------------------------------------------------------- > > Key: LUCENE-9791 > URL: https://issues.apache.org/jira/browse/LUCENE-9791 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: master (9.0), 8.7, 8.8 > Reporter: Paweł Bugalski > Priority: Major > Attachments: LUCENE-9791.patch > > > _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that > should be matched by one of registered queries if match operations are run > concurrently from multiple threads. > This is because sometimes in a concurrent environment > _TermFilteredPresearcher_ might not select a query that could later on match > one of documents being matched. > Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of > _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is > correctly initialized under lock and its internal state (a map of > _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. > Later one when those instances are used concurrently a problem with > _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not > thread safe. > _org.apache.lucene.util.BytesRefHash#find_ internally is using a private > _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an > instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ > parameter with contents of _ByteBlockPool_. This is not thread safe and can > cause incorrect answers as well as _ArrayOutOfBoundException_. > __ > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org