[ https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288316#comment-17288316 ]
Paweł Bugalski commented on LUCENE-9791: ---------------------------------------- {quote}Another idea would be to remove the scratch BytesRef: Maybe we can simply access the byte[] behind the BytesRefHash and use a standard Arrays.equals() with correct offset and count. As this is a private method, I see no issue in this.{quote} This is very tempting. But there is a number of problems with that. BytesRefHash is using BytesRefPool as its underlying storage and that means that it is not a single continues array but rather a linked list of bytes buffers. So we would need to implement the same logic of chaining those buffers together as setBytesRef does. Additionally from what I checked other usages of setBytesRef are using a resulting scratch BytesRef to do compareTo but they also expect the scratch to point to correct term, so we probably could not reuse this new method. On the other hand, we need to do comparison only so we will get a performance benefit by avoiding unneeded copy operation. However, that improvement might be small as setBytesRef is already avoiding creation of unnecessary copy if term is not spanning across multiple buffers. {quote}Nevertheless: You should never ever add new entries while doing lookups at same time. IMHO, a class like this should allow lookups concurrently, once all entries were added (like a HashMap: If you have build a HashMap, you can concurerntly access the getters/contains - but never ever change it again).{quote} I like this comparison. This is exactly the behaviour I was striving for with my patch but I did not find the correct analogy to describe it. {quote}Maybe another clean way would be some method in BytesRefHash that returns a read-only clone. {quote} BytesRefHash is not used in to many places so updating its javadoc to explicitly state above contract could be a good middle ground. > Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find > ----------------------------------------------------------------------- > > Key: LUCENE-9791 > URL: https://issues.apache.org/jira/browse/LUCENE-9791 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Affects Versions: master (9.0), 8.7, 8.8 > Reporter: Paweł Bugalski > Priority: Major > Attachments: LUCENE-9791.patch > > > _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that > should be matched by one of registered queries if match operations are run > concurrently from multiple threads. > This is because sometimes in a concurrent environment > _TermFilteredPresearcher_ might not select a query that could later on match > one of documents being matched. > Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of > _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is > correctly initialized under lock and its internal state (a map of > _org.apache.lucene.util.BytesRefHash_ instances) is correctly published. > Later one when those instances are used concurrently a problem with > _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not > thread safe. > _org.apache.lucene.util.BytesRefHash#find_ internally is using a private > _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an > instance field _scratch1_ as a temporary buffer to compare its _ByteRef_ > parameter with contents of _ByteBlockPool_. This is not thread safe and can > cause incorrect answers as well as _ArrayOutOfBoundException_. > __ > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org