[
https://issues.apache.org/jira/browse/LUCENE-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288339#comment-17288339
]
Paweł Bugalski commented on LUCENE-9791:
----------------------------------------
This is how it would look like:
{code:java}
private boolean equals(int id, BytesRef b) {
final int textStart = bytesStart[id];
final byte[] bytes = pool.buffers[textStart >> BYTE_BLOCK_SHIFT];
int pos = textStart & BYTE_BLOCK_MASK;
final int length;
final int offset;
if ((bytes[pos] & 0x80) == 0) {
// length is 1 byte
length = bytes[pos];
offset = pos + 1;
} else {
// length is 2 bytes
length = (bytes[pos] & 0x7f) + ((bytes[pos + 1] & 0xff) << 7);
offset = pos + 2;
}
return Arrays.equals(
bytes,
offset,
offset + length,
b.bytes,
b.offset,
b.offset + b.length);
} {code}
It works but it basically a merge of setBytesRef and BytesRef#bytesEquals and
the only performance benefit that I was seeing is not present as BytesRefHash
does not allow for terms to breaks on buffer borders so both this solution and
the previous one does no allocations.
> Monitor (aka Luwak) has concurrency issues related to BytesRefHash#find
> -----------------------------------------------------------------------
>
> Key: LUCENE-9791
> URL: https://issues.apache.org/jira/browse/LUCENE-9791
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Affects Versions: master (9.0), 8.7, 8.8
> Reporter: Paweł Bugalski
> Priority: Major
> Attachments: LUCENE-9791.patch
>
>
> _org.apache.lucene.monitor.Monitor_ can sometimes *NOT* match a document that
> should be matched by one of registered queries if match operations are run
> concurrently from multiple threads.
> This is because sometimes in a concurrent environment
> _TermFilteredPresearcher_ might not select a query that could later on match
> one of documents being matched.
> Internally _TermFilteredPresearcher_ is using a term acceptor: an instance of
> _org.apache.lucene.monitor.QueryIndex.QueryTermFilter_. _QueryTermFilter_ is
> correctly initialized under lock and its internal state (a map of
> _org.apache.lucene.util.BytesRefHash_ instances) is correctly published.
> Later one when those instances are used concurrently a problem with
> _org.apache.lucene.util.BytesRefHash#find_ is triggered since it is not
> thread safe.
> _org.apache.lucene.util.BytesRefHash#find_ internally is using a private
> _org.apache.lucene.util.BytesRefHash#equals_ method, which is using an
> instance field _scratch1_ as a temporary buffer to compare its _ByteRef_
> parameter with contents of _ByteBlockPool_. This is not thread safe and can
> cause incorrect answers as well as _ArrayOutOfBoundException_.
> __
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]