Re: [PR] Remove patching for doc blocks. [lucene]
gf2121 commented on code in PR #12741: URL: https://github.com/apache/lucene/pull/12741#discussion_r1378509674 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsFormat.java: ## @@ -0,0 +1,518 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene99; + +import java.io.IOException; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.FieldsConsumer; +import org.apache.lucene.codecs.FieldsProducer; +import org.apache.lucene.codecs.MultiLevelSkipListWriter; +import org.apache.lucene.codecs.PostingsFormat; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.PostingsWriterBase; +import org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsReader; +import org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SegmentWriteState; +import org.apache.lucene.index.TermState; +import org.apache.lucene.store.DataOutput; +import org.apache.lucene.util.IOUtils; +import org.apache.lucene.util.packed.PackedInts; + +/** + * Lucene 5.0 postings format, which encodes postings in packed integer blocks for fast decode. Review Comment: ```suggestion * Lucene 9.9 postings format, which encodes postings in packed integer blocks for fast decode. ``` ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90PostingsFormat.java: ## @@ -38,6 +37,8 @@ /** * Lucene 5.0 postings format, which encodes postings in packed integer blocks for fast decode. Review Comment: ```suggestion * Lucene 9.0 postings format, which encodes postings in packed integer blocks for fast decode. ``` ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipReader.java: ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene99; + +import java.io.IOException; +import java.util.Arrays; +import org.apache.lucene.codecs.MultiLevelSkipListReader; +import org.apache.lucene.store.IndexInput; + +/** + * Implements the skip list reader for block postings format that stores positions and payloads. + * + * Although this skipper uses MultiLevelSkipListReader as an interface, its definition of skip + * position will be a little different. + * + * For example, when skipInterval = blockSize = 3, df = 2*skipInterval = 6, + * + * + * 0 1 2 3 4 5 + * d d d d d d(posting list) + * ^ ^(skip point in MultiLeveSkipWriter) + * ^(skip point in Lucene90SkipWriter) + * + * + * In this case, MultiLevelSkipListReader will use the last document as a skip point, while + * Lucene90SkipReader should assume no skip point will comes. + * + * If we use the interface directly in Lucene90SkipReader, it may silly try to read another skip + * data after the only skip point is loaded. + * + * To illustrate this, we can call skipTo(d[5]), since skip point d[3] has smaller docId, and + * numSkipped+blockSize== df, the MultiLevelSkipListReader will assume the skip list isn't exhausted + * yet, and try to load a non-existed skip point + * + * Therefore, we'll trim df before passing it to the interface. see trim(int) + */ +public class Lucene99SkipReader ext
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378571480 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +110,34 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException { node = getFallback(nodeIn, hash); if (node != 0) { // it was already in fallback -- promote to primary - primaryTable.set(pos, node); + primaryTable.set(pos, node, fallbackTable.getBytes(node)); } else { // not in fallback either -- freeze & add the incoming node // freeze & add - node = fstCompiler.addNode(nodeIn); + FSTCompiler.NodeAndBuffer nodeAndBuffer = fstCompiler.addNode(nodeIn, true); Review Comment: Can we use logic like below to avoid changing `FSTCompile#addNode`? ``` long startAddress = fstCompiler.bytes.getPosition(); node = fstCompiler.addNode(nodeIn); byte[] bytes = null; if (node != FST.FINAL_END_NODE && node != FST.NON_FINAL_END_NODE) { bytes = new byte[Math.toIntExact(node - startAddress + 1)]; fstCompiler.bytes.copyBytes(startAddress, bytes, 0, bytes.length); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378573421 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; Review Comment: Can we use a data structure like `BytesRefArray` to be more memory efficient? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378577965 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -110,25 +110,34 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException { node = getFallback(nodeIn, hash); if (node != 0) { // it was already in fallback -- promote to primary - primaryTable.set(pos, node); + primaryTable.set(pos, node, fallbackTable.getBytes(node)); } else { // not in fallback either -- freeze & add the incoming node // freeze & add - node = fstCompiler.addNode(nodeIn); + FSTCompiler.NodeAndBuffer nodeAndBuffer = fstCompiler.addNode(nodeIn, true); Review Comment: I was thinking about this too. I can go with this now, just that this means it assume the BytesStore still hold those bytes after adding nodes. It kinda conflict with the idea of flushing the BytesStore after adding nodes that I had in https://github.com/apache/lucene/pull/12624 But I guess we could flush the BytesStore in the beginning of the method instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378599798 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -214,7 +222,13 @@ private long hash(long node) throws IOException { * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), * returning true if they are equal. */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { + private boolean nodesEqual( + PagedGrowableHash table, FSTCompiler.UnCompiledNode node, long address) + throws IOException { +// nocommit: this is non-optimal, we should have a BytesReader that wraps and read the +// ByteBlockPool directly +byte[] bytes = table.getBytes(address); Review Comment: Ahhh that's right, the hash map must retain the true FST offset since that's what future added nodes must link to! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378601593 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; Review Comment: I was trying to avoid double byte copy thus I used the list. I looked at `BytesRefArray` and it seems it would still have the double byte copy problem like `ByteBlockPool`: - When we call `append(BytesRef)`, the byte[] will be copied into the `ByteBlockPool`. We already had 1 copy when (1) reading from the BytesStore on adding node and (2) reading from the fallback table on promotion. - Similarly calling `get(BytesRef, int)` would also require a copy from the `ByteBlockPool` into a temporary `byte[]` - Using `List`, on promotion we can also just share the node `byte[]`, no copy here and memory is shared, thus I think it would be more efficient on both CPU and memory footprint? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
mikemccand commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788744551 > But I ended up using a `List` where each item is a node instead of ByteBlockPool due to the following reasons: Hmm -- this is sizable added RAM overhead per entry. Added array header (16 or 20 bytes), added pointers to these arrays (4 or 8 bytes), tiny objects for GC to crawl, though maybe they mostly die young if the RAM budget is low enough. Yes, we can share some of these `byte[]` with both hashes but I suspect that's not a net/net win. > We automatically get the length of the node without any traversing We are paying 4 bytes per entry for this :) And it seems a shame since FST's node encoding is already self-delimiting, and, as a side effect of looking up that node we will already have computed that length. > With ByteBlockPool we have 2 unavoidable double byte-copies: (1) when write from BytesStore to the primary table and (2) when promote an entry from the fallback table to the primary table. In both situations we need to first write into a temporary byte[]. I don't think this added cost is so much. We freeze the node into the true FST appending byte store, then, we copy those "last N bytes" over to primary hash. Eventually it's moved to fallback, and, maybe it never gets promoted back (single copy), or, maybe it does (+1 copy) but that's "worth it" since we achieve some minimization, i.e. the cost correlates nicely with the win (the whole point of this double LRU hash). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on PR #12738: URL: https://github.com/apache/lucene/pull/12738#issuecomment-1788758864 > Eventually it's moved to fallback, and, maybe it never gets promoted back (single copy), or, maybe it does (+1 copy) There is actually already one copy before this, which is where we read from the BytesStore into the temporary byte[]. So in case it never got promoted it's 2 copy and when it is, it is 4 copy (one to read from the fallback table into a temporary byte[] and one to write to the primary table). When it got promoted, we also had one side effect that the byte can be shared between the primary and fallback. I see there is a tradeoff here. If we don't care too much about CPU then we can use BytesRefArray like @gf2121 suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378650643 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; // 256K blocks, but note that the final block is sized only as needed so it won't use the full // block size when just a few elements were written to it private static final int BLOCK_SIZE_BYTES = 1 << 18; public PagedGrowableHash() { entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); + copiedOffsets = new PagedGrowableWriter(32, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); mask = 15; + offsetMask = 31; + copiedNodes = new ArrayList<>(); } public PagedGrowableHash(long lastNodeAddress, long size) { entries = new PagedGrowableWriter( size, BLOCK_SIZE_BYTES, PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT); + copiedOffsets = + new PagedGrowableWriter( + size * 2, + BLOCK_SIZE_BYTES, + PackedInts.bitsRequired(lastNodeAddress), + PackedInts.COMPACT); mask = size - 1; + offsetMask = size * 2 - 1; Review Comment: Hmm -- why are `entries` and `copiedOffsets` not simply side-by-side values arrays for the hash map? Why is `copiedOffsets` 2X the size? It seems like we could have them precisely match (side by side value array
Re: [PR] Remove patching for doc blocks. [lucene]
slow-J commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1788786005 > Thanks @slow-J ! I left some minor comments about additional `90` -> `99` refactoring. Thanks @gf2121 , committed all the suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
gf2121 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378670256 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +193,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; Review Comment: For reference, you may find great chart that showing the GC difference between `many tiny byte[]` and `big byte[] with offsets` [here](https://github.com/elastic/elasticsearch/pull/21776#issuecomment-262743775) :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1788822945 So, I replicated the jvector benchmark (the lucene part) using the new int8 quantization. Note, this is with `0` fan out or extra top-k gathered. Since the benchmark on JVector didn't specify any recall, etc. I just did the absolute baseline of `top-100`. I reserved 12GB to heap, thus reducing off-heap memory to at most 30GB. ``` 1 thread over 37 segments: completed 1000 searches in 18411 ms: 54 QPS CPU time=18231ms checking results 0.77718.23 1 0 16 100 100 0 1.00post-filter ``` Since kNN allows the segments to be searched in parallel, I used 8 threads for the query rewrite ``` 8 threads over 37 segments: completed 1000 searches in 2996 ms: 333 QPS CPU time=218ms checking results 0.777 0.22 1 0 16 100 100 0 1.00post-filter ``` I am currently force-merging to a single segment to see what a single graph gives us. FYI: the data set would require > 40GB of ram to be held in memory. With int8 quantization, its down to around 10GB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378703915 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; // 256K blocks, but note that the final block is sized only as needed so it won't use the full // block size when just a few elements were written to it private static final int BLOCK_SIZE_BYTES = 1 << 18; public PagedGrowableHash() { entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); + copiedOffsets = new PagedGrowableWriter(32, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); mask = 15; + offsetMask = 31; + copiedNodes = new ArrayList<>(); } public PagedGrowableHash(long lastNodeAddress, long size) { entries = new PagedGrowableWriter( size, BLOCK_SIZE_BYTES, PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT); + copiedOffsets = + new PagedGrowableWriter( + size * 2, + BLOCK_SIZE_BYTES, + PackedInts.bitsRequired(lastNodeAddress), + PackedInts.COMPACT); mask = size - 1; + offsetMask = size * 2 - 1; Review Comment: The index of `entries` are the hash of the node arcs while the index of `copiedOffsets` are the hash of the address hence their positions are not matched. -- This is an automated message from the Apache Git Serv
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378704913 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,119 +197,85 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { +public long copiedBytes; private PagedGrowableWriter entries; +// mapping from FST real address to copiedNodes offsets +private PagedGrowableWriter copiedOffsets; private long count; +// mask for entries private long mask; +// mask for copiedOffsets +private long offsetMask; +private final List copiedNodes; // 256K blocks, but note that the final block is sized only as needed so it won't use the full // block size when just a few elements were written to it private static final int BLOCK_SIZE_BYTES = 1 << 18; public PagedGrowableHash() { entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); + copiedOffsets = new PagedGrowableWriter(32, BLOCK_SIZE_BYTES, 8, PackedInts.COMPACT); mask = 15; + offsetMask = 31; + copiedNodes = new ArrayList<>(); } public PagedGrowableHash(long lastNodeAddress, long size) { entries = new PagedGrowableWriter( size, BLOCK_SIZE_BYTES, PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT); + copiedOffsets = + new PagedGrowableWriter( + size * 2, + BLOCK_SIZE_BYTES, + PackedInts.bitsRequired(lastNodeAddress), + PackedInts.COMPACT); mask = size - 1; + offsetMask = size * 2 - 1; assert (mask & size) == 0 : "size must be a power-of-2; got size=" + size + " mask=" + mask; + copiedNodes = new ArrayList<>(); +} + +public byte[] getBytes(long node) { + long pos = offsetHash(node) & offsetMas
Re: [I] Would SIMD powered sort (on top of Panama) be worth it? [lucene]
mikemccand commented on issue #12399: URL: https://github.com/apache/lucene/issues/12399#issuecomment-1788845861 Some exciting updates here ... Intel continues to improve this "super fast sorting using SIMD instructions" library: https://www.phoronix.com/news/Intel-x86-simd-sort-4.0 And apparently NumPy (fast Python numeric extensions) adopted it, and now also OpenJDK: https://www.phoronix.com/news/Intel-x86-simd-sort-3.0 I wonder which version of OpenJDK we will first see this released in? So, yeah, we should try to use `Arrays.sort` more in Lucene instead of our own sort impls when possible! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] speedup arm int functions? [lucene]
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-179191 I see reasonable speedups on both x64 and ARM, but sadly see no vectorization in the disassembly. The speed seems to come from the 2x instructions/pipelining (?) per strip mined loop iteration, rather than any vectorization (which is not happening). [ I included the Panama/Vector output just for comparison purposes ] Mac M2: main: ``` VectorUtilBenchmark.binaryDotProductScalar1024 thrpt5 3.151 ± 0.044 ops/us VectorUtilBenchmark.binaryDotProductVector1024 thrpt5 7.112 ± 0.030 ops/us ``` PR branch: ``` VectorUtilBenchmark.binaryDotProductScalar1024 thrpt5 4.722 ± 0.031 ops/us VectorUtilBenchmark.binaryDotProductVector1024 thrpt5 7.127 ± 0.042 ops/us ``` I see a reasonable speed up on Linux / x64. main: ``` VectorUtilBenchmark.binaryDotProductScalar1024 thrpt5 2.121 ± 0.010 ops/us VectorUtilBenchmark.binaryDotProductVector1024 thrpt5 21.341 ± 0.039 ops/us ``` PR branch: ``` VectorUtilBenchmark.binaryDotProductScalar1024 thrpt5 2.799 ± 0.022 ops/us VectorUtilBenchmark.binaryDotProductVector1024 thrpt5 21.337 ± 0.014 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] speedup arm int functions? [lucene]
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788910222 thank you for the explanation, so now it just adds another mystery for me. Your x64 is still i5-11400 ? I will look into this more. Yes on the ARM, the assembly didn't look like what i wanted but inputs are `byte` and not `short` which might be relevant. But performance improved, so that means we should investigate wtf is happening and try to take it to the limit :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] speedup arm int functions? [lucene]
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1788919960 and what is the compiler's excuse this time? :) no floating point here. I should just be able to write a simple loop and get good performance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
mikemccand commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378809307 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException { long node = primaryTable.get(pos); if (node == 0) { // node is not in primary table; is it in fallback table? -node = getFallback(nodeIn, hash); -if (node != 0) { +NodeAddressAndLength addressAndLength = getFallback(nodeIn, hash); +if (addressAndLength != null) { + node = addressAndLength.address; // it was already in fallback -- promote to primary - primaryTable.set(pos, node); + // TODO: Copy directly between 2 ByteBlockPool to avoid double-copy + primaryTable.set(pos, node, fallbackTable.getBytes(node, addressAndLength.length)); } else { // not in fallback either -- freeze & add the incoming node + long startAddress = fstCompiler.bytes.getPosition(); // freeze & add node = fstCompiler.addNode(nodeIn); // we use 0 as empty marker in hash table, so it better be impossible to get a frozen node // at 0: - assert node != 0; + assert node != FST.FINAL_END_NODE && node != FST.NON_FINAL_END_NODE; + byte[] buf = new byte[Math.toIntExact(node - startAddress + 1)]; Review Comment: Maybe add `// TODO` to change this to pool-to-pool copy to save the extra copy? But we don't need to do this now ... low priority opto. ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { -private PagedGrowableWriter entries; +public long copiedBytes; +// storing t
Re: [PR] speedup arm int functions? [lucene]
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789007292 > Your x64 is still i5-11400 ? I will look into this more. Yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] speedup arm int functions? [lucene]
ChrisHegarty commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789014829 jmh prof output from my Linux x64 rocket lake - https://gist.github.com/ChrisHegarty/508bb1857cb50df0d757f711c81fd740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378848941 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -107,28 +121,43 @@ public long add(FSTCompiler.UnCompiledNode nodeIn) throws IOException { long node = primaryTable.get(pos); if (node == 0) { // node is not in primary table; is it in fallback table? -node = getFallback(nodeIn, hash); -if (node != 0) { +NodeAddressAndLength addressAndLength = getFallback(nodeIn, hash); +if (addressAndLength != null) { + node = addressAndLength.address; // it was already in fallback -- promote to primary - primaryTable.set(pos, node); + // TODO: Copy directly between 2 ByteBlockPool to avoid double-copy + primaryTable.set(pos, node, fallbackTable.getBytes(node, addressAndLength.length)); } else { // not in fallback either -- freeze & add the incoming node + long startAddress = fstCompiler.bytes.getPosition(); // freeze & add node = fstCompiler.addNode(nodeIn); // we use 0 as empty marker in hash table, so it better be impossible to get a frozen node // at 0: - assert node != 0; + assert node != FST.FINAL_END_NODE && node != FST.NON_FINAL_END_NODE; + byte[] buf = new byte[Math.toIntExact(node - startAddress + 1)]; Review Comment: I have a TODO in `set()` itself, which covers the case of writing from BytesStore. But I could move to here instead. ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { -private PagedGrowableWriter entries; +public long copiedBytes; +// storing the FST node address wh
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378841828 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -328,7 +323,128 @@ private void rehash(long lastNodeAddress) throws IOException { } mask = newMask; - entries = newEntries; + fstHashAddress = newEntries; + + PagedGrowableWriter newCopiedOffsets = + new PagedGrowableWriter( + newSize, BLOCK_SIZE_BYTES, PackedInts.bitsRequired(copiedBytes), PackedInts.COMPACT); + PagedGrowableWriter newFSTOffsets = + new PagedGrowableWriter( + newSize, + BLOCK_SIZE_BYTES, + PackedInts.bitsRequired(lastNodeAddress), + PackedInts.COMPACT); + for (long idx = 0; idx < fstNodeAddress.size(); idx++) { +long address = fstNodeAddress.get(idx); +if (address != 0) { + long pos = Long.hashCode(address) & newMask; + while (true) { +if (newFSTOffsets.get(pos) == 0) { + newFSTOffsets.set(pos, address); + newCopiedOffsets.set(pos, copiedNodeAddress.get(idx)); + break; +} + +pos = (pos + 1) & newMask; + } +} + } + + fstNodeAddress = newFSTOffsets; + copiedNodeAddress = newCopiedOffsets; +} + +// hash code for a frozen node. this must precisely match the hash computation of an unfrozen +// node! +private long hash(long node) throws IOException { + FST.BytesReader in = getBytesReader(node); + + final int PRIME = 31; + + long h = 0; + fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); + while (true) { +h = PRIME * h + scratchArc.label(); +h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); +h = PRIME * h + scratchArc.output().hashCode(); +h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); +if (scratchArc.isFinal()) { + h += 17; +} +if (scratchArc.isLast()) { + break; +} +fstCompiler.fst.readNextRealArc(scratchArc, in); + } + + return h; +} + +/** + * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address + * (long), returning the local copiedNodes start address if the two nodes are matched, or -1 + * otherwise + */ +private int getMatchedNodeLength(FSTCompiler.UnCompiledNode node, long address) Review Comment: This is actually renamed from `nodesEqual` (it was removed), so there is no duplication. The old behavior is essentially `getMatchedNodeLength != -1` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { -private PagedGrowableWriter entries; +public long copiedBytes; +// storing the FST node address where the position is the masked hash of the node arcs Review Comment: > I was thinking we could do this instead with two PagedGrowableWriter, both of which being addressed by the hash(fstNodeAddress) % mask? It's a bit messy as that means we need to know the masked hash of the node when computing the hash. So I put that masked hash (computed using the unfrozen nodes) into the hash function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Should we not enlarge PagedGrowableWriter initial bitPerValue on NodeHash.rehash() [lucene]
dungba88 opened a new issue, #12744: URL: https://github.com/apache/lucene/issues/12744 ### Description Spawn from https://github.com/apache/lucene/pull/12738 It seems on [rehash](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L310C1-L310C56) we will use an initial bitPerValue set to cover the last node address. Does that mean every values, including the ones with low-address will use the same bpv as the high-address nodes? PagedGrowableWriter already enlarge the bpv [automatically](https://github.com/apache/lucene/blob/8fa0de2743e87dd264619632978c8dc68323947a/lucene/core/src/java/org/apache/lucene/util/packed/GrowableWriter.java#L86), so maybe we could always set the initial bpv to a small value (like 8?) Let see if the total size is different when making the change ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Use max BPV encoding in postings if doc buffer size less than ForUtil.BLOCK_SIZE [lucene]
easyice commented on issue #12717: URL: https://github.com/apache/lucene/issues/12717#issuecomment-1789106681 Oh.. Group-varint is a interesting encoder, I'd love to try it later in the week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789218896 Something weird is happening. This commit is causing the following failure: ``` ./gradlew test --tests TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource -Dtests.seed=9E4091CE5D400BA3 ``` I am starting to investigate, but folks involved with this change might find the issue quicker. ``` > java.lang.NoSuchMethodError: 'org.apache.lucene.util.automaton.Automaton org.apache.lucene.util.automaton.Automata.makeStringUnion(java.util.Collection)' > at org.apache.lucene.search.uhighlight.CharArrayMatcher.fromTerms(CharArrayMatcher.java:42) > at org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.buildCombinedAutomaton(MemoryIndexOffsetStrategy.java:71) > at org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.(MemoryIndexOffsetStrategy.java:53) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.getOffsetStrategy(UnifiedHighlighter.java:1238) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.getFieldHighlighter(UnifiedHighlighter.java:1072) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:880) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:814) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:792) > at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:725) > at org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource(TestUnifiedHighlighterTermIntervals.java:607) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.e
Re: [PR] speedup arm int functions? [lucene]
rmuir commented on PR #12743: URL: https://github.com/apache/lucene/pull/12743#issuecomment-1789229323 thanks @ChrisHegarty ! I will try to dig in more to this. Again i don't see speedups on my x86 so something has changed (maybe architecture) thats allowing more to happen in parallel on your CPU or something. wikichip has been down so I don't know any answers yet. nor if it can apply to the vector case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789264349 @benwtrent is this an issue of the code only being partially rebuilt? The signature for `Automata#makeStringUnion` was changed to accept a more general `Iterable` as opposed to a `Collection` in this change. Let me see if I can reproduce it locally as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
benwtrent commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789266991 @gsmiller you are 100% correct 🤦 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789348091 @benwtrent glad it was a simple fix. Sorry it created churn! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] byte to int in TruncateTokenFilterFactory to TruncateTokenFilter [lucene]
asubbu90 commented on issue #12449: URL: https://github.com/apache/lucene/issues/12449#issuecomment-1789370514 Hi @robro612 , you can see I have already opened a PR #12507 on this issue. Do you want to have more context on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]
uschindler commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789483634 Hi @rmuir, I am fine with both approaches. Let me just create a new PR with some changes for the Constants.java class and a separate pkg private utility class for looking up JVM args. This would allow us to handle and log a warning if for some reason the jdk.management module is not available. RAMUsageEstimator has that code and I will refactor it to separate class. I will also move the current lookup of Client VM via sysprop to Constants.java (a constant like `IS_CLIENT_VM`), so we can refer to it at runtime to figure out if JVM has C2 available. The guards like security exception and all stuf will be handled there, too. The checks in Panama and in this PR can then all be done centralized, so you do not need the crazy reflective code. I plan to add a `static Optional getVMOption(String name)` which returns an empty optional if it does not exist or module system / security manager WTF hides it. After that the logic for the Constants.java can be used here and this PR gets easier. I agree to merge it afterwards, after adding the C2 check. All fine then? I don't want to argue here about pros and cons of enabling FMA by default. My standpoint is just different. We have a lot of code in Lucene which may run slow if you choose wrong garbage collector or buggy Java versions or Htospot/CPU combinations. What happens with our scalar code or PFOR if you run it on a Core i3 of 2012 ?!? We can't prevent all slowdowns. If people see slowdowns on AMD or ARM, they have the "easy option" to diable it by passing Hotspot arguments. This can easily be stated in documentation of Solr or Elasticsearch. People that care about speed need to tune anyways. My problem here is that you add another layer of optimization settings that conflict with the knobs the enduser already has. If you disable FMA by default on ARM or AMD there must be the option to turn it on (e.g., sysprop). About the benchmarks: There is a sligt slowdown and a slight speedup on my opteron. The stddev is not too large, but yes it could be both directions. Unfortunately I acnnot easily test it, because you blocked in this PR for me the option to forcefully enable FMA. And that's my big problem. That's not future proof? Why do you want to block people using Lucene 9.9 to still use it in 2 years where maybe all AMD cpus of modern generation are fast? It is not Lucene's responsibility to question defaults set by the JDK. If FMA is so bad (as you aregure on some CPUs, why does Hotspot not disable it by default? Please let the option open to the user. If Elasticserach is afraid of slowing down their users, they can add `+XX:-UseFMA` to their standard jvm-options.properties file shipped with their distribution. This allows users to turn on/off. I agree that we should not use Panama or FMA if C2 is completely disabled. But for everything else we have correct knobs. This is why I strongly -1 to disable FMA on AMD (and also ARM) by default without the possibility to tun is back on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
dweiss commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1789519097 See #12742 - it's a problem with out javac task settings. Running a clean is a workaround. I will provide a fix for this when I get a chance, it's not a critical issue (but it is a bug in how we currently set up task dependencies). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix javac task inputs so that they include modular dependencies #12742 [lucene]
dweiss opened a new pull request, #12745: URL: https://github.com/apache/lucene/pull/12745 Fixes #12742., -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] JavaCompile tasks may be in up-to-date state when modular dependencies have changed leading to odd runtime errors [lucene]
dweiss commented on issue #12742: URL: https://github.com/apache/lucene/issues/12742#issuecomment-1789587275 I've provided a PR that fixes this. It is a corner case of us providing custom javac parameters (modular classpath). I am surprised this took so long to be discovered and I apologize - it's an oversight - we did have a dependsOn clause but it's different than task inputs (dependsOn is more of a task graph scheduling hint, inputs is a task _status_ hint). I've checked the PR by running Hoss's repro: ``` git co 11436a848cbcc8302b31ca01e63e57dd35a93b1e ./gradlew clean classes testClasses git co [head of PR branch] ./gradlew test --tests TestFlattenGraphFilter ``` everything works and the code is recompiled as it should now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379282330 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws IOException { @Override public void close() throws IOException { -IOUtils.close(vectorData, vectorIndex, quantizedVectorData); +IOUtils.close(flatVectorsReader, vectorIndex); } @Override - public OffHeapQuantizedByteVectorValues getQuantizedVectorValues(String field) - throws IOException { -FieldEntry fieldEntry = fields.get(field); -if (fieldEntry == null || fieldEntry.hasQuantizedVectors() == false) { - return null; + public QuantizedByteVectorValues getQuantizedVectorValues(String field) throws IOException { +if (flatVectorsReader instanceof QuantizedVectorsReader) { + return ((QuantizedVectorsReader) flatVectorsReader).getQuantizedVectorValues(field); Review Comment: Bump: this sounds like a good addition to the flat vectors reader? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]
benwtrent opened a new issue, #12746: URL: https://github.com/apache/lucene/issues/12746 ### Description TestDeletionPolicy.testOpenPriorSnapshot fails. Assertion fails on assuming we are on a previous commit with more than one leaf. Fails on main and 9.x. stack trace ``` org.apache.lucene.index.TestDeletionPolicy > testOpenPriorSnapshot FAILED java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([5B24813C1DE35AA2:32FD48B202239F7A]:0) at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.lucene.index.TestDeletionPolicy.testOpenPriorSnapshot(TestDeletionPolicy.java:524) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) at java.base/java.lang.Thread.run(Thread.java:1583) ``` ### Gradle command to reproduce ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.index.TestDeletionPolicy.testOpenPriorSnapshot" -Ptests.jvms=12 -P
Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]
benwtrent commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789639897 I verified this doesn't happen in 9.8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]
jpountz closed issue #12746: TestDeletionPolicy.testOpenPriorSnapshot failing URL: https://github.com/apache/lucene/issues/12746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestDeletionPolicy.testOpenPriorSnapshot failing [lucene]
jpountz commented on issue #12746: URL: https://github.com/apache/lucene/issues/12746#issuecomment-1789651692 Thanks for reporting, I pushed a fix at https://github.com/apache/lucene/commit/66324f763fc7fb0d8e7cd6f334e5438f0171c84e. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up disjunctions by computing estimations of the score of the k-th top hit up-front. [lucene]
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1789662266 @mikemccand FYI I gave a try at adding some interesting boolean queries to nightly benchmarks at https://github.com/mikemccand/luceneutil/pull/240. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Clean up UnCompiledNode.inputCount [lucene]
dungba88 closed pull request #12735: Clean up UnCompiledNode.inputCount URL: https://github.com/apache/lucene/pull/12735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use value-based LRU cache in NodeHash [lucene]
dungba88 commented on code in PR #12738: URL: https://github.com/apache/lucene/pull/12738#discussion_r1378882326 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode node) { return h; } - // hash code for a frozen node. this must precisely match the hash computation of an unfrozen - // node! - private long hash(long node) throws IOException { -final int PRIME = 31; - -long h = 0; -fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in); -while (true) { - h = PRIME * h + scratchArc.label(); - h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 32)); - h = PRIME * h + scratchArc.output().hashCode(); - h = PRIME * h + scratchArc.nextFinalOutput().hashCode(); - if (scratchArc.isFinal()) { -h += 17; - } - if (scratchArc.isLast()) { -break; - } - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -return h; - } - - /** - * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte location address (long), - * returning true if they are equal. - */ - private boolean nodesEqual(FSTCompiler.UnCompiledNode node, long address) throws IOException { -fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in); - -// fail fast for a node with fixed length arcs -if (scratchArc.bytesPerArc() != 0) { - assert node.numArcs > 0; - // the frozen node uses fixed-with arc encoding (same number of bytes per arc), but may be - // sparse or dense - switch (scratchArc.nodeFlags()) { -case FST.ARCS_FOR_BINARY_SEARCH: - // sparse - if (node.numArcs != scratchArc.numArcs()) { -return false; - } - break; -case FST.ARCS_FOR_DIRECT_ADDRESSING: - // dense -- compare both the number of labels allocated in the array (some of which may - // not actually be arcs), and the number of arcs - if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != scratchArc.numArcs() - || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) { -return false; - } - break; -default: - throw new AssertionError("unhandled scratchArc.nodeFlag() " + scratchArc.nodeFlags()); - } -} - -// compare arc by arc to see if there is a difference -for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) { - final FSTCompiler.Arc arc = node.arcs[arcUpto]; - if (arc.label != scratchArc.label() - || arc.output.equals(scratchArc.output()) == false - || ((FSTCompiler.CompiledNode) arc.target).node != scratchArc.target() - || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false - || arc.isFinal != scratchArc.isFinal()) { -return false; - } - - if (scratchArc.isLast()) { -if (arcUpto == node.numArcs - 1) { - return true; -} else { - return false; -} - } - - fstCompiler.fst.readNextRealArc(scratchArc, in); -} - -// unfrozen node has fewer arcs than frozen node - -return false; - } - /** Inner class because it needs access to hash function and FST bytes. */ private class PagedGrowableHash { -private PagedGrowableWriter entries; +public long copiedBytes; +// storing the FST node address where the position is the masked hash of the node arcs Review Comment: > I was thinking we could do this instead with two PagedGrowableWriter, both of which being addressed by the hash(fstNodeAddress) % mask? It's a bit tricky as that means we need to know the masked hash of the node when computing the hash. So I put that masked hash (computed using the unfrozen nodes) into the hash function. A bit messy but it removes one hash tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1789905123 OK. sounds good to me @uschindler . I agree the user should be in control. And I'd like to avoid us adding tons of flags for this stuff unless we need it for testing. that's just my opinion. So i really like respecting JVM flags for stuff like UseFMA too. But existing UseFMA has bad defaults in the JDK, that is my problem. Sure you get a little extra precision "for free" if you are doing trivial stuff in a serial loop. but if you are trying to max out performance, it only makes sense for intel cpus from what i've seen so far. When trying to max out the throughput on intel, you need FMA on. When its amd or arm neon, you need to use mul+add to keep everything busier. And I don't recommend any user TURN OFF UseFMA for any reason, ever. Chances are they are doing more than just using lucene library in their application and this could have disastrous consequences and slowdowns. So I agree with a specific sysprop just for this case. It is a unique situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] stabilize vectorutil benchmark [lucene]
rmuir opened a new pull request, #12747: URL: https://github.com/apache/lucene/pull/12747 This benchmark is too noisy across forks which makes comparisons impossible and misleading. Especially vectorized float methods on some machines: it is almost useless. I spent some time to reduce the noise, unfortunately it results in the benchmark being 2x slower. We can do tricky jvm args but what is best is to just do many forks. I now see reasonably stable results on Graviton3E with 256-bit ARM SVE (!) and look forward to actually investigating this thing more than just stabilizing this benchmark :) ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryCosineScalar1024 thrpt 15 0.842 ± 0.001 ops/us VectorUtilBenchmark.binaryCosineVector1024 thrpt 15 4.811 ± 0.006 ops/us VectorUtilBenchmark.binaryDotProductScalar1024 thrpt 15 2.370 ± 0.001 ops/us VectorUtilBenchmark.binaryDotProductVector1024 thrpt 15 8.037 ± 0.001 ops/us VectorUtilBenchmark.binarySquareScalar1024 thrpt 15 2.450 ± 0.034 ops/us VectorUtilBenchmark.binarySquareVector1024 thrpt 15 6.703 ± 0.010 ops/us VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.682 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 5.498 ± 0.004 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.409 ± 0.013 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 12.446 ± 0.343 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.172 ± 0.007 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.655 ± 0.152 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] stabilize vectorutil benchmark [lucene]
rmuir commented on PR #12747: URL: https://github.com/apache/lucene/pull/12747#issuecomment-1790041666 and i know it is annoying it runs slower, i really tried to not be overkill, but some methods here are super simple and super stable and some are unstable. So I gave all methods minimum 3 forks for safety: this way if there is noise between forks we actually see that (!). Those vector float methods get 15 forks for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up vectorutil float scalar methods, unroll properly, use fma where possible [lucene]
rmuir commented on PR #12737: URL: https://github.com/apache/lucene/pull/12737#issuecomment-1790097686 ARM Graviton3E (256-bit SVE) Master (well #12747 for sanity): ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 0.682 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 5.498 ± 0.004 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 2.409 ± 0.013 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 12.446 ± 0.343 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 2.172 ± 0.007 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 11.655 ± 0.152 ops/us ``` Patch ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.floatCosineScalar 1024 thrpt 15 1.423 ± 0.001 ops/us VectorUtilBenchmark.floatCosineVector 1024 thrpt 75 8.882 ± 0.052 ops/us VectorUtilBenchmark.floatDotProductScalar 1024 thrpt 15 3.751 ± 0.006 ops/us VectorUtilBenchmark.floatDotProductVector 1024 thrpt 75 13.033 ± 0.056 ops/us VectorUtilBenchmark.floatSquareScalar 1024 thrpt 15 3.201 ± 0.014 ops/us VectorUtilBenchmark.floatSquareVector 1024 thrpt 75 12.943 ± 0.250 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] StringsToAutomaton#build to take List as parameter instead of Collection [lucene]
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1790129043 Nice!I like it how we discovered a build issue bug(which was hidden for sometime now I think) and fixed it. Thanks for solving @dweiss! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org