Re: [PR] Udpate ReadTask to not rely on search(Query, Collector) [lucene]
javanna commented on code in PR #13602: URL: https://github.com/apache/lucene/pull/13602#discussion_r1696627499 ## lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/SearchWithCollectorTask.java: ## @@ -46,17 +46,17 @@ public boolean withCollector() { } @Override - protected Collector createCollector() throws Exception { -Collector collector = null; + protected CollectorManager createCollectorManager() throws Exception { +CollectorManager collectorManager; if (clnName.equalsIgnoreCase("topScoreDoc") == true) { - collector = TopScoreDocCollector.create(numHits(), Integer.MAX_VALUE); + collectorManager = new TopScoreDocCollectorManager(numHits(), Integer.MAX_VALUE); } else if (clnName.length() > 0) { - collector = Class.forName(clnName).asSubclass(Collector.class).getConstructor().newInstance(); - + collectorManager = + Class.forName(clnName).asSubclass(CollectorManager.class).getConstructor().newInstance(); } else { - collector = super.createCollector(); + collectorManager = super.createCollectorManager(); } -return collector; +return collectorManager; Review Comment: I have now added a changes entry, and replaced collector.class with a new config parameter. We throw error once collector.class is provided, so we fail fast and users know what they have to do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Udpate ReadTask to not rely on search(Query, Collector) [lucene]
javanna commented on code in PR #13602: URL: https://github.com/apache/lucene/pull/13602#discussion_r1696628573 ## lucene/benchmark/conf/collector.alg: ## @@ -17,11 +17,10 @@ # - # multi val params are iterated by NewRound's, added to reports, start with column name. -# collector.class can be: -#Fully Qualified Class Name of a Collector with a empty constructor -#topScoreDocOrdered - Creates a TopScoreDocCollector that requires in order docs -#topScoreDocUnordered - Like above, but allows out of order -collector.class=coll:topScoreDoc +# collector.manager.class can be: +#Fully Qualified Class Name of a CollectorManager with a empty constructor +#topScoreDoc - Creates a TopScoreDocCollectorManager +collector.manager.class=coll:topScoreDoc Review Comment: This looked like it needed updating following my changes, but I could not tell where these files are loaded, and what I should run to verify that my update works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
javanna commented on PR #13603: URL: https://github.com/apache/lucene/pull/13603#issuecomment-2257899247 Thanks a lot for your review @gsmiller , much appreciated, and for the reminder around the CHANGES entry. Would you like to review the wording perhaps? Otherwise, this should be good to go. My thinking is that it can be backported to branch_9x as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Remove the @Deprecated methods from TopScoreDocCollector and TopFieldCollector [lucene]
javanna closed issue #13499: Remove the @Deprecated methods from TopScoreDocCollector and TopFieldCollector URL: https://github.com/apache/lucene/issues/13499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Removing all deprecated TopScoreDocCollector + TopFieldCollector methods (#create, #createSharedManager) [lucene]
javanna merged PR #13617: URL: https://github.com/apache/lucene/pull/13617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Remove the @Deprecated methods from TopScoreDocCollector and TopFieldCollector [lucene]
javanna closed issue #13499: Remove the @Deprecated methods from TopScoreDocCollector and TopFieldCollector URL: https://github.com/apache/lucene/issues/13499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Removing all deprecated TopScoreDocCollector + TopFieldCollector methods (#create, #createSharedManager) [lucene]
javanna commented on PR #13617: URL: https://github.com/apache/lucene/pull/13617#issuecomment-2257918667 Thanks again @slow-J . Heads up, I moved the changes entry to the "API changes" section above after merging, I did not catch this while reviewing :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom commented on PR #13559: URL: https://github.com/apache/lucene/pull/13559#issuecomment-2258002016 I've added new `BitSet#firstSetBitInRange` method with maybe naive implementations for both `SparseFixedBitSet` and `FixedBitSet`. I've also changed `BlockJoinSelector` to use it. `luceneutil` benchmark results seem to be neutral, but as far as I understand there are no tasks that trigger `BlockJoinSelector`? I'll run our internal tests to see if it helps, and will add a line to CHANGES if otherwise the change looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Removing all deprecated TopScoreDocCollector + TopFieldCollector methods (#create, #createSharedManager) [lucene]
slow-J commented on PR #13617: URL: https://github.com/apache/lucene/pull/13617#issuecomment-2258011155 My bad, sorry for the mistake, thanks for catching it and the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696774526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene912; + +import static org.apache.lucene.codecs.lucene912.ForUtil.BLOCK_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.DOC_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.PAY_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.POS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.SKIP_TOTAL_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.TERMS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_CURRENT; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_START; + +import java.io.IOException; +import java.util.AbstractList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.RandomAccess; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.Impact; +import org.apache.lucene.index.Impacts; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SlowImpactsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.ReadAdvice; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.IOUtils; + +/** + * Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. + * + * @lucene.experimental + */ +public final class Lucene912PostingsReader extends PostingsReaderBase { + + private final IndexInput docIn; + private final IndexInput posIn; + private final IndexInput payIn; + + private final int version; + + /** Sole constructor. */ + public Lucene912PostingsReader(SegmentReadState state) throws IOException { +boolean success = false; +IndexInput docIn = null; +IndexInput posIn = null; +IndexInput payIn = null; + +// NOTE: these data files are too costly to verify checksum against all the bytes on open, +// but for now we at least verify proper structure of the checksum footer: which looks +// for FOOTER_MAGIC + algorithmID. This is cheap and can detect some forms of corruption +// such as file truncation. + +String docName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.DOC_EXTENSION); +try { + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + version = + CodecUtil.checkIndexHeader( + docIn, + DOC_CODEC, + VERSION_START, + VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.retrieveChecksum(docIn); + + if (state.fieldInfos.hasProx()) { +String proxName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.POS_EXTENSION); +posIn = state.directory.openInput(proxName, state.context); +CodecUtil.checkIndexHeader( +posIn, POS_CODEC, version, v
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696776133 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene912; + +import static org.apache.lucene.codecs.lucene912.ForUtil.BLOCK_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.DOC_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.PAY_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.POS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.SKIP_TOTAL_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.TERMS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_CURRENT; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_START; + +import java.io.IOException; +import java.util.AbstractList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.RandomAccess; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.Impact; +import org.apache.lucene.index.Impacts; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SlowImpactsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.ReadAdvice; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.IOUtils; + +/** + * Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. + * + * @lucene.experimental + */ +public final class Lucene912PostingsReader extends PostingsReaderBase { + + private final IndexInput docIn; + private final IndexInput posIn; + private final IndexInput payIn; + + private final int version; + + /** Sole constructor. */ + public Lucene912PostingsReader(SegmentReadState state) throws IOException { +boolean success = false; +IndexInput docIn = null; +IndexInput posIn = null; +IndexInput payIn = null; + +// NOTE: these data files are too costly to verify checksum against all the bytes on open, +// but for now we at least verify proper structure of the checksum footer: which looks +// for FOOTER_MAGIC + algorithmID. This is cheap and can detect some forms of corruption +// such as file truncation. + +String docName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.DOC_EXTENSION); +try { + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + version = + CodecUtil.checkIndexHeader( + docIn, + DOC_CODEC, + VERSION_START, + VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.retrieveChecksum(docIn); + + if (state.fieldInfos.hasProx()) { +String proxName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.POS_EXTENSION); +posIn = state.directory.openInput(proxName, state.context); +CodecUtil.checkIndexHeader( +posIn, POS_CODEC, version, v
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696777219 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene912; + +import static org.apache.lucene.codecs.lucene912.ForUtil.BLOCK_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.DOC_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.PAY_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.POS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.SKIP_TOTAL_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.TERMS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_CURRENT; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_START; + +import java.io.IOException; +import java.util.AbstractList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.RandomAccess; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.Impact; +import org.apache.lucene.index.Impacts; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SlowImpactsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.ReadAdvice; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.IOUtils; + +/** + * Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. + * + * @lucene.experimental + */ +public final class Lucene912PostingsReader extends PostingsReaderBase { + + private final IndexInput docIn; + private final IndexInput posIn; + private final IndexInput payIn; + + private final int version; + + /** Sole constructor. */ + public Lucene912PostingsReader(SegmentReadState state) throws IOException { +boolean success = false; +IndexInput docIn = null; +IndexInput posIn = null; +IndexInput payIn = null; + +// NOTE: these data files are too costly to verify checksum against all the bytes on open, +// but for now we at least verify proper structure of the checksum footer: which looks +// for FOOTER_MAGIC + algorithmID. This is cheap and can detect some forms of corruption +// such as file truncation. + +String docName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.DOC_EXTENSION); +try { + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + version = + CodecUtil.checkIndexHeader( + docIn, + DOC_CODEC, + VERSION_START, + VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.retrieveChecksum(docIn); + + if (state.fieldInfos.hasProx()) { +String proxName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.POS_EXTENSION); +posIn = state.directory.openInput(proxName, state.context); +CodecUtil.checkIndexHeader( +posIn, POS_CODEC, version, v
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696780288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene912; + +import static org.apache.lucene.codecs.lucene912.ForUtil.BLOCK_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.DOC_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.PAY_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.POS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.SKIP_TOTAL_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.TERMS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_CURRENT; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_START; + +import java.io.IOException; +import java.util.AbstractList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.RandomAccess; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.Impact; +import org.apache.lucene.index.Impacts; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SlowImpactsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.ReadAdvice; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.IOUtils; + +/** + * Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. + * + * @lucene.experimental + */ +public final class Lucene912PostingsReader extends PostingsReaderBase { + + private final IndexInput docIn; + private final IndexInput posIn; + private final IndexInput payIn; + + private final int version; + + /** Sole constructor. */ + public Lucene912PostingsReader(SegmentReadState state) throws IOException { +boolean success = false; +IndexInput docIn = null; +IndexInput posIn = null; +IndexInput payIn = null; + +// NOTE: these data files are too costly to verify checksum against all the bytes on open, +// but for now we at least verify proper structure of the checksum footer: which looks +// for FOOTER_MAGIC + algorithmID. This is cheap and can detect some forms of corruption +// such as file truncation. + +String docName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.DOC_EXTENSION); +try { + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + version = + CodecUtil.checkIndexHeader( + docIn, + DOC_CODEC, + VERSION_START, + VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.retrieveChecksum(docIn); + + if (state.fieldInfos.hasProx()) { +String proxName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.POS_EXTENSION); +posIn = state.directory.openInput(proxName, state.context); +CodecUtil.checkIndexHeader( +posIn, POS_CODEC, version, v
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on code in PR #13585: URL: https://github.com/apache/lucene/pull/13585#discussion_r1696779655 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -0,0 +1,2028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene912; + +import static org.apache.lucene.codecs.lucene912.ForUtil.BLOCK_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.DOC_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.PAY_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.POS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.SKIP_TOTAL_SIZE; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.TERMS_CODEC; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_CURRENT; +import static org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.VERSION_START; + +import java.io.IOException; +import java.util.AbstractList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.RandomAccess; +import org.apache.lucene.codecs.BlockTermState; +import org.apache.lucene.codecs.CodecUtil; +import org.apache.lucene.codecs.PostingsReaderBase; +import org.apache.lucene.codecs.lucene912.Lucene912PostingsFormat.IntBlockTermState; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.Impact; +import org.apache.lucene.index.Impacts; +import org.apache.lucene.index.ImpactsEnum; +import org.apache.lucene.index.IndexFileNames; +import org.apache.lucene.index.IndexOptions; +import org.apache.lucene.index.PostingsEnum; +import org.apache.lucene.index.SegmentReadState; +import org.apache.lucene.index.SlowImpactsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteArrayDataInput; +import org.apache.lucene.store.DataInput; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.ReadAdvice; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.BitUtil; +import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.BytesRefBuilder; +import org.apache.lucene.util.IOUtils; + +/** + * Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. + * + * @lucene.experimental + */ +public final class Lucene912PostingsReader extends PostingsReaderBase { + + private final IndexInput docIn; + private final IndexInput posIn; + private final IndexInput payIn; + + private final int version; + + /** Sole constructor. */ + public Lucene912PostingsReader(SegmentReadState state) throws IOException { +boolean success = false; +IndexInput docIn = null; +IndexInput posIn = null; +IndexInput payIn = null; + +// NOTE: these data files are too costly to verify checksum against all the bytes on open, +// but for now we at least verify proper structure of the checksum footer: which looks +// for FOOTER_MAGIC + algorithmID. This is cheap and can detect some forms of corruption +// such as file truncation. + +String docName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.DOC_EXTENSION); +try { + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + version = + CodecUtil.checkIndexHeader( + docIn, + DOC_CODEC, + VERSION_START, + VERSION_CURRENT, + state.segmentInfo.getId(), + state.segmentSuffix); + CodecUtil.retrieveChecksum(docIn); + + if (state.fieldInfos.hasProx()) { +String proxName = +IndexFileNames.segmentFileName( +state.segmentInfo.name, state.segmentSuffix, Lucene912PostingsFormat.POS_EXTENSION); +posIn = state.directory.openInput(proxName, state.context); +CodecUtil.checkIndexHeader( +posIn, POS_CODEC, version, v
Re: [PR] Inline skip data into postings lists [lucene]
jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258120692 Thanks @mikemccand for taking a look at this large PR! I think I applied all your suggestions. The format docs should be up-to-date wrt how skip data is stored, and I did the codec dance (moving Lucene99PostingsFormat and Lucene99Codec to lucene/backward-codecs, making Lucene99PostingsFormat read-only, adding a test-only Lucene99RWPostingsFormat for testing, moving all our special terms dictionary formats to Lucene912PostingsFormat, moving the completion postings format to Lucene912PostingsFormat). Note that I removed `TestLazyProxSkipping`, which assumed separate skip data and postings. I plan on merging soon, as this PR is now in a state where conflicts easily arise and are hard to address. One cannot run luceneutil anymore as luceneutil complains that `Lucene99Codec` is missing, but my last run before doing the codec dance was very similar to the last one shared above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Release memory for cancelled tasks earlier in TaskExecutor [lucene]
javanna commented on PR #13609: URL: https://github.com/apache/lucene/pull/13609#issuecomment-2258193070 Ok , my hope was that we can somehow test cancellations with a scenario that's closer to reality, but I do see challenges. I also spent quite a bit of time on this and hit different issues that made testing very difficult. Perhaps we could at least write a test that fails without the change you made, around nulling memory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Release memory for cancelled tasks earlier in TaskExecutor [lucene]
original-brownbear commented on PR #13609: URL: https://github.com/apache/lucene/pull/13609#issuecomment-2258283776 @javanna > Perhaps we could at least write a test that fails without the change you made, around nulling memory? I'm really sorry. I thought about this quite a bit, but outside of actually testing that the callables go out of scope via some `Cleaner` magic I think there's no way to do this, all of the functionality is internal to the task executor class. There's just not really any functional change here, it's just that I made the `Callable`s get GCed quicker and saved a little bit of indirection in general. I think this might be one of these things where we have to trust the code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Deduplicate bytes for `FieldReader#rootCode` [lucene]
jpountz commented on code in PR #13610: URL: https://github.com/apache/lucene/pull/13610#discussion_r1696951086 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java: ## @@ -100,6 +99,14 @@ public final class FieldReader extends Terms { w.close(); } */ +BytesRef emptyOutput = metadata.getEmptyOutput(); +if (rootCode.equals(emptyOutput) == false) { + // TODO: this branch is never taken? Review Comment: Nit: Remove the question mark, which suggests we're not sure, while both the code and Mike suggest that they are indeed always the same? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
javanna merged PR #13603: URL: https://github.com/apache/lucene/pull/13603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
javanna commented on PR #13603: URL: https://github.com/apache/lucene/pull/13603#issuecomment-2258605434 Thank you again for the review @gsmiller ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] [9.x] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
javanna opened a new pull request, #13619: URL: https://github.com/apache/lucene/pull/13619 Backport of #13603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
gsmiller commented on PR #13603: URL: https://github.com/apache/lucene/pull/13603#issuecomment-2258647048 @javanna yeah no problem. All looks good to me. +1 to backporting to 9.12. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [9.x] Introduce IndexSearcher#searchLeaf(LeafReaderContext, Weight, Collector) method [lucene]
javanna merged PR #13619: URL: https://github.com/apache/lucene/pull/13619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Inline skip data into postings lists [lucene]
mikemccand commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2258788196 > Note that I removed `TestLazyProxSkipping`, which assumed separate skip data and postings. YAY! > I plan on merging soon, as this PR is now in a state where conflicts easily arise and are hard to address. +1 to merge now. This is an amazing improvement. Thanks @jpountz! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestFSTs.testRandomWords reproducible NPE [lucene]
slow-J commented on issue #13174: URL: https://github.com/apache/lucene/issues/13174#issuecomment-2258837473 Should we close this if this issue no longer manifests? I cannot repro on `main` at the newest commit https://github.com/apache/lucene/commit/30c965ea575a5e75c2bf724a340aa690d82f1ec5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1697317194 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + +// TODO this is a temporary hack to force testing against multiple leaf reader context slices. +// It must be reverted before merging. +maxDocsPerSlice = 1; +maxSegmentsPerSlice = 1; +// end hack + // Make a copy so we can sort: List sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: -Collections.sort( -sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; +sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; -final List> groupedLeaves = new ArrayList<>(); -long docSum = 0; -List group = null; +final List> groupedLeafPartitions = new ArrayList<>(); +int currentSliceNumDocs = 0; +List group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; -groupedLeaves.add(Collections.singletonList(ctx)); +// if the segment does not fit in a single slice, we split it in multiple partitions of Review Comment: Going back to this, especially on @stefanvodita 's comment above, it may make sense to introduce a different way to control the number of slices, instead of using the existing maxDocsPerSlice argument. It is kind of nice that we can apply the existing value and partition segments accordingly, but that can also be surprising and generate way too many partitions perhaps. Good thing to get further feedback on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Removing all deprecated TopScoreDocCollector + TopFieldCollector methods (#create, #createSharedManager) [lucene]
javanna commented on PR #13617: URL: https://github.com/apache/lucene/pull/13617#issuecomment-2258904454 > My bad, sorry for the mistake, thanks for catching it and the review! no big deal, just a small oversight. Cheers! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]
kaivalnp commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1697413263 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -103,16 +114,18 @@ public Explanation explain(LeafReaderContext context, int doc) throws IOExceptio public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOException { LeafReader leafReader = context.reader(); Bits liveDocs = leafReader.getLiveDocs(); -final Scorer vectorSimilarityScorer; + +QueryTimeout queryTimeout = searcher.getTimeout(); +TimeLimitingKnnCollectorManager timeLimitingKnnCollectorManager = Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]
kaivalnp commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1697415078 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -143,27 +156,23 @@ protected boolean match(int doc) { } // Perform an approximate search - TopDocs results = approximateSearch(context, acceptDocs, cardinality); + TopDocs results = + approximateSearch(context, acceptDocs, cardinality, timeLimitingKnnCollectorManager); - // If the limit was exhausted - if (results.totalHits.relation == TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO) { -// Return a lazy-loading iterator -vectorSimilarityScorer = -VectorSimilarityScorer.fromAcceptDocs( -this, -boost, -createVectorScorer(context), -new BitSetIterator(acceptDocs, cardinality), -resultSimilarity); - } else if (results.scoreDocs.length == 0) { -return null; - } else { + if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO + // Return partial results only when timeout is met + || (queryTimeout != null && queryTimeout.shouldExit())) { // Return an iterator over the collected results -vectorSimilarityScorer = -VectorSimilarityScorer.fromScoreDocs(this, boost, results.scoreDocs); +return VectorSimilarityScorerSupplier.fromScoreDocs(boost, results.scoreDocs); + } else { Review Comment: Makes sense, I've added tests to check for a filter + non-null timeout -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
gsmiller commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697210371 ## lucene/core/src/java/org/apache/lucene/util/BitSet.java: ## @@ -92,6 +92,12 @@ public void clear() { */ public abstract int nextSetBit(int index); + /** + * Returns the index of the first set bit from start (inclusive) until end (inclusive). {@link Review Comment: I would tend to expect `end` to be exclusive in an API like this. (e.g., think of something like `String#substring(from, to)` in Java). WDYT about changing this? ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -291,6 +291,32 @@ public int nextSetBit(int index) { return DocIdSetIterator.NO_MORE_DOCS; } + @Override + public int firstSetBitInRange(int start, int upperBound) { +// Depends on the ghost bits being clear! +assert start >= 0 && start < numBits : "index=" + start + ", numBits=" + numBits; +assert start <= upperBound : "index=" + start + ", upperBound=" + upperBound; +int i = start >> 6; +long word = bits[i] >> start; // skip all the bits to the right of index + +if (word != 0) { + int res = start + Long.numberOfTrailingZeros(word); + return res > upperBound ? DocIdSetIterator.NO_MORE_DOCS : res; +} + +int maxWord = Math.min((upperBound >> 6) + 1, numWords); Review Comment: nit: `maxWord` is slightly confusing naming IMO. I would have expected it to be an _inclusive_ max index but it's _exclusive_. The logic all makes sense to me, but maybe rename to something like `limit` (or change it to inclusive and keep the name)? ## lucene/core/src/java/org/apache/lucene/util/SparseFixedBitSet.java: ## @@ -353,14 +355,47 @@ public int nextSetBit(int i) { final long indexBits = index >>> i64 >>> 1; if (indexBits == 0) { // no more bits are set in the current block of 4096 bits, go to the next one - return firstDoc(i4096 + 1); + return firstDoc(i4096 + 1, indices.length); } // there are still set bits i64 += 1 + Long.numberOfTrailingZeros(indexBits); final long bits = bitArray[o]; return (i64 << 6) | Long.numberOfTrailingZeros(bits); } + @Override + public int firstSetBitInRange(int start, int upperBound) { +assert start < length; +assert upperBound >= start; +final int i4096 = start >>> 12; +final long index = indices[i4096]; +final long[] bitArray = this.bits[i4096]; +int i64 = start >>> 6; +int o = Long.bitCount(index & ((1L << i64) - 1)); Review Comment: nit: since you repeat `1L << i64` a couple times here, maybe introduce `final long i64bit = 1L << i64;` (like we do in `#get`) ## lucene/join/src/java/org/apache/lucene/search/join/BlockJoinSelector.java: ## @@ -64,14 +64,19 @@ public boolean get(int docID) { return false; } -final int firstChild = parents.prevSetBit(docID - 1) + 1; -for (int child = children.nextSetBit(firstChild); -child < docID; -child = children.nextSetBit(child + 1)) { - if (docsWithValue.get(child)) { -return true; +final int lastPotentialChild = docID - 1; + +final int firstPotentialChild = parents.prevSetBit(lastPotentialChild) + 1; +if (firstPotentialChild < docID) { + for (int child = children.firstSetBitInRange(firstPotentialChild, lastPotentialChild); + child != DocIdSetIterator.NO_MORE_DOCS; + child = children.firstSetBitInRange(child + 1, lastPotentialChild)) { +if (docsWithValue.get(child)) { + return true; +} Review Comment: nit: I would find this just slightly more readable, but I don't have a strong preference. Just a small suggestion. ```suggestion assert firstPotentialChild <= docID; if (firstPotentialChild == docID) { // no children return false; } for (int child = children.firstSetBitInRange(firstPotentialChild, lastPotentialChild); child != DocIdSetIterator.NO_MORE_DOCS; child = children.firstSetBitInRange(child + 1, lastPotentialChild)) { if (docsWithValue.get(child)) { return true; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestFSTs.testRandomWords reproducible NPE [lucene]
stefanvodita commented on issue #13174: URL: https://github.com/apache/lucene/issues/13174#issuecomment-2259007249 It's likely that an issue might no longer be reproducible on a more recent commit because an unrelated change would have impacted the random numbers we generate. I wouldn't assume that this issue is fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1697452900 ## lucene/core/src/c/dotProduct.c: ## @@ -0,0 +1,143 @@ +// dotProduct.c + +#include +#include + +#ifdef __ARM_ACLE +#include +#endif + +#if (defined(__ARM_FEATURE_SVE) && !defined(__APPLE__)) +#include +/* + * Unrolled and vectorized int8 dotProduct implementation using SVE instructions + * NOTE: Clang 15.0 compiler on Apple M3 Max compiles the code below sucessfully + * with '-march=native+sve' option but throws "Illegal Hardware Instruction" error + * Looks like Apple M3 does not implement SVE and Apple's official documentation + * is not explicit about this or at least I could not find it. + * + */ Review Comment: I think we should remove the ifdef, this does not happen with `-march=native`, correct? The problem is only you try to "force" SVE? afaik, M3 doesnt support it and so `-march=native` should automatically take the neon path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1697455240 ## lucene/core/src/c/dotProduct.h: ## @@ -0,0 +1,4 @@ + +int32_t vdot8s_sve(int8_t* vec1[], int8_t* vec2, int32_t limit); +int32_t vdot8s_neon(int8_t* vec1[], int8_t* vec2[], int32_t limit); +int32_t dot8s(int8_t* a, int8_t* b, int32_t limit); Review Comment: can we fix these prototypes to all be the same? Can we include from the .c file? Maybe also dding `-Wall -Werror` will help keep the code tidy? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1697462600 ## lucene/core/src/c/dotProduct.c: ## @@ -0,0 +1,143 @@ +// dotProduct.c + +#include +#include + +#ifdef __ARM_ACLE +#include +#endif + +#if (defined(__ARM_FEATURE_SVE) && !defined(__APPLE__)) +#include +/* + * Unrolled and vectorized int8 dotProduct implementation using SVE instructions + * NOTE: Clang 15.0 compiler on Apple M3 Max compiles the code below sucessfully + * with '-march=native+sve' option but throws "Illegal Hardware Instruction" error + * Looks like Apple M3 does not implement SVE and Apple's official documentation + * is not explicit about this or at least I could not find it. + * + */ +int32_t vdot8s_sve(int8_t *vec1, int8_t *vec2, int32_t limit) { Review Comment: purely cosmetic, but maybe change these all to array syntax, which will be easier for java developers, and transform pointer arithmetic such as `vec1 + i + 2 * vec_length` into `vec1[i + 2 * vec_length]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1697469288 ## lucene/core/src/c/dotProduct.c: ## @@ -0,0 +1,143 @@ +// dotProduct.c + +#include +#include + +#ifdef __ARM_ACLE +#include +#endif + +#if (defined(__ARM_FEATURE_SVE) && !defined(__APPLE__)) +#include +/* + * Unrolled and vectorized int8 dotProduct implementation using SVE instructions + * NOTE: Clang 15.0 compiler on Apple M3 Max compiles the code below sucessfully + * with '-march=native+sve' option but throws "Illegal Hardware Instruction" error + * Looks like Apple M3 does not implement SVE and Apple's official documentation + * is not explicit about this or at least I could not find it. + * + */ +int32_t vdot8s_sve(int8_t *vec1, int8_t *vec2, int32_t limit) { +int32_t result = 0; +int32_t i = 0; +// Vectors of 8-bit signed integers +svint8_t va1, va2, va3, va4; +svint8_t vb1, vb2, vb3, vb4; +// Init accumulators +svint32_t acc1 = svdup_n_s32(0); +svint32_t acc2 = svdup_n_s32(0); +svint32_t acc3 = svdup_n_s32(0); +svint32_t acc4 = svdup_n_s32(0); + +// Number of 8-bits elements in the SVE vector +int32_t vec_length = svcntb(); + +// Manually unroll the loop +for (i = 0; i + 4 * vec_length <= limit; i += 4 * vec_length) { + // Load vectors into the Z registers which can range from 128-bit to 2048-bit wide + // The predicate register - P determines which bytes are active + // svptrue_b8() returns a predictae in which every element is true + // +va1 = svld1_s8(svptrue_b8(), vec1 + i); +vb1 = svld1_s8(svptrue_b8(), vec2 + i); + +va2 = svld1_s8(svptrue_b8(), vec1 + i + vec_length); +vb2 = svld1_s8(svptrue_b8(), vec2 + i + vec_length); + +va3 = svld1_s8(svptrue_b8(), vec1 + i + 2 * vec_length); +vb3 = svld1_s8(svptrue_b8(), vec2 + i + 2 * vec_length); + +va4 = svld1_s8(svptrue_b8(), vec1 + i + 3 * vec_length); +vb4 = svld1_s8(svptrue_b8(), vec2 + i + 3 * vec_length); + + // Dot product using SDOT instruction on Z vectors + acc1 = svdot_s32(acc1, va1, vb1); + acc2 = svdot_s32(acc2, va2, vb2); + acc3 = svdot_s32(acc3, va3, vb3); + acc4 = svdot_s32(acc4, va4, vb4); +} +// Add correspponding active elements in each of the vectors +acc1 = svadd_s32_x(svptrue_b8() , acc1, acc2); +acc3 = svadd_s32_x(svptrue_b8() , acc3, acc4); +acc1 = svadd_s32_x(svptrue_b8(), acc1, acc3); + +// REDUCE: Add every vector element in target and write result to scalar +result = svaddv_s32(svptrue_b8(), acc1); + +// Scalar tail. TODO: Use FMA Review Comment: I think you can remove TODO, since aarch64 "mul" is really "madd", i expect it already emits single instruction. look at assembler if you are curious. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Reproducible unit test failure [lucene]
slow-J opened a new issue, #13620: URL: https://github.com/apache/lucene/issues/13620 ### Description https://jenkins.thetaphi.de/job/Lucene-main-MacOSX/11653/ I can repro with `./gradlew test --tests TestTopDocsCollector.testResultsOrder -Dtests.seed=207A6071B3338CA6` The test passes if I revert https://github.com/apache/lucene/commit/d491dfe1315c80319547ec183be3d7aa902d2e9e This is bug happens I believe due to having 2 collectors with the same indexes and the score of `9.17561f` is never inserted into the pq. I found a fix, pending some tests. ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
rmuir commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1697474113 ## lucene/core/src/c/dotProduct.c: ## @@ -0,0 +1,143 @@ +// dotProduct.c + +#include +#include + +#ifdef __ARM_ACLE +#include +#endif + +#if (defined(__ARM_FEATURE_SVE) && !defined(__APPLE__)) +#include +/* + * Unrolled and vectorized int8 dotProduct implementation using SVE instructions + * NOTE: Clang 15.0 compiler on Apple M3 Max compiles the code below sucessfully + * with '-march=native+sve' option but throws "Illegal Hardware Instruction" error + * Looks like Apple M3 does not implement SVE and Apple's official documentation + * is not explicit about this or at least I could not find it. + * + */ +int32_t vdot8s_sve(int8_t *vec1, int8_t *vec2, int32_t limit) { +int32_t result = 0; +int32_t i = 0; +// Vectors of 8-bit signed integers +svint8_t va1, va2, va3, va4; +svint8_t vb1, vb2, vb3, vb4; +// Init accumulators +svint32_t acc1 = svdup_n_s32(0); +svint32_t acc2 = svdup_n_s32(0); +svint32_t acc3 = svdup_n_s32(0); +svint32_t acc4 = svdup_n_s32(0); + +// Number of 8-bits elements in the SVE vector +int32_t vec_length = svcntb(); + +// Manually unroll the loop +for (i = 0; i + 4 * vec_length <= limit; i += 4 * vec_length) { + // Load vectors into the Z registers which can range from 128-bit to 2048-bit wide + // The predicate register - P determines which bytes are active + // svptrue_b8() returns a predictae in which every element is true + // +va1 = svld1_s8(svptrue_b8(), vec1 + i); +vb1 = svld1_s8(svptrue_b8(), vec2 + i); + +va2 = svld1_s8(svptrue_b8(), vec1 + i + vec_length); +vb2 = svld1_s8(svptrue_b8(), vec2 + i + vec_length); + +va3 = svld1_s8(svptrue_b8(), vec1 + i + 2 * vec_length); +vb3 = svld1_s8(svptrue_b8(), vec2 + i + 2 * vec_length); + +va4 = svld1_s8(svptrue_b8(), vec1 + i + 3 * vec_length); +vb4 = svld1_s8(svptrue_b8(), vec2 + i + 3 * vec_length); + + // Dot product using SDOT instruction on Z vectors + acc1 = svdot_s32(acc1, va1, vb1); + acc2 = svdot_s32(acc2, va2, vb2); + acc3 = svdot_s32(acc3, va3, vb3); + acc4 = svdot_s32(acc4, va4, vb4); +} Review Comment: maybe consider "vector tail", since 4 * vector length can be significant: and vector dimensions may not be exact multiple of that. It limits the worst case processing that "scalar tail" must do. Example from the java vector code: https://github.com/apache/lucene/blob/main/lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java#L153-L158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix failing unit test - TestTopDocsCollector#testResultsOrder [lucene]
slow-J opened a new pull request, #13621: URL: https://github.com/apache/lucene/pull/13621 Closes #13620 I believe this bug happens due to having 2 collectors with the same indexes and the score of `9.17561f` is never inserted into the pq, failing the assertion. Another potential solution is having a SingletonMyTopDocsCollectorMananger which always returns the same `MyTopDocsCollector`. TEST: ` ./gradlew tidy && ./gradlew test` + `./gradlew test --tests TestTopDocsCollector.testResultsOrder -Dtests.seed=207A6071B3338CA6` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697607067 ## lucene/core/src/java/org/apache/lucene/util/BitSet.java: ## @@ -92,6 +92,12 @@ public void clear() { */ public abstract int nextSetBit(int index); + /** + * Returns the index of the first set bit from start (inclusive) until end (inclusive). {@link Review Comment: Ah, makes sense, will change that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697608653 ## lucene/join/src/java/org/apache/lucene/search/join/BlockJoinSelector.java: ## @@ -64,14 +64,19 @@ public boolean get(int docID) { return false; } -final int firstChild = parents.prevSetBit(docID - 1) + 1; -for (int child = children.nextSetBit(firstChild); -child < docID; -child = children.nextSetBit(child + 1)) { - if (docsWithValue.get(child)) { -return true; +final int lastPotentialChild = docID - 1; + +final int firstPotentialChild = parents.prevSetBit(lastPotentialChild) + 1; +if (firstPotentialChild < docID) { + for (int child = children.firstSetBitInRange(firstPotentialChild, lastPotentialChild); + child != DocIdSetIterator.NO_MORE_DOCS; + child = children.firstSetBitInRange(child + 1, lastPotentialChild)) { +if (docsWithValue.get(child)) { + return true; +} Review Comment: I'll apply the suggestion, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697615867 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -291,6 +291,32 @@ public int nextSetBit(int index) { return DocIdSetIterator.NO_MORE_DOCS; } + @Override + public int firstSetBitInRange(int start, int upperBound) { +// Depends on the ghost bits being clear! +assert start >= 0 && start < numBits : "index=" + start + ", numBits=" + numBits; +assert start <= upperBound : "index=" + start + ", upperBound=" + upperBound; +int i = start >> 6; +long word = bits[i] >> start; // skip all the bits to the right of index + +if (word != 0) { + int res = start + Long.numberOfTrailingZeros(word); + return res > upperBound ? DocIdSetIterator.NO_MORE_DOCS : res; +} + +int maxWord = Math.min((upperBound >> 6) + 1, numWords); Review Comment: I've ended up removing `Math.min` here and making `upperBound <= bit set size` a requirement for the method - there is similar requirement for `index` parameter. Will push a commit shortly - please let me know what you think. ## lucene/core/src/java/org/apache/lucene/util/SparseFixedBitSet.java: ## @@ -353,14 +355,47 @@ public int nextSetBit(int i) { final long indexBits = index >>> i64 >>> 1; if (indexBits == 0) { // no more bits are set in the current block of 4096 bits, go to the next one - return firstDoc(i4096 + 1); + return firstDoc(i4096 + 1, indices.length); } // there are still set bits i64 += 1 + Long.numberOfTrailingZeros(indexBits); final long bits = bitArray[o]; return (i64 << 6) | Long.numberOfTrailingZeros(bits); } + @Override + public int firstSetBitInRange(int start, int upperBound) { +assert start < length; +assert upperBound >= start; +final int i4096 = start >>> 12; +final long index = indices[i4096]; +final long[] bitArray = this.bits[i4096]; +int i64 = start >>> 6; +int o = Long.bitCount(index & ((1L << i64) - 1)); Review Comment: Make sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
gsmiller commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697623524 ## lucene/core/src/java/org/apache/lucene/util/FixedBitSet.java: ## @@ -291,6 +291,32 @@ public int nextSetBit(int index) { return DocIdSetIterator.NO_MORE_DOCS; } + @Override + public int firstSetBitInRange(int start, int upperBound) { +// Depends on the ghost bits being clear! +assert start >= 0 && start < numBits : "index=" + start + ", numBits=" + numBits; +assert start <= upperBound : "index=" + start + ", upperBound=" + upperBound; +int i = start >> 6; +long word = bits[i] >> start; // skip all the bits to the right of index + +if (word != 0) { + int res = start + Long.numberOfTrailingZeros(word); + return res > upperBound ? DocIdSetIterator.NO_MORE_DOCS : res; +} + +int maxWord = Math.min((upperBound >> 6) + 1, numWords); Review Comment: Ah, that's even better. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697616026 ## lucene/core/src/java/org/apache/lucene/util/SparseFixedBitSet.java: ## @@ -353,14 +355,47 @@ public int nextSetBit(int i) { final long indexBits = index >>> i64 >>> 1; if (indexBits == 0) { // no more bits are set in the current block of 4096 bits, go to the next one - return firstDoc(i4096 + 1); + return firstDoc(i4096 + 1, indices.length); } // there are still set bits i64 += 1 + Long.numberOfTrailingZeros(indexBits); final long bits = bitArray[o]; return (i64 << 6) | Long.numberOfTrailingZeros(bits); } + @Override + public int firstSetBitInRange(int start, int upperBound) { +assert start < length; +assert upperBound >= start; +final int i4096 = start >>> 12; +final long index = indices[i4096]; +final long[] bitArray = this.bits[i4096]; +int i64 = start >>> 6; +int o = Long.bitCount(index & ((1L << i64) - 1)); Review Comment: Makes sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
gsmiller commented on code in PR #13559: URL: https://github.com/apache/lucene/pull/13559#discussion_r1697630535 ## lucene/core/src/java/org/apache/lucene/util/BitSet.java: ## @@ -92,6 +92,12 @@ public void clear() { */ public abstract int nextSetBit(int index); + /** + * Returns the index of the first set bit from start (inclusive) until end (inclusive). {@link Review Comment: Also... looking at the `#substring` method signature got me thinking about the API name. We if we reuse the `nextSetBit` naming for the new method since it's essentially the same thing but with an early-termination optimization. We could just overload that method signature with `#nextSetBit(int beginIndex, int endIndex)`. We could also later overload `#prevSetBit` in the same way if there's ever a real use-case. Pretty minor suggestion but might be a nice naming convention. Thoughts? ## lucene/core/src/java/org/apache/lucene/util/BitSet.java: ## @@ -92,6 +92,12 @@ public void clear() { */ public abstract int nextSetBit(int index); + /** + * Returns the index of the first set bit from start (inclusive) until end (inclusive). {@link Review Comment: Also... looking at the `#substring` method signature got me thinking about the API name. What if we reuse the `nextSetBit` naming for the new method since it's essentially the same thing but with an early-termination optimization. We could just overload that method signature with `#nextSetBit(int beginIndex, int endIndex)`. We could also later overload `#prevSetBit` in the same way if there's ever a real use-case. Pretty minor suggestion but might be a nice naming convention. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] WIP do not merge [lucene]
github-actions[bot] commented on PR #13577: URL: https://github.com/apache/lucene/pull/13577#issuecomment-2259403530 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] KnnFloatVectorQuery misses highest-ranking results that FloatVectorSimilarityQuery retrieves [lucene]
david-sitsky closed issue #13611: KnnFloatVectorQuery misses highest-ranking results that FloatVectorSimilarityQuery retrieves URL: https://github.com/apache/lucene/issues/13611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org