jpountz commented on code in PR #14910: URL: https://github.com/apache/lucene/pull/14910#discussion_r2213742538
########## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ########## @@ -206,7 +208,8 @@ private static Optional<Module> lookupVectorModule() { "org.apache.lucene.util.VectorUtil", "org.apache.lucene.codecs.lucene103.Lucene103PostingsReader", "org.apache.lucene.codecs.lucene103.PostingIndexInput", - "org.apache.lucene.tests.util.TestSysoutsLimits"); + "org.apache.lucene.tests.util.TestSysoutsLimits", + "org.apache.lucene.util.FixedBitSet"); Review Comment: FWIW we should try to avoid expanding this list. Hopefully as per my other comment, we can move `bitsetToArray` to `VectorUtil` instead so that this list can stay as-is. ########## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ########## @@ -114,6 +114,8 @@ public static VectorizationProvider getInstance() { /** Create a new {@link PostingDecodingUtil} for the given {@link IndexInput}. */ public abstract PostingDecodingUtil newPostingDecodingUtil(IndexInput input) throws IOException; + public abstract BitSetUtil newBitSetUtil(); Review Comment: I would rather have the methods of `BitSetUtil` in this class. `PostingDecodingUtil` is a bit different because it requires state (the `MemorySegment`), which is not the case with `BitSetUtil`. Then we can call `bitsetToArray` from `VectorUtil` and don't need to add a new class that is allowed to call the vector API. ########## lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaBitSetUtil.java: ########## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.util.stream.IntStream; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.util.SuppressForbidden; + +public class PanamaBitSetUtil extends BitSetUtil { + + static final PanamaBitSetUtil INSTANCE = new PanamaBitSetUtil(); + + private static final VectorSpecies<Integer> INT_SPECIES = + PanamaVectorConstants.PRERERRED_INT_SPECIES; + private static final int MASK = (1 << INT_SPECIES.length()) - 1; + private static final int[] IDENTITY = IntStream.range(0, Long.SIZE).toArray(); + private static final int[] IDENTITY_MASK = IntStream.range(0, 16).map(i -> 1 << i).toArray(); + + PanamaBitSetUtil() {} + + @Override + int word2Array(long word, int base, int[] docs, int offset) { + offset = intWord2Array((int) word, docs, offset, base); + return intWord2Array((int) (word >>> 32), docs, offset, base + 32); + } + + @SuppressForbidden(reason = "Uses compress only where fast and carefully contained") + private static int intWord2Array(int word, int[] resultArray, int offset, int base) { + IntVector bitMask = IntVector.fromArray(INT_SPECIES, IDENTITY_MASK, 0); + + for (int i = 0; i < Integer.SIZE; i += INT_SPECIES.length()) { + IntVector.fromArray(INT_SPECIES, IDENTITY, i) + .add(base) Review Comment: Should we broadcast `base` to an `IntVector` outside of the loop so that it's only done once rather than once per iteration? ########## lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaBitSetUtil.java: ########## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.util.stream.IntStream; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.util.SuppressForbidden; + +public class PanamaBitSetUtil extends BitSetUtil { + + static final PanamaBitSetUtil INSTANCE = new PanamaBitSetUtil(); + + private static final VectorSpecies<Integer> INT_SPECIES = + PanamaVectorConstants.PRERERRED_INT_SPECIES; + private static final int MASK = (1 << INT_SPECIES.length()) - 1; + private static final int[] IDENTITY = IntStream.range(0, Long.SIZE).toArray(); + private static final int[] IDENTITY_MASK = IntStream.range(0, 16).map(i -> 1 << i).toArray(); + + PanamaBitSetUtil() {} + + @Override + int word2Array(long word, int base, int[] docs, int offset) { + offset = intWord2Array((int) word, docs, offset, base); + return intWord2Array((int) (word >>> 32), docs, offset, base + 32); + } + + @SuppressForbidden(reason = "Uses compress only where fast and carefully contained") + private static int intWord2Array(int word, int[] resultArray, int offset, int base) { + IntVector bitMask = IntVector.fromArray(INT_SPECIES, IDENTITY_MASK, 0); + + for (int i = 0; i < Integer.SIZE; i += INT_SPECIES.length()) { + IntVector.fromArray(INT_SPECIES, IDENTITY, i) + .add(base) + .compress(bitMask.and(word).compare(VectorOperators.NE, 0)) + .reinterpretAsInts() Review Comment: Why is this necessary? Doesn't compress() already return an `IntVector`? ########## lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaBitSetUtil.java: ########## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.util.stream.IntStream; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.util.SuppressForbidden; + +public class PanamaBitSetUtil extends BitSetUtil { + + static final PanamaBitSetUtil INSTANCE = new PanamaBitSetUtil(); + + private static final VectorSpecies<Integer> INT_SPECIES = + PanamaVectorConstants.PRERERRED_INT_SPECIES; + private static final int MASK = (1 << INT_SPECIES.length()) - 1; + private static final int[] IDENTITY = IntStream.range(0, Long.SIZE).toArray(); + private static final int[] IDENTITY_MASK = IntStream.range(0, 16).map(i -> 1 << i).toArray(); + + PanamaBitSetUtil() {} + + @Override + int word2Array(long word, int base, int[] docs, int offset) { + offset = intWord2Array((int) word, docs, offset, base); + return intWord2Array((int) (word >>> 32), docs, offset, base + 32); + } + + @SuppressForbidden(reason = "Uses compress only where fast and carefully contained") + private static int intWord2Array(int word, int[] resultArray, int offset, int base) { + IntVector bitMask = IntVector.fromArray(INT_SPECIES, IDENTITY_MASK, 0); Review Comment: I wonder if this should be a private static final field? ########## lucene/core/src/java24/org/apache/lucene/internal/vectorization/PanamaBitSetUtil.java: ########## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import java.util.stream.IntStream; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; +import org.apache.lucene.util.SuppressForbidden; + +public class PanamaBitSetUtil extends BitSetUtil { + + static final PanamaBitSetUtil INSTANCE = new PanamaBitSetUtil(); + + private static final VectorSpecies<Integer> INT_SPECIES = + PanamaVectorConstants.PRERERRED_INT_SPECIES; + private static final int MASK = (1 << INT_SPECIES.length()) - 1; + private static final int[] IDENTITY = IntStream.range(0, Long.SIZE).toArray(); + private static final int[] IDENTITY_MASK = IntStream.range(0, 16).map(i -> 1 << i).toArray(); + + PanamaBitSetUtil() {} + + @Override + int word2Array(long word, int base, int[] docs, int offset) { + offset = intWord2Array((int) word, docs, offset, base); + return intWord2Array((int) (word >>> 32), docs, offset, base + 32); + } + + @SuppressForbidden(reason = "Uses compress only where fast and carefully contained") + private static int intWord2Array(int word, int[] resultArray, int offset, int base) { + IntVector bitMask = IntVector.fromArray(INT_SPECIES, IDENTITY_MASK, 0); + + for (int i = 0; i < Integer.SIZE; i += INT_SPECIES.length()) { Review Comment: I'm a bit uncomfortable with the underlying assumption that `INT_SPECIES.length()` is a divisor of `Integer.SIZE`. Can we write the code in a way that doesn't make this assumption or add a check somewhere? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org