[jira] [Created] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues
Lu Xugang created LUCENE-10623: -- Summary: Error implementation of docValueCount for SortingSortedSetDocValues Key: LUCENE-10623 URL: https://issues.apache.org/jira/browse/LUCENE-10623 Project: Lucene - Core Issue Type: Bug Reporter: Lu Xugang Test failed below: {code:java} public void testSortOnAddIndicesOrd() throws IOException { Directory tmpDir = newDirectory(); Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); IndexWriter w = new IndexWriter(tmpDir, iwc); Document doc; doc = new Document(); doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); w.addDocument(doc); doc.add(new SortedSetDocValuesField("foo", new BytesRef("a"))); doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); w.addDocument(doc); w.commit(); Sort indexSort = new Sort(new SortedSetSortField("foo", false, SortedSetSelector.Type.MIN)); try (DirectoryReader reader = DirectoryReader.open(tmpDir)) { for (LeafReaderContext ctx : reader.leaves()) { CodecReader wrap = SortingCodecReader.wrap(SlowCodecReaderWrapper.wrap(ctx.reader()), indexSort); assertTrue(wrap.toString(), wrap.toString().startsWith("SortingCodecReader(")); SortingCodecReader sortingCodecReader = (SortingCodecReader) wrap; SortedSetDocValues sortedSetDocValues = sortingCodecReader.getDocValuesReader().getSortedSet(ctx.reader().getFieldInfos().fieldInfo("foo")); sortedSetDocValues.nextDoc(); assertEquals(sortedSetDocValues.docValueCount(), 2); sortedSetDocValues.nextDoc(); assertEquals(sortedSetDocValues.docValueCount(), 1); assertEquals(sortedSetDocValues.nextDoc(), DocIdSetIterator.NO_MORE_DOCS); } } IOUtils.close(w, dir, tmpDir); } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang reassigned LUCENE-10623: -- Assignee: Lu Xugang > Error implementation of docValueCount for SortingSortedSetDocValues > --- > > Key: LUCENE-10623 > URL: https://issues.apache.org/jira/browse/LUCENE-10623 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Assignee: Lu Xugang >Priority: Major > > Test failed below: > > {code:java} > public void testSortOnAddIndicesOrd() throws IOException { > Directory tmpDir = newDirectory(); > Directory dir = newDirectory(); > IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random())); > IndexWriter w = new IndexWriter(tmpDir, iwc); > Document doc; > doc = new Document(); > doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); > w.addDocument(doc); > doc.add(new SortedSetDocValuesField("foo", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); > doc.add(new SortedSetDocValuesField("foo", new BytesRef("b"))); > w.addDocument(doc); > w.commit(); > Sort indexSort = new Sort(new SortedSetSortField("foo", false, > SortedSetSelector.Type.MIN)); > try (DirectoryReader reader = DirectoryReader.open(tmpDir)) { > for (LeafReaderContext ctx : reader.leaves()) { > CodecReader wrap = > > SortingCodecReader.wrap(SlowCodecReaderWrapper.wrap(ctx.reader()), indexSort); > assertTrue(wrap.toString(), > wrap.toString().startsWith("SortingCodecReader(")); > SortingCodecReader sortingCodecReader = (SortingCodecReader) wrap; > SortedSetDocValues sortedSetDocValues = > sortingCodecReader.getDocValuesReader().getSortedSet(ctx.reader().getFieldInfos().fieldInfo("foo")); > sortedSetDocValues.nextDoc(); > assertEquals(sortedSetDocValues.docValueCount(), 2); > sortedSetDocValues.nextDoc(); > assertEquals(sortedSetDocValues.docValueCount(), 1); > assertEquals(sortedSetDocValues.nextDoc(), > DocIdSetIterator.NO_MORE_DOCS); > } > } > IOUtils.close(w, dir, tmpDir); > } > {code} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request, #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang opened a new pull request, #967: URL: https://github.com/apache/lucene/pull/967 See: https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r901119495 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.facet.FacetResult; +import org.apache.lucene.facet.Facets; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.facet.LabelAndValue; +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.search.ConjunctionUtils; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.BytesRef; + +/** + * Returns the counts for each given {@link FacetSet} + * + * @lucene.experimental + */ +public class MatchingFacetSetsCounts extends Facets { + + private final FacetSetMatcher[] facetSetMatchers; + private final int[] counts; + private final String field; + private final int totCount; + + /** + * Constructs a new instance of matching facet set counts which calculates the countBytes for each + * given facet set matcher. + */ + public MatchingFacetSetsCounts( + String field, FacetsCollector hits, FacetSetMatcher... facetSetMatchers) throws IOException { +if (facetSetMatchers == null || facetSetMatchers.length == 0) { + throw new IllegalArgumentException("facetSetMatchers cannot be null or empty"); +} +if (areFacetSetMatcherDimensionsInconsistent(facetSetMatchers)) { + throw new IllegalArgumentException("All facet set matchers must be the same dimensionality"); +} +this.field = field; +this.facetSetMatchers = facetSetMatchers; +this.counts = new int[facetSetMatchers.length]; +this.totCount = count(field, hits.getMatchingDocs()); + } + + /** Counts from the provided field. */ + private int count(String field, List matchingDocs) + throws IOException { + +int totCount = 0; +for (FacetsCollector.MatchingDocs hits : matchingDocs) { + + BinaryDocValues binaryDocValues = DocValues.getBinary(hits.context.reader(), field); + + final DocIdSetIterator it = + ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), binaryDocValues)); + if (it == null) { +continue; + } + + long[] dimValues = null; // dimension values buffer + int expectedNumDims = -1; + for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = it.nextDoc()) { +boolean shouldCountDoc = false; +BytesRef bytesRef = binaryDocValues.binaryValue(); +byte[] packedValue = bytesRef.bytes; +int numDims = (int) LongPoint.decodeDimension(packedValue, 0); +if (expectedNumDims == -1) { + expectedNumDims = numDims; + dimValues = new long[numDims]; +} else { + // Verify that the number of indexed dimensions for all matching documents is the same + // (since we cannot verify that at indexing time). + assert numDims == expectedNumDims + : "Expected (" + + expectedNumDims + + ") dimensions, found (" + + numDims + + ") for doc (" + + doc + + ")"; +} + +for (int start = Long.BYTES; start < bytesRef.length; start += numDims * Long.BYTES) { + LongPoint.unpack(bytesRef, start, dimValues); + for (int j = 0; j < facetSetMatchers.length; j++) { // for each facet set matcher +if (facetSetMatchers[j].matches(dimValues)) { + counts[j]++; + shouldCountDoc = true; +} + } +} +if (shouldCountDoc) { + totCount++; +} + } +} +return totCount; + } + + // TODO: This does not really provide "top children" functionality yet but provides "all + // children". This is being worked on in LUCENE-10550 + @Override + public FacetResult getTopChi
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r901119754 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.LongPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code long[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet} to index in that field. All must have the same number of Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1159759023 @gsmiller, @mdmarshmallow I pushed another commit which completes the FacetSets document and adds another check ensuring all `FacetSet` given to the `FacetSetField` are actually of the same type (no sense indexing different types under the same field). The PR feels ready to me, except `RangeMatching` which I'm waiting for @mdmarshmallow to comment about whether he agrees to remove it or not. After we agree on that, I suggest that we do a final round of review and wrap up this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)
[ https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556087#comment-17556087 ] Tomoko Uchida edited comment on LUCENE-10622 at 6/19/22 4:11 PM: - Skelton of the migration tool: [https://github.com/mocobeta/sandbox-lucene-10557] 1. download jira issues 2. first pass : create github issues 3. convert jira issues to github issues and resolve links/id mappings 4. second pass : update github issues (and comments) was (Author: tomoko uchida): Skelton of the migration tool: [https://github.com/mocobeta/sandbox-lucene-10557] 1. download jira issues 2. first pass : create github issues 3. convert jira issues to github issues and resolve links/id mappings 4. update github issues (and comments) > Prepare complete migration script to GitHub issue from Jira (best effort) > - > > Key: LUCENE-10622 > URL: https://issues.apache.org/jira/browse/LUCENE-10622 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > If we intend to move the history to GitHub, it should be perfect as far as > possible - significantly degraded copies of history are harmful, rather than > helpful for future contributors, I think. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)
[ https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556087#comment-17556087 ] Tomoko Uchida commented on LUCENE-10622: Skelton of the migration tool: [https://github.com/mocobeta/sandbox-lucene-10557] 1. download jira issues 2. first pass : create github issues 3. convert jira issues to github issues and resolve links/id mappings 4. update github issues (and comments) > Prepare complete migration script to GitHub issue from Jira (best effort) > - > > Key: LUCENE-10622 > URL: https://issues.apache.org/jira/browse/LUCENE-10622 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > If we intend to move the history to GitHub, it should be perfect as far as > possible - significantly degraded copies of history are harmful, rather than > helpful for future contributors, I think. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
mdmarshmallow commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1159832697 Agree with you on the `RangeMatching` solution, your way is cleaner. I'll remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
mdmarshmallow commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r901174337 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.IntPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code byte[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet facet sets} to index in that field. All must have the same + * number of dimensions + * @throws IllegalArgumentException if the field name is null or the given facet sets are invalid + */ + public static FacetSetsField create(String name, FacetSet... facetSets) { +if (facetSets == null || facetSets.length == 0) { + throw new IllegalArgumentException("FacetSets cannot be null or empty!"); +} + +return new FacetSetsField(name, toPackedValues(facetSets)); + } + + private FacetSetsField(String name, BytesRef value) { +super(name, value); + } + + private static BytesRef toPackedValues(FacetSet... facetSets) { +int numDims = facetSets[0].dims; +Class expectedClass = facetSets[0].getClass(); +byte[] buf = new byte[Integer.BYTES + facetSets[0].sizePackedBytes() * facetSets.length]; +IntPoint.encodeDimension(numDims, buf, 0); +int offset = Integer.BYTES; +for (FacetSet facetSet : facetSets) { + if (facetSet.dims != numDims) { +throw new IllegalArgumentException( +"All FacetSets must have the same number of dimensions. Expected " ++ numDims ++ " found " ++ facetSet.dims); + } + // It doesn't make sense to index facet sets of different types in the same field + if (facetSet.getClass() != expectedClass) { Review Comment: Thoughts on using generics here to enforce this at compile time? ## lucene/facet/src/java/org/apache/lucene/facet/facetset/package-info.java: ## @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Provides FacetSets faceting capabilities. */ Review Comment: Maybe make this slightly more descriptive? "Provides FacetSets faceting capabilities which allows users to facet on on high dimensional field values. See FacetSets.adoc in the docs package for more information on usage." Or something like that. ## lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java: ## @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.
[jira] [Created] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock
Weiming Wu created LUCENE-10624: --- Summary: Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock Key: LUCENE-10624 URL: https://issues.apache.org/jira/browse/LUCENE-10624 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 9.2, 9.1, 9.0 Reporter: Weiming Wu h3. Problem Statement We noticed DocValue read performance regression with the iterative API when upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The degradation is similar to what's described in https://issues.apache.org/jira/browse/SOLR-9599 By analyzing profiling data, we found method "advanceWithinBlock" and "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to their O(N) doc lookup algorithm. h3. Changes Used binary search algorithm to replace current O(N) lookup algorithm in Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are in ascending order. h3. Test {code:java} ./gradlew tidy ./gradlew check {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wuwm opened a new pull request, #968: [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc…
wuwm opened a new pull request, #968: URL: https://github.com/apache/lucene/pull/968 ### Description (or a Jira issue link if you have one) https://issues.apache.org/jira/browse/LUCENE-10624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r901238006 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import org.apache.lucene.document.BinaryDocValuesField; +import org.apache.lucene.document.IntPoint; +import org.apache.lucene.util.BytesRef; + +/** + * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet facet sets}. The encoding + * scheme consists of a packed {@code byte[]} where the first value denotes the number of dimensions + * in all the sets, followed by each set's values. + * + * @lucene.experimental + */ +public class FacetSetsField extends BinaryDocValuesField { + + /** + * Create a new FacetSets field. + * + * @param name field name + * @param facetSets the {@link FacetSet facet sets} to index in that field. All must have the same + * number of dimensions + * @throws IllegalArgumentException if the field name is null or the given facet sets are invalid + */ + public static FacetSetsField create(String name, FacetSet... facetSets) { +if (facetSets == null || facetSets.length == 0) { + throw new IllegalArgumentException("FacetSets cannot be null or empty!"); +} + +return new FacetSetsField(name, toPackedValues(facetSets)); + } + + private FacetSetsField(String name, BytesRef value) { +super(name, value); + } + + private static BytesRef toPackedValues(FacetSet... facetSets) { +int numDims = facetSets[0].dims; +Class expectedClass = facetSets[0].getClass(); +byte[] buf = new byte[Integer.BYTES + facetSets[0].sizePackedBytes() * facetSets.length]; +IntPoint.encodeDimension(numDims, buf, 0); +int offset = Integer.BYTES; +for (FacetSet facetSet : facetSets) { + if (facetSet.dims != numDims) { +throw new IllegalArgumentException( +"All FacetSets must have the same number of dimensions. Expected " ++ numDims ++ " found " ++ facetSet.dims); + } + // It doesn't make sense to index facet sets of different types in the same field + if (facetSet.getClass() != expectedClass) { Review Comment: Not sure what will we generify? E.g. you and I explored `FacetSet` before but it complicates things and not sure it will work w/ e.g. the `TemperatureReadingFacetSet` (and the like) which mix several dimension types. Another thing - I don't want to over-complicate the API for something that is at the end of the day just extra safety, I can't see why would someone try to index two different `FacetSet` types in the same field and expect it to work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on code in PR #841: URL: https://github.com/apache/lucene/pull/841#discussion_r901239793 ## lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java: ## @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.facetset; + +import java.util.Arrays; +import org.apache.lucene.util.NumericUtils; + +/** + * A {@link FacetSetMatcher} which considers a set as a match if all dimensions fall within the + * given corresponding range. + * + * @lucene.experimental + */ +public class RangeFacetSetMatcher extends FacetSetMatcher { + + private final long[] lowerRanges; + private final long[] upperRanges; + + /** + * Constructs an instance to match facet sets with dimensions that fall within the given ranges. + */ + public RangeFacetSetMatcher(String label, DimRange... dimRanges) { +super(label, getDims(dimRanges)); +this.lowerRanges = Arrays.stream(dimRanges).mapToLong(range -> range.min).toArray(); +this.upperRanges = Arrays.stream(dimRanges).mapToLong(range -> range.max).toArray(); + } + + @Override + public boolean matches(long[] dimValues) { +assert dimValues.length == dims +: "Encoded dimensions (dims=" ++ dimValues.length ++ ") is incompatible with range dimensions (dims=" ++ dims ++ ")"; + +for (int i = 0; i < dimValues.length; i++) { + if (dimValues[i] < lowerRanges[i]) { +// Doc's value is too low in this dimension +return false; + } + if (dimValues[i] > upperRanges[i]) { +// Doc's value is too high in this dimension +return false; + } +} +return true; + } + + private static int getDims(DimRange... dimRanges) { +if (dimRanges == null || dimRanges.length == 0) { + throw new IllegalArgumentException("dimRanges cannot be null or empty"); +} +return dimRanges.length; + } + + /** + * Creates a {@link DimRange} for the given min and max long values. This method is also suitable + * for int values. + */ + public static DimRange fromLongs(long min, boolean minInclusive, long max, boolean maxInclusive) { Review Comment: Yeah makes sense to me too! The only bummer is that it makes lines such as `RangeFacetSetMatcher.fromLongs` become `RangeFacetSetMatcher.DimRange.fromLongs`. Should we extract `DimRange` as a top-level class? I'm not too obsessed about it though. ## lucene/facet/src/java/org/apache/lucene/facet/facetset/package-info.java: ## @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Provides FacetSets faceting capabilities. */ Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556188#comment-17556188 ] Tomoko Uchida commented on LUCENE-10557: For version control, there are two considerations. 1. Fix Version(s) We have two options: Milestone or Label. One important difference between them is that an issue can have only one milestone but multiple labels. The other difference would be that while Milestone is special metadata, labels are just flexible text tags for searching. I'm personally fine with Milestone - we don't release a bug fix or improvement in multiple versions anyway. We don't have two CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" the CHANGES entry appears only in Lucene 9.3.0's section. If there are other perspectives, would you share your thoughts on it. 2. Affects Version(s) 45% of unresolved issues have this field. Maybe we could have issue labels such as "affectsVersion:9.3.0". I have never used this metadata field and I myself have no problem with omitting this in GitHub. Is there anyone who has thoughts on it? -- Aside from versions, I'm not fully sure about how to port the "Priority" field (Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but there seem no clear standards on how to set a priority except for "Blocker". Should we have this also in GitHub as a mandatory label, or should we have this as an optional one, or perhaps can we omit this in GitHub if developers/committers don't really take care of this? > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.a
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556188#comment-17556188 ] Tomoko Uchida edited comment on LUCENE-10557 at 6/20/22 5:11 AM: - For version control, there are two considerations. 1. Fix Version(s) We have two options: Milestone or Label. One important difference between them is that an issue can have only one milestone but multiple labels. The other difference would be that while Milestone is special metadata, labels are just flexible text tags for searching. I'm personally fine with Milestone - we don't release a bug fix or improvement in multiple versions anyway. We don't have two CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" the CHANGES entry appears only in Lucene 9.3.0's section. If there are other perspectives, would you share your thoughts on it. 2. Affects Version(s) 35% of unresolved issues have this field. Maybe we could have issue labels such as "affectsVersion:9.3.0". I have never used this metadata field and I myself have no problem with omitting this in GitHub. Is there anyone who has thoughts on it? -- Aside from versions, I'm not fully sure about how to port the "Priority" field (Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but there seem no clear standards on how to set a priority except for "Blocker". Should we have this also in GitHub as a mandatory label, or should we have this as an optional one, or perhaps can we omit this in GitHub if developers/committers don't really take care of this? was (Author: tomoko uchida): For version control, there are two considerations. 1. Fix Version(s) We have two options: Milestone or Label. One important difference between them is that an issue can have only one milestone but multiple labels. The other difference would be that while Milestone is special metadata, labels are just flexible text tags for searching. I'm personally fine with Milestone - we don't release a bug fix or improvement in multiple versions anyway. We don't have two CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" the CHANGES entry appears only in Lucene 9.3.0's section. If there are other perspectives, would you share your thoughts on it. 2. Affects Version(s) 45% of unresolved issues have this field. Maybe we could have issue labels such as "affectsVersion:9.3.0". I have never used this metadata field and I myself have no problem with omitting this in GitHub. Is there anyone who has thoughts on it? -- Aside from versions, I'm not fully sure about how to port the "Priority" field (Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but there seem no clear standards on how to set a priority except for "Blocker". Should we have this also in GitHub as a mandatory label, or should we have this as an optional one, or perhaps can we omit this in GitHub if developers/committers don't really take care of this? > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular