[jira] [Created] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-19 Thread Lu Xugang (Jira)
Lu Xugang created LUCENE-10623:
--

 Summary: Error implementation of docValueCount for 
SortingSortedSetDocValues
 Key: LUCENE-10623
 URL: https://issues.apache.org/jira/browse/LUCENE-10623
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Lu Xugang


Test failed below:

 
{code:java}
 public void testSortOnAddIndicesOrd() throws IOException {
Directory tmpDir = newDirectory();
Directory dir = newDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
IndexWriter w = new IndexWriter(tmpDir, iwc);

Document doc;
doc = new Document();
doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
w.addDocument(doc);

doc.add(new SortedSetDocValuesField("foo", new BytesRef("a")));
doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
w.addDocument(doc);

w.commit();

Sort indexSort = new Sort(new SortedSetSortField("foo", false, 
SortedSetSelector.Type.MIN));
try (DirectoryReader reader = DirectoryReader.open(tmpDir)) {
  for (LeafReaderContext ctx : reader.leaves()) {
CodecReader wrap =

SortingCodecReader.wrap(SlowCodecReaderWrapper.wrap(ctx.reader()), indexSort);
assertTrue(wrap.toString(), 
wrap.toString().startsWith("SortingCodecReader("));
SortingCodecReader sortingCodecReader = (SortingCodecReader) wrap;
SortedSetDocValues sortedSetDocValues = 
sortingCodecReader.getDocValuesReader().getSortedSet(ctx.reader().getFieldInfos().fieldInfo("foo"));
sortedSetDocValues.nextDoc();
assertEquals(sortedSetDocValues.docValueCount(), 2);
sortedSetDocValues.nextDoc();
assertEquals(sortedSetDocValues.docValueCount(), 1);
assertEquals(sortedSetDocValues.nextDoc(), 
DocIdSetIterator.NO_MORE_DOCS);
  }
}
IOUtils.close(w, dir, tmpDir);
  }
{code}


 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-19 Thread Lu Xugang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang reassigned LUCENE-10623:
--

Assignee: Lu Xugang

> Error implementation of docValueCount for SortingSortedSetDocValues
> ---
>
> Key: LUCENE-10623
> URL: https://issues.apache.org/jira/browse/LUCENE-10623
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Assignee: Lu Xugang
>Priority: Major
>
> Test failed below:
>  
> {code:java}
>  public void testSortOnAddIndicesOrd() throws IOException {
> Directory tmpDir = newDirectory();
> Directory dir = newDirectory();
> IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));
> IndexWriter w = new IndexWriter(tmpDir, iwc);
> Document doc;
> doc = new Document();
> doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
> w.addDocument(doc);
> doc.add(new SortedSetDocValuesField("foo", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
> doc.add(new SortedSetDocValuesField("foo", new BytesRef("b")));
> w.addDocument(doc);
> w.commit();
> Sort indexSort = new Sort(new SortedSetSortField("foo", false, 
> SortedSetSelector.Type.MIN));
> try (DirectoryReader reader = DirectoryReader.open(tmpDir)) {
>   for (LeafReaderContext ctx : reader.leaves()) {
> CodecReader wrap =
> 
> SortingCodecReader.wrap(SlowCodecReaderWrapper.wrap(ctx.reader()), indexSort);
> assertTrue(wrap.toString(), 
> wrap.toString().startsWith("SortingCodecReader("));
> SortingCodecReader sortingCodecReader = (SortingCodecReader) wrap;
> SortedSetDocValues sortedSetDocValues = 
> sortingCodecReader.getDocValuesReader().getSortedSet(ctx.reader().getFieldInfos().fieldInfo("foo"));
> sortedSetDocValues.nextDoc();
> assertEquals(sortedSetDocValues.docValueCount(), 2);
> sortedSetDocValues.nextDoc();
> assertEquals(sortedSetDocValues.docValueCount(), 1);
> assertEquals(sortedSetDocValues.nextDoc(), 
> DocIdSetIterator.NO_MORE_DOCS);
>   }
> }
> IOUtils.close(w, dir, tmpDir);
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request, #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues

2022-06-19 Thread GitBox


LuXugang opened a new pull request, #967:
URL: https://github.com/apache/lucene/pull/967

   See: https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10623


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


shaie commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r901119495


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/MatchingFacetSetsCounts.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.facetset;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.Facets;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.search.ConjunctionUtils;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Returns the counts for each given {@link FacetSet}
+ *
+ * @lucene.experimental
+ */
+public class MatchingFacetSetsCounts extends Facets {
+
+  private final FacetSetMatcher[] facetSetMatchers;
+  private final int[] counts;
+  private final String field;
+  private final int totCount;
+
+  /**
+   * Constructs a new instance of matching facet set counts which calculates 
the countBytes for each
+   * given facet set matcher.
+   */
+  public MatchingFacetSetsCounts(
+  String field, FacetsCollector hits, FacetSetMatcher... facetSetMatchers) 
throws IOException {
+if (facetSetMatchers == null || facetSetMatchers.length == 0) {
+  throw new IllegalArgumentException("facetSetMatchers cannot be null or 
empty");
+}
+if (areFacetSetMatcherDimensionsInconsistent(facetSetMatchers)) {
+  throw new IllegalArgumentException("All facet set matchers must be the 
same dimensionality");
+}
+this.field = field;
+this.facetSetMatchers = facetSetMatchers;
+this.counts = new int[facetSetMatchers.length];
+this.totCount = count(field, hits.getMatchingDocs());
+  }
+
+  /** Counts from the provided field. */
+  private int count(String field, List 
matchingDocs)
+  throws IOException {
+
+int totCount = 0;
+for (FacetsCollector.MatchingDocs hits : matchingDocs) {
+
+  BinaryDocValues binaryDocValues = 
DocValues.getBinary(hits.context.reader(), field);
+
+  final DocIdSetIterator it =
+  
ConjunctionUtils.intersectIterators(Arrays.asList(hits.bits.iterator(), 
binaryDocValues));
+  if (it == null) {
+continue;
+  }
+
+  long[] dimValues = null; // dimension values buffer
+  int expectedNumDims = -1;
+  for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = 
it.nextDoc()) {
+boolean shouldCountDoc = false;
+BytesRef bytesRef = binaryDocValues.binaryValue();
+byte[] packedValue = bytesRef.bytes;
+int numDims = (int) LongPoint.decodeDimension(packedValue, 0);
+if (expectedNumDims == -1) {
+  expectedNumDims = numDims;
+  dimValues = new long[numDims];
+} else {
+  // Verify that the number of indexed dimensions for all matching 
documents is the same
+  // (since we cannot verify that at indexing time).
+  assert numDims == expectedNumDims
+  : "Expected ("
+  + expectedNumDims
+  + ") dimensions, found ("
+  + numDims
+  + ") for doc ("
+  + doc
+  + ")";
+}
+
+for (int start = Long.BYTES; start < bytesRef.length; start += numDims 
* Long.BYTES) {
+  LongPoint.unpack(bytesRef, start, dimValues);
+  for (int j = 0; j < facetSetMatchers.length; j++) { // for each 
facet set matcher
+if (facetSetMatchers[j].matches(dimValues)) {
+  counts[j]++;
+  shouldCountDoc = true;
+}
+  }
+}
+if (shouldCountDoc) {
+  totCount++;
+}
+  }
+}
+return totCount;
+  }
+
+  // TODO: This does not really provide "top children" functionality yet but 
provides "all
+  // children". This is being worked on in LUCENE-10550
+  @Override
+  public FacetResult getTopChi

[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


shaie commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r901119754


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.facetset;
+
+import java.util.Arrays;
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet 
facet sets}. The encoding
+ * scheme consists of a packed {@code long[]} where the first value denotes 
the number of dimensions
+ * in all the sets, followed by each set's values.
+ *
+ * @lucene.experimental
+ */
+public class FacetSetsField extends BinaryDocValuesField {
+
+  /**
+   * Create a new FacetSets field.
+   *
+   * @param name field name
+   * @param facetSets the {@link FacetSet} to index in that field. All must 
have the same number of

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1159759023

   @gsmiller, @mdmarshmallow I pushed another commit which completes the 
FacetSets document and adds another check ensuring all `FacetSet` given to the 
`FacetSetField` are actually of the same type (no sense indexing different 
types under the same field). The PR feels ready to me, except `RangeMatching` 
which I'm waiting for @mdmarshmallow to comment about whether he agrees to 
remove it or not. After we agree on that, I suggest that we do a final round of 
review and wrap up this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-19 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556087#comment-17556087
 ] 

Tomoko Uchida edited comment on LUCENE-10622 at 6/19/22 4:11 PM:
-

Skelton of the migration tool: 
[https://github.com/mocobeta/sandbox-lucene-10557]
1. download jira issues
2. first pass : create github issues
3. convert jira issues to github issues and resolve links/id mappings
4. second pass : update github issues (and comments)


was (Author: tomoko uchida):
Skelton of the migration tool: 
[https://github.com/mocobeta/sandbox-lucene-10557]
1. download jira issues
2. first pass : create github issues
3. convert jira issues to github issues and resolve links/id mappings
4. update github issues (and comments)

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)

2022-06-19 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556087#comment-17556087
 ] 

Tomoko Uchida commented on LUCENE-10622:


Skelton of the migration tool: 
[https://github.com/mocobeta/sandbox-lucene-10557]
1. download jira issues
2. first pass : create github issues
3. convert jira issues to github issues and resolve links/id mappings
4. update github issues (and comments)

> Prepare complete migration script to GitHub issue from Jira (best effort)
> -
>
> Key: LUCENE-10622
> URL: https://issues.apache.org/jira/browse/LUCENE-10622
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> If we intend to move the history to GitHub, it should be perfect as far as 
> possible - significantly degraded copies of history are harmful, rather than 
> helpful for future contributors, I think.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


mdmarshmallow commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1159832697

   Agree with you on the `RangeMatching` solution, your way is cleaner. I'll 
remove it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


mdmarshmallow commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r901174337


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.facetset;
+
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.IntPoint;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet 
facet sets}. The encoding
+ * scheme consists of a packed {@code byte[]} where the first value denotes 
the number of dimensions
+ * in all the sets, followed by each set's values.
+ *
+ * @lucene.experimental
+ */
+public class FacetSetsField extends BinaryDocValuesField {
+
+  /**
+   * Create a new FacetSets field.
+   *
+   * @param name field name
+   * @param facetSets the {@link FacetSet facet sets} to index in that field. 
All must have the same
+   * number of dimensions
+   * @throws IllegalArgumentException if the field name is null or the given 
facet sets are invalid
+   */
+  public static FacetSetsField create(String name, FacetSet... facetSets) {
+if (facetSets == null || facetSets.length == 0) {
+  throw new IllegalArgumentException("FacetSets cannot be null or empty!");
+}
+
+return new FacetSetsField(name, toPackedValues(facetSets));
+  }
+
+  private FacetSetsField(String name, BytesRef value) {
+super(name, value);
+  }
+
+  private static BytesRef toPackedValues(FacetSet... facetSets) {
+int numDims = facetSets[0].dims;
+Class expectedClass = facetSets[0].getClass();
+byte[] buf = new byte[Integer.BYTES + facetSets[0].sizePackedBytes() * 
facetSets.length];
+IntPoint.encodeDimension(numDims, buf, 0);
+int offset = Integer.BYTES;
+for (FacetSet facetSet : facetSets) {
+  if (facetSet.dims != numDims) {
+throw new IllegalArgumentException(
+"All FacetSets must have the same number of dimensions. Expected "
++ numDims
++ " found "
++ facetSet.dims);
+  }
+  // It doesn't make sense to index facet sets of different types in the 
same field
+  if (facetSet.getClass() != expectedClass) {

Review Comment:
   Thoughts on using generics here to enforce this at compile time?



##
lucene/facet/src/java/org/apache/lucene/facet/facetset/package-info.java:
##
@@ -0,0 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Provides FacetSets faceting capabilities. */

Review Comment:
   Maybe make this slightly more descriptive? "Provides FacetSets faceting 
capabilities which allows users to facet on on high dimensional field values. 
See FacetSets.adoc in the docs package for more information on usage." Or 
something like that.



##
lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java:
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.

[jira] [Created] (LUCENE-10624) Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock

2022-06-19 Thread Weiming Wu (Jira)
Weiming Wu created LUCENE-10624:
---

 Summary: Binary Search for Sparse IndexedDISI advanceWithinBlock & 
advanceExactWithinBlock
 Key: LUCENE-10624
 URL: https://issues.apache.org/jira/browse/LUCENE-10624
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 9.2, 9.1, 9.0
Reporter: Weiming Wu


h3. Problem Statement

We noticed DocValue read performance regression with the iterative API when 
upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The 
degradation is similar to what's described in 
https://issues.apache.org/jira/browse/SOLR-9599 


By analyzing profiling data, we found method "advanceWithinBlock" and 
"advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to 
their O(N) doc lookup algorithm. 
h3. Changes

Used binary search algorithm to replace current O(N) lookup algorithm in Sparse 
IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are 
in ascending order.
h3. Test
{code:java}
./gradlew tidy
./gradlew check {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wuwm opened a new pull request, #968: [LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc…

2022-06-19 Thread GitBox


wuwm opened a new pull request, #968:
URL: https://github.com/apache/lucene/pull/968

   ### Description (or a Jira issue link if you have one)
   https://issues.apache.org/jira/browse/LUCENE-10624


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


shaie commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r901238006


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/FacetSetsField.java:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.facetset;
+
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.IntPoint;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * A {@link BinaryDocValuesField} which encodes a list of {@link FacetSet 
facet sets}. The encoding
+ * scheme consists of a packed {@code byte[]} where the first value denotes 
the number of dimensions
+ * in all the sets, followed by each set's values.
+ *
+ * @lucene.experimental
+ */
+public class FacetSetsField extends BinaryDocValuesField {
+
+  /**
+   * Create a new FacetSets field.
+   *
+   * @param name field name
+   * @param facetSets the {@link FacetSet facet sets} to index in that field. 
All must have the same
+   * number of dimensions
+   * @throws IllegalArgumentException if the field name is null or the given 
facet sets are invalid
+   */
+  public static FacetSetsField create(String name, FacetSet... facetSets) {
+if (facetSets == null || facetSets.length == 0) {
+  throw new IllegalArgumentException("FacetSets cannot be null or empty!");
+}
+
+return new FacetSetsField(name, toPackedValues(facetSets));
+  }
+
+  private FacetSetsField(String name, BytesRef value) {
+super(name, value);
+  }
+
+  private static BytesRef toPackedValues(FacetSet... facetSets) {
+int numDims = facetSets[0].dims;
+Class expectedClass = facetSets[0].getClass();
+byte[] buf = new byte[Integer.BYTES + facetSets[0].sizePackedBytes() * 
facetSets.length];
+IntPoint.encodeDimension(numDims, buf, 0);
+int offset = Integer.BYTES;
+for (FacetSet facetSet : facetSets) {
+  if (facetSet.dims != numDims) {
+throw new IllegalArgumentException(
+"All FacetSets must have the same number of dimensions. Expected "
++ numDims
++ " found "
++ facetSet.dims);
+  }
+  // It doesn't make sense to index facet sets of different types in the 
same field
+  if (facetSet.getClass() != expectedClass) {

Review Comment:
   Not sure what will we generify? E.g. you and I explored `FacetSet` before 
but it complicates things and not sure it will work w/ e.g. the 
`TemperatureReadingFacetSet` (and the like) which mix several dimension types. 
Another thing - I don't want to over-complicate the API for something that is 
at the end of the day just extra safety, I can't see why would someone try to 
index two different `FacetSet` types in the same field and expect it to work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-06-19 Thread GitBox


shaie commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r901239793


##
lucene/facet/src/java/org/apache/lucene/facet/facetset/RangeFacetSetMatcher.java:
##
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.facetset;
+
+import java.util.Arrays;
+import org.apache.lucene.util.NumericUtils;
+
+/**
+ * A {@link FacetSetMatcher} which considers a set as a match if all 
dimensions fall within the
+ * given corresponding range.
+ *
+ * @lucene.experimental
+ */
+public class RangeFacetSetMatcher extends FacetSetMatcher {
+
+  private final long[] lowerRanges;
+  private final long[] upperRanges;
+
+  /**
+   * Constructs an instance to match facet sets with dimensions that fall 
within the given ranges.
+   */
+  public RangeFacetSetMatcher(String label, DimRange... dimRanges) {
+super(label, getDims(dimRanges));
+this.lowerRanges = Arrays.stream(dimRanges).mapToLong(range -> 
range.min).toArray();
+this.upperRanges = Arrays.stream(dimRanges).mapToLong(range -> 
range.max).toArray();
+  }
+
+  @Override
+  public boolean matches(long[] dimValues) {
+assert dimValues.length == dims
+: "Encoded dimensions (dims="
++ dimValues.length
++ ") is incompatible with range dimensions (dims="
++ dims
++ ")";
+
+for (int i = 0; i < dimValues.length; i++) {
+  if (dimValues[i] < lowerRanges[i]) {
+// Doc's value is too low in this dimension
+return false;
+  }
+  if (dimValues[i] > upperRanges[i]) {
+// Doc's value is too high in this dimension
+return false;
+  }
+}
+return true;
+  }
+
+  private static int getDims(DimRange... dimRanges) {
+if (dimRanges == null || dimRanges.length == 0) {
+  throw new IllegalArgumentException("dimRanges cannot be null or empty");
+}
+return dimRanges.length;
+  }
+
+  /**
+   * Creates a {@link DimRange} for the given min and max long values. This 
method is also suitable
+   * for int values.
+   */
+  public static DimRange fromLongs(long min, boolean minInclusive, long max, 
boolean maxInclusive) {

Review Comment:
   Yeah makes sense to me too! The only bummer is that it makes lines such as 
`RangeFacetSetMatcher.fromLongs` become 
`RangeFacetSetMatcher.DimRange.fromLongs`. Should we extract `DimRange` as a 
top-level class? I'm not too obsessed about it though.



##
lucene/facet/src/java/org/apache/lucene/facet/facetset/package-info.java:
##
@@ -0,0 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Provides FacetSets faceting capabilities. */

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-19 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556188#comment-17556188
 ] 

Tomoko Uchida commented on LUCENE-10557:


For version control, there are two considerations.

1. Fix Version(s)

We have two options: Milestone or Label. One important difference between them 
is that an issue can have only one milestone but multiple labels. The other 
difference would be that while Milestone is special metadata, labels are just 
flexible text tags for searching. I'm personally fine with Milestone - we don't 
release a bug fix or improvement in multiple versions anyway. We don't have two 
CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" 
the CHANGES entry appears only in Lucene 9.3.0's section. 
If there are other perspectives, would you share your thoughts on it.

2. Affects Version(s)

45% of unresolved issues have this field. Maybe we could have issue labels such 
as "affectsVersion:9.3.0". I have never used this metadata field and I myself 
have no problem with omitting this in GitHub. Is there anyone who has thoughts 
on it?

-- 
Aside from versions, I'm not fully sure about how to port the "Priority" field 
(Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but 
there seem no clear standards on how to set a priority except for "Blocker". 
Should we have this also in GitHub as a mandatory label, or should we have this 
as an optional one, or perhaps can we omit this in GitHub if 
developers/committers don't really take care of this?

 

> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that. 
> Things to consider:
>  *** version numbers - labels or milestones?
>  *** add a comment/ prepend a link to the source Jira issue on github side,
>  *** add a comment/ prepend a link on the jira side to the new issue on 
> github side (for people who access jira from blogs, mailing list archives and 
> other sources that will have stale links),
>  *** convert cross-issue automatic links in comments/ descriptions (as 
> suggested by Robert),
>  *** strategy to deal with sub-issues (hierarchies),
>  *** maybe prefix (or postfix) the issue title on github side with the 
> original LUCENE-XYZ key so that it is easier to search for a particular issue 
> there?
>  *** how to deal with user IDs (author, reporter, commenters)? Do they have 
> to be github users? Will information about people not registered on github be 
> lost?
>  *** create an extra mapping file of old-issue-new-issue URLs for any 
> potential future uses. 
>  *** what to do with issue numbers in git/svn commits? These could be 
> rewritten but it'd change the entire git history tree - I don't think this is 
> practical, while doable.
>  * Build the convention for issue label/milestone management
>  ** Do some experiments on a sandbox repository 
> [https://github.com/mocobeta/sandbox-lucene-10557]
>  ** Make documentation for metadata (label/milestone) management 
>  * Enable Github issue on the lucene's repository
>  ** Raise an issue on INFRA
>  ** (Create an issue-only private repository for sensitive issues if it's 
> needed and allowed)
>  ** Set a mail hook to 
> [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to 
> the general mail group name)
>  * Set a schedule for migration
>  ** Give some time to committers to play around with issues/labels/milestones 
> before the actual migration
>  ** Make an announcement on the mail lists
>  ** Show some text messages when opening a new Jira issue (in issue template?)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira

2022-06-19 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556188#comment-17556188
 ] 

Tomoko Uchida edited comment on LUCENE-10557 at 6/20/22 5:11 AM:
-

For version control, there are two considerations.

1. Fix Version(s)

We have two options: Milestone or Label. One important difference between them 
is that an issue can have only one milestone but multiple labels. The other 
difference would be that while Milestone is special metadata, labels are just 
flexible text tags for searching. I'm personally fine with Milestone - we don't 
release a bug fix or improvement in multiple versions anyway. We don't have two 
CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" 
the CHANGES entry appears only in Lucene 9.3.0's section. 
If there are other perspectives, would you share your thoughts on it.

2. Affects Version(s)

35% of unresolved issues have this field. Maybe we could have issue labels such 
as "affectsVersion:9.3.0". I have never used this metadata field and I myself 
have no problem with omitting this in GitHub. Is there anyone who has thoughts 
on it?

-- 
Aside from versions, I'm not fully sure about how to port the "Priority" field 
(Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but 
there seem no clear standards on how to set a priority except for "Blocker". 
Should we have this also in GitHub as a mandatory label, or should we have this 
as an optional one, or perhaps can we omit this in GitHub if 
developers/committers don't really take care of this?

 


was (Author: tomoko uchida):
For version control, there are two considerations.

1. Fix Version(s)

We have two options: Milestone or Label. One important difference between them 
is that an issue can have only one milestone but multiple labels. The other 
difference would be that while Milestone is special metadata, labels are just 
flexible text tags for searching. I'm personally fine with Milestone - we don't 
release a bug fix or improvement in multiple versions anyway. We don't have two 
CHANGES entries for one issue; if we resolve an issue in "10.0.0" and "9.3.0" 
the CHANGES entry appears only in Lucene 9.3.0's section. 
If there are other perspectives, would you share your thoughts on it.

2. Affects Version(s)

45% of unresolved issues have this field. Maybe we could have issue labels such 
as "affectsVersion:9.3.0". I have never used this metadata field and I myself 
have no problem with omitting this in GitHub. Is there anyone who has thoughts 
on it?

-- 
Aside from versions, I'm not fully sure about how to port the "Priority" field 
(Blocker, Critical, Major, Minor, Trivial). It's a mandatory field in Jira but 
there seem no clear standards on how to set a priority except for "Blocker". 
Should we have this also in GitHub as a mandatory label, or should we have this 
as an optional one, or perhaps can we omit this in GitHub if 
developers/committers don't really take care of this?

 

> Migrate to GitHub issue from Jira
> -
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * (/) Get a consensus about the migration among committers
>  * Choose issues that should be moved to GitHub
>  ** Discussion thread 
> [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12]
>  ** -Conclusion for now: We don't migrate any issues. Only new issues should 
> be opened on GitHub.-
>  ** Write a prototype migration script - the decision could be made on that. 
> Things to consider:
>  *** version numbers - labels or milestones?
>  *** add a comment/ prepend a link to the source Jira issue on github side,
>  *** add a comment/ prepend a link on the jira side to the new issue on 
> github side (for people who access jira from blogs, mailing list archives and 
> other sources that will have stale links),
>  *** convert cross-issue automatic links in comments/ descriptions (as 
> suggested by Robert),
>  *** strategy to deal with sub-issues (hierarchies),
>  *** maybe prefix (or postfix) the issue title on github side with the 
> original LUCENE-XYZ key so that it is easier to search for a particular