[jira] [Resolved] (LUCENE-10555) avoid repeated NumericLeafComparator setScorer calls

2022-05-11 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10555.
---
Fix Version/s: 9.2
   Resolution: Fixed

> avoid repeated NumericLeafComparator setScorer calls
> 
>
> Key: LUCENE-10555
> URL: https://issues.apache.org/jira/browse/LUCENE-10555
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> ElasticSearch use CancellableBulkScorer to fast cancel long time query 
> execution by dividing one segment docs to many small split docs. For every 
> split docs, collector.setScorer(scorer) is called, then  
> NumericLeafComparator#setScorer is called. As a result, for one segment, 
> NumericLeafComparator#setScorer is called many times. 
> Every time NumericLeafComparator#setScorer is called, the 
> NumericLeafComparator#iteratorCost is reset to the Scorer.cost and increase 
> many unnecessary  pointValues#intersect calls to get  competitive docs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread GitBox


wjp719 commented on code in PR #780:
URL: https://github.com/apache/lucene/pull/780#discussion_r869953128


##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -269,11 +276,23 @@ public PointValues.Relation compare(byte[] 
minPackedValue, byte[] maxPackedValue
   if (estimatedNumberOfMatches >= threshold) {
 // the new range is not selective enough to be worth materializing, it 
doesn't reduce number
 // of docs at least 8x
+if (updateCounter > 256) {
+  if (tryUpdateFailCount >= 3) {
+currentSkipInterval = Math.min(currentSkipInterval * 2, 
MAX_SKIP_INTERVAL);
+tryUpdateFailCount = 0;
+  } else {
+tryUpdateFailCount++;
+  }
+}

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 commented on pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread GitBox


wjp719 commented on PR #780:
URL: https://github.com/apache/lucene/pull/780#issuecomment-1123279942

   > I'm curious about `tryUpdateFailCount`, did you get better results on the 
benchmark with it than without it?
   
   @jpountz yes, with `tryUpdateFailCount`, the case 
`asc_sort_with_after_timestamp` perform better than without it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #875: LUCENE-10560: Speed up OrdinalMap construction a bit.

2022-05-11 Thread GitBox


jpountz commented on code in PR #875:
URL: https://github.com/apache/lucene/pull/875#discussion_r870001129


##
lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java:
##
@@ -48,10 +49,69 @@ public class OrdinalMap implements Accountable {
   // need it
   // TODO: use more efficient packed ints structures?
 
+  /**
+   * Copy the first 8 bytes of the given term as a comparable unsigned long. 
In case the term has
+   * less than 8 bytes, missing bytes will be replaced with zeroes. Note that 
two terms that produce
+   * the same long could still be different due to the fact that missing bytes 
are replaced with
+   * zeroes, e.g. {@code [1, 0]} and {@code [1]} get mapped to the same long.
+   */
+  static long prefix8ToComparableUnsignedLong(BytesRef term) {
+// Use Big Endian so that longs are comparable
+if (term.length >= Long.BYTES) {
+  return (long) BitUtil.VH_BE_LONG.get(term.bytes, term.offset);
+} else {
+  long l;
+  int offset;
+  if (term.length >= Integer.BYTES) {
+l = (int) BitUtil.VH_BE_INT.get(term.bytes, term.offset);
+offset = Integer.BYTES;
+  } else {
+l = 0;
+offset = 0;
+  }
+  while (offset < term.length) {
+l = (l << 8) | Byte.toUnsignedLong(term.bytes[term.offset + offset]);
+offset++;
+  }
+  l <<= (Long.BYTES - term.length) << 3;
+  return l;
+}
+  }
+
+  private static int compare(BytesRef termA, long prefix8A, BytesRef termB, 
long prefix8B) {
+assert prefix8A == prefix8ToComparableUnsignedLong(termA);

Review Comment:
   The main improvement I can think of would consist of looking up the first 
and last values of the segment to check if all values share a common prefix, 
e.g. the IPv4-mapped IPv6 addresses case. Maybe in the future we could split 
the value space into smaller blocks or something like that that would help us 
still handle well cases when many values share a common prefix but not all, 
e.g. a dataset of URLs where many values have the `https://www.` prefix, but 
not all, or a dataset that mixes lots of IPv4-mapped IPv6 addresses with 
regular IPv6 addresses.
   
   Maybe the API could tell us about the min and max term lengths, so that we 
could optimize the fixed-length case (e.g. geonames IDs) in the future a bit.
   
   I don't have many ideas beyond these ones. I tried to review existing 
litterature for binary search and sorting string[] keys, which have 
commonalities with what we're doing here since there's a value that's 
potentially going to be compared with several other values, and it looks like 
the main idea consists of identifying shared prefixes so that these bytes 
wouldn't have to be compared over and over again. Maybe something we can try 
out next.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #875: LUCENE-10560: Speed up OrdinalMap construction a bit.

2022-05-11 Thread GitBox


jpountz commented on code in PR #875:
URL: https://github.com/apache/lucene/pull/875#discussion_r870010670


##
lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java:
##
@@ -48,10 +49,69 @@ public class OrdinalMap implements Accountable {
   // need it
   // TODO: use more efficient packed ints structures?
 
+  /**
+   * Copy the first 8 bytes of the given term as a comparable unsigned long. 
In case the term has
+   * less than 8 bytes, missing bytes will be replaced with zeroes. Note that 
two terms that produce
+   * the same long could still be different due to the fact that missing bytes 
are replaced with
+   * zeroes, e.g. {@code [1, 0]} and {@code [1]} get mapped to the same long.
+   */
+  static long prefix8ToComparableUnsignedLong(BytesRef term) {
+// Use Big Endian so that longs are comparable
+if (term.length >= Long.BYTES) {
+  return (long) BitUtil.VH_BE_LONG.get(term.bytes, term.offset);
+} else {
+  long l;
+  int offset;
+  if (term.length >= Integer.BYTES) {
+l = (int) BitUtil.VH_BE_INT.get(term.bytes, term.offset);
+offset = Integer.BYTES;
+  } else {
+l = 0;
+offset = 0;
+  }
+  while (offset < term.length) {
+l = (l << 8) | Byte.toUnsignedLong(term.bytes[term.offset + offset]);
+offset++;
+  }
+  l <<= (Long.BYTES - term.length) << 3;
+  return l;
+}
+  }
+
+  private static int compare(BytesRef termA, long prefix8A, BytesRef termB, 
long prefix8B) {
+assert prefix8A == prefix8ToComparableUnsignedLong(termA);

Review Comment:
   I added a TODO to look into this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870023604


##
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java:
##
@@ -377,10 +377,13 @@ private static class FieldEntry {
   for (int level = 0; level < numLevels; level++) {
 if (level == 0) {
   graphOffsetsByLevel[level] = 0;
+} else if (level == 1) {
+  int numNodesOn0Level = size;

Review Comment:
   minor: `numNodesOnLevel0` might be clearer at the first glance (and 
consistent with other parts)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870030770


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
   /** Random seed for level generation; public to expose for testing * */
   public static long randSeed = DEFAULT_RAND_SEED;
 
-  private final int maxConn;
+  private final int M; // max number of connections on upper layers
+  private final int maxConn0; // max number of connections on the 0th (last) 
layer

Review Comment:
   I'd agree with this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mocobeta commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870035060


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -31,6 +31,7 @@
 public final class OnHeapHnswGraph extends HnswGraph {
 
   private final int maxConn;
+  private final int maxConn0;

Review Comment:
   I guess we could have just `M` here too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread GitBox


jpountz merged PR #780:
URL: https://github.com/apache/lucene/pull/780


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534783#comment-17534783
 ] 

ASF subversion and git services commented on LUCENE-10496:
--

Commit e49708e01da38c2f3d8ef8ac7e7c9198e26bf867 in lucene's branch 
refs/heads/main from xiaoping
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e49708e01da ]

LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort 
and search sort are in opposite direction (#780)



> avoid unnecessary attempts to evaluate skipping doc if index sort and search 
> sort are in opposite direction
> ---
>
> Key: LUCENE-10496
> URL: https://issues.apache.org/jira/browse/LUCENE-10496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> users ofter write doc with indexSorting in one direction(asc or desc) , but 
> need to search top docs both in two direction (asc and desc)
> if index sort and search sort are in opposite direction, 
> *NumericLeafComparator* needn't to check if can skip non-competitive doc 
> inside one segments, because the rest docs are all competitive. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10496.
---
Fix Version/s: 9.2
   Resolution: Fixed

> avoid unnecessary attempts to evaluate skipping doc if index sort and search 
> sort are in opposite direction
> ---
>
> Key: LUCENE-10496
> URL: https://issues.apache.org/jira/browse/LUCENE-10496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> users ofter write doc with indexSorting in one direction(asc or desc) , but 
> need to search top docs both in two direction (asc and desc)
> if index sort and search sort are in opposite direction, 
> *NumericLeafComparator* needn't to check if can skip non-competitive doc 
> inside one segments, because the rest docs are all competitive. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534787#comment-17534787
 ] 

ASF subversion and git services commented on LUCENE-10496:
--

Commit 6a973cfa269b42f4a77f41a70bdab387bfa37bf9 in lucene's branch 
refs/heads/branch_9x from xiaoping
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6a973cfa269 ]

LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort 
and search sort are in opposite direction (#780)



> avoid unnecessary attempts to evaluate skipping doc if index sort and search 
> sort are in opposite direction
> ---
>
> Key: LUCENE-10496
> URL: https://issues.apache.org/jira/browse/LUCENE-10496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> users ofter write doc with indexSorting in one direction(asc or desc) , but 
> need to search top docs both in two direction (asc and desc)
> if index sort and search sort are in opposite direction, 
> *NumericLeafComparator* needn't to check if can skip non-competitive doc 
> inside one segments, because the rest docs are all competitive. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534786#comment-17534786
 ] 

ASF subversion and git services commented on LUCENE-10496:
--

Commit 54595611aefb513f3f47a48a38caf70a4dddc701 in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=54595611aef ]

LUCENE-10496: CHANGES entry.


> avoid unnecessary attempts to evaluate skipping doc if index sort and search 
> sort are in opposite direction
> ---
>
> Key: LUCENE-10496
> URL: https://issues.apache.org/jira/browse/LUCENE-10496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> users ofter write doc with indexSorting in one direction(asc or desc) , but 
> need to search top docs both in two direction (asc and desc)
> if index sort and search sort are in opposite direction, 
> *NumericLeafComparator* needn't to check if can skip non-competitive doc 
> inside one segments, because the rest docs are all competitive. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534788#comment-17534788
 ] 

ASF subversion and git services commented on LUCENE-10496:
--

Commit 3b36d85966ebb399d2130cb66cb8b40a72440f85 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3b36d85966e ]

LUCENE-10496: CHANGES entry.


> avoid unnecessary attempts to evaluate skipping doc if index sort and search 
> sort are in opposite direction
> ---
>
> Key: LUCENE-10496
> URL: https://issues.apache.org/jira/browse/LUCENE-10496
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> users ofter write doc with indexSorting in one direction(asc or desc) , but 
> need to search top docs both in two direction (asc and desc)
> if index sort and search sort are in opposite direction, 
> *NumericLeafComparator* needn't to check if can skip non-competitive doc 
> inside one segments, because the rest docs are all competitive. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #876: LUCENE-9356: Change test to detect mismatched checksums instead of byte flips.

2022-05-11 Thread GitBox


jpountz commented on PR #876:
URL: https://github.com/apache/lucene/pull/876#issuecomment-1123706148

   I removed the dependency on LineFileDocs. Interestingly, this test caught an 
issue with vectors, which don't close index inputs on all paths. cc 
@mayya-sharipova since there are in-progress PRs that make changes to vectors 
formats


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #873: LUCENE-10397: KnnVectorQuery doesn't tie break by doc ID

2022-05-11 Thread GitBox


jpountz commented on PR #873:
URL: https://github.com/apache/lucene/pull/873#issuecomment-1123721194

   Is it possible to somehow encode longs differently in the reverse case, so 
that we don't have to customize the comparison function?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9409) TestAllFilesDetectTruncation failures

2022-05-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534878#comment-17534878
 ] 

Robert Muir commented on LUCENE-9409:
-

the test also doesn't account for the case that you might truncate and happen 
to have CODEC_MAGIC bytes at the right place...

> TestAllFilesDetectTruncation failures
> -
>
> Key: LUCENE-9409
> URL: https://issues.apache.org/jira/browse/LUCENE-9409
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Elastic CI found a seed that reproducibly fails 
> TestAllFilesDetectTruncation.
> https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+nightly+branch_8x/85/console
> This is a consequence of LUCENE-9396: we now check for truncation after 
> creating slices, so in some cases you would get an IndexOutOfBoundsException 
> rather than CorruptIndexException/EOFException if out-of-bounds slices get 
> created.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand closed pull request #861: LUCENE-10551: switch to PUAFIF

2022-05-11 Thread GitBox


mikemccand closed pull request #861: LUCENE-10551: switch to PUAFIF
URL: https://github.com/apache/lucene/pull/861


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


mayya-sharipova commented on PR #877:
URL: https://github.com/apache/lucene/pull/877#issuecomment-1123806812

   @LuXugang Thanks for opening this PR. Is this a copy of 
https://github.com/apache/lucene/tree/vectors-disi-direct? I thought we can 
just open a PR of this branch against `main` branch, like 
[this](https://github.com/apache/lucene/compare/vectors-disi-direct?expand=1), 
since we have already approved everything in `vectors-disi-direct` branch, it 
would be quite easy for use to approve this new PR?  WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request, #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


LuXugang opened a new pull request, #880:
URL: https://github.com/apache/lucene/pull/880

   follow up of https://github.com/apache/lucene/pull/792 and 
https://github.com/apache/lucene/pull/870


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang closed pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


LuXugang closed pull request #877: LUCENE-10502: Use IndexedDISI to store 
docIds and DirectMonotonicWriter/Reader to handle ordToDoc
URL: https://github.com/apache/lucene/pull/877


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


LuXugang commented on PR #877:
URL: https://github.com/apache/lucene/pull/877#issuecomment-1123826295

   Thanks @mayya-sharipova,  I just got to learn this new git operation, see  
https://github.com/apache/lucene/pull/880, this PR will be close.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress

2022-05-11 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534927#comment-17534927
 ] 

Uwe Schindler commented on LUCENE-10551:


I think you should also open a bug report in GraalVM.

> LowercaseAsciiCompression should return false when it's unable to compress
> --
>
> Key: LUCENE-10551
> URL: https://issues.apache.org/jira/browse/LUCENE-10551
> Project: Lucene - Core
>  Issue Type: Bug
> Environment: Lucene version 8.11.1
>Reporter: Peixin Li
>Priority: Major
> Attachments: LUCENE-10551-test.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code:java}
>  Failed to commit..
> java.lang.IllegalStateException: 10 <> 5 
> cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion
>  cloud gen2tion instance - dev1tion instance - 
> testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o
>         at 
> org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>         at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>         at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>         at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267)
>         at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>         at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476)
>         at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656)
>         at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364)
>         at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770)
>         at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728)
>        {code}
> {code:java}
> key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow,
>  resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, 
> domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1})
> java.lang.IllegalStateException: 29 <> 16 
> analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a5f6e74d6394a8641fdb899boke-public-tiller@sha256:c2eb6e580123622e1bc0ff3becae3a3a71ac36c98a2786d780590197839175e5osms/opcbuild-osms-agent-proxy-java:

[jira] [Created] (LUCENE-10567) Can LongDistanceFeatureQuery benefit from better sampling technique to evaluate iterator for competitive hits

2022-05-11 Thread Mayya Sharipova (Jira)
Mayya Sharipova created LUCENE-10567:


 Summary: Can LongDistanceFeatureQuery benefit from better sampling 
technique to evaluate iterator for competitive hits
 Key: LUCENE-10567
 URL: https://issues.apache.org/jira/browse/LUCENE-10567
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mayya Sharipova


LUCENE-10496 introduced an improvement in sampling technique to evaluate if we 
can iterate over a subset of points instead of doc values in sorting.  This 
original code was inspired by how 
[LongDistanceFeatureQuery|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/LongDistanceFeatureQuery.java#L361]
 computes competitive hits. 

We should investigate if the same improvement in sampling technique can benefit 
 LongDistanceFeatureQuery  as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


LuXugang commented on PR #880:
URL: https://github.com/apache/lucene/pull/880#issuecomment-1123836523

   Hi, @mayya-sharipova , such merge operation is new to me, I am not sure if I 
could add an entry to CHANGES.txt correctly, so I could add entry after this PR 
merged or could you be nice to help me out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


mayya-sharipova commented on PR #880:
URL: https://github.com/apache/lucene/pull/880#issuecomment-1123863575

   @LuXugang Right,  you can add an entry to CHANGES.txt after this PR is 
merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870463506


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -68,42 +69,43 @@ public final class HnswGraphBuilder {
*
* @param vectors the vectors whose relations are represented by the graph - 
must provide a
* different view over those vectors than the one used to add via 
addGraphNode.
-   * @param maxConn the number of connections to make when adding a new graph 
node; roughly speaking
-   * the graph fanout.
+   * @param M the number of connections to make when adding a new graph node; 
roughly speaking the

Review Comment:
   Addressed in d16168f77cbeae581a093daa53113bf897ab8e31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464398


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -43,7 +43,8 @@ public final class HnswGraphBuilder {
   /** Random seed for level generation; public to expose for testing * */
   public static long randSeed = DEFAULT_RAND_SEED;
 
-  private final int maxConn;
+  private final int M; // max number of connections on upper layers
+  private final int maxConn0; // max number of connections on the 0th (last) 
layer

Review Comment:
   Addressed in d16168f77cbeae581a093daa53113bf897ab8e31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464781


##
lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java:
##
@@ -256,10 +256,11 @@ public void testSearchWithSelectiveAcceptOrds() throws 
IOException {
 
   public void testSearchWithSkewedAcceptOrds() throws IOException {
 int nDoc = 1000;
+int maxConn = 16;

Review Comment:
   Good suggestion, I've decided to pull out this parameter. Addressed in 
d16168f77cbeae581a093daa53113bf897ab8e31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870464973


##
lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java:
##
@@ -377,10 +377,13 @@ private static class FieldEntry {
   for (int level = 0; level < numLevels; level++) {
 if (level == 0) {
   graphOffsetsByLevel[level] = 0;
+} else if (level == 1) {
+  int numNodesOn0Level = size;

Review Comment:
   Good suggestion! Addressed in d16168f77cbeae581a093daa53113bf897ab8e31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #872:
URL: https://github.com/apache/lucene/pull/872#discussion_r870466493


##
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##
@@ -31,6 +31,7 @@
 public final class OnHeapHnswGraph extends HnswGraph {
 
   private final int maxConn;
+  private final int maxConn0;

Review Comment:
   Addressed in d16168f77cbeae581a093daa53113bf897ab8e31



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW

2022-05-11 Thread GitBox


mayya-sharipova commented on PR #872:
URL: https://github.com/apache/lucene/pull/872#issuecomment-1123937553

   @jtibshirani  @mocobeta Thanks for your review. I've addressed your latest 
comments in d16168f77cbeae581a093daa53113bf897ab8e31.
   
   The plan for merging this PR is following: once [other 
changes](https://github.com/apache/lucene/pull/880) to vectors' format are 
merged to `main`, I will change this PR based on `main`, that all these changes 
will go to `Lucene92` vector format files instead of the current `Lucene91` 
vector format files.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova merged pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


mayya-sharipova merged PR #880:
URL: https://github.com/apache/lucene/pull/880


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535028#comment-17535028
 ] 

ASF subversion and git services commented on LUCENE-10502:
--

Commit 6040d1648f6e30107086c1f9a159c1979498fb4e in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6040d1648f6 ]

LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader 
to handle ordToDoc (#880)

Currently vector's all docs of all fields are fully loaded into memory (for 
sparse cases).
This happens not only when we do vector search, but also when we open an index 
to 
load meta info for vector readers.

This patch instead uses IndexedDISI to store docIds and 
DirectMonotonicWriter/Reader to 
handle  ordToDoc mapping. Benefits are reduced memory usage, and faster loading 
of 
meta info for vector readers.

> Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle 
> ordToDoc 
> 
>
> Key: LUCENE-10502
> URL: https://issues.apache.org/jira/browse/LUCENE-10502
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Since at search phase, vector's all docs of all fields will be fully loaded 
> into memory, could we use IndexedDISI to store docIds and 
> DirectMonotonicWriter/Reader to handle ordToDoc mapping?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


mayya-sharipova commented on PR #880:
URL: https://github.com/apache/lucene/pull/880#issuecomment-1124044401

   @LuXugang I've merged your PR to `main`.
   
   Other things we need to do:
   1) Please create a PR against `main` add an entry to CHANGES.txt  under the 
`Lucene 9.2.0` release.  
   2) After 1) is merged,  we need to cherry-pick all these changes to 9.x 
branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request, #881: LUCENE-10502: add entry

2022-05-11 Thread GitBox


LuXugang opened a new pull request, #881:
URL: https://github.com/apache/lucene/pull/881

   Add entry.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread GitBox


LuXugang commented on PR #880:
URL: https://github.com/apache/lucene/pull/880#issuecomment-1124111322

   @mayya-sharipova , @jtibshirani  Thanks for your reviews! 
   
   >Please create a PR against main add an entry to CHANGES.txt under the 
Lucene 9.2.0 release.
   
   Addressed in https://github.com/apache/lucene/pull/881.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #881: LUCENE-10502: add entry

2022-05-11 Thread GitBox


mayya-sharipova commented on code in PR #881:
URL: https://github.com/apache/lucene/pull/881#discussion_r870677297


##
lucene/CHANGES.txt:
##
@@ -149,6 +149,9 @@ Optimizations
   when the search order and the index order are in opposite directions.
   (Jianping Weng)
 
+* LUCENE-10502: Use IndexedDISI to store docIds and 
DirectMonotonicWriter/Reader to handle
+  ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang)

Review Comment:
   You don't need to add our names (mine and Julie, as we were reviewers), and 
in any way please put your name first as the main contributor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress

2022-05-11 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535093#comment-17535093
 ] 

Michael McCandless commented on LUCENE-10551:
-

+1 to get to the bottom of the GraalVM mis-compilation.

And also +1 if we can find a simple code change that's low risk / performance 
impact to other JVM users and could side-step this bug.

> LowercaseAsciiCompression should return false when it's unable to compress
> --
>
> Key: LUCENE-10551
> URL: https://issues.apache.org/jira/browse/LUCENE-10551
> Project: Lucene - Core
>  Issue Type: Bug
> Environment: Lucene version 8.11.1
>Reporter: Peixin Li
>Priority: Major
> Attachments: LUCENE-10551-test.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code:java}
>  Failed to commit..
> java.lang.IllegalStateException: 10 <> 5 
> cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion
>  cloud gen2tion instance - dev1tion instance - 
> testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o
>         at 
> org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912)
>         at 
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)
>         at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170)
>         at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120)
>         at 
> org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267)
>         at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350)
>         at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476)
>         at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656)
>         at 
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364)
>         at 
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770)
>         at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728)
>        {code}
> {code:java}
> key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow,
>  resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, 
> domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1})
> java.lang.IllegalStateException: 29 <> 16 
> analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a

[GitHub] [lucene] jtibshirani opened a new pull request, #882: LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage

2022-05-11 Thread GitBox


jtibshirani opened a new pull request, #882:
URL: https://github.com/apache/lucene/pull/882

   Before, it didn't update the estimated memory usage, so calls to ramBytesUsed
   could be totally off.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shahrs87 opened a new pull request, #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes

2022-05-11 Thread GitBox


shahrs87 opened a new pull request, #883:
URL: https://github.com/apache/lucene/pull/883

   
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shahrs87 commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes

2022-05-11 Thread GitBox


shahrs87 commented on PR #883:
URL: https://github.com/apache/lucene/pull/883#issuecomment-1124385828

   ```
apache-lucene % ./gradlew check
   > Task :lucene:analysis:common:generateClassicTokenizerChecksumCheck FAILED
   
   FAILURE: Build failed with an exception.
   
   * Where:
   Script 
'/Users/rushabh.shah/apache-lucene/gradle/generation/regenerate.gradle' line: 
184
   
   * What went wrong:
   Execution failed for task 
':lucene:analysis:common:generateClassicTokenizerChecksumCheck'.
   > Checksums mismatch for derived resources; you might have modified a 
generated resource (regenerate task: generateClassicTokenizer):
 Current:
   
lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=c825a8b8d0d0d893b4914e7161bcd119e7b07b40
 
 Expected:
   
lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=381a9627fd7da6402216e3279cf81a09af222aaf
   ```
   
   When I am running `./gradlew check`, I am getting the above error. Looks 
like I need to update checksum in some file. Do we have any graddle target to 
achieve that ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535139#comment-17535139
 ] 

Rushabh Shah commented on LUCENE-10561:
---

Hi,
This is my first PR so apologies if I ask dumb questions.
I created PR for this change. When I am running ./gradlew check, it is throwing 
error below. 
{noformat}
apache-lucene % ./gradlew check
> Task :lucene:analysis:common:generateClassicTokenizerChecksumCheck FAILED

FAILURE: Build failed with an exception.

* Where:
Script '/Users/rushabh.shah/apache-lucene/gradle/generation/regenerate.gradle' 
line: 184

* What went wrong:
Execution failed for task 
':lucene:analysis:common:generateClassicTokenizerChecksumCheck'.
> Checksums mismatch for derived resources; you might have modified a generated 
> resource (regenerate task: generateClassicTokenizer):
  Current:

lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=c825a8b8d0d0d893b4914e7161bcd119e7b07b40
  
  Expected:

lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=381a9627fd7da6402216e3279cf81a09af222aaf

{noformat}
I understand that I need to update checksum somewhere but don't know whether we 
have some script that does that or we have to compute manually. Please advise.
[~rcmuir] [~tomoko]

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #881: LUCENE-10502: add entry

2022-05-11 Thread GitBox


LuXugang commented on code in PR #881:
URL: https://github.com/apache/lucene/pull/881#discussion_r870839712


##
lucene/CHANGES.txt:
##
@@ -149,6 +149,9 @@ Optimizations
   when the search order and the index order are in opposite directions.
   (Jianping Weng)
 
+* LUCENE-10502: Use IndexedDISI to store docIds and 
DirectMonotonicWriter/Reader to handle
+  ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang)

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a diff in pull request #881: LUCENE-10502: add entry

2022-05-11 Thread GitBox


LuXugang commented on code in PR #881:
URL: https://github.com/apache/lucene/pull/881#discussion_r870839898


##
lucene/CHANGES.txt:
##
@@ -149,6 +149,9 @@ Optimizations
   when the search order and the index order are in opposite directions.
   (Jianping Weng)
 
+* LUCENE-10502: Use IndexedDISI to store docIds and 
DirectMonotonicWriter/Reader to handle
+  ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang)

Review Comment:
   @mayya-sharipova Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mayya-sharipova merged pull request #881: LUCENE-10502: add entry

2022-05-11 Thread GitBox


mayya-sharipova merged PR #881:
URL: https://github.com/apache/lucene/pull/881


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535804#comment-17535804
 ] 

ASF subversion and git services commented on LUCENE-10502:
--

Commit a06460a5380234fbb3335b058e86f8fa64e277d4 in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a06460a5380 ]

LUCENE-10502: add changes entry (#881)



> Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle 
> ordToDoc 
> 
>
> Key: LUCENE-10502
> URL: https://issues.apache.org/jira/browse/LUCENE-10502
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Since at search phase, vector's all docs of all fields will be fully loaded 
> into memory, could we use IndexedDISI to store docIds and 
> DirectMonotonicWriter/Reader to handle ordToDoc mapping?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535830#comment-17535830
 ] 

Robert Muir commented on LUCENE-10561:
--

Some of the tokenizers are auto-generated. For example ClassicTokenizerImpl is 
generated by "jflex" tool from ClassicTokenizer.jflex sources.
The build is just failing because the wrong file was changed.
Personally I recommend avoiding ClasicTokenizerImpl or similar for this issue, 
because these particular generated tokenizers are very complicated. It is hard 
to reduce the member/class visibility due to the way the code generation works. 

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes

2022-05-11 Thread GitBox


mocobeta commented on PR #883:
URL: https://github.com/apache/lucene/pull/883#issuecomment-1124499151

   Hi @shahrs87.
   First of all, thanks for the great PR! As for the failed checksum check for 
`ClassicTokenizerImpl`, please refer to Robert's comment in Jira - I think you 
can omit `ClassicTokenizerImpl` for now and we can improve it in another 
issue/PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shahrs87 commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes

2022-05-11 Thread GitBox


shahrs87 commented on PR #883:
URL: https://github.com/apache/lucene/pull/883#issuecomment-1124521767

   @mocobeta Thank you the reply. I have removed the classes that are generated 
via jflex. I also ran all the workflow test mentioned in Contributors guide and 
all seems to pass. Can you please review the PR ? Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535858#comment-17535858
 ] 

Rushabh Shah commented on LUCENE-10561:
---

Thank you [~rcmuir]  for the comment. I have updated the PR as per your 
suggestions. Please review.

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535859#comment-17535859
 ] 

Rushabh Shah commented on LUCENE-10561:
---

[~rcmuir]  Also can you please assign the Jira to me.

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535862#comment-17535862
 ] 

Robert Muir commented on LUCENE-10561:
--

I think i didn't explain my suggestion well enough. When i said "make entire 
classes package private", I mean that, for example, ArabicNormalizer does not 
need to be public class anymore.

I think this is easier than modifying many individual constants. Some of the 
constants modified in the PR are not related to stemmers/normalizers and may 
cause problems. 

Generally speaking, public constants are harmless, so I don't think there is 
much benefit in hiding them.

But entire classes that need not be public, that is worth fixing because it 
clutters up javadoc and API for no good reason.

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535865#comment-17535865
 ] 

Rushabh Shah commented on LUCENE-10561:
---

Thank you [~rcmuir]  for the clarification. One followup question. Should I 
change only the classes that are Normalizer or Stemmer in this PR and not touch 
other classes ?

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535866#comment-17535866
 ] 

Tomoko Uchida commented on LUCENE-10561:


{quote}Also can you please assign the Jira to me.
{quote}
I don't think we can assign external contributors. Don't worry about it - it's 
not so important thing; A CHANGE entry is sufficient for us to track who 
changes it.

 

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535870#comment-17535870
 ] 

Tomoko Uchida commented on LUCENE-10561:


{quote}Generally speaking, public constants are harmless, so I don't think 
there is much benefit in hiding them.
{quote}
There are some public static constants their type is Array - {{char[][]}}. I 
thought it'd be safe to make them private but is making the entire class 
package-private sufficient?

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535872#comment-17535872
 ] 

Tomoko Uchida commented on LUCENE-10561:


{quote}One followup question. Should I change only the classes that are 
Normalizer or Stemmer in this PR and not touch other classes ?
{quote}
I'd agree with that. It'd be better not to touch Tokenizer/TokenFilters in this 
issue. It's another issue and changing them will break users' code; we'd need 
to be careful about it.

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer

2022-05-11 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535874#comment-17535874
 ] 

Tomoko Uchida commented on LUCENE-10561:


(So... we need to switch Jira/GitHub even such a relatively small issue, and 
conversation is scattered over both of them. This is a good motivation for 
LUCENE-10557 to me.)

> Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and 
> PersianNormalizer
> 
>
> Key: LUCENE-10561
> URL: https://issues.apache.org/jira/browse/LUCENE-10561
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a spin-off of [LUCENE-10312].
> Constants and methods in those classes are exposed to the outside packages; 
> we should be able to limit the visibility to {{private}} or, at least to 
> {{package private}}.
> This change breaks backward compatibility so should be applied to the main 
> branch (10.0) only, and a MIGRATE entry may be needed.
> Also, they seem unchanged since 2008, we could refactor them to embrace newer 
> Java APIs as we did in [https://github.com/apache/lucene/pull/540]. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader

2022-05-11 Thread GitBox


zacharymorn merged PR #833:
URL: https://github.com/apache/lucene/pull/833


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10411) Add NN vectors support to ExitableDirectoryReader

2022-05-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535875#comment-17535875
 ] 

ASF subversion and git services commented on LUCENE-10411:
--

Commit 96036bca9f667edbdc528bfe95eeb2795526e9fa in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=96036bca9f6 ]

LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833)



> Add NN vectors support to ExitableDirectoryReader
> -
>
> Key: LUCENE-10411
> URL: https://issues.apache.org/jira/browse/LUCENE-10411
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is currently unsupported.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-11 Thread Shad Storhaug (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535898#comment-17535898
 ] 

Shad Storhaug commented on LUCENE-10557:


The Lucene.NET project switched to GitHub issues 3 years ago, and we were able 
to get far more contributor participation than on JIRA, but that may just be 
mostly a .NET thing. We held a vote and all participants were unanimous on the 
GitHub Issues migration.

It was pretty straightforward for INFRA to do. We made a script to migrate the 
open issues to GitHub via API, but there were a couple of issues we had that 
you may need to find a workaround for:

# The issues were not added to GitHub in chronological order.
# Since INFRA wouldn't give us permission to the API, we gave the script to 
them and the issues were added to GitHub with the INFRA tech as the owner 
instead of the original person who submitted the issue.

There is a walkthrough of the process here: 
https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22

We didn't disable JIRA support, but we didn't really have to because we 
couldn't get users to participate on it, anyway. But that is something you will 
also need to take into consideration.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #882: LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage

2022-05-11 Thread GitBox


jpountz commented on code in PR #882:
URL: https://github.com/apache/lucene/pull/882#discussion_r871000644


##
lucene/core/src/test/org/apache/lucene/util/TestSparseFixedBitSet.java:
##
@@ -71,4 +71,27 @@ public void testApproximateCardinalityOnDenseSet() {
 }
 assertEquals(numDocs, set.approximateCardinality());
   }
+
+  public void testRamBytesUsed() throws IOException {
+int size = 1000 + random().nextInt(1);
+BitSet original = new SparseFixedBitSet(size);
+for (int i = 0; i < 3; i++) {
+  original.set(random().nextInt(size));
+}
+assertTrue(original.ramBytesUsed() > 0);
+
+BitSet copy = copyOf(original, size);
+BitSet otherBitSet = new SparseFixedBitSet(size);
+int interval = 10 + random().nextInt(100);
+for (int i = 0; i < size; i += interval) {
+  otherBitSet.set(i);
+}
+copy.or(new BitSetIterator(otherBitSet, size));
+assertTrue(copy.ramBytesUsed() > original.ramBytesUsed());
+
+copy = copyOf(original, size);
+copy.or(DocIdSetIterator.all(size));
+assertTrue(copy.ramBytesUsed() > original.ramBytesUsed());
+assertTrue(copy.ramBytesUsed() > size / Byte.SIZE);

Review Comment:
   Would it also make sense to test that copying a SparseFixedBitSet via 
`SparseFixedBitSet#or` gives an instance that has the same `ramBytesUsed`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-11 Thread Shad Storhaug (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535898#comment-17535898
 ] 

Shad Storhaug edited comment on LUCENE-10557 at 5/12/22 6:40 AM:
-

The Lucene.NET project switched to GitHub issues 3 years ago, and we were able 
to get far more contributor participation than on JIRA, but that may just be 
mostly a .NET thing. We held a vote and all participants were unanimous on the 
GitHub Issues migration.

It was pretty straightforward for INFRA to do. We made a script to migrate the 
open issues to GitHub via API, but there were a couple of issues we had that 
you may need to find a workaround for:

# The issues were not added to GitHub in chronological order.
# Since INFRA wouldn't give us permission to the API, we gave the script to 
them and the issues were added to GitHub with the INFRA tech as the owner 
instead of the original person who submitted the issue.

There is a walkthrough of the process here: 
https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22
The INFRA ticket for this is: https://issues.apache.org/jira/browse/INFRA-20118

We didn't disable JIRA support, but we didn't really have to because we 
couldn't get users to participate on it, anyway. But that is something you will 
also need to take into consideration.


was (Author: nightowl888):
The Lucene.NET project switched to GitHub issues 3 years ago, and we were able 
to get far more contributor participation than on JIRA, but that may just be 
mostly a .NET thing. We held a vote and all participants were unanimous on the 
GitHub Issues migration.

It was pretty straightforward for INFRA to do. We made a script to migrate the 
open issues to GitHub via API, but there were a couple of issues we had that 
you may need to find a workaround for:

# The issues were not added to GitHub in chronological order.
# Since INFRA wouldn't give us permission to the API, we gave the script to 
them and the issues were added to GitHub with the INFRA tech as the owner 
instead of the original person who submitted the issue.

There is a walkthrough of the process here: 
https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22

We didn't disable JIRA support, but we didn't really have to because we 
couldn't get users to participate on it, anyway. But that is something you will 
also need to take into consideration.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org