[jira] [Resolved] (LUCENE-10555) avoid repeated NumericLeafComparator setScorer calls
[ https://issues.apache.org/jira/browse/LUCENE-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10555. --- Fix Version/s: 9.2 Resolution: Fixed > avoid repeated NumericLeafComparator setScorer calls > > > Key: LUCENE-10555 > URL: https://issues.apache.org/jira/browse/LUCENE-10555 > Project: Lucene - Core > Issue Type: Improvement >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > ElasticSearch use CancellableBulkScorer to fast cancel long time query > execution by dividing one segment docs to many small split docs. For every > split docs, collector.setScorer(scorer) is called, then > NumericLeafComparator#setScorer is called. As a result, for one segment, > NumericLeafComparator#setScorer is called many times. > Every time NumericLeafComparator#setScorer is called, the > NumericLeafComparator#iteratorCost is reset to the Scorer.cost and increase > many unnecessary pointValues#intersect calls to get competitive docs. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
wjp719 commented on code in PR #780: URL: https://github.com/apache/lucene/pull/780#discussion_r869953128 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -269,11 +276,23 @@ public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue if (estimatedNumberOfMatches >= threshold) { // the new range is not selective enough to be worth materializing, it doesn't reduce number // of docs at least 8x +if (updateCounter > 256) { + if (tryUpdateFailCount >= 3) { +currentSkipInterval = Math.min(currentSkipInterval * 2, MAX_SKIP_INTERVAL); +tryUpdateFailCount = 0; + } else { +tryUpdateFailCount++; + } +} Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
wjp719 commented on PR #780: URL: https://github.com/apache/lucene/pull/780#issuecomment-1123279942 > I'm curious about `tryUpdateFailCount`, did you get better results on the benchmark with it than without it? @jpountz yes, with `tryUpdateFailCount`, the case `asc_sort_with_after_timestamp` perform better than without it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #875: LUCENE-10560: Speed up OrdinalMap construction a bit.
jpountz commented on code in PR #875: URL: https://github.com/apache/lucene/pull/875#discussion_r870001129 ## lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java: ## @@ -48,10 +49,69 @@ public class OrdinalMap implements Accountable { // need it // TODO: use more efficient packed ints structures? + /** + * Copy the first 8 bytes of the given term as a comparable unsigned long. In case the term has + * less than 8 bytes, missing bytes will be replaced with zeroes. Note that two terms that produce + * the same long could still be different due to the fact that missing bytes are replaced with + * zeroes, e.g. {@code [1, 0]} and {@code [1]} get mapped to the same long. + */ + static long prefix8ToComparableUnsignedLong(BytesRef term) { +// Use Big Endian so that longs are comparable +if (term.length >= Long.BYTES) { + return (long) BitUtil.VH_BE_LONG.get(term.bytes, term.offset); +} else { + long l; + int offset; + if (term.length >= Integer.BYTES) { +l = (int) BitUtil.VH_BE_INT.get(term.bytes, term.offset); +offset = Integer.BYTES; + } else { +l = 0; +offset = 0; + } + while (offset < term.length) { +l = (l << 8) | Byte.toUnsignedLong(term.bytes[term.offset + offset]); +offset++; + } + l <<= (Long.BYTES - term.length) << 3; + return l; +} + } + + private static int compare(BytesRef termA, long prefix8A, BytesRef termB, long prefix8B) { +assert prefix8A == prefix8ToComparableUnsignedLong(termA); Review Comment: The main improvement I can think of would consist of looking up the first and last values of the segment to check if all values share a common prefix, e.g. the IPv4-mapped IPv6 addresses case. Maybe in the future we could split the value space into smaller blocks or something like that that would help us still handle well cases when many values share a common prefix but not all, e.g. a dataset of URLs where many values have the `https://www.` prefix, but not all, or a dataset that mixes lots of IPv4-mapped IPv6 addresses with regular IPv6 addresses. Maybe the API could tell us about the min and max term lengths, so that we could optimize the fixed-length case (e.g. geonames IDs) in the future a bit. I don't have many ideas beyond these ones. I tried to review existing litterature for binary search and sorting string[] keys, which have commonalities with what we're doing here since there's a value that's potentially going to be compared with several other values, and it looks like the main idea consists of identifying shared prefixes so that these bytes wouldn't have to be compared over and over again. Maybe something we can try out next. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #875: LUCENE-10560: Speed up OrdinalMap construction a bit.
jpountz commented on code in PR #875: URL: https://github.com/apache/lucene/pull/875#discussion_r870010670 ## lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java: ## @@ -48,10 +49,69 @@ public class OrdinalMap implements Accountable { // need it // TODO: use more efficient packed ints structures? + /** + * Copy the first 8 bytes of the given term as a comparable unsigned long. In case the term has + * less than 8 bytes, missing bytes will be replaced with zeroes. Note that two terms that produce + * the same long could still be different due to the fact that missing bytes are replaced with + * zeroes, e.g. {@code [1, 0]} and {@code [1]} get mapped to the same long. + */ + static long prefix8ToComparableUnsignedLong(BytesRef term) { +// Use Big Endian so that longs are comparable +if (term.length >= Long.BYTES) { + return (long) BitUtil.VH_BE_LONG.get(term.bytes, term.offset); +} else { + long l; + int offset; + if (term.length >= Integer.BYTES) { +l = (int) BitUtil.VH_BE_INT.get(term.bytes, term.offset); +offset = Integer.BYTES; + } else { +l = 0; +offset = 0; + } + while (offset < term.length) { +l = (l << 8) | Byte.toUnsignedLong(term.bytes[term.offset + offset]); +offset++; + } + l <<= (Long.BYTES - term.length) << 3; + return l; +} + } + + private static int compare(BytesRef termA, long prefix8A, BytesRef termB, long prefix8B) { +assert prefix8A == prefix8ToComparableUnsignedLong(termA); Review Comment: I added a TODO to look into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mocobeta commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870023604 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -377,10 +377,13 @@ private static class FieldEntry { for (int level = 0; level < numLevels; level++) { if (level == 0) { graphOffsetsByLevel[level] = 0; +} else if (level == 1) { + int numNodesOn0Level = size; Review Comment: minor: `numNodesOnLevel0` might be clearer at the first glance (and consistent with other parts)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mocobeta commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870030770 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -43,7 +43,8 @@ public final class HnswGraphBuilder { /** Random seed for level generation; public to expose for testing * */ public static long randSeed = DEFAULT_RAND_SEED; - private final int maxConn; + private final int M; // max number of connections on upper layers + private final int maxConn0; // max number of connections on the 0th (last) layer Review Comment: I'd agree with this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mocobeta commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870035060 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -31,6 +31,7 @@ public final class OnHeapHnswGraph extends HnswGraph { private final int maxConn; + private final int maxConn0; Review Comment: I guess we could have just `M` here too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
jpountz merged PR #780: URL: https://github.com/apache/lucene/pull/780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
[ https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534783#comment-17534783 ] ASF subversion and git services commented on LUCENE-10496: -- Commit e49708e01da38c2f3d8ef8ac7e7c9198e26bf867 in lucene's branch refs/heads/main from xiaoping [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e49708e01da ] LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction (#780) > avoid unnecessary attempts to evaluate skipping doc if index sort and search > sort are in opposite direction > --- > > Key: LUCENE-10496 > URL: https://issues.apache.org/jira/browse/LUCENE-10496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: jianping weng >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > users ofter write doc with indexSorting in one direction(asc or desc) , but > need to search top docs both in two direction (asc and desc) > if index sort and search sort are in opposite direction, > *NumericLeafComparator* needn't to check if can skip non-competitive doc > inside one segments, because the rest docs are all competitive. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
[ https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10496. --- Fix Version/s: 9.2 Resolution: Fixed > avoid unnecessary attempts to evaluate skipping doc if index sort and search > sort are in opposite direction > --- > > Key: LUCENE-10496 > URL: https://issues.apache.org/jira/browse/LUCENE-10496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > users ofter write doc with indexSorting in one direction(asc or desc) , but > need to search top docs both in two direction (asc and desc) > if index sort and search sort are in opposite direction, > *NumericLeafComparator* needn't to check if can skip non-competitive doc > inside one segments, because the rest docs are all competitive. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
[ https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534787#comment-17534787 ] ASF subversion and git services commented on LUCENE-10496: -- Commit 6a973cfa269b42f4a77f41a70bdab387bfa37bf9 in lucene's branch refs/heads/branch_9x from xiaoping [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6a973cfa269 ] LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction (#780) > avoid unnecessary attempts to evaluate skipping doc if index sort and search > sort are in opposite direction > --- > > Key: LUCENE-10496 > URL: https://issues.apache.org/jira/browse/LUCENE-10496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > users ofter write doc with indexSorting in one direction(asc or desc) , but > need to search top docs both in two direction (asc and desc) > if index sort and search sort are in opposite direction, > *NumericLeafComparator* needn't to check if can skip non-competitive doc > inside one segments, because the rest docs are all competitive. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
[ https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534786#comment-17534786 ] ASF subversion and git services commented on LUCENE-10496: -- Commit 54595611aefb513f3f47a48a38caf70a4dddc701 in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=54595611aef ] LUCENE-10496: CHANGES entry. > avoid unnecessary attempts to evaluate skipping doc if index sort and search > sort are in opposite direction > --- > > Key: LUCENE-10496 > URL: https://issues.apache.org/jira/browse/LUCENE-10496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > users ofter write doc with indexSorting in one direction(asc or desc) , but > need to search top docs both in two direction (asc and desc) > if index sort and search sort are in opposite direction, > *NumericLeafComparator* needn't to check if can skip non-competitive doc > inside one segments, because the rest docs are all competitive. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10496) avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction
[ https://issues.apache.org/jira/browse/LUCENE-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534788#comment-17534788 ] ASF subversion and git services commented on LUCENE-10496: -- Commit 3b36d85966ebb399d2130cb66cb8b40a72440f85 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3b36d85966e ] LUCENE-10496: CHANGES entry. > avoid unnecessary attempts to evaluate skipping doc if index sort and search > sort are in opposite direction > --- > > Key: LUCENE-10496 > URL: https://issues.apache.org/jira/browse/LUCENE-10496 > Project: Lucene - Core > Issue Type: New Feature >Reporter: jianping weng >Priority: Major > Fix For: 9.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > users ofter write doc with indexSorting in one direction(asc or desc) , but > need to search top docs both in two direction (asc and desc) > if index sort and search sort are in opposite direction, > *NumericLeafComparator* needn't to check if can skip non-competitive doc > inside one segments, because the rest docs are all competitive. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #876: LUCENE-9356: Change test to detect mismatched checksums instead of byte flips.
jpountz commented on PR #876: URL: https://github.com/apache/lucene/pull/876#issuecomment-1123706148 I removed the dependency on LineFileDocs. Interestingly, this test caught an issue with vectors, which don't close index inputs on all paths. cc @mayya-sharipova since there are in-progress PRs that make changes to vectors formats -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #873: LUCENE-10397: KnnVectorQuery doesn't tie break by doc ID
jpountz commented on PR #873: URL: https://github.com/apache/lucene/pull/873#issuecomment-1123721194 Is it possible to somehow encode longs differently in the reverse case, so that we don't have to customize the comparison function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9409) TestAllFilesDetectTruncation failures
[ https://issues.apache.org/jira/browse/LUCENE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534878#comment-17534878 ] Robert Muir commented on LUCENE-9409: - the test also doesn't account for the case that you might truncate and happen to have CODEC_MAGIC bytes at the right place... > TestAllFilesDetectTruncation failures > - > > Key: LUCENE-9409 > URL: https://issues.apache.org/jira/browse/LUCENE-9409 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > The Elastic CI found a seed that reproducibly fails > TestAllFilesDetectTruncation. > https://elasticsearch-ci.elastic.co/job/apache+lucene-solr+nightly+branch_8x/85/console > This is a consequence of LUCENE-9396: we now check for truncation after > creating slices, so in some cases you would get an IndexOutOfBoundsException > rather than CorruptIndexException/EOFException if out-of-bounds slices get > created. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand closed pull request #861: LUCENE-10551: switch to PUAFIF
mikemccand closed pull request #861: LUCENE-10551: switch to PUAFIF URL: https://github.com/apache/lucene/pull/861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #877: URL: https://github.com/apache/lucene/pull/877#issuecomment-1123806812 @LuXugang Thanks for opening this PR. Is this a copy of https://github.com/apache/lucene/tree/vectors-disi-direct? I thought we can just open a PR of this branch against `main` branch, like [this](https://github.com/apache/lucene/compare/vectors-disi-direct?expand=1), since we have already approved everything in `vectors-disi-direct` branch, it would be quite easy for use to approve this new PR? WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request, #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang opened a new pull request, #880: URL: https://github.com/apache/lucene/pull/880 follow up of https://github.com/apache/lucene/pull/792 and https://github.com/apache/lucene/pull/870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang closed pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang closed pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc URL: https://github.com/apache/lucene/pull/877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #877: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #877: URL: https://github.com/apache/lucene/pull/877#issuecomment-1123826295 Thanks @mayya-sharipova, I just got to learn this new git operation, see https://github.com/apache/lucene/pull/880, this PR will be close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534927#comment-17534927 ] Uwe Schindler commented on LUCENE-10551: I think you should also open a bug report in GraalVM. > LowercaseAsciiCompression should return false when it's unable to compress > -- > > Key: LUCENE-10551 > URL: https://issues.apache.org/jira/browse/LUCENE-10551 > Project: Lucene - Core > Issue Type: Bug > Environment: Lucene version 8.11.1 >Reporter: Peixin Li >Priority: Major > Attachments: LUCENE-10551-test.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > {code:java} > Failed to commit.. > java.lang.IllegalStateException: 10 <> 5 > cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion > cloud gen2tion instance - dev1tion instance - > testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o > at > org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} > {code:java} > key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow, > resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, > domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1}) > java.lang.IllegalStateException: 29 <> 16 > analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a5f6e74d6394a8641fdb899boke-public-tiller@sha256:c2eb6e580123622e1bc0ff3becae3a3a71ac36c98a2786d780590197839175e5osms/opcbuild-osms-agent-proxy-java:
[jira] [Created] (LUCENE-10567) Can LongDistanceFeatureQuery benefit from better sampling technique to evaluate iterator for competitive hits
Mayya Sharipova created LUCENE-10567: Summary: Can LongDistanceFeatureQuery benefit from better sampling technique to evaluate iterator for competitive hits Key: LUCENE-10567 URL: https://issues.apache.org/jira/browse/LUCENE-10567 Project: Lucene - Core Issue Type: Improvement Reporter: Mayya Sharipova LUCENE-10496 introduced an improvement in sampling technique to evaluate if we can iterate over a subset of points instead of doc values in sorting. This original code was inspired by how [LongDistanceFeatureQuery|https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/LongDistanceFeatureQuery.java#L361] computes competitive hits. We should investigate if the same improvement in sampling technique can benefit LongDistanceFeatureQuery as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #880: URL: https://github.com/apache/lucene/pull/880#issuecomment-1123836523 Hi, @mayya-sharipova , such merge operation is new to me, I am not sure if I could add an entry to CHANGES.txt correctly, so I could add entry after this PR merged or could you be nice to help me out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #880: URL: https://github.com/apache/lucene/pull/880#issuecomment-1123863575 @LuXugang Right, you can add an entry to CHANGES.txt after this PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870463506 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -68,42 +69,43 @@ public final class HnswGraphBuilder { * * @param vectors the vectors whose relations are represented by the graph - must provide a * different view over those vectors than the one used to add via addGraphNode. - * @param maxConn the number of connections to make when adding a new graph node; roughly speaking - * the graph fanout. + * @param M the number of connections to make when adding a new graph node; roughly speaking the Review Comment: Addressed in d16168f77cbeae581a093daa53113bf897ab8e31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870464398 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -43,7 +43,8 @@ public final class HnswGraphBuilder { /** Random seed for level generation; public to expose for testing * */ public static long randSeed = DEFAULT_RAND_SEED; - private final int maxConn; + private final int M; // max number of connections on upper layers + private final int maxConn0; // max number of connections on the 0th (last) layer Review Comment: Addressed in d16168f77cbeae581a093daa53113bf897ab8e31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870464781 ## lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java: ## @@ -256,10 +256,11 @@ public void testSearchWithSelectiveAcceptOrds() throws IOException { public void testSearchWithSkewedAcceptOrds() throws IOException { int nDoc = 1000; +int maxConn = 16; Review Comment: Good suggestion, I've decided to pull out this parameter. Addressed in d16168f77cbeae581a093daa53113bf897ab8e31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870464973 ## lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java: ## @@ -377,10 +377,13 @@ private static class FieldEntry { for (int level = 0; level < numLevels; level++) { if (level == 0) { graphOffsetsByLevel[level] = 0; +} else if (level == 1) { + int numNodesOn0Level = size; Review Comment: Good suggestion! Addressed in d16168f77cbeae581a093daa53113bf897ab8e31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on code in PR #872: URL: https://github.com/apache/lucene/pull/872#discussion_r870466493 ## lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java: ## @@ -31,6 +31,7 @@ public final class OnHeapHnswGraph extends HnswGraph { private final int maxConn; + private final int maxConn0; Review Comment: Addressed in d16168f77cbeae581a093daa53113bf897ab8e31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #872: LUCENE-10527 Use 2*maxConn for last layer in HNSW
mayya-sharipova commented on PR #872: URL: https://github.com/apache/lucene/pull/872#issuecomment-1123937553 @jtibshirani @mocobeta Thanks for your review. I've addressed your latest comments in d16168f77cbeae581a093daa53113bf897ab8e31. The plan for merging this PR is following: once [other changes](https://github.com/apache/lucene/pull/880) to vectors' format are merged to `main`, I will change this PR based on `main`, that all these changes will go to `Lucene92` vector format files instead of the current `Lucene91` vector format files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova merged PR #880: URL: https://github.com/apache/lucene/pull/880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
[ https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535028#comment-17535028 ] ASF subversion and git services commented on LUCENE-10502: -- Commit 6040d1648f6e30107086c1f9a159c1979498fb4e in lucene's branch refs/heads/main from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6040d1648f6 ] LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc (#880) Currently vector's all docs of all fields are fully loaded into memory (for sparse cases). This happens not only when we do vector search, but also when we open an index to load meta info for vector readers. This patch instead uses IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc mapping. Benefits are reduced memory usage, and faster loading of meta info for vector readers. > Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle > ordToDoc > > > Key: LUCENE-10502 > URL: https://issues.apache.org/jira/browse/LUCENE-10502 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Priority: Major > Time Spent: 13h 50m > Remaining Estimate: 0h > > Since at search phase, vector's all docs of all fields will be fully loaded > into memory, could we use IndexedDISI to store docIds and > DirectMonotonicWriter/Reader to handle ordToDoc mapping? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
mayya-sharipova commented on PR #880: URL: https://github.com/apache/lucene/pull/880#issuecomment-1124044401 @LuXugang I've merged your PR to `main`. Other things we need to do: 1) Please create a PR against `main` add an entry to CHANGES.txt under the `Lucene 9.2.0` release. 2) After 1) is merged, we need to cherry-pick all these changes to 9.x branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang opened a new pull request, #881: LUCENE-10502: add entry
LuXugang opened a new pull request, #881: URL: https://github.com/apache/lucene/pull/881 Add entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #880: LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
LuXugang commented on PR #880: URL: https://github.com/apache/lucene/pull/880#issuecomment-1124111322 @mayya-sharipova , @jtibshirani Thanks for your reviews! >Please create a PR against main add an entry to CHANGES.txt under the Lucene 9.2.0 release. Addressed in https://github.com/apache/lucene/pull/881. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a diff in pull request #881: LUCENE-10502: add entry
mayya-sharipova commented on code in PR #881: URL: https://github.com/apache/lucene/pull/881#discussion_r870677297 ## lucene/CHANGES.txt: ## @@ -149,6 +149,9 @@ Optimizations when the search order and the index order are in opposite directions. (Jianping Weng) +* LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle + ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang) Review Comment: You don't need to add our names (mine and Julie, as we were reviewers), and in any way please put your name first as the main contributor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10551) LowercaseAsciiCompression should return false when it's unable to compress
[ https://issues.apache.org/jira/browse/LUCENE-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535093#comment-17535093 ] Michael McCandless commented on LUCENE-10551: - +1 to get to the bottom of the GraalVM mis-compilation. And also +1 if we can find a simple code change that's low risk / performance impact to other JVM users and could side-step this bug. > LowercaseAsciiCompression should return false when it's unable to compress > -- > > Key: LUCENE-10551 > URL: https://issues.apache.org/jira/browse/LUCENE-10551 > Project: Lucene - Core > Issue Type: Bug > Environment: Lucene version 8.11.1 >Reporter: Peixin Li >Priority: Major > Attachments: LUCENE-10551-test.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > {code:java} > Failed to commit.. > java.lang.IllegalStateException: 10 <> 5 > cion1cion_desarrollociones_oraclecionesnaturacionesnatura2tedppsa-integrationdemotiontion > cloud gen2tion instance - dev1tion instance - > testtion-devbtion-instancetion-prdtion-promerication-qation064533tion535217tion697401tion761348tion892818tion_matrationcauto_simmonsintgic_testtioncloudprodictioncloudservicetiongateway10tioninstance-jtsundatamartprd??o > at > org.apache.lucene.util.compress.LowercaseAsciiCompression.compress(LowercaseAsciiCompression.java:115) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:834) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:628) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:947) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:912) > at > org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318) > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:170) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:120) > at > org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:267) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) > at > org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3364) > at > org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3770) > at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3728) > {code} > {code:java} > key=och-live--WorkResource.renewAssignmentToken.ResourceTime[namespace=workflow, > resourceGroup=workflow-service-overlay]{availabilityDomain=iad-ad-1, > domainId=och-live, host=workflow-service-overlay-01341.node.ad1.us-ashburn-1}) > java.lang.IllegalStateException: 29 <> 16 > analytics-platform-test/koala/cluster-tool:1.0-20220310151438.492,mesh_istio_examples-bookinfo-details-v1:1.16.2mesh_istio_examples-bookinfo-reviews-v3:1.16.2oce-clamav:1.0.219oce-tesseract:1.0.7oce-traefik:2.5.1oci-opensearch:1.2.4.8.103oda-digital-assistant-control-plane-train-pool-workflow-v6:22.02.14oke-coresvcs-k8s-dns-dnsmasq-nanny-amd64@sha256:41aa9160ceeaf712369ddb660d02e5ec06d1679965e6930351967c8cf5ed62d4oke-coresvcs-k8s-dns-kube-dns-amd64@sha256:2cf34b04106974952996c6ef1313f165ce65b4ad68a3051f51b1b8f91ba5f838oke-coresvcs-k8s-dns-sidecar-amd64@sha256:8a82c7288725cb4de9c7cd8d5a78279208e379f35751539b406077f9a3163dcdoke-coresvcs-node-problem-detector@sha256:9d54df11804a862c54276648702a45a6a0027a9d930a86becd69c34cc84bf510oke-coresvcs-oke-fluentd-lumberjack@sha256:5f3f10b187eb804ce4e84bc3672de1cf318c0f793f00dac01cd7da8beea8f269oke-etcd-operator@sha256:4353a2e5ef02bb0f6b046a8d6219b1af359a2c1141c358ff110e395f29d0bfc8oke-oke-hyperkube-amd64@sha256:3c734f46099400507f938090eb9a874338fa25cde425ac9409df4c885759752foke-public-busybox@sha256:4cee1979ba0bf7db9fc5d28fb7b798ca69ae95a47c5fecf46327720df4ff352doke-public-coredns@sha256:86f8cfc74497f04e181ab2e1d26d2fd8bd46c4b33ce24b55620efcdfcb214670oke-public-coredns@sha256:8cd974302f1f6108f6f31312f8181ae723b514e2022089cdcc3db10666c49228oke-public-etcd@sha256:b751e459bc2a8f079f6730dd8462671b253c7c8b0d0eb47c67888d5091c6bb77oke-public-etcd@sha256:d6a76200a6e9103681bc2cf7fefbcada0dd9372d52cf8964178d846b89959d14oke-public-etcd@sha256:fa056479342b45479ac74c58176ddad43687d5fc295375d705808f9dfb48439aoke-public-kube-proxy@sha256:93b2da69d03413671606e22294c59a69fe404088a
[GitHub] [lucene] jtibshirani opened a new pull request, #882: LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage
jtibshirani opened a new pull request, #882: URL: https://github.com/apache/lucene/pull/882 Before, it didn't update the estimated memory usage, so calls to ramBytesUsed could be totally off. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shahrs87 opened a new pull request, #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes
shahrs87 opened a new pull request, #883: URL: https://github.com/apache/lucene/pull/883 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `main` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shahrs87 commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes
shahrs87 commented on PR #883: URL: https://github.com/apache/lucene/pull/883#issuecomment-1124385828 ``` apache-lucene % ./gradlew check > Task :lucene:analysis:common:generateClassicTokenizerChecksumCheck FAILED FAILURE: Build failed with an exception. * Where: Script '/Users/rushabh.shah/apache-lucene/gradle/generation/regenerate.gradle' line: 184 * What went wrong: Execution failed for task ':lucene:analysis:common:generateClassicTokenizerChecksumCheck'. > Checksums mismatch for derived resources; you might have modified a generated resource (regenerate task: generateClassicTokenizer): Current: lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=c825a8b8d0d0d893b4914e7161bcd119e7b07b40 Expected: lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=381a9627fd7da6402216e3279cf81a09af222aaf ``` When I am running `./gradlew check`, I am getting the above error. Looks like I need to update checksum in some file. Do we have any graddle target to achieve that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535139#comment-17535139 ] Rushabh Shah commented on LUCENE-10561: --- Hi, This is my first PR so apologies if I ask dumb questions. I created PR for this change. When I am running ./gradlew check, it is throwing error below. {noformat} apache-lucene % ./gradlew check > Task :lucene:analysis:common:generateClassicTokenizerChecksumCheck FAILED FAILURE: Build failed with an exception. * Where: Script '/Users/rushabh.shah/apache-lucene/gradle/generation/regenerate.gradle' line: 184 * What went wrong: Execution failed for task ':lucene:analysis:common:generateClassicTokenizerChecksumCheck'. > Checksums mismatch for derived resources; you might have modified a generated > resource (regenerate task: generateClassicTokenizer): Current: lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=c825a8b8d0d0d893b4914e7161bcd119e7b07b40 Expected: lucene/analysis/common/src/java/org/apache/lucene/analysis/classic/ClassicTokenizerImpl.java=381a9627fd7da6402216e3279cf81a09af222aaf {noformat} I understand that I need to update checksum somewhere but don't know whether we have some script that does that or we have to compute manually. Please advise. [~rcmuir] [~tomoko] > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #881: LUCENE-10502: add entry
LuXugang commented on code in PR #881: URL: https://github.com/apache/lucene/pull/881#discussion_r870839712 ## lucene/CHANGES.txt: ## @@ -149,6 +149,9 @@ Optimizations when the search order and the index order are in opposite directions. (Jianping Weng) +* LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle + ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang) Review Comment: Got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #881: LUCENE-10502: add entry
LuXugang commented on code in PR #881: URL: https://github.com/apache/lucene/pull/881#discussion_r870839898 ## lucene/CHANGES.txt: ## @@ -149,6 +149,9 @@ Optimizations when the search order and the index order are in opposite directions. (Jianping Weng) +* LUCENE-10502: Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle + ordToDoc in HNSW vectors (Mayya Sharipova, Julie Tibshirani, Lu Xugang) Review Comment: @mayya-sharipova Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova merged pull request #881: LUCENE-10502: add entry
mayya-sharipova merged PR #881: URL: https://github.com/apache/lucene/pull/881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10502) Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle ordToDoc
[ https://issues.apache.org/jira/browse/LUCENE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535804#comment-17535804 ] ASF subversion and git services commented on LUCENE-10502: -- Commit a06460a5380234fbb3335b058e86f8fa64e277d4 in lucene's branch refs/heads/main from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a06460a5380 ] LUCENE-10502: add changes entry (#881) > Use IndexedDISI to store docIds and DirectMonotonicWriter/Reader to handle > ordToDoc > > > Key: LUCENE-10502 > URL: https://issues.apache.org/jira/browse/LUCENE-10502 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: 9.1 >Reporter: Lu Xugang >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > Since at search phase, vector's all docs of all fields will be fully loaded > into memory, could we use IndexedDISI to store docIds and > DirectMonotonicWriter/Reader to handle ordToDoc mapping? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535830#comment-17535830 ] Robert Muir commented on LUCENE-10561: -- Some of the tokenizers are auto-generated. For example ClassicTokenizerImpl is generated by "jflex" tool from ClassicTokenizer.jflex sources. The build is just failing because the wrong file was changed. Personally I recommend avoiding ClasicTokenizerImpl or similar for this issue, because these particular generated tokenizers are very complicated. It is hard to reduce the member/class visibility due to the way the code generation works. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes
mocobeta commented on PR #883: URL: https://github.com/apache/lucene/pull/883#issuecomment-1124499151 Hi @shahrs87. First of all, thanks for the great PR! As for the failed checksum check for `ClassicTokenizerImpl`, please refer to Robert's comment in Jira - I think you can omit `ClassicTokenizerImpl` for now and we can improve it in another issue/PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shahrs87 commented on pull request #883: LUCENE-10561 Reduce class/member visibility of all normalizer and stemmer classes
shahrs87 commented on PR #883: URL: https://github.com/apache/lucene/pull/883#issuecomment-1124521767 @mocobeta Thank you the reply. I have removed the classes that are generated via jflex. I also ran all the workflow test mentioned in Contributors guide and all seems to pass. Can you please review the PR ? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535858#comment-17535858 ] Rushabh Shah commented on LUCENE-10561: --- Thank you [~rcmuir] for the comment. I have updated the PR as per your suggestions. Please review. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535859#comment-17535859 ] Rushabh Shah commented on LUCENE-10561: --- [~rcmuir] Also can you please assign the Jira to me. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535862#comment-17535862 ] Robert Muir commented on LUCENE-10561: -- I think i didn't explain my suggestion well enough. When i said "make entire classes package private", I mean that, for example, ArabicNormalizer does not need to be public class anymore. I think this is easier than modifying many individual constants. Some of the constants modified in the PR are not related to stemmers/normalizers and may cause problems. Generally speaking, public constants are harmless, so I don't think there is much benefit in hiding them. But entire classes that need not be public, that is worth fixing because it clutters up javadoc and API for no good reason. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535865#comment-17535865 ] Rushabh Shah commented on LUCENE-10561: --- Thank you [~rcmuir] for the clarification. One followup question. Should I change only the classes that are Normalizer or Stemmer in this PR and not touch other classes ? > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535866#comment-17535866 ] Tomoko Uchida commented on LUCENE-10561: {quote}Also can you please assign the Jira to me. {quote} I don't think we can assign external contributors. Don't worry about it - it's not so important thing; A CHANGE entry is sufficient for us to track who changes it. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535870#comment-17535870 ] Tomoko Uchida commented on LUCENE-10561: {quote}Generally speaking, public constants are harmless, so I don't think there is much benefit in hiding them. {quote} There are some public static constants their type is Array - {{char[][]}}. I thought it'd be safe to make them private but is making the entire class package-private sufficient? > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535872#comment-17535872 ] Tomoko Uchida commented on LUCENE-10561: {quote}One followup question. Should I change only the classes that are Normalizer or Stemmer in this PR and not touch other classes ? {quote} I'd agree with that. It'd be better not to touch Tokenizer/TokenFilters in this issue. It's another issue and changing them will break users' code; we'd need to be careful about it. > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10561) Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and PersianNormalizer
[ https://issues.apache.org/jira/browse/LUCENE-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535874#comment-17535874 ] Tomoko Uchida commented on LUCENE-10561: (So... we need to switch Jira/GitHub even such a relatively small issue, and conversation is scattered over both of them. This is a good motivation for LUCENE-10557 to me.) > Reduce class/member visibility of ArabicStemmer, ArabicNormalizer, and > PersianNormalizer > > > Key: LUCENE-10561 > URL: https://issues.apache.org/jira/browse/LUCENE-10561 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Tomoko Uchida >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a spin-off of [LUCENE-10312]. > Constants and methods in those classes are exposed to the outside packages; > we should be able to limit the visibility to {{private}} or, at least to > {{package private}}. > This change breaks backward compatibility so should be applied to the main > branch (10.0) only, and a MIGRATE entry may be needed. > Also, they seem unchanged since 2008, we could refactor them to embrace newer > Java APIs as we did in [https://github.com/apache/lucene/pull/540]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn merged pull request #833: LUCENE-10411: Add NN vectors support to ExitableDirectoryReader
zacharymorn merged PR #833: URL: https://github.com/apache/lucene/pull/833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10411) Add NN vectors support to ExitableDirectoryReader
[ https://issues.apache.org/jira/browse/LUCENE-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535875#comment-17535875 ] ASF subversion and git services commented on LUCENE-10411: -- Commit 96036bca9f667edbdc528bfe95eeb2795526e9fa in lucene's branch refs/heads/main from zacharymorn [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=96036bca9f6 ] LUCENE-10411: Add NN vectors support to ExitableDirectoryReader (#833) > Add NN vectors support to ExitableDirectoryReader > - > > Key: LUCENE-10411 > URL: https://issues.apache.org/jira/browse/LUCENE-10411 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > This is currently unsupported. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535898#comment-17535898 ] Shad Storhaug commented on LUCENE-10557: The Lucene.NET project switched to GitHub issues 3 years ago, and we were able to get far more contributor participation than on JIRA, but that may just be mostly a .NET thing. We held a vote and all participants were unanimous on the GitHub Issues migration. It was pretty straightforward for INFRA to do. We made a script to migrate the open issues to GitHub via API, but there were a couple of issues we had that you may need to find a workaround for: # The issues were not added to GitHub in chronological order. # Since INFRA wouldn't give us permission to the API, we gave the script to them and the issues were added to GitHub with the INFRA tech as the owner instead of the original person who submitted the issue. There is a walkthrough of the process here: https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22 We didn't disable JIRA support, but we didn't really have to because we couldn't get users to participate on it, anyway. But that is something you will also need to take into consideration. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #882: LUCENE-10564: Make sure SparseFixedBitSet#or updates memory usage
jpountz commented on code in PR #882: URL: https://github.com/apache/lucene/pull/882#discussion_r871000644 ## lucene/core/src/test/org/apache/lucene/util/TestSparseFixedBitSet.java: ## @@ -71,4 +71,27 @@ public void testApproximateCardinalityOnDenseSet() { } assertEquals(numDocs, set.approximateCardinality()); } + + public void testRamBytesUsed() throws IOException { +int size = 1000 + random().nextInt(1); +BitSet original = new SparseFixedBitSet(size); +for (int i = 0; i < 3; i++) { + original.set(random().nextInt(size)); +} +assertTrue(original.ramBytesUsed() > 0); + +BitSet copy = copyOf(original, size); +BitSet otherBitSet = new SparseFixedBitSet(size); +int interval = 10 + random().nextInt(100); +for (int i = 0; i < size; i += interval) { + otherBitSet.set(i); +} +copy.or(new BitSetIterator(otherBitSet, size)); +assertTrue(copy.ramBytesUsed() > original.ramBytesUsed()); + +copy = copyOf(original, size); +copy.or(DocIdSetIterator.all(size)); +assertTrue(copy.ramBytesUsed() > original.ramBytesUsed()); +assertTrue(copy.ramBytesUsed() > size / Byte.SIZE); Review Comment: Would it also make sense to test that copying a SparseFixedBitSet via `SparseFixedBitSet#or` gives an instance that has the same `ramBytesUsed`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535898#comment-17535898 ] Shad Storhaug edited comment on LUCENE-10557 at 5/12/22 6:40 AM: - The Lucene.NET project switched to GitHub issues 3 years ago, and we were able to get far more contributor participation than on JIRA, but that may just be mostly a .NET thing. We held a vote and all participants were unanimous on the GitHub Issues migration. It was pretty straightforward for INFRA to do. We made a script to migrate the open issues to GitHub via API, but there were a couple of issues we had that you may need to find a workaround for: # The issues were not added to GitHub in chronological order. # Since INFRA wouldn't give us permission to the API, we gave the script to them and the issues were added to GitHub with the INFRA tech as the owner instead of the original person who submitted the issue. There is a walkthrough of the process here: https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22 The INFRA ticket for this is: https://issues.apache.org/jira/browse/INFRA-20118 We didn't disable JIRA support, but we didn't really have to because we couldn't get users to participate on it, anyway. But that is something you will also need to take into consideration. was (Author: nightowl888): The Lucene.NET project switched to GitHub issues 3 years ago, and we were able to get far more contributor participation than on JIRA, but that may just be mostly a .NET thing. We held a vote and all participants were unanimous on the GitHub Issues migration. It was pretty straightforward for INFRA to do. We made a script to migrate the open issues to GitHub via API, but there were a couple of issues we had that you may need to find a workaround for: # The issues were not added to GitHub in chronological order. # Since INFRA wouldn't give us permission to the API, we gave the script to them and the issues were added to GitHub with the INFRA tech as the owner instead of the original person who submitted the issue. There is a walkthrough of the process here: https://gist.github.com/jonmagic/5282384165e0f86ef105#start-an-issue-import%22 We didn't disable JIRA support, but we didn't really have to because we couldn't get users to participate on it, anyway. But that is something you will also need to take into consideration. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org