[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)
uschindler commented on PR #912: URL: https://github.com/apache/lucene/pull/912#issuecomment-1140776739 Thanks. Here we only need to compile a specific exact version (19). So for the MR-JAR task this should be fine with auto-discovery. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)
uschindler commented on PR #912: URL: https://github.com/apache/lucene/pull/912#issuecomment-1140807127 Thanks @mocobeta, I just need to not forget to reenable in September when this gets merged (the we van also use Autoprovision of JDK 19). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543849#comment-17543849 ] Jan Høydahl commented on LUCENE-10557: -- Thanks for thorough research! I propose to skip bulk migration of Jira issues and instead bulk comment on all open JIRAs, prompting the reporter or assignee to take whatever action they see fit. Some will choose to migrate, others will finalize the feature in JIRA, quite some will probably be closed as won't do, and some will just remain open/stale/don't care. No history will be lost, we can still refer and link to historic Jira issues, as long as ASF keeps Jira alive. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy merged pull request #2661: SOLR-16213 Upgrade Jackson to version 2.13.3
janhoy merged PR #2661: URL: https://github.com/apache/lucene-solr/pull/2661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this
[ https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543852#comment-17543852 ] ASF subversion and git services commented on LUCENE-10574: -- Commit 318177af83efc99b6c05412cc8ef0ade15c92f6c in lucene's branch refs/heads/main from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=318177af83e ] LUCENE-10574: Fix TestTieredMergePolicy's expectations about the segment count. > Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't > do this > --- > > Key: LUCENE-10574 > URL: https://issues.apache.org/jira/browse/LUCENE-10574 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Fix For: 9.3 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge > policy that doesn't merge in an O(n^2) way. > I have the feeling it might have to be the latter, as folks seem really wed > to this crazy O(n^2) behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this
[ https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543851#comment-17543851 ] ASF subversion and git services commented on LUCENE-10574: -- Commit 4b63460d2de17a2a90ef560f5ca9035b7ad1fa08 in lucene's branch refs/heads/branch_9x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4b63460d2de ] LUCENE-10574: Fix TestTieredMergePolicy's expectations about the segment count. > Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't > do this > --- > > Key: LUCENE-10574 > URL: https://issues.apache.org/jira/browse/LUCENE-10574 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Fix For: 9.3 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge > policy that doesn't merge in an O(n^2) way. > I have the feeling it might have to be the latter, as folks seem really wed > to this crazy O(n^2) behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543885#comment-17543885 ] Deepika Sharma commented on LUCENE-10151: - I have raised a PR for the `BulkScorer` approach. Please take a look and let me know your thoughts. > Add timeout support to IndexSearcher > > > Key: LUCENE-10151 > URL: https://issues.apache.org/jira/browse/LUCENE-10151 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > I'd like to explore adding optional "timeout" capabilities to > {{IndexSearcher}}. This would enable users to (optionally) specify a maximum > time budget for search execution. If the search "times out", partial results > would be available. > This idea originated on the dev list (thanks [~jpountz] for the suggestion). > Thread for reference: > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E] > > A couple things to watch out for with this change: > # We want to make sure it's robust to a two-phase query evaluation scenario > where the "approximate" step matches a large number of candidates but the > "confirmation" step matches very few (or none). This is a particularly tricky > case. > # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is > {{GREATER_THAN_OR_EQUAL_TO}} if the query times out > # We want to make sure it plays nice with the {{LRUCache}} since it iterates > the query to pre-populate a {{BitSet}} when caching. That step shouldn't be > allowed to overrun the timeout. The proper way to handle this probably needs > some thought. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10151) Add timeout support to IndexSearcher
[ https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543885#comment-17543885 ] Deepika Sharma edited comment on LUCENE-10151 at 5/30/22 11:08 AM: --- I have raised a PR for the {{BulkScorer}} approach. Please take a look and let me know your thoughts. was (Author: JIRAUSER288832): I have raised a PR for the `BulkScorer` approach. Please take a look and let me know your thoughts. > Add timeout support to IndexSearcher > > > Key: LUCENE-10151 > URL: https://issues.apache.org/jira/browse/LUCENE-10151 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Greg Miller >Priority: Minor > > I'd like to explore adding optional "timeout" capabilities to > {{IndexSearcher}}. This would enable users to (optionally) specify a maximum > time budget for search execution. If the search "times out", partial results > would be available. > This idea originated on the dev list (thanks [~jpountz] for the suggestion). > Thread for reference: > [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E] > > A couple things to watch out for with this change: > # We want to make sure it's robust to a two-phase query evaluation scenario > where the "approximate" step matches a large number of candidates but the > "confirmation" step matches very few (or none). This is a particularly tricky > case. > # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is > {{GREATER_THAN_OR_EQUAL_TO}} if the query times out > # We want to make sure it plays nice with the {{LRUCache}} since it iterates > the query to pre-populate a {{BitSet}} when caching. That step shouldn't be > allowed to overrun the timeout. The proper way to handle this probably needs > some thought. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543911#comment-17543911 ] Tomoko Uchida commented on LUCENE-10557: Hi [~janhoy], thanks for your suggestion. "Migrate no issues and start from fresh" is definitely an option, on the other hand, many issues that are worth revisiting may remain Jira forever - I feel a bit sorry for them. If unresolved issues are on GitHub (even just the title and description) from the start, possible contributors will be able to browse/search them. However, I think we can discuss it later. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543911#comment-17543911 ] Tomoko Uchida edited comment on LUCENE-10557 at 5/30/22 11:52 AM: -- Hi [~janhoy], thanks for your suggestion. "Migrate no issues and make a fresh start" is definitely an option, on the other hand, many issues that are worth revisiting may remain Jira forever - I feel a bit sorry for them. If unresolved issues are on GitHub (even just the title and description) from the start, possible contributors will be able to browse/search them. However, I think we can discuss it later. was (Author: tomoko uchida): Hi [~janhoy], thanks for your suggestion. "Migrate no issues and start from fresh" is definitely an option, on the other hand, many issues that are worth revisiting may remain Jira forever - I feel a bit sorry for them. If unresolved issues are on GitHub (even just the title and description) from the start, possible contributors will be able to browse/search them. However, I think we can discuss it later. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities
shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1141152098 Hey @mdmarshmallow I think this is a great and very useful feature. I also believe that in general it will be good to accompany these changes with a demo `main()` in the `demo` package, but it can wait a bit until we have a solid API. I've added to this PR an `.adoc` with few example use cases. IMO it will be useful to keep it around, but modify it of course per the feedback we receive, as a documentation of this feature. If for some reason we'd think that this document is redundant / will be hard to maintain and we'll want to stick with javadocs, I don't mind if in the end we'll delete it. For now I think it's a convenient place to document our thoughts, examples and APIs. I used the term `FacetSets` to denote "a set of values that go together". Other names may include `Tuple`, `Group` etc. I know naming is the hardest part :). In my mind I'm also thinking about an API like: ``` doc.add(new FacetSetsField( "actorAwards, // A Thriller for which this actor received a Best Actor Oscar award in 2022 new FacetSet(ord("Oscar"), ord("Best Actor"), ord("Thriller"), 2022), // A Drama for which this actor received a Best Supporting Actor Emmy award in 2005 new FacetSet(ord("Emmy"), ord("Best Supporting Actor"), ord("Drama"), 2005), )); ``` Yes, it could be just sugar API on top of `HyperRectangle` but perhaps from a faceting perspective might make more sense and consistent with the other faceting API (`RangeFacets`, `SSDVFacetField` etc.). I'd love to receive feedback on the use cases. I can also add to the document a more-than-pseudocode-like example which will include the indexing and aggregation API, so we have something more concrete to discuss? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1141164970 @mayya-sharipova , @msokolov I found out the reason the original tests are now failing. The tie-breaking by Lucene docId (smaller id wins) doesn't actually work right now(for the regular similarities) and not for Euclidean after my changes. The reason is the way we encode the heap value from the nodeId and the score: private long encode(int node, float score) { return order.applylong) NumericUtils.floatToSortableInt(score)) << 32) | node); } With this encoding, a higher node Id wins. Tests for cosine and dot-product are slighly different from the Euclidean and doesn't check tie-breaking. I am taking a look right now to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10597) Move globalMaxScore to MaxScoreCache from ImpactsDISI?
Tomoko Uchida created LUCENE-10597: -- Summary: Move globalMaxScore to MaxScoreCache from ImpactsDISI? Key: LUCENE-10597 URL: https://issues.apache.org/jira/browse/LUCENE-10597 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tomoko Uchida All max score calculations are done in {{MaxScoreCache}} except for {{globalMaxScore}}, that resides in {{ImpactsDISI}}. Perhaps it would be clearer to have this global max score value in {{MaxScoreCache}}? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta opened a new pull request, #931: LUCENE-10597: move globalMaxScore to MaxScoreCache
mocobeta opened a new pull request, #931: URL: https://github.com/apache/lucene/pull/931 ### Description (or a Jira issue link if you have one) See https://issues.apache.org/jira/browse/LUCENE-10597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a diff in pull request #931: LUCENE-10597: move globalMaxScore to MaxScoreCache
mocobeta commented on code in PR #931: URL: https://github.com/apache/lucene/pull/931#discussion_r884855095 ## lucene/core/src/java/org/apache/lucene/search/MaxScoreCache.java: ## @@ -80,6 +82,9 @@ int getLevel(int upTo) throws IOException { /** Return the maximum score for the given {@code level}. */ float getMaxScoreForLevel(int level) throws IOException { +if (level < 0) { Review Comment: The original expression was `level == -1`. "smaller than zero" might be better here since there is an assumption that `level` is greater or equal to zero, I have no strong opinion on it though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543968#comment-17543968 ] Tomoko Uchida commented on LUCENE-10557: One feature I couldn't find out in GitHub issue that is naturally equipped in Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie any number of sub-issues to it with Jira but cannot in GitHub; I imagine "GitHub Project" is used for such issue grouping purpose (haven't used it) though, it'd be too much for us I think. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543968#comment-17543968 ] Tomoko Uchida edited comment on LUCENE-10557 at 5/30/22 2:24 PM: - One feature I couldn't find out in GitHub issue that is naturally equipped in Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie any number of sub-issues to it with Jira but cannot with GitHub issue; I imagine "GitHub Project" is used for such issue grouping purpose (haven't used it) though, it'd be too much for us I think. was (Author: tomoko uchida): One feature I couldn't find out in GitHub issue that is naturally equipped in Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie any number of sub-issues to it with Jira but cannot in GitHub; I imagine "GitHub Project" is used for such issue grouping purpose (haven't used it) though, it'd be too much for us I think. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on code in PR #926: URL: https://github.com/apache/lucene/pull/926#discussion_r884908511 ## lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java: ## @@ -193,25 +208,40 @@ public void testAdvanceShallow() throws IOException { } try (IndexReader reader = DirectoryReader.open(d)) { IndexSearcher searcher = new IndexSearcher(reader); -KnnVectorQuery query = new KnnVectorQuery("field", new float[] {2, 3}, 3); +KnnVectorQuery query = new KnnVectorQuery("field", new float[] {0.5f, 1}, 3); Review Comment: I have reverted this and proposed a possible solution. The node ID and score encoding in the HEAP was broken for such edge cases in my opinion for the majority of similarity functions. Euclidean tests were green because of the reverse! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on code in PR #926: URL: https://github.com/apache/lucene/pull/926#discussion_r884909389 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java: ## @@ -56,9 +56,9 @@ long apply(long v) { // Whether the search stopped early because it reached the visited nodes limit private boolean incomplete; - public NeighborQueue(int initialSize, boolean reversed) { + public NeighborQueue(int initialSize, boolean descOrder) { Review Comment: fixed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543987#comment-17543987 ] Dawid Weiss commented on LUCENE-10557: -- I don't think this is a problem. You just create a description with a bullet list and reference related issues - they do show up in mentions, I think this is sufficient. > Migrate to GitHub issue from Jira? > -- > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * Get a consensus about the migration among committers > * Enable Github issue on the lucene's repository (currently, it is disabled > on it) > * Build the convention or rules for issue label/milestone management > * Choose issues that should be moved to GitHub (I think too old or obsolete > issues can remain Jira.) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?
[ https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543989#comment-17543989 ] Lu Xugang commented on LUCENE-10188: Hi, [~jpountz] , implementation of SortedNumericDocValues#docValueCount() means numbers of values indexed per doc, duplicated value was also participated in counting. Does SortedSetDocValues#docValueCount() has the same semantic ? Or maybe you means only calculate the number of unique values(ord)? > Give SortedSetDocValues a docValueCount()? > -- > > Key: LUCENE-10188 > URL: https://issues.apache.org/jira/browse/LUCENE-10188 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 10.0 (main), 9.2 > > > Theoretically SortedSetDocValues gives more options to codecs with regard to > how SORTED_SET doc values could store ords. However in practice we currently > always store counts. Maybe giving SORTED_SET doc values an API that is closer > to the API of SORTED_NUMERIC doc values would be a better trade-off? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero
Lu Xugang created LUCENE-10598: -- Summary: SortedSetDocValues#docValueCount() should be always greater than zero Key: LUCENE-10598 URL: https://issues.apache.org/jira/browse/LUCENE-10598 Project: Lucene - Core Issue Type: Bug Reporter: Lu Xugang This test runs failed. {code:java} public void testDocValueCount() throws IOException { try (Directory d = newDirectory()) { try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) { for (int j = 0; j < 1; j++) { Document doc = new Document(); doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); doc.add(new SortedSetDocValuesField("field", new BytesRef("b"))); w.addDocument(doc); } } try (IndexReader reader = DirectoryReader.open(d)) { assertEquals(1, reader.leaves().size()); for (LeafReaderContext leaf : reader.leaves()) { SortedSetDocValues docValues= leaf.reader().getSortedSetDocValues("field") ; for (int doc1 = docValues.nextDoc(); doc1 != DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) { assert docValues.docValueCount() > 0; } } } } } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero
[ https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543994#comment-17543994 ] Lu Xugang commented on LUCENE-10598: After LUCENE-10188 's question confirmed by [~jpountz] , I would fix it. > SortedSetDocValues#docValueCount() should be always greater than zero > - > > Key: LUCENE-10598 > URL: https://issues.apache.org/jira/browse/LUCENE-10598 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > > This test runs failed. > {code:java} > public void testDocValueCount() throws IOException { > try (Directory d = newDirectory()) { > try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) { > for (int j = 0; j < 1; j++) { > Document doc = new Document(); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("b"))); > w.addDocument(doc); > } > } > try (IndexReader reader = DirectoryReader.open(d)) { > assertEquals(1, reader.leaves().size()); > for (LeafReaderContext leaf : reader.leaves()) { > SortedSetDocValues docValues= > leaf.reader().getSortedSetDocValues("field") ; > for (int doc1 = docValues.nextDoc(); doc1 != > DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) { > assert docValues.docValueCount() > 0; > } > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?
[ https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543989#comment-17543989 ] Lu Xugang edited comment on LUCENE-10188 at 5/30/22 3:53 PM: - Hi, [~jpountz] , implementation of SortedNumericDocValues#docValueCount() means numbers of values indexed per doc, duplicated value was also participated in counting. Does SortedSetDocValues#docValueCount() has the same semantic (currently they have the same javadoc description)? Or maybe you means only calculate the number of unique values(ord)? was (Author: chrislu): Hi, [~jpountz] , implementation of SortedNumericDocValues#docValueCount() means numbers of values indexed per doc, duplicated value was also participated in counting. Does SortedSetDocValues#docValueCount() has the same semantic ? Or maybe you means only calculate the number of unique values(ord)? > Give SortedSetDocValues a docValueCount()? > -- > > Key: LUCENE-10188 > URL: https://issues.apache.org/jira/browse/LUCENE-10188 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 10.0 (main), 9.2 > > > Theoretically SortedSetDocValues gives more options to codecs with regard to > how SORTED_SET doc values could store ords. However in practice we currently > always store counts. Maybe giving SORTED_SET doc values an API that is closer > to the API of SORTED_NUMERIC doc values would be a better trade-off? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10599) Improve LogMergePolicy's handling of maxMergeSize
Adrien Grand created LUCENE-10599: - Summary: Improve LogMergePolicy's handling of maxMergeSize Key: LUCENE-10599 URL: https://issues.apache.org/jira/browse/LUCENE-10599 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand LogMergePolicy excludes from merging segments whose size is greater than or equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still considered for merging, segments will effectively reach a size somewhere between maxMergeSize and mergeFactor*maxMergeSize before they are not considered for merging anymore. At least this is what I thought. When LogMergePolicy ignores a segment that is too large for merging, it also ignores other segments that are in the same window of mergeFactor segments for merging if they are on the same tier. So actually segments might reach a size that is somewhere between maxMergeSize / mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not considered for merging anymore. Assuming a merge factor of 10 and a max merge size of 1,000 this means that segments will reach their maximum size somewhere between 178 and 10,000. This range is too large and makes maxMergeSize too hard to reason about? Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will happily merge them into a single 9990-docs segment. However if you have one 1,000 segment and 9 180-docs segments, then the 180-docs segments will not get merged with any other segment, even if you keep adding segments to the index. I propose to change this behavior so that when a large segment is encountered, then we wouldn't skip the entire window of mergeFactor segments, but just the segments that are too large. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #910: LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery
jtibshirani merged PR #910: URL: https://github.com/apache/lucene/pull/910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics
[ https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544026#comment-17544026 ] ASF subversion and git services commented on LUCENE-10582: -- Commit e319a5223cac757f4d9c7a80d3b0587370f8aa5f in lucene's branch refs/heads/main from Yannick Welsch [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e319a5223ca ] LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery (#910) CombinedFieldQuery does not properly combine overridden collection statistics, resulting in an IllegalArgumentException during searches. > CombinedFieldQuery fails with distributed field statistics > -- > > Key: LUCENE-10582 > URL: https://issues.apache.org/jira/browse/LUCENE-10582 > Project: Lucene - Core > Issue Type: Bug > Components: modules/sandbox >Reporter: Yannick Welsch >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > CombinedFieldQuery does not properly combine distributed collection > statistics, resulting in an IllegalArgumentException during searches. > Originally surfaced in this Elasticsearch issue: > https://github.com/elastic/elasticsearch/issues/82817 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics
[ https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani resolved LUCENE-10582. --- Fix Version/s: 9.3 Resolution: Fixed > CombinedFieldQuery fails with distributed field statistics > -- > > Key: LUCENE-10582 > URL: https://issues.apache.org/jira/browse/LUCENE-10582 > Project: Lucene - Core > Issue Type: Bug > Components: modules/sandbox >Reporter: Yannick Welsch >Priority: Minor > Fix For: 9.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > CombinedFieldQuery does not properly combine distributed collection > statistics, resulting in an IllegalArgumentException during searches. > Originally surfaced in this Elasticsearch issue: > https://github.com/elastic/elasticsearch/issues/82817 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics
[ https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544028#comment-17544028 ] ASF subversion and git services commented on LUCENE-10582: -- Commit 823df23ae122992ab2e91a0a35980cc7dacff8fe in lucene's branch refs/heads/branch_9x from Yannick Welsch [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=823df23ae12 ] LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery (#910) CombinedFieldQuery does not properly combine overridden collection statistics, resulting in an IllegalArgumentException during searches. > CombinedFieldQuery fails with distributed field statistics > -- > > Key: LUCENE-10582 > URL: https://issues.apache.org/jira/browse/LUCENE-10582 > Project: Lucene - Core > Issue Type: Bug > Components: modules/sandbox >Reporter: Yannick Welsch >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > CombinedFieldQuery does not properly combine distributed collection > statistics, resulting in an IllegalArgumentException during searches. > Originally surfaced in this Elasticsearch issue: > https://github.com/elastic/elasticsearch/issues/82817 -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?
[ https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544043#comment-17544043 ] Adrien Grand commented on LUCENE-10188: --- The value count on SortedSetDocValues should be the number of unique ords. > Give SortedSetDocValues a docValueCount()? > -- > > Key: LUCENE-10188 > URL: https://issues.apache.org/jira/browse/LUCENE-10188 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > Fix For: 10.0 (main), 9.2 > > > Theoretically SortedSetDocValues gives more options to codecs with regard to > how SORTED_SET doc values could store ords. However in practice we currently > always store counts. Maybe giving SORTED_SET doc values an API that is closer > to the API of SORTED_NUMERIC doc values would be a better trade-off? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero
[ https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544045#comment-17544045 ] Adrien Grand commented on LUCENE-10598: --- Good catch, it's indeed broken. > SortedSetDocValues#docValueCount() should be always greater than zero > - > > Key: LUCENE-10598 > URL: https://issues.apache.org/jira/browse/LUCENE-10598 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > > This test runs failed. > {code:java} > public void testDocValueCount() throws IOException { > try (Directory d = newDirectory()) { > try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) { > for (int j = 0; j < 1; j++) { > Document doc = new Document(); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("b"))); > w.addDocument(doc); > } > } > try (IndexReader reader = DirectoryReader.open(d)) { > assertEquals(1, reader.leaves().size()); > for (LeafReaderContext leaf : reader.leaves()) { > SortedSetDocValues docValues= > leaf.reader().getSortedSetDocValues("field") ; > for (int doc1 = docValues.nextDoc(); doc1 != > DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) { > assert docValues.docValueCount() > 0; > } > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang closed pull request #928: LUCENE-10594: Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex
LuXugang closed pull request #928: LUCENE-10594: Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex URL: https://github.com/apache/lucene/pull/928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10594) Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex
[ https://issues.apache.org/jira/browse/LUCENE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lu Xugang resolved LUCENE-10594. Resolution: Not A Problem > Duplicated value should be involve in SortedSetDocValues#docValueCount() in > MemoryIndex > --- > > Key: LUCENE-10594 > URL: https://issues.apache.org/jira/browse/LUCENE-10594 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > Base on LUCENE-10188 , SortedSetDocValues#docValueCount() in MemoryIndex > should keep the same semantic that duplicated values should also be involve > in calculating count. > Due to `dvBytesRefHashValuesSet` only record number of unique values, so a > additional `count` is needed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on pull request #926: VectorSimilarityFunction reverse removal
mayya-sharipova commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1141499461 > The tie-breaking by Lucene docId (smaller id wins) doesn't actually work right now(for the regular similarities) and not for Euclidean after my changes. > The node ID and score encoding in the HEAP was broken for such edge cases in my opinion for the majority of similarity functions. Euclidean tests were green because of the reverse! @alessandrobenedetti Thank for your investigations. I understand now why tests were failing (e.g. advanceShallow), and the reason is that because they were originally not completely correct (e.g. using COSINE similarity instead of euclidean in this test would make the returned doc IDs different). > I have reverted this and proposed a possible solution. The proposed solution looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero
[ https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544079#comment-17544079 ] Robert Muir commented on LUCENE-10598: -- I think we should validate this count in CheckIndex, similar to the checks it does for SortedNumericDocValues docValueCount > SortedSetDocValues#docValueCount() should be always greater than zero > - > > Key: LUCENE-10598 > URL: https://issues.apache.org/jira/browse/LUCENE-10598 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Major > > This test runs failed. > {code:java} > public void testDocValueCount() throws IOException { > try (Directory d = newDirectory()) { > try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) { > for (int j = 0; j < 1; j++) { > Document doc = new Document(); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("a"))); > doc.add(new SortedSetDocValuesField("field", new BytesRef("b"))); > w.addDocument(doc); > } > } > try (IndexReader reader = DirectoryReader.open(d)) { > assertEquals(1, reader.leaves().size()); > for (LeafReaderContext leaf : reader.leaves()) { > SortedSetDocValues docValues= > leaf.reader().getSortedSetDocValues("field") ; > for (int doc1 = docValues.nextDoc(); doc1 != > DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) { > assert docValues.docValueCount() > 0; > } > } > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #926: VectorSimilarityFunction reverse removal
jtibshirani commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1141529401 Thanks @alessandrobenedetti for working on this nice simplification! One thing I wanted to double-check -- for Euclidean distance we're now performing an extra division compared to before. I don't think this will have any significant impact on performance, and shouldn't affect the unrolling optimization added in LUCENE-10453, but I'm not 100% sure 🤔 Our nightly benchmarks only test `VectorSimilarityFunction.DOT_PRODUCT` so we wouldn't be able to catch a difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on pull request #928: LUCENE-10594: Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex
LuXugang commented on PR #928: URL: https://github.com/apache/lucene/pull/928#issuecomment-1141622393 According to the explaining in [LUCENE-10188](https://issues.apache.org/jira/browse/LUCENE-10188) , this issue is not a problem, closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10594) Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex
[ https://issues.apache.org/jira/browse/LUCENE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544105#comment-17544105 ] Lu Xugang commented on LUCENE-10594: According to the explaining in LUCENE-10188 , this issue is not a problem > Duplicated value should be involve in SortedSetDocValues#docValueCount() in > MemoryIndex > --- > > Key: LUCENE-10594 > URL: https://issues.apache.org/jira/browse/LUCENE-10594 > Project: Lucene - Core > Issue Type: Bug >Reporter: Lu Xugang >Priority: Trivial > Time Spent: 20m > Remaining Estimate: 0h > > Base on LUCENE-10188 , SortedSetDocValues#docValueCount() in MemoryIndex > should keep the same semantic that duplicated values should also be involve > in calculating count. > Due to `dvBytesRefHashValuesSet` only record number of unique values, so a > additional `count` is needed. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10596) Remove unused parameter in #getOrAddPerField
[ https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543754#comment-17543754 ] tangdh edited comment on LUCENE-10596 at 5/31/22 5:36 AM: -- I have raised a [PR|https://github.com/apache/lucene/pull/930],:D [~mikemccand] was (Author: JIRAUSER290177): I have raised a [PR|https://github.com/apache/lucene/pull/930],:D > Remove unused parameter in #getOrAddPerField > > > Key: LUCENE-10596 > URL: https://issues.apache.org/jira/browse/LUCENE-10596 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.2 >Reporter: tangdh >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I noticed that the parameter fieldType is no longer used in the method > getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd > be happy to raise a PR -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10596) Remove unused parameter in #getOrAddPerField
[ https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544156#comment-17544156 ] ASF subversion and git services commented on LUCENE-10596: -- Commit 40e9e5a00d51d82ab4344565b3b8d8712c8029d6 in lucene's branch refs/heads/main from tang donghai [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=40e9e5a00d5 ] LUCENE-10596: Remove unused parameter in #getOrAddPerField (#930) > Remove unused parameter in #getOrAddPerField > > > Key: LUCENE-10596 > URL: https://issues.apache.org/jira/browse/LUCENE-10596 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.2 >Reporter: tangdh >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > I noticed that the parameter fieldType is no longer used in the method > getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd > be happy to raise a PR -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #930: LUCENE-10596: Remove unused parameter in #getOrAddPerField
mocobeta merged PR #930: URL: https://github.com/apache/lucene/pull/930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10596) Remove unused parameter in #getOrAddPerField
[ https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544157#comment-17544157 ] ASF subversion and git services commented on LUCENE-10596: -- Commit 67e9f33f5aba34fde1b7c9232bd5a884e524420a in lucene's branch refs/heads/branch_9x from tang donghai [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=67e9f33f5ab ] LUCENE-10596: Remove unused parameter in #getOrAddPerField (#930) > Remove unused parameter in #getOrAddPerField > > > Key: LUCENE-10596 > URL: https://issues.apache.org/jira/browse/LUCENE-10596 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.2 >Reporter: tangdh >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > I noticed that the parameter fieldType is no longer used in the method > getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd > be happy to raise a PR -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] kaivalnp opened a new pull request, #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester
kaivalnp opened a new pull request, #932: URL: https://github.com/apache/lucene/pull/932 ### Description Link to [Jira](https://issues.apache.org/jira/browse/LUCENE-10559) ### Solution Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be able to compare pre and post-filtering benchmarks `filterSelectivity` expresses the selectivity of a filter as proportion of passing docs that are randomly selected. We store these in a FixedBitSet and use this to calculate true KNN as well as in HNSW search In case of post-filter, we over-select results as `topK / filterSelectivity` to get final hits close to actual requested `topK` For pre-filter, we wrap the FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10596) Remove unused parameter in #getOrAddPerField
[ https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida resolved LUCENE-10596. Fix Version/s: 10.0 (main) 9.3 Resolution: Fixed > Remove unused parameter in #getOrAddPerField > > > Key: LUCENE-10596 > URL: https://issues.apache.org/jira/browse/LUCENE-10596 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Affects Versions: 9.2 >Reporter: tangdh >Priority: Minor > Fix For: 10.0 (main), 9.3 > > Time Spent: 20m > Remaining Estimate: 0h > > I noticed that the parameter fieldType is no longer used in the method > getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd > be happy to raise a PR -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org