date:20220530

[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-05-30 Thread GitBox



uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140776739

   Thanks. Here we only need to compile a specific exact version (19). So for 
the MR-JAR task this should be fine with auto-discovery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #912: MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23)

2022-05-30 Thread GitBox



uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1140807127

   Thanks @mocobeta,
   I just need to not forget to reenable in September when this gets merged 
(the we van also use Autoprovision of JDK 19).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543849#comment-17543849
 ] 

Jan Høydahl commented on LUCENE-10557:
--

Thanks for thorough research! I propose to skip bulk migration of Jira issues 
and instead bulk comment on all open JIRAs, prompting the reporter or assignee 
to take whatever action they see fit. Some will choose to migrate, others will 
finalize the feature in JIRA, quite some will probably be closed as won't do, 
and some will just remain open/stale/don't care. No history will be lost, we 
can still refer and link to historic Jira issues, as long as ASF keeps Jira 
alive.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #2661: SOLR-16213 Upgrade Jackson to version 2.13.3

2022-05-30 Thread GitBox



janhoy merged PR #2661:
URL: https://github.com/apache/lucene-solr/pull/2661


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543852#comment-17543852
 ] 

ASF subversion and git services commented on LUCENE-10574:
--

Commit 318177af83efc99b6c05412cc8ef0ade15c92f6c in lucene's branch 
refs/heads/main from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=318177af83e ]

LUCENE-10574: Fix TestTieredMergePolicy's expectations about the segment count.


> Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't 
> do this
> ---
>
> Key: LUCENE-10574
> URL: https://issues.apache.org/jira/browse/LUCENE-10574
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge 
> policy that doesn't merge in an O(n^2) way.
> I have the feeling it might have to be the latter, as folks seem really wed 
> to this crazy O(n^2) behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543851#comment-17543851
 ] 

ASF subversion and git services commented on LUCENE-10574:
--

Commit 4b63460d2de17a2a90ef560f5ca9035b7ad1fa08 in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4b63460d2de ]

LUCENE-10574: Fix TestTieredMergePolicy's expectations about the segment count.


> Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't 
> do this
> ---
>
> Key: LUCENE-10574
> URL: https://issues.apache.org/jira/browse/LUCENE-10574
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge 
> policy that doesn't merge in an O(n^2) way.
> I have the feeling it might have to be the latter, as folks seem really wed 
> to this crazy O(n^2) behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10151) Add timeout support to IndexSearcher

2022-05-30 Thread Deepika Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543885#comment-17543885
 ] 

Deepika Sharma commented on LUCENE-10151:
-

I have raised a PR for the `BulkScorer` approach. Please take a look and let me 
know your thoughts.

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10151) Add timeout support to IndexSearcher

2022-05-30 Thread Deepika Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543885#comment-17543885
 ] 

Deepika Sharma edited comment on LUCENE-10151 at 5/30/22 11:08 AM:
---

I have raised a PR for the {{BulkScorer}} approach. Please take a look and let 
me know your thoughts.


was (Author: JIRAUSER288832):
I have raised a PR for the `BulkScorer` approach. Please take a look and let me 
know your thoughts.

> Add timeout support to IndexSearcher
> 
>
> Key: LUCENE-10151
> URL: https://issues.apache.org/jira/browse/LUCENE-10151
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Greg Miller
>Priority: Minor
>
> I'd like to explore adding optional "timeout" capabilities to 
> {{IndexSearcher}}. This would enable users to (optionally) specify a maximum 
> time budget for search execution. If the search "times out", partial results 
> would be available.
> This idea originated on the dev list (thanks [~jpountz] for the suggestion). 
> Thread for reference: 
> [http://mail-archives.apache.org/mod_mbox/lucene-dev/202110.mbox/%3CCAL8PwkZdNGmYJopPjeXYK%3DF7rvLkWon91UEXVxMM4MeeJ3UHxQ%40mail.gmail.com%3E]
>  
> A couple things to watch out for with this change:
>  # We want to make sure it's robust to a two-phase query evaluation scenario 
> where the "approximate" step matches a large number of candidates but the 
> "confirmation" step matches very few (or none). This is a particularly tricky 
> case.
>  # We want to make sure the {{TotalHits#Relation}} reported by {{TopDocs}} is 
> {{GREATER_THAN_OR_EQUAL_TO}} if the query times out
>  # We want to make sure it plays nice with the {{LRUCache}} since it iterates 
> the query to pre-populate a {{BitSet}} when caching. That step shouldn't be 
> allowed to overrun the timeout. The proper way to handle this probably needs 
> some thought.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543911#comment-17543911
 ] 

Tomoko Uchida commented on LUCENE-10557:


Hi [~janhoy], thanks for your suggestion.

"Migrate no issues and start from fresh" is definitely an option, on the other 
hand, many issues that are worth revisiting may remain Jira forever - I feel a 
bit sorry for them. If unresolved issues are on GitHub (even just the title and 
description) from the start, possible contributors will be able to 
browse/search them. However, I think we can discuss it later.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543911#comment-17543911
 ] 

Tomoko Uchida edited comment on LUCENE-10557 at 5/30/22 11:52 AM:
--

Hi [~janhoy], thanks for your suggestion.

"Migrate no issues and make a fresh start" is definitely an option, on the 
other hand, many issues that are worth revisiting may remain Jira forever - I 
feel a bit sorry for them. If unresolved issues are on GitHub (even just the 
title and description) from the start, possible contributors will be able to 
browse/search them. However, I think we can discuss it later.


was (Author: tomoko uchida):
Hi [~janhoy], thanks for your suggestion.

"Migrate no issues and start from fresh" is definitely an option, on the other 
hand, many issues that are worth revisiting may remain Jira forever - I feel a 
bit sorry for them. If unresolved issues are on GitHub (even just the title and 
description) from the start, possible contributors will be able to 
browse/search them. However, I think we can discuss it later.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-05-30 Thread GitBox



shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1141152098

   Hey @mdmarshmallow I think this is a great and very useful feature. I also 
believe that in general it will be good to accompany these changes with a demo 
`main()` in the `demo` package, but it can wait a bit until we have a solid 
API.  I've added to this PR an `.adoc` with few example use cases. IMO it will 
be useful to keep it around, but modify it of course per the feedback we 
receive, as a documentation of this feature. If for some reason we'd think that 
this document is redundant / will be hard to maintain and we'll want to stick 
with javadocs, I don't mind if in the end we'll delete it. For now I think it's 
a convenient place to document our thoughts, examples and APIs.
   
   I used the term `FacetSets` to denote "a set of values that go together". 
Other names may include `Tuple`, `Group` etc. I know naming is the hardest part 
:). In my mind I'm also thinking about an API like:
   
   ```
   doc.add(new FacetSetsField(
   "actorAwards,
   // A Thriller for which this actor received a Best Actor Oscar award in 
2022
   new FacetSet(ord("Oscar"), ord("Best Actor"), ord("Thriller"), 2022),
   // A Drama for which this actor received a Best Supporting Actor Emmy 
award in 2005
   new FacetSet(ord("Emmy"), ord("Best Supporting Actor"), ord("Drama"), 
2005),
   ));
   ```
   
   Yes, it could be just sugar API on top of `HyperRectangle` but perhaps from 
a faceting perspective might make more sense and consistent with the other 
faceting API (`RangeFacets`, `SSDVFacetField` etc.). I'd love to receive 
feedback on the use cases. I can also add to the document a 
more-than-pseudocode-like example which will include the indexing and 
aggregation API, so we have something more concrete to discuss?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal

2022-05-30 Thread GitBox



alessandrobenedetti commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1141164970

   @mayya-sharipova , @msokolov  I found out the reason the original tests are 
now failing.
   The tie-breaking by Lucene docId (smaller id wins) doesn't actually work 
right now(for the regular similarities) and not for Euclidean after my changes.
   
   The reason is the way we encode the heap value from the nodeId and the score:
   
   private long encode(int node, float score) {
   return order.applylong) NumericUtils.floatToSortableInt(score)) << 
32) | node);
 } 
 
 With this encoding, a higher node Id wins.
 
 Tests for cosine and dot-product are slighly different from the Euclidean 
and doesn't check tie-breaking.
 I am taking a look right now to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10597) Move globalMaxScore to MaxScoreCache from ImpactsDISI?

2022-05-30 Thread Tomoko Uchida (Jira)

Tomoko Uchida created LUCENE-10597:
--

 Summary: Move globalMaxScore to MaxScoreCache from ImpactsDISI?
 Key: LUCENE-10597
 URL: https://issues.apache.org/jira/browse/LUCENE-10597
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Tomoko Uchida


All max score calculations are done in {{MaxScoreCache}} except for 
{{globalMaxScore}}, that resides in {{ImpactsDISI}}.

Perhaps it would be clearer to have this global max score value in 
{{MaxScoreCache}}?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta opened a new pull request, #931: LUCENE-10597: move globalMaxScore to MaxScoreCache

2022-05-30 Thread GitBox



mocobeta opened a new pull request, #931:
URL: https://github.com/apache/lucene/pull/931

   ### Description (or a Jira issue link if you have one)
   See https://issues.apache.org/jira/browse/LUCENE-10597


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a diff in pull request #931: LUCENE-10597: move globalMaxScore to MaxScoreCache

2022-05-30 Thread GitBox



mocobeta commented on code in PR #931:
URL: https://github.com/apache/lucene/pull/931#discussion_r884855095


##
lucene/core/src/java/org/apache/lucene/search/MaxScoreCache.java:
##
@@ -80,6 +82,9 @@ int getLevel(int upTo) throws IOException {
 
   /** Return the maximum score for the given {@code level}. */
   float getMaxScoreForLevel(int level) throws IOException {
+if (level < 0) {

Review Comment:
   The original expression was `level == -1`. "smaller than zero" might be 
better here since there is an assumption that `level` is greater or equal to 
zero, I have no strong opinion on it though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543968#comment-17543968
 ] 

Tomoko Uchida commented on LUCENE-10557:


One feature I couldn't find out in GitHub issue that is naturally equipped in 
Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie 
any number of sub-issues to it with Jira but cannot in GitHub; I imagine 
"GitHub Project" is used for such issue grouping purpose (haven't used it) 
though, it'd be too much for us I think.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543968#comment-17543968
 ] 

Tomoko Uchida edited comment on LUCENE-10557 at 5/30/22 2:24 PM:
-

One feature I couldn't find out in GitHub issue that is naturally equipped in 
Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie 
any number of sub-issues to it with Jira but cannot with GitHub issue; I 
imagine "GitHub Project" is used for such issue grouping purpose (haven't used 
it) though, it'd be too much for us I think.


was (Author: tomoko uchida):
One feature I couldn't find out in GitHub issue that is naturally equipped in 
Jira is the parent-child issue hierarchy. We can have an umbrella issue and tie 
any number of sub-issues to it with Jira but cannot in GitHub; I imagine 
"GitHub Project" is used for such issue grouping purpose (haven't used it) 
though, it'd be too much for us I think.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #926: VectorSimilarityFunction reverse removal

2022-05-30 Thread GitBox



alessandrobenedetti commented on code in PR #926:
URL: https://github.com/apache/lucene/pull/926#discussion_r884908511


##
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##
@@ -193,25 +208,40 @@ public void testAdvanceShallow() throws IOException {
   }
   try (IndexReader reader = DirectoryReader.open(d)) {
 IndexSearcher searcher = new IndexSearcher(reader);
-KnnVectorQuery query = new KnnVectorQuery("field", new float[] {2, 3}, 
3);
+KnnVectorQuery query = new KnnVectorQuery("field", new float[] {0.5f, 
1}, 3);

Review Comment:
   I have reverted this and proposed a possible solution.
   The node ID and score encoding in the HEAP was broken for such edge cases in 
my opinion for the majority of similarity functions.
   Euclidean tests were green because of the reverse!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #926: VectorSimilarityFunction reverse removal

2022-05-30 Thread GitBox



alessandrobenedetti commented on code in PR #926:
URL: https://github.com/apache/lucene/pull/926#discussion_r884909389


##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborQueue.java:
##
@@ -56,9 +56,9 @@ long apply(long v) {
   // Whether the search stopped early because it reached the visited nodes 
limit
   private boolean incomplete;
 
-  public NeighborQueue(int initialSize, boolean reversed) {
+  public NeighborQueue(int initialSize, boolean descOrder) {

Review Comment:
   fixed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira?

2022-05-30 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543987#comment-17543987
 ] 

Dawid Weiss commented on LUCENE-10557:
--

I don't think this is a problem. You just create a description with a bullet 
list and reference related issues - they do show up in mentions, I think this 
is sufficient.

> Migrate to GitHub issue from Jira?
> --
>
> Key: LUCENE-10557
> URL: https://issues.apache.org/jira/browse/LUCENE-10557
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Major
>
> A few (not the majority) Apache projects already use the GitHub issue instead 
> of Jira. For example,
> Airflow: [https://github.com/apache/airflow/issues]
> BookKeeper: [https://github.com/apache/bookkeeper/issues]
> So I think it'd be technically possible that we move to GitHub issue. I have 
> little knowledge of how to proceed with it, I'd like to discuss whether we 
> should migrate to it, and if so, how to smoothly handle the migration.
> The major tasks would be:
>  * Get a consensus about the migration among committers
>  * Enable Github issue on the lucene's repository (currently, it is disabled 
> on it)
>  * Build the convention or rules for issue label/milestone management
>  * Choose issues that should be moved to GitHub (I think too old or obsolete 
> issues can remain Jira.)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?

2022-05-30 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543989#comment-17543989
 ] 

Lu Xugang commented on LUCENE-10188:


Hi, [~jpountz] ,  implementation of SortedNumericDocValues#docValueCount() 
means numbers of values indexed per doc, duplicated value was also participated 
in counting.

Does SortedSetDocValues#docValueCount() has the same semantic ？ Or maybe you 
means only calculate the number of unique values(ord)?

> Give SortedSetDocValues a docValueCount()?
> --
>
> Key: LUCENE-10188
> URL: https://issues.apache.org/jira/browse/LUCENE-10188
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 10.0 (main), 9.2
>
>
> Theoretically SortedSetDocValues gives more options to codecs with regard to 
> how SORTED_SET doc values could store ords. However in practice we currently 
> always store counts. Maybe giving SORTED_SET doc values an API that is closer 
> to the API of SORTED_NUMERIC doc values would be a better trade-off?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero

2022-05-30 Thread Lu Xugang (Jira)

Lu Xugang created LUCENE-10598:
--

 Summary: SortedSetDocValues#docValueCount() should be always 
greater than zero
 Key: LUCENE-10598
 URL: https://issues.apache.org/jira/browse/LUCENE-10598
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Lu Xugang


This test runs failed.

{code:java}
  public void testDocValueCount() throws IOException {
  try (Directory d = newDirectory()) {
try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) {
  for (int j = 0; j < 1; j++) {
Document doc = new Document();
doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
doc.add(new SortedSetDocValuesField("field", new BytesRef("b")));
w.addDocument(doc);
  }
}
try (IndexReader reader = DirectoryReader.open(d)) {
  assertEquals(1, reader.leaves().size());
  for (LeafReaderContext leaf : reader.leaves()) {
SortedSetDocValues docValues= 
leaf.reader().getSortedSetDocValues("field") ;
for (int doc1 = docValues.nextDoc(); doc1 != 
DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) {
  assert docValues.docValueCount() > 0;
}
  }
}
}
  }
{code}






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero

2022-05-30 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543994#comment-17543994
 ] 

Lu Xugang commented on LUCENE-10598:


After LUCENE-10188 's question confirmed by [~jpountz] , I would fix it.

> SortedSetDocValues#docValueCount() should be always greater than zero
> -
>
> Key: LUCENE-10598
> URL: https://issues.apache.org/jira/browse/LUCENE-10598
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
>
> This test runs failed.
> {code:java}
>   public void testDocValueCount() throws IOException {
>   try (Directory d = newDirectory()) {
> try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) {
>   for (int j = 0; j < 1; j++) {
> Document doc = new Document();
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("b")));
> w.addDocument(doc);
>   }
> }
> try (IndexReader reader = DirectoryReader.open(d)) {
>   assertEquals(1, reader.leaves().size());
>   for (LeafReaderContext leaf : reader.leaves()) {
> SortedSetDocValues docValues= 
> leaf.reader().getSortedSetDocValues("field") ;
> for (int doc1 = docValues.nextDoc(); doc1 != 
> DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) {
>   assert docValues.docValueCount() > 0;
> }
>   }
> }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?

2022-05-30 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543989#comment-17543989
 ] 

Lu Xugang edited comment on LUCENE-10188 at 5/30/22 3:53 PM:
-

Hi, [~jpountz] , implementation of SortedNumericDocValues#docValueCount() means 
numbers of values indexed per doc, duplicated value was also participated in 
counting.

Does SortedSetDocValues#docValueCount() has the same semantic (currently they 
have the same javadoc description)？ Or maybe you means only calculate the 
number of unique values(ord)?


was (Author: chrislu):
Hi, [~jpountz] ,  implementation of SortedNumericDocValues#docValueCount() 
means numbers of values indexed per doc, duplicated value was also participated 
in counting.

Does SortedSetDocValues#docValueCount() has the same semantic ？ Or maybe you 
means only calculate the number of unique values(ord)?

> Give SortedSetDocValues a docValueCount()?
> --
>
> Key: LUCENE-10188
> URL: https://issues.apache.org/jira/browse/LUCENE-10188
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 10.0 (main), 9.2
>
>
> Theoretically SortedSetDocValues gives more options to codecs with regard to 
> how SORTED_SET doc values could store ords. However in practice we currently 
> always store counts. Maybe giving SORTED_SET doc values an API that is closer 
> to the API of SORTED_NUMERIC doc values would be a better trade-off?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10599) Improve LogMergePolicy's handling of maxMergeSize

2022-05-30 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10599:
-

 Summary: Improve LogMergePolicy's handling of maxMergeSize
 Key: LUCENE-10599
 URL: https://issues.apache.org/jira/browse/LUCENE-10599
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


LogMergePolicy excludes from merging segments whose size is greater than or 
equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still 
considered for merging, segments will effectively reach a size somewhere 
between maxMergeSize and mergeFactor*maxMergeSize before they are not 
considered for merging anymore.

At least this is what I thought. When LogMergePolicy ignores a segment that is 
too large for merging, it also ignores other segments that are in the same 
window of mergeFactor segments for merging if they are on the same tier. So 
actually segments might reach a size that is somewhere between maxMergeSize / 
mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not considered 
for merging anymore.

Assuming a merge factor of 10 and a max merge size of 1,000 this means that 
segments will reach their maximum size somewhere between 178 and 10,000. This 
range is too large and makes maxMergeSize too hard to reason about?

Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will 
happily merge them into a single 9990-docs segment. However if you have one 
1,000 segment and 9 180-docs segments, then the 180-docs segments will not get 
merged with any other segment, even if you keep adding segments to the index.

I propose to change this behavior so that when a large segment is encountered, 
then we wouldn't skip the entire window of mergeFactor segments, but just the 
segments that are too large.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #910: LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery

2022-05-30 Thread GitBox



jtibshirani merged PR #910:
URL: https://github.com/apache/lucene/pull/910


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544026#comment-17544026
 ] 

ASF subversion and git services commented on LUCENE-10582:
--

Commit e319a5223cac757f4d9c7a80d3b0587370f8aa5f in lucene's branch 
refs/heads/main from Yannick Welsch
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e319a5223ca ]

LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery (#910)

CombinedFieldQuery does not properly combine overridden collection statistics, 
resulting in an IllegalArgumentException during searches.

> CombinedFieldQuery fails with distributed field statistics
> --
>
> Key: LUCENE-10582
> URL: https://issues.apache.org/jira/browse/LUCENE-10582
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox
>Reporter: Yannick Welsch
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> CombinedFieldQuery does not properly combine distributed collection 
> statistics, resulting in an IllegalArgumentException during searches.
> Originally surfaced in this Elasticsearch issue: 
> https://github.com/elastic/elasticsearch/issues/82817



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics

2022-05-30 Thread Julie Tibshirani (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani resolved LUCENE-10582.
---
Fix Version/s: 9.3
   Resolution: Fixed

> CombinedFieldQuery fails with distributed field statistics
> --
>
> Key: LUCENE-10582
> URL: https://issues.apache.org/jira/browse/LUCENE-10582
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox
>Reporter: Yannick Welsch
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> CombinedFieldQuery does not properly combine distributed collection 
> statistics, resulting in an IllegalArgumentException during searches.
> Originally surfaced in this Elasticsearch issue: 
> https://github.com/elastic/elasticsearch/issues/82817



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10582) CombinedFieldQuery fails with distributed field statistics

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544028#comment-17544028
 ] 

ASF subversion and git services commented on LUCENE-10582:
--

Commit 823df23ae122992ab2e91a0a35980cc7dacff8fe in lucene's branch 
refs/heads/branch_9x from Yannick Welsch
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=823df23ae12 ]

LUCENE-10582: Fix merging of CollectionStatistics in CombinedFieldQuery (#910)

CombinedFieldQuery does not properly combine overridden collection statistics, 
resulting in an IllegalArgumentException during searches.


> CombinedFieldQuery fails with distributed field statistics
> --
>
> Key: LUCENE-10582
> URL: https://issues.apache.org/jira/browse/LUCENE-10582
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/sandbox
>Reporter: Yannick Welsch
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> CombinedFieldQuery does not properly combine distributed collection 
> statistics, resulting in an IllegalArgumentException during searches.
> Originally surfaced in this Elasticsearch issue: 
> https://github.com/elastic/elasticsearch/issues/82817



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10188) Give SortedSetDocValues a docValueCount()?

2022-05-30 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544043#comment-17544043
 ] 

Adrien Grand commented on LUCENE-10188:
---

The value count on SortedSetDocValues should be the number of unique ords.

> Give SortedSetDocValues a docValueCount()?
> --
>
> Key: LUCENE-10188
> URL: https://issues.apache.org/jira/browse/LUCENE-10188
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 10.0 (main), 9.2
>
>
> Theoretically SortedSetDocValues gives more options to codecs with regard to 
> how SORTED_SET doc values could store ords. However in practice we currently 
> always store counts. Maybe giving SORTED_SET doc values an API that is closer 
> to the API of SORTED_NUMERIC doc values would be a better trade-off?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero

2022-05-30 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544045#comment-17544045
 ] 

Adrien Grand commented on LUCENE-10598:
---

Good catch, it's indeed broken.

> SortedSetDocValues#docValueCount() should be always greater than zero
> -
>
> Key: LUCENE-10598
> URL: https://issues.apache.org/jira/browse/LUCENE-10598
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
>
> This test runs failed.
> {code:java}
>   public void testDocValueCount() throws IOException {
>   try (Directory d = newDirectory()) {
> try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) {
>   for (int j = 0; j < 1; j++) {
> Document doc = new Document();
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("b")));
> w.addDocument(doc);
>   }
> }
> try (IndexReader reader = DirectoryReader.open(d)) {
>   assertEquals(1, reader.leaves().size());
>   for (LeafReaderContext leaf : reader.leaves()) {
> SortedSetDocValues docValues= 
> leaf.reader().getSortedSetDocValues("field") ;
> for (int doc1 = docValues.nextDoc(); doc1 != 
> DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) {
>   assert docValues.docValueCount() > 0;
> }
>   }
> }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang closed pull request #928: LUCENE-10594: Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex

2022-05-30 Thread GitBox



LuXugang closed pull request #928: LUCENE-10594: Duplicated value should be 
involve in SortedSetDocValues#docValueCount() in MemoryIndex
URL: https://github.com/apache/lucene/pull/928


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10594) Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex

2022-05-30 Thread Lu Xugang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang resolved LUCENE-10594.

Resolution: Not A Problem

> Duplicated value should be involve in SortedSetDocValues#docValueCount() in 
> MemoryIndex
> ---
>
> Key: LUCENE-10594
> URL: https://issues.apache.org/jira/browse/LUCENE-10594
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Base on LUCENE-10188 , SortedSetDocValues#docValueCount() in MemoryIndex 
> should keep the same semantic that duplicated values should also be involve 
> in calculating count.
> Due to `dvBytesRefHashValuesSet` only record number of unique values, so a 
> additional `count` is needed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on pull request #926: VectorSimilarityFunction reverse removal

2022-05-30 Thread GitBox



mayya-sharipova commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1141499461

   > The tie-breaking by Lucene docId (smaller id wins) doesn't actually work 
right now(for the regular similarities) and not for Euclidean after my changes.
   
   > The node ID and score encoding in the HEAP was broken for such edge cases 
in my opinion for the majority of similarity functions.
   Euclidean tests were green because of the reverse!
   
   @alessandrobenedetti Thank for your investigations. I understand now why 
tests were failing (e.g. advanceShallow), and the reason is that because they 
were originally not completely correct (e.g. using COSINE similarity instead of 
euclidean in this test would make the returned doc IDs different).
   
> I have reverted this and proposed a possible solution.
   
   The proposed solution looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10598) SortedSetDocValues#docValueCount() should be always greater than zero

2022-05-30 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544079#comment-17544079
 ] 

Robert Muir commented on LUCENE-10598:
--

I think we should validate this count in CheckIndex, similar to the checks it 
does for SortedNumericDocValues docValueCount

> SortedSetDocValues#docValueCount() should be always greater than zero
> -
>
> Key: LUCENE-10598
> URL: https://issues.apache.org/jira/browse/LUCENE-10598
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Major
>
> This test runs failed.
> {code:java}
>   public void testDocValueCount() throws IOException {
>   try (Directory d = newDirectory()) {
> try (IndexWriter w = new IndexWriter(d, new IndexWriterConfig())) {
>   for (int j = 0; j < 1; j++) {
> Document doc = new Document();
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("a")));
> doc.add(new SortedSetDocValuesField("field", new BytesRef("b")));
> w.addDocument(doc);
>   }
> }
> try (IndexReader reader = DirectoryReader.open(d)) {
>   assertEquals(1, reader.leaves().size());
>   for (LeafReaderContext leaf : reader.leaves()) {
> SortedSetDocValues docValues= 
> leaf.reader().getSortedSetDocValues("field") ;
> for (int doc1 = docValues.nextDoc(); doc1 != 
> DocIdSetIterator.NO_MORE_DOCS; doc1 = docValues.nextDoc()) {
>   assert docValues.docValueCount() > 0;
> }
>   }
> }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on pull request #926: VectorSimilarityFunction reverse removal

2022-05-30 Thread GitBox



jtibshirani commented on PR #926:
URL: https://github.com/apache/lucene/pull/926#issuecomment-1141529401

   Thanks @alessandrobenedetti for working on this nice simplification! One 
thing I wanted to double-check -- for Euclidean distance we're now performing 
an extra division compared to before. I don't think this will have any 
significant impact on performance, and shouldn't affect the unrolling 
optimization added in LUCENE-10453, but I'm not 100% sure 🤔  Our nightly 
benchmarks only test `VectorSimilarityFunction.DOT_PRODUCT` so we wouldn't be 
able to catch a difference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on pull request #928: LUCENE-10594: Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex

2022-05-30 Thread GitBox



LuXugang commented on PR #928:
URL: https://github.com/apache/lucene/pull/928#issuecomment-1141622393

   According to the explaining in 
[LUCENE-10188](https://issues.apache.org/jira/browse/LUCENE-10188) , this issue 
is not a problem, closed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10594) Duplicated value should be involve in SortedSetDocValues#docValueCount() in MemoryIndex

2022-05-30 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544105#comment-17544105
 ] 

Lu Xugang commented on LUCENE-10594:


According to the explaining in LUCENE-10188 , this issue is not a problem

> Duplicated value should be involve in SortedSetDocValues#docValueCount() in 
> MemoryIndex
> ---
>
> Key: LUCENE-10594
> URL: https://issues.apache.org/jira/browse/LUCENE-10594
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Base on LUCENE-10188 , SortedSetDocValues#docValueCount() in MemoryIndex 
> should keep the same semantic that duplicated values should also be involve 
> in calculating count.
> Due to `dvBytesRefHashValuesSet` only record number of unique values, so a 
> additional `count` is needed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10596) Remove unused parameter in #getOrAddPerField

2022-05-30 Thread tangdh (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17543754#comment-17543754
 ] 

tangdh edited comment on LUCENE-10596 at 5/31/22 5:36 AM:
--

I have raised a [PR|https://github.com/apache/lucene/pull/930]，:D [~mikemccand] 


was (Author: JIRAUSER290177):
I have raised a [PR|https://github.com/apache/lucene/pull/930]，:D

> Remove unused parameter in #getOrAddPerField
> 
>
> Key: LUCENE-10596
> URL: https://issues.apache.org/jira/browse/LUCENE-10596
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I noticed that the parameter fieldType is no longer used in the method 
> getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd 
> be happy to raise a PR



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10596) Remove unused parameter in #getOrAddPerField

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544156#comment-17544156
 ] 

ASF subversion and git services commented on LUCENE-10596:
--

Commit 40e9e5a00d51d82ab4344565b3b8d8712c8029d6 in lucene's branch 
refs/heads/main from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=40e9e5a00d5 ]

LUCENE-10596: Remove unused parameter in #getOrAddPerField (#930)



> Remove unused parameter in #getOrAddPerField
> 
>
> Key: LUCENE-10596
> URL: https://issues.apache.org/jira/browse/LUCENE-10596
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I noticed that the parameter fieldType is no longer used in the method 
> getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd 
> be happy to raise a PR



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta merged pull request #930: LUCENE-10596: Remove unused parameter in #getOrAddPerField

2022-05-30 Thread GitBox



mocobeta merged PR #930:
URL: https://github.com/apache/lucene/pull/930


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10596) Remove unused parameter in #getOrAddPerField

2022-05-30 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544157#comment-17544157
 ] 

ASF subversion and git services commented on LUCENE-10596:
--

Commit 67e9f33f5aba34fde1b7c9232bd5a884e524420a in lucene's branch 
refs/heads/branch_9x from tang donghai
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=67e9f33f5ab ]

LUCENE-10596: Remove unused parameter in #getOrAddPerField (#930)



> Remove unused parameter in #getOrAddPerField
> 
>
> Key: LUCENE-10596
> URL: https://issues.apache.org/jira/browse/LUCENE-10596
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I noticed that the parameter fieldType is no longer used in the method 
> getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd 
> be happy to raise a PR



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] kaivalnp opened a new pull request, #932: LUCENE-10559: Add Prefilter Option to KnnGraphTester

2022-05-30 Thread GitBox



kaivalnp opened a new pull request, #932:
URL: https://github.com/apache/lucene/pull/932

   ### Description
   Link to [Jira](https://issues.apache.org/jira/browse/LUCENE-10559)
   
   ### Solution
   
   Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be 
able to compare pre and post-filtering benchmarks
   
   `filterSelectivity` expresses the selectivity of a filter as proportion of 
passing docs that are randomly selected. We store these in a FixedBitSet and 
use this to calculate true KNN as well as in HNSW search
   
   In case of post-filter, we over-select results as `topK / filterSelectivity` 
to get final hits close to actual requested `topK`
   For pre-filter, we wrap the FixedBitSet in a query and pass it as prefilter 
argument to KnnVectorQuery


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10596) Remove unused parameter in #getOrAddPerField

2022-05-30 Thread Tomoko Uchida (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida resolved LUCENE-10596.

Fix Version/s: 10.0 (main)
   9.3
   Resolution: Fixed

> Remove unused parameter in #getOrAddPerField
> 
>
> Key: LUCENE-10596
> URL: https://issues.apache.org/jira/browse/LUCENE-10596
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Affects Versions: 9.2
>Reporter: tangdh
>Priority: Minor
> Fix For: 10.0 (main), 9.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I noticed that the parameter fieldType is no longer used in the method 
> getOrAddPerField(indexingChain.java:773), do we need to remove it? If so, I'd 
> be happy to raise a PR



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

44 matches

Mail list logo