date:20220523

[jira] [Created] (LUCENE-10588) Make Luke launching code faster

2022-05-23 Thread Tomoko Uchida (Jira)

Tomoko Uchida created LUCENE-10588:
--

 Summary: Make Luke launching code faster
 Key: LUCENE-10588
 URL: https://issues.apache.org/jira/browse/LUCENE-10588
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Tomoko Uchida


Starting Luke can take multiple seconds since it renders all GUI components 
when launching; It could be possible to make it faster (within sub-second) by 
lazily rendering panels to avoid loading too many classes when starting.

This typically becomes an issue on CI job, but a quicker launch would be also 
good for humans.
https://github.com/apache/lucene/pull/917



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #917: LUCENE-10531: Disable distribution test (gui test) on windows.

2022-05-23 Thread GitBox



mocobeta commented on PR #917:
URL: https://github.com/apache/lucene/pull/917#issuecomment-1134306924

   > One change I think we could try is to run the forked command with a higher 
priority
   
   Thanks for your suggestion; we could try this workaround though, I feel like 
it'd be better to keep it disabled and try to solve the slowness of launching 
the app. 
   https://issues.apache.org/jira/browse/LUCENE-10588
   
   It's great to know we can run the gui test with a (virtual) display with 
Github Actions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie merged pull request #919: Update dev-docs

2022-05-23 Thread GitBox



shaie merged PR #919:
URL: https://github.com/apache/lucene/pull/919


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



romseygeek commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134360770

   Hiya, this is making the elasticsearch CI cross; all builds are failing with 
this message:
   ```
   * Where:
   08:09:37 Script 
'/var/lib/jenkins/workspace/apache+lucene+main/gradle/java/modules.gradle' 
line: 215
   08:09:37 
   08:09:37 * What went wrong:
   08:09:37 Execution failed for task ':lucene:core.tests:test'.
   08:09:37 > java.nio.file.NoSuchFileException: 
/var/lib/jenkins/workspace/apache+lucene+main/lucene/core.tests/build/tmp/test/jvm-forking.properties
   ```
   
   I think the problem is that `core.tests` in the middle there, which should 
instead be a `core/tests`, but I'm not sure if that's something wrong with our 
Jenkins environment or if its a bug in the gradle logic that is constructing 
the path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



romseygeek commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134364141

   Aha, and it's failing locally for me as well.  I'll see if I can work out 
where the issue is!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



dweiss commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134368823

   Hi Alan. Is the Lucene build failing for you locally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



romseygeek commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134370810

   Hey Dawid, yes I get local failures if I run `./gradlew clean check`.
   
   ```
   * Where:
   Script '/Users/romseygeek/projects/lucene/gradle/java/modules.gradle' line: 
215
   
   * What went wrong:
   Execution failed for task ':lucene:backward-codecs:test'.
   > java.nio.file.NoSuchFileException: 
/Users/romseygeek/projects/lucene/lucene/backward-codecs/build/tmp/test/jvm-forking.properties
   ```
   
   I think possibly the problem is that it's not creating the 
`jvm-forking.properties` file before trying to write to it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



dweiss commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134373557

   No. I think it's the parent path that is missing here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



dweiss commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134374008

   try Files.createDirectories(forkProperties.toPath().getParent());


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



dweiss commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134375202

   I know why you're getting it. clean executes after configuration and wipes 
the temporary task directory for test. We'll have to recreate it properly. I'll 
commit a fix soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



romseygeek commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134378224

   Can confirm that adding the `createDirectories` line before the 
`writeString` fixes the problem.  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



uschindler commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134378329

   Yes, your Jenkins seems to call `gradlew clean test`, too, so this is 
failing.
   
   On ASF and Policeman Jenkins we do not do this, so it passes. Jenkins on my 
managed Jenkins instances have the "reset git reporitoy to clean checkout" 
feature enabled, so when job starts it has a completely clean git checkout, so 
`gradlew clean` is not needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10574) Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't do this

2022-05-23 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540835#comment-17540835
 ] 

Adrien Grand commented on LUCENE-10574:
---

The stored fields benchmark aimed at reproducing a pathological case, but I 
don't think this case is uncommon.

The only thing you need to be affected by O(n^2) merges is to flush segments 
that are significantly smaller than the default floor segment size of 
TieredMergePolicy (2MB). We almost never see this in our benchmarks because our 
indexing logic always tries to max out indexing speed, so even with the default 
RAM buffer size of 16MB, the smallest segments in the index would be above 2MB.

However in the real world where there are frequent reopens, this wouldn't be 
unlikely. For instance, if your documents require ~100 bytes of disk space each 
in the index, and your indexing/refresh rate trigger creation of segments of 
~100 documents each, then you'll end up with ~10kB flush segments and hit 
pathological merges.

> Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't 
> do this
> ---
>
> Key: LUCENE-10574
> URL: https://issues.apache.org/jira/browse/LUCENE-10574
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: 9.3
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge 
> policy that doesn't merge in an O(n^2) way.
> I have the feeling it might have to be the latter, as folks seem really wed 
> to this crazy O(n^2) behavior.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540836#comment-17540836
 ] 

ASF subversion and git services commented on LUCENE-10370:
--

Commit 5b92002fed3ca316e98c822c1afdccd30f00feb7 in lucene's branch 
refs/heads/main from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5b92002fed3 ]

LUCENE-10370: recreate temporary location in case it's wiped by a clean.


> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.
> PR at:
> https://github.com/apache/lucene/pull/909



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10370) Fix classpath/module path of tests forking their own Java (TestNRTReplication)

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540837#comment-17540837
 ] 

ASF subversion and git services commented on LUCENE-10370:
--

Commit fa411e053f690a9f3087c5112150d7b08477aa73 in lucene's branch 
refs/heads/branch_9x from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fa411e053f6 ]

LUCENE-10370: recreate temporary location in case it's wiped by a clean.


> Fix classpath/module path of tests forking their own Java (TestNRTReplication)
> --
>
> Key: LUCENE-10370
> URL: https://issues.apache.org/jira/browse/LUCENE-10370
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> TestNRTReplication fails because it assumes classpath can just be copied to a 
> sub-process - this is no longer the case.
> PR at:
> https://github.com/apache/lucene/pull/909



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



dweiss commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134379491

   I've committed a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on pull request #909: LUCENE-10370: pass proper classpath/module arguments for forking jvms from within tests

2022-05-23 Thread GitBox



romseygeek commented on PR #909:
URL: https://github.com/apache/lucene/pull/909#issuecomment-1134383958

   Thanks Dawid!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10589) Fix corner case in TestKnnVectorQuery.testRandomWithFilter

2022-05-23 Thread Tomoko Uchida (Jira)

Tomoko Uchida created LUCENE-10589:
--

 Summary: Fix corner case in TestKnnVectorQuery.testRandomWithFilter
 Key: LUCENE-10589
 URL: https://issues.apache.org/jira/browse/LUCENE-10589
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Tomoko Uchida


{{TestKnnVectorQuery.testRandomWithFilter}} can fail with 
java.lang.UnsupportedOperationException.

Reproducible command
{code:java}
./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
-Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
{code}
{code:java}
org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED
java.lang.UnsupportedOperationException: exact search is not supported
at 
__randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0)
at 
org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715)
at 
org.apache.lucene.search.KnnVectorQuery.searchLeaf(KnnVectorQuery.java:151)
at 
org.apache.lucene.search.KnnVectorQuery.rewrite(KnnVectorQuery.java:108)
at 
org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:789)
at 
org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69)
at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:803)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:685)
at 
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:584)
at 
org.apache.lucene.search.TestKnnVectorQuery.testRandomWithFilter(TestKnnVectorQuery.java:556)
{code}
In some edge cases (depending on the random seed), 
[KnnVectorQuery.java#147|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java#L147]
 becomes false, and then `exactSearch()` is called.

The upper bound of [the test range query 
(filter)|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L554]
 could be 200 (the max value of "tag" field + 1) instead of lower + 150 to make 
it "unrestrictive"?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta opened a new pull request, #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



mocobeta opened a new pull request, #920:
URL: https://github.com/apache/lucene/pull/920

   This is a small tweak for `TestKnnVectorQuery.testRandomWithFilter()`.
   
   See https://issues.apache.org/jira/browse/LUCENE-10589.
   
   On main:
   ```
   ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
-Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
   
   org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED
   java.lang.UnsupportedOperationException: exact search is not supported
   at 
__randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0)
   at 
org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715)
   ```
   
   With this patch:
   ```
   ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
-Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
   
   :lucene:core:test (SUCCESS): 1 test(s)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10589) Fix corner case in TestKnnVectorQuery.testRandomWithFilter

2022-05-23 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540935#comment-17540935
 ] 

Dawid Weiss commented on LUCENE-10589:
--

I don't know anything about this code area but thank you for following up on 
jenkins failures, [~tomoko]!

> Fix corner case in TestKnnVectorQuery.testRandomWithFilter
> --
>
> Key: LUCENE-10589
> URL: https://issues.apache.org/jira/browse/LUCENE-10589
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{TestKnnVectorQuery.testRandomWithFilter}} can fail with 
> java.lang.UnsupportedOperationException.
> Reproducible command
> {code:java}
> ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
> -Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
> {code}
> {code:java}
> org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED
> java.lang.UnsupportedOperationException: exact search is not supported
> at 
> __randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0)
> at 
> org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715)
> at 
> org.apache.lucene.search.KnnVectorQuery.searchLeaf(KnnVectorQuery.java:151)
> at 
> org.apache.lucene.search.KnnVectorQuery.rewrite(KnnVectorQuery.java:108)
> at 
> org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:789)
> at 
> org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:803)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:685)
> at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:584)
> at 
> org.apache.lucene.search.TestKnnVectorQuery.testRandomWithFilter(TestKnnVectorQuery.java:556)
> {code}
> In some edge cases (depending on the random seed), 
> [KnnVectorQuery.java#147|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java#L147]
>  becomes false, and then `exactSearch()` is called.
> The upper bound of [the test range query 
> (filter)|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L554]
>  could be 200 (the max value of "tag" field + 1) instead of lower + 150 to 
> make it "unrestrictive"?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



mocobeta commented on PR #920:
URL: https://github.com/apache/lucene/pull/920#issuecomment-1134641142

   It looks like the test can be tweaked not to fall into the corner cases but 
I'm not fully sure if this is correct - is there a better way to fix it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #921: LUCENE-10078: Enable merge-on-refresh by default.

2022-05-23 Thread GitBox



jpountz opened a new pull request, #921:
URL: https://github.com/apache/lucene/pull/921

   This gives implementations of `findFullFlushMerges` to `LogMergePolicy` and
   `TieredMergePolicy` and enables merge-on-refresh with a default timeout of
   500ms.
   
   The idea behind the 500ms default is that it felt both high-enough to have 
time
   to run merges of small segments, and low enough that the freshness of the 
data
   wouldn't look badly affected for users who have high refresh rates (e.g.
   refreshing every second).
   
   For `findFullFlushMerges`, `LogMergePolicy` looks at tail segments to see if 
it
   can find at least `mergeFactor` flush segments below the min segment size, 
and
   `TieredMergePolicy` looks for a merge that has at least `segmentsPerTier`
   segments where the largest segment of the merge is a flush segment and below
   the floor size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10589) Fix corner case in TestKnnVectorQuery.testRandomWithFilter

2022-05-23 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540950#comment-17540950
 ] 

Tomoko Uchida commented on LUCENE-10589:


You’re welcome - debugging this was a good chance to follow/play around with 
the code for me.

> Fix corner case in TestKnnVectorQuery.testRandomWithFilter
> --
>
> Key: LUCENE-10589
> URL: https://issues.apache.org/jira/browse/LUCENE-10589
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{TestKnnVectorQuery.testRandomWithFilter}} can fail with 
> java.lang.UnsupportedOperationException.
> Reproducible command
> {code:java}
> ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
> -Dtests.seed=1DA39B92702DAC45 -Dtests.multiplier=3
> {code}
> {code:java}
> org.apache.lucene.search.TestKnnVectorQuery > testRandomWithFilter FAILED
> java.lang.UnsupportedOperationException: exact search is not supported
> at 
> __randomizedtesting.SeedInfo.seed([1DA39B92702DAC45:6BEAC2197AD96AE0]:0)
> at 
> org.apache.lucene.search.TestKnnVectorQuery$ThrowingKnnVectorQuery.exactSearch(TestKnnVectorQuery.java:715)
> at 
> org.apache.lucene.search.KnnVectorQuery.searchLeaf(KnnVectorQuery.java:151)
> at 
> org.apache.lucene.search.KnnVectorQuery.rewrite(KnnVectorQuery.java:108)
> at 
> org.apache.lucene.search.ConstantScoreQuery.rewrite(ConstantScoreQuery.java:44)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:789)
> at 
> org.apache.lucene.tests.search.AssertingIndexSearcher.rewrite(AssertingIndexSearcher.java:69)
> at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:803)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:685)
> at 
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:667)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:584)
> at 
> org.apache.lucene.search.TestKnnVectorQuery.testRandomWithFilter(TestKnnVectorQuery.java:556)
> {code}
> In some edge cases (depending on the random seed), 
> [KnnVectorQuery.java#147|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java#L147]
>  becomes false, and then `exactSearch()` is called.
> The upper bound of [the test range query 
> (filter)|https://github.com/apache/lucene/blob/fe9d26178d033f585c08a5e86708063ac0ec0c9e/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L554]
>  could be 200 (the max value of "tag" field + 1) instead of lower + 150 to 
> make it "unrestrictive"?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir merged pull request #901: remove commented-out/obselete AwaitsFix

2022-05-23 Thread GitBox



rmuir merged PR #901:
URL: https://github.com/apache/lucene/pull/901


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540955#comment-17540955
 ] 

ASF subversion and git services commented on LUCENE-10229:
--

Commit c86f9b2d8c1ccdb85a33b64ace70a1b1d3a4e2d4 in lucene's branch 
refs/heads/main from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c86f9b2d8c1 ]

remove commented-out/obselete AwaitsFix (#901)

* remove commented-out/obselete AwaitsFix

All of these issues are fixed, but the AwaitsFix annotation is still there, 
just commented out. This causes confusion and makes it harder to keep an 
eye/review the AwaitsFix tests, e.g. false positives when running 'git grep 
AwaitsFix'

* Remove @AwaitsFix from TestMatchRegionRetriever. The problem has been fixed 
in LUCENE-10229.

Co-authored-by: Dawid Weiss 

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10229) Match offsets should be consistent for fields with positions and fields with offsets

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540961#comment-17540961
 ] 

ASF subversion and git services commented on LUCENE-10229:
--

Commit 6edc8a4cff5fc6bb2aca8847d8edd2d6eb01ec13 in lucene's branch 
refs/heads/branch_9x from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6edc8a4cff5 ]

remove commented-out/obselete AwaitsFix (#901)

* remove commented-out/obselete AwaitsFix

All of these issues are fixed, but the AwaitsFix annotation is still there, 
just commented out. This causes confusion and makes it harder to keep an 
eye/review the AwaitsFix tests, e.g. false positives when running 'git grep 
AwaitsFix'

* Remove @AwaitsFix from TestMatchRegionRetriever. The problem has been fixed 
in LUCENE-10229.

Co-authored-by: Dawid Weiss 

> Match offsets should be consistent for fields with positions and fields with 
> offsets
> 
>
> Key: LUCENE-10229
> URL: https://issues.apache.org/jira/browse/LUCENE-10229
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is a follow-up of LUCENE-10223 in which it was discovered that fields 
> with
> offsets don't highlight some more complex interval queries properly.  Alan 
> says:
> {quote}
> It's because it returns the position of the inner match, but the offsets of 
> the outer.  And so if you're re-analyzing and retrieving offsets by looking 
> at the positions, you get the 'right' thing.  It's not obvious to me what the 
> correct response is here, but thinking about it the current behaviour is kind 
> of the worst of both worlds, and perhaps we should change it so that you get 
> offsets of the inner match as standard, and then the outer match is returned 
> as part of the sub matches.
> {quote}
> Intervals are nicely separated into "basic intervals" and "filters" which 
> restrict some other source of intervals, here is the original documentation:
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/package-info.java#L29-L50
> My experience from an extended period of using interval queries in a frontend 
> where they're highlighted is that filters are restrictions that should not be 
> highlighted - it's the source intervals that people care about. Filters are 
> what you remove or where you give proper context to source intervals.
> The test code contributed in LUCENE-10223 contains numerous query-highlight 
> examples (on fields with positions) where this intuition is demonstrated on 
> all kinds of interval functions:
> https://github.com/apache/lucene/blob/main/lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchHighlighter.java#L335-L542
> This issue is about making the internals work consistently for fields with 
> positions and fields with offsets.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10078) Enable merge-on-refresh by default?

2022-05-23 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540979#comment-17540979
 ] 

Adrien Grand commented on LUCENE-10078:
---

We had discussions about this in the context of the O(n^2) merging that 
{{floorSegmentSize}} introduces (LUCENE-10574), so I took a stab at this issue, 
so that users fully benefit from the trade-off we're making of creating 
unbalanced merges for the sake of having fewer segments to deal with at search 
time.

> Enable merge-on-refresh by default?
> ---
>
> Key: LUCENE-10078
> URL: https://issues.apache.org/jira/browse/LUCENE-10078
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a spinoff from the discussion in LUCENE-10073.
> The newish merge-on-refresh ([crazy origin 
> story|https://blog.mikemccandless.com/2021/03/open-source-collaboration-or-how-we.html])
>  feature is a powerful way to reduce searched segment counts, especially 
> helpful for applications using many indexing threads.  Such usage will write 
> many tiny segments on each refresh, which could quickly be merged up during 
> the {{refresh}} operation.
> We would have to implement a default for {{findFullFlushMerges}} 
> (LUCENE-10064 is open for this), and then we would need 
> {{IndexWriterConfig.getMaxFullFlushMergeWaitMillis}} a non-zero value (this 
> issue).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #918: LUCENE-10586: Minor cleanup for local variables in BlockTreeTermsReader

2022-05-23 Thread GitBox



mocobeta commented on PR #918:
URL: https://github.com/apache/lucene/pull/918#issuecomment-1134773996

   Thanks @mikemccand for confirming this.
   I'll keep this open for a few more days for others to review it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10586) Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, indexMetaIn, termsMetaIn

2022-05-23 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540996#comment-17540996
 ] 

Adrien Grand commented on LUCENE-10586:
---

+1

The reason is historical indeed. The earlier version of this class, 
Lucene40BlockTreeTermsReader, used to record metadata interleaved with the 
actual data. At some point, we moved metadata to a dedicated file, so that we 
could verify checksums upon opening the segment, so this required assigning 
`indexMetaIn` and `termsMetaIn` to either the data files or the metadata file 
depending on the version. It's good we can clean this up now that we're always 
reading metadata from the metadata file!

> Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, 
> indexMetaIn, termsMetaIn
> ---
>
> Key: LUCENE-10586
> URL: https://issues.apache.org/jira/browse/LUCENE-10586
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Those three local variables refer to the same {{IndexInput}} object (no 
> clone() is called).
> {code}
> indexMetaIn = termsMetaIn = metaIn;
> {code}
> I'm not sure but maybe there are some historical reasons. I wonder if it 
> would be better to have only one reference for the underlying {{IndexInput}} 
> object to make it a little easy to follow the code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



msokolov commented on PR #920:
URL: https://github.com/apache/lucene/pull/920#issuecomment-1134805154

   I suspect what's happening is RandomIndexWriter is causing some very small 
segment to be written, and within that segment the query *is* highly selective 
causing us to fall back to brute force scan. I would probably fix by using a 
more "normal" IndexWriter and always indexing a single segment?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8519) MultiDocValues.getNormValues should not call getMergedFieldInfos

2022-05-23 Thread Rushabh Shah (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541000#comment-17541000
 ] 

Rushabh Shah commented on LUCENE-8519:
--

[~dsmiley] Thank you for the review and the merge. 

> MultiDocValues.getNormValues should not call getMergedFieldInfos
> 
>
> Key: LUCENE-8519
> URL: https://issues.apache.org/jira/browse/LUCENE-8519
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Priority: Minor
> Fix For: 9.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {{MultiDocValues.getNormValues}} should not call {{getMergedFieldInfos}} 
> because it's a needless expense.  getNormValues simply wants to know if each 
> LeafReader that has this field has norms as well; that's all.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Michael Sokolov (Jira)

Michael Sokolov created LUCENE-10590:


 Summary: Indexing all zero vectors leads to heat death of the 
universe
 Key: LUCENE-10590
 URL: https://issues.apache.org/jira/browse/LUCENE-10590
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael Sokolov


By accident while testing something else, I ran a luceneutil test indexing 1M 
100d vectors where all the vectors were all zeroes. This caused indexing to 
take a very long time (~40x normal - it did eventually complete) and the search 
performance was similarly bad.  We should not degrade by orders of magnitude 
with even the worst data though.

I'm not entirely sure what the issue is, but perhaps as long as we keep finding 
hits that are "better" we keep exploring the graph, where better means (score, 
-docid) >= (lowest score, -docid). If that's right and all docs have the same 
score, then we probably need to either switch to > (but this could lead to 
poorer recall in normal cases) or introduce some kind of minimum score 
threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a diff in pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



rmuir commented on code in PR #916:
URL: https://github.com/apache/lucene/pull/916#discussion_r879605545


##
CONTRIBUTING.md:
##
@@ -78,8 +78,11 @@ Please be patient. Committers are busy people too. If no one 
responds to your pa
 
 Please refer to [GitHub's 
documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests)
 for an explanation of how to create a pull request.
 
+You should open a pull request against the `main` branch. It is also 
recommended to give Lucene maintainers 
[access](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to your PR branch.

Review Comment:
   Is this "access" step required anymore? Isn't it the default in github these 
days?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



mocobeta commented on PR #920:
URL: https://github.com/apache/lucene/pull/920#issuecomment-1134837438

   According to the javadocs of the test, using a randomly skewed index with 
RandomIndexWriter is an intentional choice I think? 
   ```
   /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



msokolov commented on PR #920:
URL: https://github.com/apache/lucene/pull/920#issuecomment-1134842970

   Yeah, I'm just not sure that all the implications are desirable for this 
test. For example if we have a segment with 5 docs, their "tags" might all be > 
the threshold of the filter?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a diff in pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



mocobeta commented on code in PR #916:
URL: https://github.com/apache/lucene/pull/916#discussion_r879634562


##
CONTRIBUTING.md:
##
@@ -78,8 +78,11 @@ Please be patient. Committers are busy people too. If no one 
responds to your pa
 
 Please refer to [GitHub's 
documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests)
 for an explanation of how to create a pull request.
 
+You should open a pull request against the `main` branch. It is also 
recommended to give Lucene maintainers 
[access](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to your PR branch.

Review Comment:
   For sure. We already have the good default, then the request to give access 
wouldn't be needed anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shahrs87 commented on a diff in pull request #897: LUCENE-10266 Move nearest-neighbor search on points to core

2022-05-23 Thread GitBox



shahrs87 commented on code in PR #897:
URL: https://github.com/apache/lucene/pull/897#discussion_r879640507


##
lucene/core/src/java/org/apache/lucene/document/LatLonPoint.java:
##
@@ -362,4 +377,72 @@ public static Query newDistanceFeatureQuery(
 }
 return query;
   }
+
+  /**
+   * Finds the {@code n} nearest indexed points to the provided point, 
according to Haversine
+   * distance.
+   *
+   * This is functionally equivalent to running {@link MatchAllDocsQuery} 
with a {@link
+   * LatLonDocValuesField#newDistanceSort}, but is far more efficient since it 
takes advantage of
+   * properties the indexed BKD tree. Currently this only works with {@link 
Lucene90PointsFormat}

Review Comment:
   Removed in the latest revision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shahrs87 commented on a diff in pull request #897: LUCENE-10266 Move nearest-neighbor search on points to core

2022-05-23 Thread GitBox



shahrs87 commented on code in PR #897:
URL: https://github.com/apache/lucene/pull/897#discussion_r879640718


##
lucene/core/src/java/org/apache/lucene/search/NearestNeighbor.java:
##
@@ -31,12 +31,8 @@
 import org.apache.lucene.util.Bits;
 import org.apache.lucene.util.SloppyMath;
 
-/**
- * KNN search on top of 2D lat/lon indexed points.
- *
- * @lucene.experimental
- */
-class NearestNeighbor {
+/** KNN search on top of 2D lat/lon indexed points. */
+public class NearestNeighbor {

Review Comment:
   Yes. Changed in the latest revision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shahrs87 commented on pull request #897: LUCENE-10266 Move nearest-neighbor search on points to core

2022-05-23 Thread GitBox



shahrs87 commented on PR #897:
URL: https://github.com/apache/lucene/pull/897#issuecomment-1134875992

   @jpountz  Thank you for the feedback. I have addressed your comments in the 
latest revision. I have one question. To make changes in the CHANGES.txt file, 
will this change go in `API changes` section or `Other`. Please advise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541100#comment-17541100
 ] 

Dawid Weiss commented on LUCENE-10590:
--

Love the title, [~sokolov]. Very Douglas-y Adams-y.

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] Yuti-G commented on a diff in pull request #915: LUCENE-10585: Scrub copy/paste code in the facets module and attempt to simplify a bit

2022-05-23 Thread GitBox



Yuti-G commented on code in PR #915:
URL: https://github.com/apache/lucene/pull/915#discussion_r879786369


##
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.sortedset;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.PrimitiveIterator;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.Facets;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.FacetsConfig.DimConfig;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.facet.TopOrdAndIntQueue;
+import org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState.DimTree;
+import 
org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState.OrdRange;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+/** Base class for SSDV faceting implementations. */
+abstract class AbstractSortedSetDocValueFacetCounts extends Facets {
+
+  private static final Comparator FACET_RESULT_COMPARATOR =
+  new Comparator<>() {
+@Override
+public int compare(FacetResult a, FacetResult b) {
+  if (a.value.intValue() > b.value.intValue()) {
+return -1;
+  } else if (b.value.intValue() > a.value.intValue()) {
+return 1;
+  } else {
+return a.dim.compareTo(b.dim);
+  }
+}
+  };
+
+  final SortedSetDocValuesReaderState state;
+  final FacetsConfig stateConfig;
+  final SortedSetDocValues dv;
+  final String field;
+
+  AbstractSortedSetDocValueFacetCounts(SortedSetDocValuesReaderState state) 
throws IOException {
+this.state = state;
+this.field = state.getField();
+this.stateConfig = state.getFacetsConfig();
+this.dv = state.getDocValues();
+  }
+
+  @Override
+  public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
+validateTopN(topN);
+TopChildrenForPath topChildrenForPath = getTopChildrenForPath(topN, dim, 
path);
+return createFacetResult(topChildrenForPath, dim, path);
+  }
+
+  @Override
+  public Number getSpecificValue(String dim, String... path) throws 
IOException {
+if (path.length != 1) {
+  throw new IllegalArgumentException("path must be length=1");
+}
+int ord = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, 
path)));
+if (ord < 0) {
+  return -1;
+}
+
+return getCount(ord);
+  }
+
+  @Override
+  public List getAllDims(int topN) throws IOException {
+validateTopN(topN);
+List results = new ArrayList<>();
+for (String dim : state.getDims()) {
+  TopChildrenForPath topChildrenForPath = getTopChildrenForPath(topN, dim);
+  FacetResult facetResult = createFacetResult(topChildrenForPath, dim);
+  if (facetResult != null) {
+results.add(facetResult);
+  }
+}
+
+// Sort by highest count:
+results.sort(FACET_RESULT_COMPARATOR);
+return results;
+  }
+
+  @Override
+  public List getTopDims(int topNDims, int topNChildren) throws 
IOException {
+validateTopN(topNDims);
+validateTopN(topNChildren);
+
+// Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+// string values.
+PriorityQueue pq =
+new PriorityQueue<>(topNDims) {
+  @Override
+  protected boolean lessThan(DimValue a, DimValue b) {
+if (a.value > b.value) {
+  return false;
+} else if (a.value < b.value) {
+  return true;
+} else {
+  return a.dim.compareTo(b.dim) > 0;
+}
+  }
+};
+
+// Keep track of intermediate results, if we compute them, so we can reuse 
them later:
+Map intermediateResults = null;
+
+for (String dim : state.getDims()) {
+  DimConfig dimConfig = stateConfig.getDimConfig(dim);
+  int d

[GitHub] [lucene] Yuti-G commented on pull request #915: LUCENE-10585: Scrub copy/paste code in the facets module and attempt to simplify a bit

2022-05-23 Thread GitBox



Yuti-G commented on PR #915:
URL: https://github.com/apache/lucene/pull/915#issuecomment-1135050190

   Hi @gsmiller, thanks for making a lot of improvements to the code, and it 
looks great to me! I also ran the benchmarks for facet and do not observe much 
difference from the main branch. I added getTopDims to benchmarks but the PR 
hasn't merged yet, so the attached results are from my local. Thanks!
   
   Main:
   https://user-images.githubusercontent.com/4710/169889777-cf059966-a38d-49b8-8699-e8ff5172967c.png";>
   
   pr/915:
   https://user-images.githubusercontent.com/4710/169890719-ec235cb4-4f49-47f9-9ca0-a1f822073bf5.png";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-05-23 Thread GitBox



gsmiller commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r879866385


##
lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/HyperRectangleFacetCounts.java:
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.hyperrectangle;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.Facets;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.search.DocIdSetIterator;
+
+/** Get counts given a list of HyperRectangles (which must be of the same 
type) */
+public class HyperRectangleFacetCounts extends Facets {
+  /** Hypper rectangles passed to constructor. */
+  protected final HyperRectangle[] hyperRectangles;
+
+  /** Counts, initialized in subclass. */
+  protected final int[] counts;
+
+  /** Our field name. */
+  protected final String field;
+
+  /** Number of dimensions for field */
+  protected final int dims;
+
+  /** Total number of hits. */
+  protected int totCount;
+
+  /**
+   * Create HyperRectangleFacetCounts using this
+   *
+   * @param field Field name
+   * @param hits Hits to facet on
+   * @param hyperRectangles List of hyper rectangle facets
+   * @throws IOException If there is a problem reading the field
+   */
+  public HyperRectangleFacetCounts(
+  String field, FacetsCollector hits, HyperRectangle... hyperRectangles) 
throws IOException {
+assert hyperRectangles.length > 0 : "Hyper rectangle ranges cannot be 
empty";
+assert areHyperRectangleDimsConsistent(hyperRectangles)
+: "All hyper rectangles must be the same dimensionality";
+this.field = field;
+this.hyperRectangles = hyperRectangles;
+this.dims = hyperRectangles[0].dims;
+this.counts = new int[hyperRectangles.length];
+count(field, hits.getMatchingDocs());
+  }
+
+  private boolean areHyperRectangleDimsConsistent(HyperRectangle[] 
hyperRectangles) {
+int dims = hyperRectangles[0].dims;
+return Arrays.stream(hyperRectangles).allMatch(hyperRectangle -> 
hyperRectangle.dims == dims);
+  }
+
+  /** Counts from the provided field. */
+  private void count(String field, List 
matchingDocs)
+  throws IOException {
+
+for (int i = 0; i < matchingDocs.size(); i++) {
+
+  FacetsCollector.MatchingDocs hits = matchingDocs.get(i);
+
+  BinaryDocValues binaryDocValues = 
DocValues.getBinary(hits.context.reader(), field);
+
+  final DocIdSetIterator it = hits.bits.iterator();
+  if (it == null) {
+continue;
+  }

Review Comment:
   Yeah, this convenience is nice. It also might optimize a little internally 
by figuring out what to lead with, etc. for doing the conjunction. So 
definitely nice to use.



##
lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/HyperRectangleFacetCounts.java:
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.hyperrectangle;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.document.LongPoint;
+import org.apac

[GitHub] [lucene] gsmiller commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-05-23 Thread GitBox



gsmiller commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r879869751


##
lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/LongPointFacetField.java:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.hyperrectangle;
+
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.LongPoint;
+
+/** Packs an array of longs into a {@link BinaryDocValuesField} */
+public class LongPointFacetField extends BinaryDocValuesField {

Review Comment:
   Full transparency: Marc and I had a discussion about this offline so I 
wanted to circle back here with a suggestion I made to him so it's fully out in 
the open and we can carry a conversation forward with the community.
   
   While I initially suggested adding this as a sub-class of 
`BinaryRangeDocValuesField` (similar to what `LongRangeDocValuesField` does), I 
wonder if the right thing would be to actually formalize a new doc values 
format type. If we're building faceting, and potentially "slow range query" 
support on top of these, it seems like formalizing the format encoding might be 
the right thing to do. I'd be really curious what the community thinks of this 
though, and recommended that Marc start that discussion. I'm personally leaning 
towards formalizing the format, and maybe even having single-valued and 
multi-valued versions (analogous to `(Sorted)NumericDocValues`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-05-23 Thread GitBox



gsmiller commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r879870847


##
lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/HyperRectangle.java:
##
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.hyperrectangle;
+
+/** Holds the name and the number of dims for a HyperRectangle */
+public abstract class HyperRectangle {

Review Comment:
   Does `HyperRectangle` itself actually need to be part of the public API 
though? Users certainly need the definitions for `Long/DoubleHyperRectangle` 
but do they need the `HyperRectangle` definition itself? Like would they need a 
generic reference to `HyperRectangle`? I'm not sure?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

2022-05-23 Thread GitBox



gsmiller commented on code in PR #841:
URL: https://github.com/apache/lucene/pull/841#discussion_r879871471


##
lucene/facet/src/java/org/apache/lucene/facet/hyperrectangle/HyperRectangleFacetCounts.java:
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.hyperrectangle;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.document.LongPoint;
+import org.apache.lucene.facet.FacetResult;
+import org.apache.lucene.facet.Facets;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.LabelAndValue;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.search.DocIdSetIterator;
+
+/** Get counts given a list of HyperRectangles (which must be of the same 
type) */
+public class HyperRectangleFacetCounts extends Facets {
+  /** Hypper rectangles passed to constructor. */
+  protected final HyperRectangle[] hyperRectangles;
+
+  /** Counts, initialized in subclass. */
+  protected final int[] counts;
+
+  /** Our field name. */
+  protected final String field;
+
+  /** Number of dimensions for field */
+  protected final int dims;
+
+  /** Total number of hits. */
+  protected int totCount;

Review Comment:
   That makes sense. I think leaving it `private` until there's a need is good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541180#comment-17541180
 ] 

Michael Sokolov commented on LUCENE-10590:
--

> Love the title, Michael Sokolov. Very Douglas-y Adams-y.
 :starry eyes:

So I wrote a unit test, wrapped the `RandomVectorValues.vectorValue(int)` 
method to see where it was being called, and fiddled around with 
`BoundsChecker` to see what would happen if we swapped its `<` with a `<=`, and 
what I found is that in the existing situation, indeed, we crawl over the 
entire graph every time we insert a node, because every node looks like a 
viable candidate (we only exclude nodes whose scores are `<` the current least 
score (or `>` for the inverse scoring functions)). But ... if we change to 
using `<=` (resp. `>=`) then the cost shifts over to 
`HnswGraphBuilder.findWorstNonDiverse` since there we early terminate in the 
opposite way.

Anyway that isn't very clear but the point is that these boundary conditions 
are sensitive to this equality case (where everything is equally distant to 
everything else) and they explode in different directions! Basically what we 
need to do is bias them to give up when stuff is exactly ==. Possibly 
BoundsChecker should get a new parameter (open/closed) 

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on pull request #920: LUCENE-10589: increase upper bound of test range query to the maximum value + 1

2022-05-23 Thread GitBox



jtibshirani commented on PR #920:
URL: https://github.com/apache/lucene/pull/920#issuecomment-1135164651

   Thank you @mocobeta for looking into this! I don't think the failure is 
caused having multiple segments, since we make sure to force merge to one 
segment before starting the searches. Stepping through what happens, it looks 
like we just hit a really unlucky query + data combination where it takes more 
than 150 steps to conclude the search.
   
   Your proposed fix makes sense to me. Another option is to decrease `k` to 
make the search more restrictive (currently it's set to 5, I think 1 would work 
instead).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541185#comment-17541185
 ] 

Julie Tibshirani commented on LUCENE-10590:
---

I don't have a deep understanding of what's happening, but wanted to share this 
discussion from hnswlib: 
[https://github.com/nmslib/hnswlib/issues/263#issuecomment-739549454.] It looks 
like HNSW can really fall apart if there are a lot of duplicate vectors. The 
duplicates all link to each other, creating a highly disconnected graph. I've 
often seen libraries recommend that users deduplicate vectors before indexing 
them 
([https://github.com/facebookresearch/faiss/wiki/FAQ#searching-duplicate-vectors-is-slow).]
 I guess indexing all zero vectors is an extreme version of this!

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541193#comment-17541193
 ] 

Michael Sokolov commented on LUCENE-10590:
--

Thanks Julie, this is definitely the same problem. I fiddled around with bounds 
checking but it's not so obvious how to fix this. I wonder if we can impose a 
default visitedLimit to avoid this kind of runaway explosion. Duplicate 
detection sounds challenging

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541185#comment-17541185
 ] 

Julie Tibshirani edited comment on LUCENE-10590 at 5/23/22 10:26 PM:
-

I don't have a deep understanding of what's happening, but wanted to share this 
discussion from hnswlib: 
[https://github.com/nmslib/hnswlib/issues/263#issuecomment-739549454]. It looks 
like HNSW can really fall apart if there are a lot of duplicate vectors. The 
duplicates all link to each other, creating a highly disconnected graph. I've 
often seen libraries recommend that users deduplicate vectors before indexing 
them 
([https://github.com/facebookresearch/faiss/wiki/FAQ#searching-duplicate-vectors-is-slow]).
 I guess indexing all zero vectors is an extreme version of this!


was (Author: julietibs):
I don't have a deep understanding of what's happening, but wanted to share this 
discussion from hnswlib: 
[https://github.com/nmslib/hnswlib/issues/263#issuecomment-739549454]. It looks 
like HNSW can really fall apart if there are a lot of duplicate vectors. The 
duplicates all link to each other, creating a highly disconnected graph. I've 
often seen libraries recommend that users deduplicate vectors before indexing 
them 
([https://github.com/facebookresearch/faiss/wiki/FAQ#searching-duplicate-vectors-is-slow).]
 I guess indexing all zero vectors is an extreme version of this!

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541185#comment-17541185
 ] 

Julie Tibshirani edited comment on LUCENE-10590 at 5/23/22 10:26 PM:
-

I don't have a deep understanding of what's happening, but wanted to share this 
discussion from hnswlib: 
[https://github.com/nmslib/hnswlib/issues/263#issuecomment-739549454]. It looks 
like HNSW can really fall apart if there are a lot of duplicate vectors. The 
duplicates all link to each other, creating a highly disconnected graph. I've 
often seen libraries recommend that users deduplicate vectors before indexing 
them 
([https://github.com/facebookresearch/faiss/wiki/FAQ#searching-duplicate-vectors-is-slow).]
 I guess indexing all zero vectors is an extreme version of this!


was (Author: julietibs):
I don't have a deep understanding of what's happening, but wanted to share this 
discussion from hnswlib: 
[https://github.com/nmslib/hnswlib/issues/263#issuecomment-739549454.] It looks 
like HNSW can really fall apart if there are a lot of duplicate vectors. The 
duplicates all link to each other, creating a highly disconnected graph. I've 
often seen libraries recommend that users deduplicate vectors before indexing 
them 
([https://github.com/facebookresearch/faiss/wiki/FAQ#searching-duplicate-vectors-is-slow).]
 I guess indexing all zero vectors is an extreme version of this!

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a diff in pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



mocobeta commented on code in PR #916:
URL: https://github.com/apache/lucene/pull/916#discussion_r879964992


##
CONTRIBUTING.md:
##
@@ -78,8 +78,11 @@ Please be patient. Committers are busy people too. If no one 
responds to your pa
 
 Please refer to [GitHub's 
documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests)
 for an explanation of how to create a pull request.
 
+You should open a pull request against the `main` branch. It is also 
recommended to give Lucene maintainers 
[access](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to your PR branch.

Review Comment:
   ```suggestion
   You should open a pull request against the `main` branch. 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on a diff in pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



mocobeta commented on code in PR #916:
URL: https://github.com/apache/lucene/pull/916#discussion_r879967339


##
CONTRIBUTING.md:
##
@@ -78,8 +78,11 @@ Please be patient. Committers are busy people too. If no one 
responds to your pa
 
 Please refer to [GitHub's 
documentation](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests)
 for an explanation of how to create a pull request.
 
+You should open a pull request against the `main` branch. 

Review Comment:
   ```suggestion
   You should open a pull request against the `main` branch. Committers will 
backport it to the maintenance branches once the change is merged into `main` 
(as far as it is possible).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Lu Xugang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang updated LUCENE-10590:
---
Attachment: image.png

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: image.png
>
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10590) Indexing all zero vectors leads to heat death of the universe

2022-05-23 Thread Lu Xugang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang updated LUCENE-10590:
---
Attachment: (was: image.png)

> Indexing all zero vectors leads to heat death of the universe
> -
>
> Key: LUCENE-10590
> URL: https://issues.apache.org/jira/browse/LUCENE-10590
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael Sokolov
>Priority: Major
>
> By accident while testing something else, I ran a luceneutil test indexing 1M 
> 100d vectors where all the vectors were all zeroes. This caused indexing to 
> take a very long time (~40x normal - it did eventually complete) and the 
> search performance was similarly bad.  We should not degrade by orders of 
> magnitude with even the worst data though.
> I'm not entirely sure what the issue is, but perhaps as long as we keep 
> finding hits that are "better" we keep exploring the graph, where better 
> means (score, -docid) >= (lowest score, -docid). If that's right and all docs 
> have the same score, then we probably need to either switch to > (but this 
> could lead to poorer recall in normal cases) or introduce some kind of 
> minimum score threshold?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10559) Add preFilter/postFilter options to KnnGraphTester

2022-05-23 Thread Kaival Parikh (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541282#comment-17541282
 ] 

Kaival Parikh commented on LUCENE-10559:


The graph construction parameters were:

docs = path_to_vec_file, ndoc = 100, dim = 100, fanout = 0, maxConn = 150, 
beamWidthIndex = 300

All these were the same for search time, with additional:

search = path_to_query_file, niter = 1000, selectivity = (as required, 0.01 ~ 
0.8), prefilter (as required)

 

Also you were right about the search vectors, there was an overlap with the 
training set. I created a fresh query file excluding trained vectors and re-ran 
the utility:
||selectivity||effective topK||post-filter recall||post-filter time||pre-filter 
recall||pre-filter time||
|0.8|125|0.965|1.57|0.976|1.61|
|0.6|166|0.959|2.07|0.981|2.00|
|0.4|250|0.962|2.71|0.986|2.65|
|0.2|500|0.958|4.80|0.992|4.51|
|0.1|1000|0.954|8.61|0.994|7.74|
|0.01|1|0.971|58.78|1.000|9.44|

The recall and time seem to be in the same range as before. The high recall for 
selective queries (selectivity = 0.01, prefilter, recall = 1.000) may be due to 
performing an exact search when the nodes visited limit is reached

> Add preFilter/postFilter options to KnnGraphTester
> --
>
> Key: LUCENE-10559
> URL: https://issues.apache.org/jira/browse/LUCENE-10559
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>
> We want to be able to test the efficacy of pre-filtering in KnnVectorQuery: 
> if you (say) want the top K nearest neighbors subject to a constraint Q, are 
> you better off over-selecting (say 2K) top hits and *then* filtering 
> (post-filtering), or incorporating the filtering into the query 
> (pre-filtering). How does it depend on the selectivity of the filter?
> I think we can get a reasonable testbed by generating a uniform random filter 
> with some selectivity (that is consistent and repeatable). Possibly we'd also 
> want to try filters that are correlated with index order, but it seems they'd 
> be unlikely to be correlated with vector values in a way that the graph 
> structure would notice, so random is a pretty good starting point for this.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



mocobeta commented on PR #916:
URL: https://github.com/apache/lucene/pull/916#issuecomment-1135442476

   Thank you @rmuir for reviewing.
   
   I'd merge this - let's restart from a blank sheet (all necessary information 
should be written in the contribution guide), but if anyone has suggestions to 
make the pull request template helpful for committers/contributors, please make 
another issue/PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta merged pull request #916: Refine contribution guide and pull request template

2022-05-23 Thread GitBox



mocobeta merged PR #916:
URL: https://github.com/apache/lucene/pull/916


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta merged pull request #918: LUCENE-10586: Minor cleanup for local variables in BlockTreeTermsReader

2022-05-23 Thread GitBox



mocobeta merged PR #918:
URL: https://github.com/apache/lucene/pull/918


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10586) Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, indexMetaIn, termsMetaIn

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541290#comment-17541290
 ] 

ASF subversion and git services commented on LUCENE-10586:
--

Commit f5c1f11a2afeb685d919a47904d525f076c90fda in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f5c1f11a2af ]

LUCENE-10586: Minor cleanup local variables in BlockTreeTermsReader (#918)



> Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, 
> indexMetaIn, termsMetaIn
> ---
>
> Key: LUCENE-10586
> URL: https://issues.apache.org/jira/browse/LUCENE-10586
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Those three local variables refer to the same {{IndexInput}} object (no 
> clone() is called).
> {code}
> indexMetaIn = termsMetaIn = metaIn;
> {code}
> I'm not sure but maybe there are some historical reasons. I wonder if it 
> would be better to have only one reference for the underlying {{IndexInput}} 
> object to make it a little easy to follow the code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10586) Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, indexMetaIn, termsMetaIn

2022-05-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541292#comment-17541292
 ] 

ASF subversion and git services commented on LUCENE-10586:
--

Commit 2cd9eb13262a0598c6f3c0409103121e72256772 in lucene's branch 
refs/heads/branch_9x from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=2cd9eb13262 ]

LUCENE-10586: Minor cleanup local variables in BlockTreeTermsReader (#918)



> Minor refactoring in Lucene90BlockTreeTermsReader local variables: metaIn, 
> indexMetaIn, termsMetaIn
> ---
>
> Key: LUCENE-10586
> URL: https://issues.apache.org/jira/browse/LUCENE-10586
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Those three local variables refer to the same {{IndexInput}} object (no 
> clone() is called).
> {code}
> indexMetaIn = termsMetaIn = metaIn;
> {code}
> I'm not sure but maybe there are some historical reasons. I wonder if it 
> would be better to have only one reference for the underlying {{IndexInput}} 
> object to make it a little easy to follow the code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

62 matches

Mail list logo