[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-04-05 Thread GitBox


wjp719 commented on code in PR #780:
URL: https://github.com/apache/lucene/pull/780#discussion_r842427168


##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -94,6 +95,7 @@ public void disableSkipping() {
 private long iteratorCost;
 private int maxDocVisited = -1;
 private int updateCounter = 0;
+private boolean disableSegmentInternalSkip = false;

Review Comment:
   Ok, we should keep skipping, it help more than the skipping overhead



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-04-05 Thread GitBox


jpountz commented on code in PR #780:
URL: https://github.com/apache/lucene/pull/780#discussion_r842434529


##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -94,6 +95,7 @@ public void disableSkipping() {
 private long iteratorCost;
 private int maxDocVisited = -1;
 private int updateCounter = 0;
+private boolean disableSegmentInternalSkip = false;

Review Comment:
   OK, then my suggestion would be to close this PR if that works for you too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 closed pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-04-05 Thread GitBox


wjp719 closed pull request #780: LUCENE-10496: avoid unnecessary attempts to 
evaluate skipping doc if index sort and search sort are in opposite direction
URL: https://github.com/apache/lucene/pull/780


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 commented on a diff in pull request #780: LUCENE-10496: avoid unnecessary attempts to evaluate skipping doc if index sort and search sort are in opposite direction

2022-04-05 Thread GitBox


wjp719 commented on code in PR #780:
URL: https://github.com/apache/lucene/pull/780#discussion_r842438315


##
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##
@@ -94,6 +95,7 @@ public void disableSkipping() {
 private long iteratorCost;
 private int maxDocVisited = -1;
 private int updateCounter = 0;
+private boolean disableSegmentInternalSkip = false;

Review Comment:
   I will close this pr. thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #731: LUCENE-10456: Implement Weight#count for MultiRangeQuery

2022-04-05 Thread GitBox


jpountz merged PR #731:
URL: https://github.com/apache/lucene/pull/731


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10456) Implement Weight#count for MultiRangeQuery

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517272#comment-17517272
 ] 

ASF subversion and git services commented on LUCENE-10456:
--

Commit 898ec1659d07d4ea953b8eaf6332d1a036b151e4 in lucene's branch 
refs/heads/main from xiaoping
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=898ec1659d0 ]

LUCENE-10456: Implement Weight#count for MultiRangeQuery (#731)



> Implement Weight#count for MultiRangeQuery
> --
>
> Key: LUCENE-10456
> URL: https://issues.apache.org/jira/browse/LUCENE-10456
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> for one dimension MultiRangeQuery, we can firstly   merge overlapping ranges 
> to have some unconnected ranges, then the doc count of this multiRangeQuery 
> is the sum of each doc count of these unconnected range. For each unconnected 
> range, we can take advantage of PointRangeQuery count() method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10456) Implement Weight#count for MultiRangeQuery

2022-04-05 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10456.
---
Fix Version/s: 9.2
   Resolution: Fixed

> Implement Weight#count for MultiRangeQuery
> --
>
> Key: LUCENE-10456
> URL: https://issues.apache.org/jira/browse/LUCENE-10456
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> for one dimension MultiRangeQuery, we can firstly   merge overlapping ranges 
> to have some unconnected ranges, then the doc count of this multiRangeQuery 
> is the sum of each doc count of these unconnected range. For each unconnected 
> range, we can take advantage of PointRangeQuery count() method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] pquentin opened a new pull request, #784: LUCENE-10085: Fix flaky test by always deleting docs

2022-04-05 Thread GitBox


pquentin opened a new pull request, #784:
URL: https://github.com/apache/lucene/pull/784

   As noticed by @jtibshirani in 
https://issues.apache.org/jira/browse/LUCENE-10085?focusedCommentId=17516122&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17516122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on a diff in pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery

2022-04-05 Thread GitBox


zacharymorn commented on code in PR #767:
URL: https://github.com/apache/lucene/pull/767#discussion_r842463582


##
lucene/core/src/test/org/apache/lucene/search/TestFieldExistsQuery.java:
##
@@ -65,20 +65,21 @@ public void testDocValuesRewriteWithTermsPresent() throws 
IOException {
 dir.close();
   }
 
-  public void testDocValuesRewriteWithPointValuesPresent() throws IOException {
+  public void testDocValuesRewriteWithDocValuesPresent() throws IOException {
 Directory dir = newDirectory();
 RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
 final int numDocs = atLeast(100);
 for (int i = 0; i < numDocs; ++i) {
   Document doc = new Document();
-  doc.add(new BinaryPoint("dim", new byte[4], new byte[4]));
+  doc.add(new DoubleDocValuesField("f", 2.0));
+  doc.add(new StringField("f", random().nextBoolean() ? "yes" : "no", 
Store.NO));

Review Comment:
   Updated in 
https://github.com/apache/lucene/pull/767/commits/61fedf349fd56d0a49cbb076bdbc7f13919d30f1
 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery

2022-04-05 Thread GitBox


zacharymorn commented on PR #767:
URL: https://github.com/apache/lucene/pull/767#issuecomment-1088372398

   > Thank you!
   
   No problem, thanks @jpountz for all the reviews and suggestions as well! 
I'll open a follow-up PR after merging this as you suggested to remove the 
deprecated classes for `main`.  
   
   @mikemccand, please let me know if you have further feedback for this PR as 
well. I plan to merge this in the new few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10456) Implement Weight#count for MultiRangeQuery

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517290#comment-17517290
 ] 

ASF subversion and git services commented on LUCENE-10456:
--

Commit c4da9adcfeb35426d056d1b2df20d42e6d7fa127 in lucene's branch 
refs/heads/branch_9x from xiaoping
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c4da9adcfeb ]

LUCENE-10456: Implement Weight#count for MultiRangeQuery (#731)


> Implement Weight#count for MultiRangeQuery
> --
>
> Key: LUCENE-10456
> URL: https://issues.apache.org/jira/browse/LUCENE-10456
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
> Fix For: 9.2
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> for one dimension MultiRangeQuery, we can firstly   merge overlapping ranges 
> to have some unconnected ranges, then the doc count of this multiRangeQuery 
> is the sum of each doc count of these unconnected range. For each unconnected 
> range, we can take advantage of PointRangeQuery count() method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #761: LUCENE-10002: replace TopFieldCollector usages in tests with collector manager

2022-04-05 Thread GitBox


jpountz merged PR #761:
URL: https://github.com/apache/lucene/pull/761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #785: LUCENE-10002: move MemoryIndex to search(Query, CollectorManager)

2022-04-05 Thread GitBox


javanna opened a new pull request, #785:
URL: https://github.com/apache/lucene/pull/785

   MemoryIndex exposes a search method that returns the score of the matching 
doc if it does match, otherwise 0.0f. It internally uses a custom collector and 
calls IndexSearcher#search(Query, Collector). We can easily replace that with a 
collector manager, that does not need to handle concurrency as the memory index 
will only ever hold a single document.
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-04-05 Thread Balmukund Mandal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517325#comment-17517325
 ] 

Balmukund Mandal commented on LUCENE-9625:
--

[~sokolov] 

As you've mentioned that jars should be copied but not very clear that which 
all Lucene jars we've to copy?

> Benchmark KNN search with ann-benchmarks
> 
>
> Key: LUCENE-9625
> URL: https://issues.apache.org/jira/browse/LUCENE-9625
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors

2022-04-05 Thread GitBox


LuXugang commented on PR #649:
URL: https://github.com/apache/lucene/pull/649#issuecomment-1088480302

   > I wonder if it would be a better trade-off to keep ints uncompressed, but 
read them from disk directly instead of loading giant arrays in memory? Or 
possibly switch to something like DirectMonotonicReader if it doesn't slow down 
searches.
   
   @jpountz, could we use IndexedDISI to store docIds and  
DirectMonotonicWriter to store `ordToDoc` mapping like doc values did.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10499) eliminate unnecessary copy data overhead when grow array size

2022-04-05 Thread jianping weng (Jira)
jianping weng created LUCENE-10499:
--

 Summary: eliminate unnecessary copy data overhead when grow array 
size
 Key: LUCENE-10499
 URL: https://issues.apache.org/jira/browse/LUCENE-10499
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: jianping weng


ArrayUtil#grow will copy origin array data to the new array after growing size 
in default, but in many places, the new array needn't the origin data. we 
should eliminate the extra copy data overhead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 opened a new pull request, #786: LUCENE-10499: eliminate unnecessary copy data overhead when growing array size

2022-04-05 Thread GitBox


wjp719 opened a new pull request, #786:
URL: https://github.com/apache/lucene/pull/786

   ArrayUtil#grow will copy origin array data to the new array after growing 
size in default, but in many places, the new array needn't the origin data. 
This PR eliminates the extra copy data overhead.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10499) reduce unnecessary copy data overhead when grow array size

2022-04-05 Thread jianping weng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10499:
---
Summary: reduce unnecessary copy data overhead when grow array size  (was: 
eliminate unnecessary copy data overhead when grow array size)

> reduce unnecessary copy data overhead when grow array size
> --
>
> Key: LUCENE-10499
> URL: https://issues.apache.org/jira/browse/LUCENE-10499
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ArrayUtil#grow will copy origin array data to the new array after growing 
> size in default, but in many places, the new array needn't the origin data. 
> we should eliminate the extra copy data overhead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10499) reduce unnecessary copy data overhead when grow array size

2022-04-05 Thread jianping weng (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10499:
---
Description: ArrayUtil#grow will copy origin array data to the new array 
after growing size in default, but in many places, the new array needn't the 
origin data. we should reduce the extra copy data overhead  (was: 
ArrayUtil#grow will copy origin array data to the new array after growing size 
in default, but in many places, the new array needn't the origin data. we 
should eliminate the extra copy data overhead)

> reduce unnecessary copy data overhead when grow array size
> --
>
> Key: LUCENE-10499
> URL: https://issues.apache.org/jira/browse/LUCENE-10499
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ArrayUtil#grow will copy origin array data to the new array after growing 
> size in default, but in many places, the new array needn't the origin data. 
> we should reduce the extra copy data overhead



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #786: LUCENE-10499: reduce unnecessary copy data overhead when growing array size

2022-04-05 Thread GitBox


rmuir commented on PR #786:
URL: https://github.com/apache/lucene/pull/786#issuecomment-1088581156

   Can we instead add a new method to `ArrayUtil`, with a different name than 
`grow`, that doesn't copy data? 
   
   I don't think we should add a boolean parameter to grow that gives it 
radically different behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #783: LUCENE-10497: add a base Token class to analysis-common

2022-04-05 Thread GitBox


mocobeta commented on PR #783:
URL: https://github.com/apache/lucene/pull/783#issuecomment-1088582275

   I checked that this preserves the current behavior. I'd merge this (only in 
main) now and will refine it in the following PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta merged pull request #783: LUCENE-10497: add a base Token class to analysis-common

2022-04-05 Thread GitBox


mocobeta merged PR #783:
URL: https://github.com/apache/lucene/pull/783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10497) Unify "Token" interface in Kuromoji and Nori

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517386#comment-17517386
 ] 

ASF subversion and git services commented on LUCENE-10497:
--

Commit bb4a0dc19ba911ac488e3ea7d6fcb3b17a38832f in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=bb4a0dc19ba ]

LUCENE-10497: Add a base Token class to analysis-common (for kuromoji and nori) 
(#783)



> Unify "Token" interface in Kuromoji and Nori
> 
>
> Key: LUCENE-10497
> URL: https://issues.apache.org/jira/browse/LUCENE-10497
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Tomoko Uchida
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "Token" classes are the output of the viterbi algorithm in kuromoji and nori, 
> and its implementation is slightly different between them (Nori's Token class 
> looks more expressive or extendable).
> To proceed [LUCENE-10493], I think we need the unified "Token" class or at 
> least a common interface for it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] wjp719 commented on pull request #786: LUCENE-10499: reduce unnecessary copy data overhead when growing array size

2022-04-05 Thread GitBox


wjp719 commented on PR #786:
URL: https://github.com/apache/lucene/pull/786#issuecomment-1088618967

   > Can we instead add a new method to `ArrayUtil`, with a different name than 
`grow`, that doesn't copy data?
   > 
   > I don't think we should add a boolean parameter to grow that gives it 
radically different behavior.
   
   @rmuir Hi, I add ArrayUtil#growSizeOnly method that only grow size and not 
copy array data, please review again, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors

2022-04-05 Thread GitBox


jpountz commented on PR #649:
URL: https://github.com/apache/lucene/pull/649#issuecomment-1088641170

   Yes, something like that sounds like a good fit to store the ordToDoc 
mapping indeed :+1:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7522) Make the Lucene jar an OSGi bundle

2022-04-05 Thread Alexander Kurtakov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517425#comment-17517425
 ] 

Alexander Kurtakov commented on LUCENE-7522:


Is there hope for this one?

> Make the Lucene jar an OSGi bundle
> --
>
> Key: LUCENE-7522
> URL: https://issues.apache.org/jira/browse/LUCENE-7522
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/build
>Affects Versions: 6.2.1
>Reporter: Michal Hlavac
>Priority: Major
>
> Add support for OSGi. LUCENE-1344 added this feature to previous versions, 
> but now lucene jars are not OSGi bundles. There are OSGi bundles from 
> Servicemix, but I think lucene should add this feature.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests

2022-04-05 Thread GitBox


javanna opened a new pull request, #787:
URL: https://github.com/apache/lucene/pull/787

   This commit replaces more usages of search(Query, Collector) with calling 
the corresponding search(Query, CollectorManager) instead. This round focuses 
on tests that implement custom collector, that need a corresponding collector 
manager.
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests

2022-04-05 Thread GitBox


javanna commented on code in PR #787:
URL: https://github.com/apache/lucene/pull/787#discussion_r842783772


##
lucene/core/src/test/org/apache/lucene/search/TestMultiCollector.java:
##
@@ -99,7 +100,7 @@ public void testCollectionTerminatedExceptionHandling() 
throws IOException {
   }
   final IndexReader reader = w.getReader();
   w.close();
-  final IndexSearcher searcher = newSearcher(reader);
+  final IndexSearcher searcher = newSearcher(reader, true, true, false);

Review Comment:
   In this case, it did not feel necessary to write a collector manager able to 
handle concurrency. I went for executing the test from a single threaded 
searcher instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests

2022-04-05 Thread GitBox


javanna commented on code in PR #787:
URL: https://github.com/apache/lucene/pull/787#discussion_r842785447


##
lucene/core/src/test/org/apache/lucene/search/TestTotalHitCountCollector.java:
##
@@ -38,10 +38,10 @@ public void testBasics() throws Exception {
 IndexReader reader = writer.getReader();
 writer.close();
 
-IndexSearcher searcher = newSearcher(reader);
-TotalHitCountCollector c = new TotalHitCountCollector();
-searcher.search(new MatchAllDocsQuery(), c);
-assertEquals(5, c.getTotalHits());
+IndexSearcher searcher = newSearcher(reader, true, true, 
random().nextBoolean());
+TotalHitCountCollectorManager collectorManager = new 
TotalHitCountCollectorManager();
+int totalHits = searcher.search(new MatchAllDocsQuery(), collectorManager);
+assertEquals(5, totalHits);

Review Comment:
   here, instead of having a separate test class for the collector manager, I 
merged the two and used a collector manager in the test class for the collector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #787: LUCENE-10002: replace more usages of search(Query, Collector) in tests

2022-04-05 Thread GitBox


javanna commented on code in PR #787:
URL: https://github.com/apache/lucene/pull/787#discussion_r842786871


##
lucene/queryparser/src/test/org/apache/lucene/queryparser/surround/query/BooleanQueryTestFacade.java:
##
@@ -121,14 +125,24 @@ public void doTest() throws Exception {
 Query query = lq.makeLuceneQueryField(fieldName, qf);
 /* if (verbose) System.out.println("Lucene: " + query.toString()); */
 
-TestCollector tc = new TestCollector();
-IndexReader reader = DirectoryReader.open(dBase.getDb());
-IndexSearcher searcher = new IndexSearcher(reader);
-try {
-  searcher.search(query, tc);
-} finally {
-  reader.close();
+SetOnce collector = new SetOnce<>();
+try (IndexReader reader = DirectoryReader.open(dBase.getDb())) {
+  IndexSearcher searcher = new IndexSearcher(reader);

Review Comment:
   no executor is set to the searcher, hence it does not seem worthwhile to 
write a collector manager that handles concurrency. I am even wondering if 
enforcing  single thread through the setonce is worthwhile



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-04-05 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517471#comment-17517471
 ] 

Michael Sokolov commented on LUCENE-9625:
-

> As you've mentioned that jars should be copied but not very clear that which 
> all Lucene jars we've to copy?

I think lucene-core should be enough. What have you tried?

> Benchmark KNN search with ann-benchmarks
> 
>
> Key: LUCENE-9625
> URL: https://issues.apache.org/jira/browse/LUCENE-9625
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-04-05 Thread Balmukund Mandal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517488#comment-17517488
 ] 

Balmukund Mandal commented on LUCENE-9625:
--

Thank you Michael for your response. Even, i tried the same but getting below 
error. Also, i copied jar and KnnGraphTester.class files into lib and classes 
folder which are created inside algorithms folder.

Error: Unable to initialize main class 
org.apache.lucene.util.hnsw.KnnGraphTester
Caused by: java.lang.NoClassDefFoundError: 
org/apache/lucene/util/hnsw/KnnGraphTester$1

> Benchmark KNN search with ann-benchmarks
> 
>
> Key: LUCENE-9625
> URL: https://issues.apache.org/jira/browse/LUCENE-9625
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-04-05 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517495#comment-17517495
 ] 

Michael Sokolov commented on LUCENE-9625:
-

So, clearly it's not finding KnnGraphTester. Probably it needs to go in the 
proper folder structure (org/apache/lucene/util/KnnGraphTester.class) relative 
to the classpath root. TBH I don't remember the way files are set up in this 
ann-benchmarks environment, but I hope that helps you figure it out!

> Benchmark KNN search with ann-benchmarks
> 
>
> Key: LUCENE-9625
> URL: https://issues.apache.org/jira/browse/LUCENE-9625
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #756: LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges

2022-04-05 Thread GitBox


iverase commented on PR #756:
URL: https://github.com/apache/lucene/pull/756#issuecomment-1088873741

   Thanks @yinux! your dataset is very interesting because it seems your 
geometries have many parallel edges which are challenging the tessellator.
   
   In order to fix that I went back to the PR that changed the behaviour of the 
tessellator. In that PR we added the capability of labelling the edges of the 
triangles to know if they belong to the original polygon or they have been 
introduced by the tessellation process.  One side effect is that the method 
`filterPoints` which removes collinear points is more picky now because it can 
only remove collinear points when they have the same label (belonging to the 
original polygon or not).
   
   In this case the issue was a collinear edge where one of the edges was 
returning over the previous edge and had a point on top of that edge. We can 
actually remove that collinearity as well so we end up with only one edge. 
   
   I added that change in line 1306 and now all test are passing. I run this 
version over my dataset (over 400 million polygons) and all passed too. 
   
   Let me know if you find more issues.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller merged pull request #751: LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0

2022-04-05 Thread GitBox


gsmiller merged PR #751:
URL: https://github.com/apache/lucene/pull/751


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517542#comment-17517542
 ] 

ASF subversion and git services commented on LUCENE-10467:
--

Commit 6b82e600a8f8e2dcfd182bcaf58a3434e9a05b3f in lucene's branch 
refs/heads/main from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6b82e600a8f ]

LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren 
if topN <= 0 (#751)



> Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
> --
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, there are different behaviors from subclass that implements  and 
> overrides getAllDims and getTopChildren when passing in an invalid TopN 
> parameter (topN <= 0). Some overridden implementations throw a 
> NullPointerException, some throw an IllegalArgumentException, and others 
> throw no exception.
> It would provide a better user experience by consistently throwing an 
> IllegalArgumentException when requesting topN <= 0 for these two 
> functionalities across all implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517545#comment-17517545
 ] 

ASF subversion and git services commented on LUCENE-10467:
--

Commit a071180a806d1bb7ae11ae30a07e43e452bea810 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a071180a806 ]

Add CHANGES entry for LUCENE-10467


> Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
> --
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, there are different behaviors from subclass that implements  and 
> overrides getAllDims and getTopChildren when passing in an invalid TopN 
> parameter (topN <= 0). Some overridden implementations throw a 
> NullPointerException, some throw an IllegalArgumentException, and others 
> throw no exception.
> It would provide a better user experience by consistently throwing an 
> IllegalArgumentException when requesting topN <= 0 for these two 
> functionalities across all implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517557#comment-17517557
 ] 

ASF subversion and git services commented on LUCENE-10467:
--

Commit d1bd59f6bc6effe04f802e0d9a8f88fc7920f2cd in lucene's branch 
refs/heads/branch_9x from Yuting Gan
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d1bd59f6bc6 ]

LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren 
if topN <= 0


> Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
> --
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, there are different behaviors from subclass that implements  and 
> overrides getAllDims and getTopChildren when passing in an invalid TopN 
> parameter (topN <= 0). Some overridden implementations throw a 
> NullPointerException, some throw an IllegalArgumentException, and others 
> throw no exception.
> It would provide a better user experience by consistently throwing an 
> IllegalArgumentException when requesting topN <= 0 for these two 
> functionalities across all implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10467) Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0

2022-04-05 Thread Greg Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-10467.
--
Fix Version/s: 9.2
   Resolution: Fixed

Merged and backported. Thanks [~yutinggan]!

> Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0
> --
>
> Key: LUCENE-10467
> URL: https://issues.apache.org/jira/browse/LUCENE-10467
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Yuting Gan
>Priority: Minor
> Fix For: 9.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, there are different behaviors from subclass that implements  and 
> overrides getAllDims and getTopChildren when passing in an invalid TopN 
> parameter (topN <= 0). Some overridden implementations throw a 
> NullPointerException, some throw an IllegalArgumentException, and others 
> throw no exception.
> It would provide a better user experience by consistently throwing an 
> IllegalArgumentException when requesting topN <= 0 for these two 
> functionalities across all implementations.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on a diff in pull request #778: LUCENE-10495: Fix bug in TaxonomyFacets

2022-04-05 Thread GitBox


gsmiller commented on code in PR #778:
URL: https://github.com/apache/lucene/pull/778#discussion_r843051837


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacets.java:
##
@@ -109,7 +109,7 @@ public boolean childrenLoaded() {
* @lucene.experimental
*/
   public boolean siblingsLoaded() {
-return children != null;
+return siblings != null;

Review Comment:
   Oh, that does look like a silly bug. Can you create a unit test that 
actually demonstrates the bug and then validates the fix? This looks super 
obvious but it's a nice opportunity to make our testing a little better to 
avoid a future regression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani merged pull request #784: LUCENE-10085: Fix flaky test by always deleting docs

2022-04-05 Thread GitBox


jtibshirani merged PR #784:
URL: https://github.com/apache/lucene/pull/784


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517595#comment-17517595
 ] 

ASF subversion and git services commented on LUCENE-10085:
--

Commit 6062ba0b3b80085f07fc66b32a3f32dc32a78acd in lucene's branch 
refs/heads/main from Quentin Pradet
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6062ba0b3b8 ]

LUCENE-10085: Fix rare failure in TestDocValuesFieldExistsQuery (#784)

In rare cases, this test could delete all documents and cause a failure.

> Implement Weight#count on DocValuesFieldExistsQuery
> ---
>
> Key: LUCENE-10085
> URL: https://issues.apache.org/jira/browse/LUCENE-10085
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could implement {{Weight#count}} to return docCount if either terms or points 
> are indexed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10085) Implement Weight#count on DocValuesFieldExistsQuery

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517607#comment-17517607
 ] 

ASF subversion and git services commented on LUCENE-10085:
--

Commit be1c4850441f7cff07f5ce20c9ed4ad203cabb21 in lucene's branch 
refs/heads/branch_9x from Quentin Pradet
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=be1c4850441 ]

LUCENE-10085: Fix rare failure in TestDocValuesFieldExistsQuery (#784)

In rare cases, this test could delete all documents and cause a failure.

> Implement Weight#count on DocValuesFieldExistsQuery
> ---
>
> Key: LUCENE-10085
> URL: https://issues.apache.org/jira/browse/LUCENE-10085
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could implement {{Weight#count}} to return docCount if either terms or points 
> are indexed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10500) StringValueFacetCounts relies on sequential collection

2022-04-05 Thread Luca Cavanna (Jira)
Luca Cavanna created LUCENE-10500:
-

 Summary: StringValueFacetCounts relies on sequential collection
 Key: LUCENE-10500
 URL: https://issues.apache.org/jira/browse/LUCENE-10500
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Luca Cavanna


We recently moved some of the facets tests to use IndexSearcher#search(Query, 
CollectorManager) providing a FacetsCollectorManager instead of a 
FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, 
concurrent search may now be exercised while it was not before.

This caused some build failures on TestStringValueFacetCounts:

{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at 
__randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0)
at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81)
at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346)
at 
org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440)
at 
org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295)
at 
org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123)
at 
org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349)
at 
org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325)
{code}

This looks like a real bug, as StringValueFacetCounts#countOneSegment is called 
once providing the index of the current loop instead of the ordinal taken from 
the matching hits that we are analyzing. That works fine with single threaded 
collection as we will go sequentially and the two indices will always be the 
same. With multi-threaded search, the order of the returned matching hits (one 
per segment) is not deterministic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection

2022-04-05 Thread GitBox


javanna opened a new pull request, #788:
URL: https://github.com/apache/lucene/pull/788

   StringValueFacetCounts should use the segment ordinal instead of the current 
index when looping through the matching hits, as when search is multi-threaded 
the order of the matching hits (one per segment) is not deterministic.
   
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection

2022-04-05 Thread GitBox


jpountz merged PR #788:
URL: https://github.com/apache/lucene/pull/788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10500) StringValueFacetCounts relies on sequential collection

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517693#comment-17517693
 ] 

ASF subversion and git services commented on LUCENE-10500:
--

Commit 796a19b457691249b0f3c2fc837f82a378d1d2c6 in lucene's branch 
refs/heads/main from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=796a19b4576 ]

LUCENE-10500: StringValueFacetCounts to not rely on sequential collection (#788)

StringValueFacetCounts should use the segment ordinal instead of the current 
index when looping through the matching hits, as when search is multi-threaded 
the order of the matching hits (one per segment) is not deterministic.

> StringValueFacetCounts relies on sequential collection
> --
>
> Key: LUCENE-10500
> URL: https://issues.apache.org/jira/browse/LUCENE-10500
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Luca Cavanna
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We recently moved some of the facets tests to use IndexSearcher#search(Query, 
> CollectorManager) providing a FacetsCollectorManager instead of a 
> FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, 
> concurrent search may now be exercised while it was not before.
> This caused some build failures on TestStringValueFacetCounts:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>   at 
> __randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0)
>   at 
> org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81)
>   at 
> org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123)
>   at 
> org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349)
>   at 
> org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325)
> {code}
> This looks like a real bug, as StringValueFacetCounts#countOneSegment is 
> called once providing the index of the current loop instead of the ordinal 
> taken from the matching hits that we are analyzing. That works fine with 
> single threaded collection as we will go sequentially and the two indices 
> will always be the same. With multi-threaded search, the order of the 
> returned matching hits (one per segment) is not deterministic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10500) StringValueFacetCounts relies on sequential collection

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517694#comment-17517694
 ] 

ASF subversion and git services commented on LUCENE-10500:
--

Commit d2b3141e361636a36646b3a45fc322abecd2e957 in lucene's branch 
refs/heads/branch_9x from Luca Cavanna
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d2b3141e361 ]

LUCENE-10500: StringValueFacetCounts to not rely on sequential collection (#788)

StringValueFacetCounts should use the segment ordinal instead of the current 
index when looping through the matching hits, as when search is multi-threaded 
the order of the matching hits (one per segment) is not deterministic.

> StringValueFacetCounts relies on sequential collection
> --
>
> Key: LUCENE-10500
> URL: https://issues.apache.org/jira/browse/LUCENE-10500
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Luca Cavanna
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We recently moved some of the facets tests to use IndexSearcher#search(Query, 
> CollectorManager) providing a FacetsCollectorManager instead of a 
> FacetsCollector. Whenever newIndexSearcher(IndexReader) is used in tests, 
> concurrent search may now be exercised while it was not before.
> This caused some build failures on TestStringValueFacetCounts:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>   at 
> __randomizedtesting.SeedInfo.seed([ED8BF8281FCE5C02:9FC7DD27AEAEEA71]:0)
>   at 
> org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.util.packed.Packed64.get(Packed64.java:81)
>   at 
> org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.OrdinalMap$2.get(OrdinalMap.java:346)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.countOneSegment(StringValueFacetCounts.java:440)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.count(StringValueFacetCounts.java:295)
>   at 
> org.apache.lucene.facet.StringValueFacetCounts.(StringValueFacetCounts.java:123)
>   at 
> org.apache.lucene.facet.TestStringValueFacetCounts.checkFacetResult(TestStringValueFacetCounts.java:349)
>   at 
> org.apache.lucene.facet.TestStringValueFacetCounts.testRandom(TestStringValueFacetCounts.java:325)
> {code}
> This looks like a real bug, as StringValueFacetCounts#countOneSegment is 
> called once providing the index of the current loop instead of the ordinal 
> taken from the matching hits that we are analyzing. That works fine with 
> single threaded collection as we will go sequentially and the two indices 
> will always be the same. With multi-threaded search, the order of the 
> returned matching hits (one per segment) is not deterministic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new pull request, #789: Add release wizard step around build failures

2022-04-05 Thread GitBox


jtibshirani opened a new pull request, #789:
URL: https://github.com/apache/lucene/pull/789

   This PR proposes adding a preparation step to look at
   bui...@lucene.apache.org and address recurring failures. This helps make 
sure we
   catch and fix known bugs before spinning the release candidate. It also 
prevents
   flaky tests from failing during the release vote (which adds confusion).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #789: Add release wizard step around build failures

2022-04-05 Thread GitBox


jtibshirani commented on PR #789:
URL: https://github.com/apache/lucene/pull/789#issuecomment-1089323655

   This was inspired by my experience releasing Lucene 9.1, where addressing 
build failures early would've helped save us time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-05 Thread GitBox


zhaih commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843265236


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {

Review Comment:
   Thank you for adding the test case, it looks awesome!



##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/AlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * A modified DirectoryTaxonomyReader that always recreates a new {@link
+ * AlwaysRefreshDirectoryTaxonomyReader} instance when {@link
+ * AlwaysRefreshDirectoryTaxonomyReader#doOpenIfChanged()} is called. This 
enables us to easily go
+ * forward or backward in time by re-computing the ordinal space during each 
refresh.

Review Comment:
   Maybe add a note/warn here saying this will result in an always 
`O(#facet_label)` taxonomy array construction time when refresh?



##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR 

[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843271841


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+import org.junit.Test;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  @Test
+  /**
+   * Tests the expert constructors in {@link DirectoryTaxonomyReader} and 
checks the {@link
+   * DirectoryTaxonomyReader#getInternalIndexReader()} API. Also demonstrates 
the need for the
+   * constructor and the API.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+Directory dir = newDirectory();
+DirectoryTaxonomyWriter tw = new DirectoryTaxonomyWriter(dir);
+tw.addCategory(new FacetLabel("a"));
+tw.commit();
+
+DirectoryTaxonomyReader dtr1 = new 
AlwaysRefreshDirectoryTaxonomyReader(dir);
+tw.addCategory(new FacetLabel("b"));
+tw.commit();
+DirectoryTaxonomyReader dtr2 = dtr1.doOpenIfChanged();

Review Comment:
   > his is not really testing the unique capability of AlwaysRefreshDTR
   
   Yes and that was the cause of my hesitation in the first comment (I had also 
mentioned this in the first commit javadoc). The next commit adds the correct 
test case where DTR fails to pass the use case.
   
   > maybe by creating a method returning taxoArray == null?
   
   That was another way but I did not want to add another API to just be able 
to demonstrate a rare/expert use case. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843272080


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+import org.junit.Test;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {

Review Comment:
   The next commit does that exactly
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843272398


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java:
##
@@ -78,10 +78,20 @@
   private volatile TaxonomyIndexArrays taxoArrays;
 
   /**
-   * Called only from {@link #doOpenIfChanged()}. If the taxonomy has been 
recreated, you should
-   * pass {@code null} as the caches and parent/children arrays.
+   * Expert: Use this method to explicitly force the {@link 
DirectoryTaxonomyReader} to use specific
+   * parent/children arrays and caches.
+   *
+   * Called from {@link #doOpenIfChanged()}. If the taxonomy has been 
recreated, you should pass
+   * {@code null} as the caches and parent/children arrays.
+   *
+   * @param indexReader An indexReader that is opened in the desired Directory
+   * @param taxoWriter The {@link DirectoryTaxonomyWriter} from which to 
obtain newly added
+   * categories, in real-time.
+   * @param ordinalCache a FacetLabel to Integer ordinal mapping if it already 
exists
+   * @param categoryCache an ordinal to FacetLabel mapping if it already exists
+   * @param taxoArrays taxonomy arrays that store the parent, siblings, 
children information
*/
-  DirectoryTaxonomyReader(
+  public DirectoryTaxonomyReader(

Review Comment:
   Changed it to protected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843275500


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   Is there a way in Java to send a class, exception type as a param to this 
test case and then check the error thrown? So for DTR we could "actually" 
verify that an AIOOB is thrown instead of me mentioning it here. 
   
   
   Something like
   
```
   // ClassParam is sent as a param to the test case, once as DTR and next time 
as AlwaysRefreshDTR
   final DirectoryTaxonomyReader dtr1 = new ClassParam(dir1);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843275500


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   Is there a way in Java to send a class, exception type as a param to this 
test case and then check the error thrown? So for DTR we could "actually" 
verify that an AIOOB is thrown instead of me mentioning it here. 
   
   
   Something like
   
```
   // DTRClass is sent as a param to the test case, once as DTR and next time 
as AlwaysRefreshDTR
   final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843272398


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java:
##
@@ -78,10 +78,20 @@
   private volatile TaxonomyIndexArrays taxoArrays;
 
   /**
-   * Called only from {@link #doOpenIfChanged()}. If the taxonomy has been 
recreated, you should
-   * pass {@code null} as the caches and parent/children arrays.
+   * Expert: Use this method to explicitly force the {@link 
DirectoryTaxonomyReader} to use specific
+   * parent/children arrays and caches.
+   *
+   * Called from {@link #doOpenIfChanged()}. If the taxonomy has been 
recreated, you should pass
+   * {@code null} as the caches and parent/children arrays.
+   *
+   * @param indexReader An indexReader that is opened in the desired Directory
+   * @param taxoWriter The {@link DirectoryTaxonomyWriter} from which to 
obtain newly added
+   * categories, in real-time.
+   * @param ordinalCache a FacetLabel to Integer ordinal mapping if it already 
exists
+   * @param categoryCache an ordinal to FacetLabel mapping if it already exists
+   * @param taxoArrays taxonomy arrays that store the parent, siblings, 
children information
*/
-  DirectoryTaxonomyReader(
+  public DirectoryTaxonomyReader(

Review Comment:
   Changed it to protected to enable subclasses to be able to customize their 
usage. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10501) StackOverflow when RegExp encounters a very large string

2022-04-05 Thread Kartik Ganesh (Jira)
Kartik Ganesh created LUCENE-10501:
--

 Summary: StackOverflow when RegExp encounters a very large string
 Key: LUCENE-10501
 URL: https://issues.apache.org/jira/browse/LUCENE-10501
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 9.1
Reporter: Kartik Ganesh


When RegExp encounters a very large string, it hits a Stack Overflow exception 
when parsing it.

Simple program to repro:

{{$ ls}}
{{RegExpTest.java       lucene-core-9.1.0.jar}}
{{$ cat RegExpTest.java}}
{{class RegExpTest {}}
{{    public static void main(String[] args) {}}
{{        StringBuilder strBuilder = new StringBuilder();}}
{{        for (int i = 0; i < 5; i++) {}}
{{            strBuilder.append("a");}}
{{        }}}
{{        try {}}
{{            new 
org.apache.lucene.util.automaton.RegExp(strBuilder.toString());}}
{{        } catch (StackOverflowError e) {}}
{{            System.out.println("Stack overflow");}}
{{            System.exit(-1);}}
{{        }}}
{{        System.out.println("Success");}}
{{    }}}
{{}}}
{{$ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java}}
{{$ java -cp './lucene-core-9.1.0.jar:.' RegExpTest}}
{{Stack overflow}}
{{$ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest}}
{{Success}}

Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be 
due to the recursive parsing implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gsmiller commented on pull request #788: LUCENE-10500: StringValueFacetCounts to not rely on sequential collection

2022-04-05 Thread GitBox


gsmiller commented on PR #788:
URL: https://github.com/apache/lucene/pull/788#issuecomment-1089423815

   Oh no. Good find and thanks for fixing! (and apologies for slipping that bug 
in)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10501) StackOverflow when RegExp encounters a very large string

2022-04-05 Thread Kartik Ganesh (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kartik Ganesh updated LUCENE-10501:
---
Description: 
When RegExp encounters a very large string, it hits a Stack Overflow exception 
when parsing it.

Simple program to repro:
{noformat}
$ ls
RegExpTest.java   lucene-core-9.1.0.jar
$ cat RegExpTest.java
class RegExpTest {
public static void main(String[] args) {
StringBuilder strBuilder = new StringBuilder();
for (int i = 0; i < 5; i++) {
strBuilder.append("a");
}
try {
new org.apache.lucene.util.automaton.RegExp(strBuilder.toString());
} catch (StackOverflowError e) {
System.out.println("Stack overflow");
System.exit(-1);
}
System.out.println("Success");
}
}
$ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java
$ java -cp './lucene-core-9.1.0.jar:.' RegExpTest
Stack overflow
$ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest
Success{noformat}
Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be 
due to the recursive parsing implementation.

  was:
When RegExp encounters a very large string, it hits a Stack Overflow exception 
when parsing it.

Simple program to repro:

{{$ ls}}
{{RegExpTest.java       lucene-core-9.1.0.jar}}
{{$ cat RegExpTest.java}}
{{class RegExpTest {}}
{{    public static void main(String[] args) {}}
{{        StringBuilder strBuilder = new StringBuilder();}}
{{        for (int i = 0; i < 5; i++) {}}
{{            strBuilder.append("a");}}
{{        }}}
{{        try {}}
{{            new 
org.apache.lucene.util.automaton.RegExp(strBuilder.toString());}}
{{        } catch (StackOverflowError e) {}}
{{            System.out.println("Stack overflow");}}
{{            System.exit(-1);}}
{{        }}}
{{        System.out.println("Success");}}
{{    }}}
{{}}}
{{$ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java}}
{{$ java -cp './lucene-core-9.1.0.jar:.' RegExpTest}}
{{Stack overflow}}
{{$ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest}}
{{Success}}

Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to be 
due to the recursive parsing implementation.


> StackOverflow when RegExp encounters a very large string
> 
>
> Key: LUCENE-10501
> URL: https://issues.apache.org/jira/browse/LUCENE-10501
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/queryparser
>Affects Versions: 9.1
>Reporter: Kartik Ganesh
>Priority: Major
>
> When RegExp encounters a very large string, it hits a Stack Overflow 
> exception when parsing it.
> Simple program to repro:
> {noformat}
> $ ls
> RegExpTest.java   lucene-core-9.1.0.jar
> $ cat RegExpTest.java
> class RegExpTest {
> public static void main(String[] args) {
> StringBuilder strBuilder = new StringBuilder();
> for (int i = 0; i < 5; i++) {
> strBuilder.append("a");
> }
> try {
> new 
> org.apache.lucene.util.automaton.RegExp(strBuilder.toString());
> } catch (StackOverflowError e) {
> System.out.println("Stack overflow");
> System.exit(-1);
> }
> System.out.println("Success");
> }
> }
> $ javac -cp './lucene-core-9.1.0.jar:.' RegExpTest.java
> $ java -cp './lucene-core-9.1.0.jar:.' RegExpTest
> Stack overflow
> $ java -Xss1G -cp './lucene-core-9.1.0.jar:.' RegExpTest
> Success{noformat}
> Based on https://issues.apache.org/jira/browse/LUCENE-6156 , this appears to 
> be due to the recursive parsing implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843312709


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   I am also okay with just plainly duplicating all the code and creating a new 
test case. It is important that we have a test case that tests if an AIOOB is 
thrown. What do other people think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10482) Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-05 Thread Gautam Worah (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Worah updated LUCENE-10482:
--
Status: Patch Available  (was: Open)

> Allow users to create their own DirectoryTaxonomyReaders with empty 
> taxoArrays instead of letting the taxoEpoch decide
> --
>
> Key: LUCENE-10482
> URL: https://issues.apache.org/jira/browse/LUCENE-10482
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 9.1
>Reporter: Gautam Worah
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> I was experimenting with the taxonomy index and {{DirectoryTaxonomyReaders}} 
> in my day job where we were trying to replace the index underneath a reader 
> asynchronously and then call the {{doOpenIfChanged}} call on it.
> It turns out that the taxonomy index uses its own index based counter (the 
> {{{}taxonomyIndexEpoch{}}}) to determine if the index was opened in write 
> mode after the last time it was written and if not, it directly tries to 
> reuse the previous {{taxoArrays}} it had created. This logic fails in a 
> scenario where both the old and new index were opened just once but the index 
> itself is completely different in both the cases.
> In such a case, it would be good to give the user the flexibility to inform 
> the DTR to recreate its {{{}taxoArrays{}}}, {{ordinalCache}} and 
> {{{}categoryCache{}}} (not refreshing these arrays causes it to fail in 
> various ways). Luckily, such a constructor already exists! But it is private 
> today! The idea here is to allow subclasses of DTR to use this constructor.
> Curious to see what other folks think about this idea. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide

2022-04-05 Thread GitBox


zhaih commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843318155


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   I believe you can use `expectThrows`?
   
https://lucene.apache.org/core/7_3_1/test-framework/org/apache/lucene/util/LuceneTestCase.html#expectThrows-java.lang.Class-org.apache.lucene.util.LuceneTestCase.ThrowingRunnable-



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10292) AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()

2022-04-05 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated LUCENE-10292:

Attachment: LUCENE-10292.patch
  Assignee: Chris M. Hostetter
Status: Open  (was: Open)

{quote}seems like it should be possible/better to change 
{{AnalyzingInfixSuggester.build()}} so that the {{searcherMgr}} is only 
repalced *after* we build the new index (and/or stop using a new 
{{IndexWriter}} on every {{AnalyzingInfixSuggester.build()}} call and just do a 
{{writer.deleteAll()}} instead?)
{quote}

Digging into the code a bit more i realized:
* Using {{writer.deleteAll()}} isn't really a viable "fix" because (by default) 
the writer doesn't exist (or is closed after {{build()}} 
* Another aspect of the existing impl is that if thread T1 enters the 
{{lookup()}} method and makes it past the {{if (searcherMgr == null)}} check, 
but then another thread T2 calls {{build()}} before the T1 reaches the 
synchronized block, then T1 will be blocked for the entire durration of the 
{{build()}} ... which is certainly less then ideal.

I worked up a patch with a test case to try and  demonstrate the problem, as 
well as a fix -- note that as written the test case won't always  "fail" 
cleaning with the existing impl, it can also deadlock because of how I'm using 
a Semaphore to "slow" down the {{build()}} -- triggering the blocked lookup 
thread situation described above.

The basic idea of the fix s to decouple the synchronization/locking needed for 
changing the writer from the synchronization/locking needed to change the 
SearchManager.  and replace the SearchManager only once the {{build()}} is 
complete. 

 I originally tried to replace the "R/W" locking of SearcherManager with an 
AtomicReference so we wouldn't need to have any synchronization blocks in 
{{lookup()}} at all; but I couldn't figure out a "safe" way to do that w/o ref 
counting the SearcherManager itself (to ensure that no build/add calls close 
the SearcherManager that {{lookup()}} gets a ref to before it has a change to 
{{acquire()}} on it (thank you {{testRandomNRT}} for catching that for me)

Feedback on this approach (or suggestions for better ways to solve this) welcome

> AnalyzingInfixSuggester thread safety: lookup() fails during (re)build()
> 
>
> Key: LUCENE-10292
> URL: https://issues.apache.org/jira/browse/LUCENE-10292
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: LUCENE-10292.patch
>
>
> I'm filing this based on anecdotal information from a Solr user w/o 
> experiencing it first hand (and I don't have a test case to demonstrate it) 
> but based on a reading of the code the underlying problem seems self 
> evident...
> With all other Lookup implementations I've examined, it is possible to call 
> {{lookup()}} regardless of whether another thread is concurrently calling 
> {{build()}} – in all cases I've seen, it is even possible to call 
> {{lookup()}} even if {{build()}} has never been called: the result is just an 
> "empty" {{List}} 
> Typically this is works because the {{build()}} method uses temporary 
> datastructures until it's "build logic" is complete, at which point it 
> atomically replaces the datastructures used by the {{lookup()}} method.   In 
> the case of {{AnalyzingInfixSuggester}} however, the {{build()}} method 
> starts by closing & null'ing out the {{protected SearcherManager 
> searcherMgr}} (which it only populates again once it's completed building up 
> it's index) and then the lookup method starts with...
> {code:java}
> if (searcherMgr == null) {
>   throw new IllegalStateException("suggester was not built");
> }
> {code}
> ... meaning it is unsafe to call {{AnalyzingInfixSuggester.lookup()}} in any 
> situation where another thread may be calling 
> {{AnalyzingInfixSuggester.build()}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843363466


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
I meant something like
   
   ```
   void testCase(DTRClass, exceptionType) {
 // dir1 is created here
 final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1);
 // again some logic here
 if (DTRClass instanceOf AlwaysRefreshDTR) {
   mgr.maybeRefresh();
 } else {
   expectThrows(AIOOB, mgr.maybeRefresh());
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gautamworah96 commented on a diff in pull request #762: LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch d

2022-04-05 Thread GitBox


gautamworah96 commented on code in PR #762:
URL: https://github.com/apache/lucene/pull/762#discussion_r843363466


##
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestAlwaysRefreshDirectoryTaxonomyReader.java:
##
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy.directory;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.FacetLabel;
+import org.apache.lucene.facet.taxonomy.SearcherTaxonomyManager;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.util.IOUtils;
+
+public class TestAlwaysRefreshDirectoryTaxonomyReader extends FacetTestCase {
+
+  /**
+   * Tests the behavior of the {@link AlwaysRefreshDirectoryTaxonomyReader} by 
testing if the
+   * associated {@link SearcherTaxonomyManager} can successfully refresh and 
serve queries if the
+   * underlying taxonomy index is changed to an older checkpoint. Ideally, 
each checkpoint should be
+   * self-sufficient and should allow serving search queries when {@link
+   * SearcherTaxonomyManager#maybeRefresh()} is called.
+   *
+   * It does not check whether the private taxoArrays were actually 
recreated or no. We are
+   * (correctly) hiding away that complexity away from the user.
+   */
+  public void testAlwaysRefreshDirectoryTaxonomyReader() throws IOException {
+final Path taxoPath1 = createTempDir("dir1");
+final Directory dir1 = newFSDirectory(taxoPath1);
+final DirectoryTaxonomyWriter tw1 =
+new DirectoryTaxonomyWriter(dir1, IndexWriterConfig.OpenMode.CREATE);
+tw1.addCategory(new FacetLabel("a"));
+tw1.commit(); // commit1
+
+final Path taxoPath2 = createTempDir("commit1");
+final Directory commit1 = newFSDirectory(taxoPath2);
+// copy all index files from dir1
+for (String file : dir1.listAll()) {
+  commit1.copyFrom(dir1, file, file, IOContext.READ);
+}
+
+tw1.addCategory(new FacetLabel("b"));
+tw1.commit(); // commit2
+tw1.close();
+
+final DirectoryReader dr1 = DirectoryReader.open(dir1);
+// using a DirectoryTaxonomyReader here will cause the test to fail and 
throw a AIOOB exception

Review Comment:
   Something like
   
   ```
   void testCase(DTRClass, exceptionType) {
 // dir1 is created here
 final DirectoryTaxonomyReader dtr1 = new DTRClass(dir1);
 // again some logic here
 if (DTRClass instanceOf AlwaysRefreshDTR) {
   mgr.maybeRefresh();
 } else {
   expectThrows(AIOOB, mgr.maybeRefresh());
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9625) Benchmark KNN search with ann-benchmarks

2022-04-05 Thread Balmukund Mandal (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517825#comment-17517825
 ] 

Balmukund Mandal commented on LUCENE-9625:
--

Thank you Michael, it works. I think its easy to include only two jars i.e. 
core and test  rather including KnnGraphTester.class which requires proper 
directories . Hence, little modification in luceneknn.py file and read all from 
lib/* only.

> Benchmark KNN search with ann-benchmarks
> 
>
> Key: LUCENE-9625
> URL: https://issues.apache.org/jira/browse/LUCENE-9625
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In addition to benchmarking with luceneutil, it would be good to be able to 
> make use of ann-benchmarks, which is publishing results from many approximate 
> knn algorithms, including the hnsw implementation from its authors. We don't 
> expect to challenge the performance of these native code libraries, however 
> it would be good to know just how far off we are.
> I started looking into this and posted a fork of ann-benchmarks that uses 
> KnnGraphTester  class to run these: 
> https://github.com/msokolov/ann-benchmarks. It's still a WIP; you have to 
> manually copy jars and the KnnGraphTester.class to the test host machine 
> rather than downloading from a distribution. KnnGraphTester needs some 
> modifications in order to support this process - this issue is mostly about 
> that.
> One thing I noticed is that some of the index builds with higher fanout 
> (efConstruction) settings time out at 2h (on an AWS c5 instance), so this is 
> concerning and I'll open a separate issue for trying to improve that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn merged pull request #767: LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery

2022-04-05 Thread GitBox


zacharymorn merged PR #767:
URL: https://github.com/apache/lucene/pull/767


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?

2022-04-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517860#comment-17517860
 ] 

ASF subversion and git services commented on LUCENE-10436:
--

Commit 91e29405d8e4017e40e613323d6ef8f598167588 in lucene's branch 
refs/heads/main from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=91e29405d8e ]

LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
KnnVectorFieldExistsQuery with FieldExistsQuery (#767)



> Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery into a single FieldExistsQuery?
> --
>
> Key: LUCENE-10436
> URL: https://issues.apache.org/jira/browse/LUCENE-10436
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Now that we require consistency across data structures, we could merge 
> DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
> KnnVectorFieldExistsQuery together into a FieldExistsQuery that would require 
> that the field indexes either norms, doc values or vectors?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn opened a new pull request, #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-05 Thread GitBox


zacharymorn opened a new pull request, #790:
URL: https://github.com/apache/lucene/pull/790

   # Description
   
   Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and 
KnnVectorFieldExistsQuery
   
   # Tests
   
   Passed existing tests.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Lucene maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-05 Thread GitBox


zacharymorn commented on PR #790:
URL: https://github.com/apache/lucene/pull/790#issuecomment-1089864280

   For the change entry, I assume this should go into version `10.0.0`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-05 Thread GitBox


jpountz commented on code in PR #790:
URL: https://github.com/apache/lucene/pull/790#discussion_r843530787


##
lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java:
##
@@ -58,12 +58,6 @@ private static boolean shouldNeverCache(Query query) {
   return true;
 }
 
-if (query instanceof DocValuesFieldExistsQuery) {
-  // We do not bother caching DocValuesFieldExistsQuery queries since they 
are already plenty
-  // fast.
-  return true;
-}

Review Comment:
   We should have a similar instanceof check for `FieldExistsQuery`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #790: LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery

2022-04-05 Thread GitBox


jpountz commented on PR #790:
URL: https://github.com/apache/lucene/pull/790#issuecomment-1089886965

   > For the change entry, I assume this should go into version 10.0.0?
   
   Yes, we need a CHANGES entry under 10.0.0 and a new entry in 
`lucene/MIGRATE.txt` that recommends replacing `DocValueFieldExistsQuery` and 
others with `FieldExistsQuery`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org