[GitHub] [lucene] jpountz commented on pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount

2023-01-01 Thread GitBox


jpountz commented on PR #12051:
URL: https://github.com/apache/lucene/pull/12051#issuecomment-1368388218

   Thanks for catching this. Would it also work if we fixed indexing to 
sometimes index other values, e.g. replacing `if (random().nextBoolean()) {` 
with `if (i != 3 && random().nextBoolean()) {` and force-merged before opening 
a reader?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz opened a new pull request, #12053: Allow reusing indexed binary fields.

2023-01-01 Thread GitBox


jpountz opened a new pull request, #12053:
URL: https://github.com/apache/lucene/pull/12053

   Today Lucene allows creating indexed binary fields, e.g. via 
`StringField(String, BytesRef, Field.Store)`, but not reusing them: calling 
`setBytesValue` on a `StringField` throws.
   
   This commit removes the check that prevents reusing fields with binary 
values. I considered an alternative that consisted of failing if calling 
`setBytesValue` on a field that is indexed and tokenized, but we currently 
don't have such checks e.g. on numeric values, so it did not feel consistent.
   
   Doing this change would help improve the [nightly benchmarks for the NYC 
taxis 
dataset](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html) 
by doing the String -> UTF-8 conversion only once for keywords, instead of once 
for the `StringField` and one for the `SortedDocValuesField`, while still 
reusing fields.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz opened a new pull request, #12054: Introduce a new `KeywordField`.

2023-01-01 Thread GitBox


jpountz opened a new pull request, #12054:
URL: https://github.com/apache/lucene/pull/12054

   `KeywordField` is a combination of `StringField` and 
`SortedSetDocValuesField`, similarly to how `LongField` is a combination of 
`LongPoint` and `SortedNumericDocValuesField`. This makes it easier for users 
to create fields that can be used for filtering, sorting and faceting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz opened a new pull request, #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


jpountz opened a new pull request, #12055:
URL: https://github.com/apache/lucene/pull/12055

   Currently multi-term queries with a filter rewrite internally rewrite to a 
disjunction if 16 terms or less match the query. Otherwise postings lists of 
matching terms are collected into a `DocIdSetBuilder`. This change replaces the 
latter with a mixed approach where a disjunction is created between the 16 
terms that have the highest document frequency and an iterator produced from 
the `DocIdSetBuilder` that collects all other terms. On fields that have a 
zipfian distribution, it's quite likely that no high-frequency terms make it to 
the `DocIdSetBuilder`. This provides two main benefits:
- Queries are less likely to allocate a FixedBitSet of size `maxDoc`.
- Queries are better at skipping or early terminating. On the other hand, 
queries that need to consume most or all matching documents may get a slowdown.
   
   The slowdown is unfortunate, but my gut feeling is that this change still 
has more pros than cons.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


jpountz commented on PR #12055:
URL: https://github.com/apache/lucene/pull/12055#issuecomment-1368422269

   Here is what luceneutil gives on wikimedium10m:
   
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   BrowseDateTaxoFacets   43.81  (3.9%)   42.63 
(12.5%)   -2.7% ( -18% -   14%) 0.359
   OrHighNotLow  397.64  (9.3%)  387.31  
(7.0%)   -2.6% ( -17% -   15%) 0.320
  BrowseDayOfYearTaxoFacets   44.21  (4.4%)   43.10 
(12.8%)   -2.5% ( -18% -   15%) 0.406
   OrHighNotMed  439.76  (8.8%)  431.26  
(6.6%)   -1.9% ( -15% -   14%) 0.432
  OrHighNotHigh  349.59  (8.0%)  342.97  
(5.8%)   -1.9% ( -14% -   12%) 0.391
  BrowseMonthTaxoFacets   29.26  (8.8%)   28.75 
(12.3%)   -1.7% ( -21% -   21%) 0.609
  OrNotHighHigh  359.69  (6.8%)  353.47  
(5.4%)   -1.7% ( -13% -   11%) 0.374
MedTerm  741.89  (6.7%)  729.74  
(6.8%)   -1.6% ( -14% -   12%) 0.442
AndHighHigh  104.97  (5.6%)  103.30  
(5.7%)   -1.6% ( -12% -   10%) 0.373
   HighTerm  509.66  (7.1%)  501.79  
(7.3%)   -1.5% ( -14% -   13%) 0.498
 OrHighHigh   46.45  (4.3%)   45.79  
(3.3%)   -1.4% (  -8% -6%) 0.240
LowTerm  972.50  (7.7%)  959.89  
(6.7%)   -1.3% ( -14% -   14%) 0.570
  HighTermTitleSort  174.75  (6.6%)  172.71  
(5.5%)   -1.2% ( -12% -   11%) 0.544
 AndHighLow 1288.70  (2.7%) 1274.94  
(3.1%)   -1.1% (  -6% -4%) 0.247
   OrNotHighMed  456.13  (4.5%)  452.07  
(3.9%)   -0.9% (  -8% -7%) 0.504
  HighTermMonthSort 3799.69  (6.2%) 3765.99  
(4.8%)   -0.9% ( -11% -   10%) 0.613
  BrowseMonthSSDVFacets   21.87  (9.8%)   21.67 
(10.7%)   -0.9% ( -19% -   21%) 0.786
 HighPhrase   93.80  (7.4%)   92.97  
(6.1%)   -0.9% ( -13% -   13%) 0.680
LowSloppyPhrase   59.38  (3.5%)   58.90  
(4.2%)   -0.8% (  -8% -7%) 0.513
 Fuzzy2   49.64  (1.6%)   49.26  
(2.7%)   -0.8% (  -4% -3%) 0.268
 Fuzzy1  108.72  (1.5%)  107.94  
(1.6%)   -0.7% (  -3% -2%) 0.148
LowSpanNear  157.46  (4.0%)  156.35  
(4.0%)   -0.7% (  -8% -7%) 0.577
BrowseRandomLabelSSDVFacets   14.99  (5.9%)   14.88  
(5.9%)   -0.7% ( -11% -   11%) 0.712
   AndHighHighDayTaxoFacets6.15  (6.0%)6.11  
(5.6%)   -0.6% ( -11% -   11%) 0.743
 AndHighMed  206.86  (5.1%)  205.71  
(5.7%)   -0.6% ( -10% -   10%) 0.745
  OrHighMed  178.33  (3.7%)  177.55  
(3.7%)   -0.4% (  -7% -7%) 0.709
MedSpanNear   55.68  (3.1%)   55.48  
(3.3%)   -0.4% (  -6% -6%) 0.713
   HighSpanNear   14.27  (3.5%)   14.23  
(3.1%)   -0.3% (  -6% -6%) 0.780
Respell  106.00  (1.8%)  105.77  
(1.6%)   -0.2% (  -3% -3%) 0.695
   HighTermTitleBDVSort   18.32  (3.7%)   18.30  
(5.7%)   -0.1% (  -9% -9%) 0.927
   PKLookup  235.19  (3.3%)  234.99  
(3.8%)   -0.1% (  -6% -7%) 0.939
MedSloppyPhrase   19.51  (3.8%)   19.49  
(4.0%)   -0.1% (  -7% -8%) 0.957
MedIntervalsOrdered   72.80  (5.6%)   72.76  
(5.1%)   -0.1% ( -10% -   11%) 0.971
  OrHighLow  711.52  (2.3%)  712.77  
(2.6%)0.2% (  -4% -5%) 0.823
  LowPhrase   37.03  (5.3%)   37.10  
(4.7%)0.2% (  -9% -   10%) 0.908
  MedPhrase  147.57  (5.0%)  147.86  
(4.2%)0.2% (  -8% -9%) 0.893
BrowseRandomLabelTaxoFacets   35.14 (10.5%)   35.21 
(12.7%)0.2% ( -20% -   26%) 0.955
  HighTermDayOfYearSort  394.78  (5.3%)  395.67  
(2.7%)0.2% (  -7% -8%) 0.865
 TermDTSort  129.27  (3.6%)  129.82  
(3.7%)0.4% (  -6% -7%) 0.711
AndHighMedDayTaxoFacets  157.89  (2.1%)  158.59  
(1.8%)0.4% (  -3% -4%) 0.475
   OrNotHighLow 1014.83  (4.5%) 1020.03  
(4.1%)0.5% (  -7% -

[GitHub] [lucene] jpountz commented on pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


jpountz commented on PR #12055:
URL: https://github.com/apache/lucene/pull/12055#issuecomment-1368423502

   For the record, the reason why we're seeing a speedup here is because prefix 
and wildcard queries produce constant scores, so the query can early terminate 
once 1,000 hits have been collected. Before the change, we would always create 
a bitset of all matches, and that would force evaluating the query against the 
entire doc ID space up-front. Evaluation is more lazy now, with only 
low-frequency postings being evaluated up-front and high-frequency postings 
being evaulated lazily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov merged pull request #12047: fix typo analysis-kuromoji

2023-01-01 Thread GitBox


msokolov merged PR #12047:
URL: https://github.com/apache/lucene/pull/12047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] twosom commented on pull request #12047: fix typo analysis-kuromoji

2023-01-01 Thread GitBox


twosom commented on PR #12047:
URL: https://github.com/apache/lucene/pull/12047#issuecomment-1368474848

   @msokolov 
   Thanks~!
   and Happy New Year!👻


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on a diff in pull request #12029: introduce support in KnnVectorQuery for getters/setters

2023-01-01 Thread GitBox


msokolov commented on code in PR #12029:
URL: https://github.com/apache/lucene/pull/12029#discussion_r1059769467


##
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##
@@ -33,6 +33,7 @@
 import org.apache.lucene.store.Directory;
 import org.apache.lucene.util.TestVectorUtil;
 import org.apache.lucene.util.VectorUtil;
+import org.junit.Assert;

Review Comment:
   OK, tiny nit here - but LuceneTestCase inherits from Assert so we don't need 
to import and can just use the assertions directly without qualification. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-01 Thread GitBox


msokolov commented on PR #12048:
URL: https://github.com/apache/lucene/pull/12048#issuecomment-1368477965

   Sorry, I don't see this being any better than the current situation; aside 
from tests, the parameters are only used in HnswVectorsFormat where they are 
currently defined, so I think we should leave them there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on issue #11354: Reuse HNSW graphs when merging segments? [LUCENE-10318]

2023-01-01 Thread GitBox


msokolov commented on issue #11354:
URL: https://github.com/apache/lucene/issues/11354#issuecomment-1368479497

   HI Jack, thanks for persisting and returning to this. I haven't had a chance 
to review the PR yet, just looking at the results here I have a few questions. 
First, it looks to me as if we see some very since improvement for the larger 
graphs, preserve the same recall, and changes to QPS are probably noise. I 
guess the assumption is we are producing similar results with less work? Just 
so we can understand these results a little better, could you document how you 
arrived at them? What dataset did you use? How did you measure the times and 
recall (was it using KnnGraphTester? luceneutil? some other benchmarking 
tool?). I'd also be curious to see the numbers and sizes of the segments in the 
results: I assume they would be unchanged from Control to Test, but it would be 
nice to be able to verify. Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-01 Thread GitBox


rmuir commented on PR #12053:
URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368481511

   > I considered an alternative that consisted of failing if calling 
`setBytesValue` on a field that is indexed and tokenized
   
   Can we just do this instead?
   
   I think an important point here is that you shouldnt be calling 
setBytesValue if it is tokenized (TokenStream in use). You need Reader/String.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-01 Thread GitBox


rmuir commented on PR #12053:
URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368481827

   and yeah, you don't have such checks on numeric values, but numeric values 
don't have TokenStream tokenization. Being consistent with them makes no sense, 
that isn't what this is about.
   
   otherwise, if we cant agree here, lets just keep the restriction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12053: Allow reusing indexed binary fields.

2023-01-01 Thread GitBox


rmuir commented on PR #12053:
URL: https://github.com/apache/lucene/pull/12053#issuecomment-1368482600

   the fact that the tests pass with this change is really upsetting too. we 
should at least add checks for the type of luser moments we want to prevent, 
e.g. calling setBytesRef on a fucking TextField, etc. If we dont add these 
checks then users are going to invoke these methods and... nothing will happen 
at all... or something that isn't what they want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #12051: Fix wrong assertion in TestBooleanQuery.testQueryMatchesCount

2023-01-01 Thread GitBox


zhaih commented on PR #12051:
URL: https://github.com/apache/lucene/pull/12051#issuecomment-1368503412

   Yeah it should work unless we later come up with some way to quickly pull 
out count in that situation as well. 
   
   But I think the assertion here may not be necessary because I see you have 
already added a specific test testing more comprehensive situations where 
boolean weights should or should not return -1. The assertion here seems was 
introduced at the time when the `Weight#count` API was first introduced and 
should be removed IMO since we have had a non-default impl right now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #12056: Update to error-prone 2.17

2023-01-01 Thread GitBox


rmuir opened a new pull request, #12056:
URL: https://github.com/apache/lucene/pull/12056

   I investigated each of the new checks, nothing really interesting except an 
incorrect javadoc link (discovered manually) linking to Object.finalize()


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #12057: Forbidden-apis "built-in" signatures don't appear to be working?

2023-01-01 Thread GitBox


rmuir opened a new issue, #12057:
URL: https://github.com/apache/lucene/issues/12057

   ### Description
   
   I was looking at new error-prone checks in #12056 and one fails on 
Object.finalize
   
   Because the method is in the built-in JDK deprecated list (e.g. 
https://github.com/policeman-tools/forbidden-apis/blob/main/src/main/resources/de/thetaphi/forbiddenapis/signatures/jdk-deprecated-11.txt#L195),
 I would expect the check to fail if i override finalize.
   
   If I give `lucene/core/src/test/org/apache/lucene/TestDemo.java` a finalizer 
method, nothing fails. It makes me worried the built-in signatures lists aren't 
being applied somehow? Maybe the gradle task matching logic in the 
forbidden-apis config is buggy? not sure what is going on. cc: @uschindler 
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12057: Forbidden-apis "built-in" signatures don't appear to be working?

2023-01-01 Thread GitBox


rmuir commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368524586

   Here's how to reproduce: apply this patch, then run `gradlew check -x test`. 
I would expect the build to fail, because we added a deprecated finalizer. 
Maybe forbidden doesn't fail because we don't actually call Object.finalize()? 
This method is a little special in that overriding it is enough to be bad. 
Maybe we should try to fix javac or ECJ to fail on deprecated usages instead?
   
   
   ```
   diff --git a/lucene/core/src/test/org/apache/lucene/TestDemo.java 
b/lucene/core/src/test/org/apache/lucene/TestDemo.java
   index 6c608e1d0b1..8bcbdc813ee 100644
   --- a/lucene/core/src/test/org/apache/lucene/TestDemo.java
   +++ b/lucene/core/src/test/org/apache/lucene/TestDemo.java
   @@ -46,6 +46,11 @@ import org.apache.lucene.util.IOUtils;
 */
public class TestDemo extends LuceneTestCase {
   
   +  @Override
   +  protected void finalize() {
   +System.out.println("YOLO");
   +  }
   +
  public void testDemo() throws IOException {
String longTerm =
"longtermlongtermlongtermlongtermlongtermlongtermlongtermlong"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12057: Forbidden-apis "built-in" signatures don't appear to be working?

2023-01-01 Thread GitBox


rmuir commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368527335

   Confirmed that's the issue, if i add a `super.finalize()` call to my 
finalizer, then forbidden fails. I will edit the issue.
   
   So we may need to use a different tool (javac, ecj) to ban finalizers. 
worst-case we just enable the error-prone check for them, but I try to avoid 
using error-prone if something simpler will do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12057: ban finalizers in the build somehow (worst-case: use error-prone)

2023-01-01 Thread GitBox


rmuir commented on issue #12057:
URL: https://github.com/apache/lucene/issues/12057#issuecomment-1368539424

   Currently there is no good way with ECJ/javac, unless we fail on all 
deprecations, which is very noisy at the moment. We can probably do it better 
with ECJ if we enable all their deprecation options, and clean up codebase 
(e.g. ensure tests calling deprecated stuff are also themselves deprecated)
   
   * org.eclipse.jdt.core.compiler.problem.deprecation
   * org.eclipse.jdt.core.compiler.problem.deprecationInDeprecatedCode
   * 
org.eclipse.jdt.core.compiler.problem.deprecationWhenOverridingDeprecatedMethod
   
   But looking at the code, that's gonna require quite a bit of work, I just 
wanted to make some progress here and prevent finalizers from slipping in.
   
   Unfortunately the error-prone checker doesn't fail on this case either yet: 
https://github.com/google/error-prone/pull/3652


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler opened a new pull request, #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties

2023-01-01 Thread GitBox


uschindler opened a new pull request, #12058:
URL: https://github.com/apache/lucene/pull/12058

   This improves the test, which fails with OpenJ9 VMs, due to the following 
problem:
   - OpenJ9 returns the HotspotMXBean, but it is empty and has no properties. 
So we can't detect compressed pointers. But the test requires it
   - The assumption now uses the compilation bean to detect this in tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties

2023-01-01 Thread GitBox


uschindler commented on PR #12058:
URL: https://github.com/apache/lucene/pull/12058#issuecomment-1368556378

   Thanks Robert!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #12058: Fix detection of Hotspot in TestRamUsageEstimator so it works with OpenJ9 that has the bean, but without properties

2023-01-01 Thread GitBox


uschindler merged PR #12058:
URL: https://github.com/apache/lucene/pull/12058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


rmuir commented on code in PR #12055:
URL: https://github.com/apache/lucene/pull/12055#discussion_r1059804843


##
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##
@@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
   }
   Query q = new ConstantScoreQuery(bq.build());
   final Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
-  return new WeightOrDocIdSet(weight);
+  return new WeightOrDocIdSetIterator(weight);
 }
 
 // Too many terms: go back to the terms we already collected and start 
building the bit set
-DocIdSetBuilder builder = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
+PriorityQueue highFrequencyTerms =
+new PriorityQueue(collectedTerms.size()) {
+  @Override
+  protected boolean lessThan(PostingsEnum a, PostingsEnum b) {
+return a.cost() < b.cost();

Review Comment:
   `pq.insertWithOverflow` uses `!lessThan()` in its code. So I'm worried about 
this PQ behaving stupidly on ties with the same `docFreq`.
   
   Is there a simple tiebreaker we can use (even synthetic such as `int 
termId`) so that such ties don't enter the PQ? I'm just concerned about 
"collect remaining terms" piece for cases where there are jazillions of terms. 
should also allow the IO to be a bit more sequential in such cases, rather than 
constantly replacing top of PQ with more ties?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


rmuir commented on code in PR #12055:
URL: https://github.com/apache/lucene/pull/12055#discussion_r1059807197


##
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##
@@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
   }
   Query q = new ConstantScoreQuery(bq.build());
   final Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
-  return new WeightOrDocIdSet(weight);
+  return new WeightOrDocIdSetIterator(weight);
 }
 
 // Too many terms: go back to the terms we already collected and start 
building the bit set
-DocIdSetBuilder builder = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
+PriorityQueue highFrequencyTerms =
+new PriorityQueue(collectedTerms.size()) {
+  @Override
+  protected boolean lessThan(PostingsEnum a, PostingsEnum b) {
+return a.cost() < b.cost();
+  }
+};
+DocIdSetBuilder otherTerms = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
 if (collectedTerms.isEmpty() == false) {
   TermsEnum termsEnum2 = terms.iterator();
   for (TermAndState t : collectedTerms) {
 termsEnum2.seekExact(t.term, t.state);
-docs = termsEnum2.postings(docs, PostingsEnum.NONE);
-builder.add(docs);
+PostingsEnum postings = termsEnum2.postings(null, 
PostingsEnum.NONE);
+highFrequencyTerms.add(postings);

Review Comment:
   Rather than just blindly add terms to the PQ, should we just have a constant 
mininum `cost` threshold (e.g. 256, 1024, whatever) to even consider it? 
otherwise go directly to `otherTerms`. The skipping stuff isn't going to be 
useful for the long-tail of low-cost terms (the majority, if we are thinking 
zipf). Ideally we wouldnt waste our time unless it has skipdata? And we want to 
be careful about the performance of these queries when there are jazillions of 
jazillions of matching low-frequency terms.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #12055: Better skipping for multi-term queries with a FILTER rewrite.

2023-01-01 Thread GitBox


rmuir commented on code in PR #12055:
URL: https://github.com/apache/lucene/pull/12055#discussion_r1059807649


##
lucene/core/src/java/org/apache/lucene/search/MultiTermQueryConstantScoreWrapper.java:
##
@@ -183,23 +182,31 @@ private WeightOrDocIdSet rewrite(LeafReaderContext 
context) throws IOException {
   }
   Query q = new ConstantScoreQuery(bq.build());
   final Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
-  return new WeightOrDocIdSet(weight);
+  return new WeightOrDocIdSetIterator(weight);
 }
 
 // Too many terms: go back to the terms we already collected and start 
building the bit set
-DocIdSetBuilder builder = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
+PriorityQueue highFrequencyTerms =
+new PriorityQueue(collectedTerms.size()) {
+  @Override
+  protected boolean lessThan(PostingsEnum a, PostingsEnum b) {
+return a.cost() < b.cost();
+  }
+};
+DocIdSetBuilder otherTerms = new 
DocIdSetBuilder(context.reader().maxDoc(), terms);
 if (collectedTerms.isEmpty() == false) {
   TermsEnum termsEnum2 = terms.iterator();
   for (TermAndState t : collectedTerms) {
 termsEnum2.seekExact(t.term, t.state);
-docs = termsEnum2.postings(docs, PostingsEnum.NONE);
-builder.add(docs);
+PostingsEnum postings = termsEnum2.postings(null, 
PostingsEnum.NONE);
+highFrequencyTerms.add(postings);
   }
 }
 
-// Then keep filling the bit set with remaining terms
+// Then collect remaining terms
+PostingsEnum postings = null;
 do {
-  docs = termsEnum.postings(docs, PostingsEnum.NONE);
+  postings = termsEnum.postings(postings, PostingsEnum.NONE);

Review Comment:
   i don't understand how this is safe at all, we are reusing PostingsEnum 
instances yet also stuffing them into a priority queue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #8485: TestIndexWriterOnError.testCheckpoint fails on IBM J9 [LUCENE-7432]

2023-01-01 Thread GitBox


uschindler closed issue #8485: TestIndexWriterOnError.testCheckpoint fails on 
IBM J9 [LUCENE-7432]
URL: https://github.com/apache/lucene/issues/8485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]

2023-01-01 Thread GitBox


uschindler commented on issue #7580:
URL: https://github.com/apache/lucene/issues/7580#issuecomment-1368575244

   This seems fixed now, tets no longer fails.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]

2023-01-01 Thread GitBox


uschindler closed issue #7580: Reproducible fieldcache AIOOBE only on J9 
[LUCENE-6522]
URL: https://github.com/apache/lucene/issues/7580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #7579: org.apache.xerces.util is a protected pkg on IBM J9 [LUCENE-6521]

2023-01-01 Thread GitBox


uschindler closed issue #7579: org.apache.xerces.util is a protected pkg on IBM 
J9 [LUCENE-6521]
URL: https://github.com/apache/lucene/issues/7579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #7579: org.apache.xerces.util is a protected pkg on IBM J9 [LUCENE-6521]

2023-01-01 Thread GitBox


uschindler commented on issue #7579:
URL: https://github.com/apache/lucene/issues/7579#issuecomment-1368575466

   This is fixed in J9, as it now uses OpenJDK class library.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #7575: mockfilesystem tests fail with IBM jdk [LUCENE-6517]

2023-01-01 Thread GitBox


uschindler closed issue #7575: mockfilesystem tests fail with IBM jdk 
[LUCENE-6517]
URL: https://github.com/apache/lucene/issues/7575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #7575: mockfilesystem tests fail with IBM jdk [LUCENE-6517]

2023-01-01 Thread GitBox


uschindler commented on issue #7575:
URL: https://github.com/apache/lucene/issues/7575#issuecomment-1368575612

   This should no longer be an issue, as OpenJ9 uses the OpenJDK class library 
now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #7614: TestQueryTemplateManager always fails on J9 [LUCENE-6556]

2023-01-01 Thread GitBox


uschindler commented on issue #7614:
URL: https://github.com/apache/lucene/issues/7614#issuecomment-1368575794

   This is no longer an issue, all tests pass, because OpenJ9 now uses the 
OpenJDK class library and no longer Harmony.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #7614: TestQueryTemplateManager always fails on J9 [LUCENE-6556]

2023-01-01 Thread GitBox


uschindler closed issue #7614: TestQueryTemplateManager always fails on J9 
[LUCENE-6556]
URL: https://github.com/apache/lucene/issues/7614


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #7580: Reproducible fieldcache AIOOBE only on J9 [LUCENE-6522]

2023-01-01 Thread GitBox


uschindler commented on issue #7580:
URL: https://github.com/apache/lucene/issues/7580#issuecomment-1368576005

   In addition, Lucene has no fieldcache anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler closed issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928]

2023-01-01 Thread GitBox


uschindler closed issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928]
URL: https://github.com/apache/lucene/issues/5001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #5001: TestNRTManager hangs with IBM JRE [LUCENE-3928]

2023-01-01 Thread GitBox


uschindler commented on issue #5001:
URL: https://github.com/apache/lucene/issues/5001#issuecomment-1368577762

   This test now passes with IBM Semeru / OpenJ9


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jasirkt commented on issue #11701: Deadlock in AnalysisSPILoader [LUCENE-10665]

2023-01-01 Thread GitBox


jasirkt commented on issue #11701:
URL: https://github.com/apache/lucene/issues/11701#issuecomment-1368693920

   > In which verison did you see this?
   
   9.1.0
   
   Thanks for fixing. It works now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org