Re: [PR] Disable sort optimization when tracking all docs [lucene]

2025-04-25 Thread via GitHub


github-actions[bot] commented on PR #14395:
URL: https://github.com/apache/lucene/pull/14395#issuecomment-2831654167

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


jpountz commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830429994

   I hope you don't mind, I updated this PR title and description to better 
reflect the change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


jpountz commented on code in PR #14543:
URL: https://github.com/apache/lucene/pull/14543#discussion_r2060222310


##
lucene/core/src/test/org/apache/lucene/search/TestBoolean2ScorerSupplier.java:
##
@@ -315,6 +318,9 @@ public void testDisjunctionLeadCost() throws IOException {
 new BooleanScorerSupplier(
 new FakeWeight(), subs, RandomPicks.randomFrom(random(), 
ScoreMode.values()), 0, 100)
 .get(100); // triggers assertions as a side-effect
+new BooleanScorerSupplier(
+new FakeWeight(), subs, RandomPicks.randomFrom(random(), 
ScoreMode.values()), 0, 100)
+.bulkScorer(); // triggers assertions as a side-effect

Review Comment:
   Thanks, I had only run lucky seeds that had not exercised 
ScoreMode.TOP_SCORES, which triggers different logic for producing a bulk 
scorer (MaxScoreBulkScorer instead of BooleanScorer). This is a real failure.
   
   I decided to relax assertions a bit instead of refactoring 
BooleanScorerSupplier too much, since all cases when the lead cost is greater 
than or equal to the cost of a clause are practically equivalent and mean that 
this clause is leading iteration. This also helped simplify tests a bit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


jpountz commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830415622

   This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 
10.2.1 then? cc @ChrisHegarty 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] add RawTFSimilarity class [lucene]

2025-04-25 Thread via GitHub


cpoerschke commented on PR #13749:
URL: https://github.com/apache/lucene/pull/13749#issuecomment-2830589655

   > Your reference to `DelimitedTermFrequencyTokenFilter` suggests that the 
freq here is more a feature than an actual frequency of a term in a doc. From 
an API perspective, this would make me want to expose it via an IndexableField 
sub class, with a query factory, a bit like `FeatureQuery` but for integer 
values?
   
   (belatedly) thanks for this mention! yes, the value is a feature of a term 
in a doc, and actually originally non-integer. (still work in progress) 
https://github.com/apache/solr/pull/3318 documentation now includes both a 
`RawTFSimilarity` and a `FeatureQuery` section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


ChrisHegarty commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830590012

   > @ChrisHegarty @jpountz Moved the change log to 10.2.1
   
   Eh! I think you moved it to 10.2.0, rather than 10.2.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 opened a new pull request, #14557:
URL: https://github.com/apache/lucene/pull/14557

   Backport https://github.com/apache/lucene/pull/14511 to branch_10x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] [Bug] Lead cost in boolean conjunction queries can be miscalculated [lucene]

2025-04-25 Thread via GitHub


ChrisHegarty closed issue #14542: [Bug] Lead cost in boolean conjunction 
queries can be miscalculated
URL: https://github.com/apache/lucene/issues/14542


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


ChrisHegarty merged PR #14543:
URL: https://github.com/apache/lucene/pull/14543


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830734669

   To be clear, i raised https://github.com/apache/lucene/pull/14557 and 
https://github.com/apache/lucene/pull/14558 for backporting. I plan to merge 
this now if no one objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830738175

   @expani could you resolve the conflicts so that i can merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-25 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2059828062


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   I've added an override function to `MMapDirectory` so the `ReadAdvice` can 
be explicitly specified on a per-file basis



##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   I've added an override function to `MMapDirectory` in eb80be0a7c so the 
`ReadAdvice` can be explicitly specified on a per-file basis



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-04-25 Thread via GitHub


mikemccand commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2830109470

   > > The Linux change targets both MGLRU and normal LRU. The impact is more 
pronounced in MGLRU, as page reclamation is more aggressive there. However, the 
semantic change for this advice is the same in both cases. In the latest 
kernels, using `MADV_RANDOM` does not mark the page as accessed, regardless of 
whether MGLRU is in use. That's a big shift of semantic for our default read 
advice.
   > 
   > Easy argument to change the default to `NORMAL`.
   
   +1 to go back to `NORMAL` as default, until we can better understand the 
regressions we (OpenSearch users, Elasticsearch users, and Amazon product 
search (my team)) are  seeing with `MADV_RANDOM`.
   
   I think `MADV_RANDOM` can also be harmful for "hot" (index expected to 
mostly fit in RAM) use cases.
   
   For our service (Amazon product search), which is mostly hot, we had to 
hard-override back to `IOContext.DEFAULT` for `.vec` and `.veq` (quantized 
vectors) in a hackity way (subclass `MMapDirectory` to insert shim (that 
rewrites the `IOContext`) into `openInput` -- oooh as @jpountz describes at 
https://github.com/apache/lucene/issues/14348#issuecomment-2730966937, except 
opposite), in some cases (lighting a new commit point during NRT replication) 
where we had to turn off `MMapDirectory.setPreload`.
   
   At Lucene's defaults (`MADV_RANDOM` for the KNN vector files) we saw 
horribly slow warmup of our searchers ... basically, paging in all those 
vectors one at a time as "real" queries visited the HNSW graph was crazy slow 
(many minutes) even on crazy fast infra (AWS), whereas letting the OS do its 
default "thing" (bulk readahead of N pages when a page miss happens?) was much 
quicker.  Much less  "page fault amplification".
   
   Benchmarks in luceneutil also hit this -- minutes and minutes of swapping in 
the HNSW graph (without `.setPreload`) from a fast local SSD, but I think 
luceneutil is still using Lucene's `IOContext` defaults here. 
   
   Actually, if we `MADV_RANDOM` and `.setPreload` to load `.vec`, what is the 
effect?  Does the preloading still work (OS caches/touches all pages, and does 
mark them as accessed (so they stay cached), despite the `MADV_RANDOM`)?  Is it 
much slower to preload when you `MADV_RANDOM` (though presumably it is 
sequentially bringing pages in)?
   
   > AFAIK https://github.com/apache/lucene/issues/14422 is working on fixing 
that "real problem".
   
   +1 to work towards this more general fix.  But, sheesh, it looks so 
complicated, depending on hot vs cold use case, preloading or not, which part 
of the Lucene index (KNN, terms, postings), Linux kernel versions, ... in the 
mean time I think we should revert back to `NORMAL`/`DEFAULT` as Lucene's 
default...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]

2025-04-25 Thread via GitHub


rmuir commented on issue #14408:
URL: https://github.com/apache/lucene/issues/14408#issuecomment-2830144654

   > +1 to work towards this more general fix. But, sheesh, it looks so 
complicated, depending on hot vs cold use case, preloading or not, which part 
of the Lucene index (KNN, terms, postings), Linux kernel versions, ... in the 
mean time I think we should revert back to `NORMAL`/`DEFAULT` as Lucene's 
default...
   
   Yes, my argument is that it is complicated, and lucene needs to get out of 
the business of it.
   
   We can flip-flop on this setting over and over again, and each time some 
users will experience regressions and others will get happy.
   
   I don't want to see this, it is just more of the same. Lucene needs to get 
out of the business of doing it.
   
   But if you want a quick fix: I'd be in favor of a PR that removes all 
madvise/preloading/otherwise from lucene. Let it be the "user's decision" on 
this shit, the different use-cases and platforms are too different, lucene 
cannot have "defaults" in java code and pretend like thats gonna work well 
across all these various use-cases and platforms. it does not work. and for 
users that care and want this stuff, hopefully it gets easier for them once 
#14422 lands.
   
   This way its not constant regressing back and forth because of settings 
flipflopping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830167911

   Addressed comments.
   I want to backport these to `9.12.x` and `10.2.x` as well. Will open 
separate PRs for the same. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Smoke tester requiring Python 3.12+ [lucene]

2025-04-25 Thread via GitHub


stefanvodita commented on issue #14556:
URL: https://github.com/apache/lucene/issues/14556#issuecomment-2830293555

   I see we have 3.12 
[configured](https://github.com/apache/lucene/blob/92d79d47cbd238137ec136f6947c0c9e86003ce0/dev-tools/scripts/pyproject.toml#L2)
 and at least for me that's alright. But do we have a way to warn users of the 
scripts that they need 3.12?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Smoke tester requiring Python 3.12+ [lucene]

2025-04-25 Thread via GitHub


rmuir commented on issue #14556:
URL: https://github.com/apache/lucene/issues/14556#issuecomment-2830316620

   I think there is a way, there is even some existing logic to do it (I 
suspect it has the wrong version set). Additionally, I know existing logic uses 
an outdated method to check the python version: because I disabled the linter 
violation around that.
   
   I've also a concern that for an old python it may just fail at `pip install` 
phase, and never hit such a custom check. All it takes is for a library to drop 
support, and culturally it seems python devs are fairly aggressive on that.
   
   We should definitely improve here though!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


benwtrent commented on code in PR #14543:
URL: https://github.com/apache/lucene/pull/14543#discussion_r2060181757


##
lucene/core/src/java/org/apache/lucene/search/BooleanScorerSupplier.java:
##
@@ -78,11 +86,7 @@ private long computeCost() {
   return minRequiredCost.getAsLong();
 } else {
   final Collection optionalScorers = 
subs.get(Occur.SHOULD);

Review Comment:
   THis is unused now. CI is mad :)
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


benwtrent commented on code in PR #14543:
URL: https://github.com/apache/lucene/pull/14543#discussion_r2060183537


##
lucene/core/src/test/org/apache/lucene/search/TestBoolean2ScorerSupplier.java:
##
@@ -315,6 +318,9 @@ public void testDisjunctionLeadCost() throws IOException {
 new BooleanScorerSupplier(
 new FakeWeight(), subs, RandomPicks.randomFrom(random(), 
ScoreMode.values()), 0, 100)
 .get(100); // triggers assertions as a side-effect
+new BooleanScorerSupplier(
+new FakeWeight(), subs, RandomPicks.randomFrom(random(), 
ScoreMode.values()), 0, 100)
+.bulkScorer(); // triggers assertions as a side-effect

Review Comment:
   ```
   TestBoolean2ScorerSupplier > testDisjunctionLeadCost FAILED
   java.lang.AssertionError: FakeLazyScorer(cost=42,leadCost=54) 
expected:<54> but was:<9223372036854775807>
   at 
__randomizedtesting.SeedInfo.seed([59321C032B75088A:CB8AA4FC1AF92C0A]:0)
   at org.junit.Assert.fail(Assert.java:89)
   at org.junit.Assert.failNotEquals(Assert.java:835)
   at org.junit.Assert.assertEquals(Assert.java:647)
   at 
org.apache.lucene.search.TestBoolean2ScorerSupplier$FakeScorerSupplier.get(TestBoolean2ScorerSupplier.java:108)
   at 
org.apache.lucene.search.BooleanScorerSupplier.optionalBulkScorer(BooleanScorerSupplier.java:294)
   at 
org.apache.lucene.search.BooleanScorerSupplier.booleanScorer(BooleanScorerSupplier.java:216)
   at 
org.apache.lucene.search.BooleanScorerSupplier.bulkScorer(BooleanScorerSupplier.java:178)
   at 
org.apache.lucene.search.TestBoolean2ScorerSupplier.testDisjunctionLeadCost(TestBoolean2ScorerSupplier.java:323)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   ```
   
   Is failing?
   
   ```
   gradlew test --tests TestBoolean2ScorerSupplier.testDisjunctionLeadCost 
-Dtests.seed=59321C032B75088A -Dtests.locale=yrl-VE -Dtests.timezone=Etc/GMT-14 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830552012

   >I hope you don't mind, I updated this PR title and description to better 
reflect the change.
   
   Not at all. Thanks for taking the time to explain the different pieces of 
this code. 
   
   It was really fun debugging this and would definitely love to visit this 
part of the code again. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830554353

   @ChrisHegarty @jpountz Moved the change log to 10.2.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


ChrisHegarty commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830633097

   > This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry 
to 10.2.1 then? cc @ChrisHegarty
   
   What am I missing?  This is not applicable to 10.2.1, since the only changed 
file is Lucene103PostingsReader.java which is not present in 10.2 !
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830623186

   Oops hadn't rebased with main. Fixed it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830648760

   >What am I missing? This is not applicable to 10.2.1, since the only changed 
file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the 
rebase mess something up ?
   
   I had updated 103PostingsReader as the initial plan was not to backport. 
   
   Updated 101PostingsReader which is used in 10.2.1 
   
   Should I also raise against some other branch as well ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830669459

   > Lucene103PostingsReader.java which is not present in 10.2
   
   Yes, we have not backport `Lucene103PostingReader`, see 
https://github.com/apache/lucene/pull/14333#issuecomment-2824644842. I think we 
will need to make the same change to `Lucene101PostingReader` if we want to 
include this in 10.2.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830685389

   I made the same change in `Lucene101PostingsReader` as well. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 merged PR #14511:
URL: https://github.com/apache/lucene/pull/14511


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 closed issue #14445: Term Query is slower post Lucene 9.12 for fields 
with IndexOptions.DOCS
URL: https://github.com/apache/lucene/issues/14445


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 closed issue #14445: Term Query is slower post Lucene 9.12 for fields 
with IndexOptions.DOCS
URL: https://github.com/apache/lucene/issues/14445


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 merged PR #14558:
URL: https://github.com/apache/lucene/pull/14558


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 opened a new pull request, #14558:
URL: https://github.com/apache/lucene/pull/14558

   Backport https://github.com/apache/lucene/pull/14511 to branch_10_2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]

2025-04-25 Thread via GitHub


gf2121 merged PR #14557:
URL: https://github.com/apache/lucene/pull/14557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059790054


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -66,13 +67,20 @@
 public final class Lucene103PostingsReader extends PostingsReaderBase {
 
   static final VectorizationProvider VECTORIZATION_PROVIDER = 
VectorizationProvider.getInstance();
+
   // Dummy impacts, composed of the maximum possible term frequency and the 
lowest possible
   // (unsigned) norm value. This is typically used on tail blocks, which don't 
actually record
-  // impacts as the storage overhead would not be worth any query evaluation 
speedup, since there's
+  // impacts as the storage overhead would not be worth any query evaluation 
speedup, since
+  // there's
   // less than 128 docs left to evaluate anyway.
   private static final List DUMMY_IMPACTS =
   Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L));
 
+  // We stopped storing a placeholder impact with freq=1 for fields with 
IndexOptions.DOCS after
+  // 9.12.0
+  private static final List NON_COMPETITIVE_IMPACTS =
+  Collections.singletonList(new Impact(1, 1L));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059808880


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
 
   @Override
   public int numLevels() {
-return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+return level1LastDocID == NO_MORE_DOCS ? 1 : 2;

Review Comment:
   I had made these changes to bring back the same behavior as 9.11.1
   
   Without these changes, it doesn't read the skip data [ and exits here 
](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L85)
 right after the min competitive score is set.
   
   Whereas in 9.11.1, it reads the skip data [by entering this part of 
ImpactDISI](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L88-L97)
 which @msfroh described in the mail chain
   
   ```
   It was fast because (once the collector has filled its priority queue), we'd 
check the (constant) impacts to find the first block that's strictly better 
than the min competitive score. Since all scores are equal, that would quickly 
skip to the end.
   ```
   
   Although, not keeping this achieves the same result. Should I add a TODO to 
remove this later after we fix the scrorers you mentioned ? 
   
   Also, can this give incorrect results when norms are enabled ? Since, we are 
not reading the impacts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-25 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2059809065


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   I've been looking at trying to push `ReadAdvice` into `MMapDirectory` 
completely - the complication is `SerialIOCountingDirectory`, which uses 
`ReadAdvice` to infer readahead. Maybe best to look at that in more detail in a 
later PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059897538


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
 
   @Override
   public int numLevels() {
-return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+return level1LastDocID == NO_MORE_DOCS ? 1 : 2;
   }
 
   @Override
   public int getDocIdUpTo(int level) {
-if (indexHasFreq == false) {
-  return NO_MORE_DOCS;
-}

Review Comment:
   Done



##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1309,8 +1318,9 @@ public List getImpacts(int level) {
   if (level == 1) {
 return readImpacts(level1SerializedImpacts, level1Impacts);
   }
+  return DUMMY_IMPACTS;
 }
-return DUMMY_IMPACTS;
+return NON_COMPETITIVE_IMPACTS;

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


jpountz commented on PR #14543:
URL: https://github.com/apache/lucene/pull/14543#issuecomment-2829978102

   @peteralfonsi I pushed tests to your branch so that this change has a chance 
to make it to 10.2. I hope you don't mind. We already had good tests for 
`ScorerSupplier#scorer`, I just extended them to cover 
`ScorerSupplier#bulkScorer` too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-25 Thread via GitHub


stefanvodita merged PR #14439:
URL: https://github.com/apache/lucene/pull/14439


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Smoke tester requiring Python 3.12+ [lucene]

2025-04-25 Thread via GitHub


stefanvodita opened a new issue, #14556:
URL: https://github.com/apache/lucene/issues/14556

   #14326 added a [line in 
scriptutil](https://github.com/apache/lucene/blob/92d79d47cbd238137ec136f6947c0c9e86003ce0/dev-tools/scripts/scriptutil.py#L26)
 that imports `override` from `typing`, which was [introduced in Python 
3.12](https://github.com/python/cpython/issues/101561).
   
   Running the smoke tester with 3.11, errors out like so:
   ```
   Traceback (most recent call last):
 File 
"/local/home/voditas/ws/open/lucene/dev-tools/scripts/smokeTestRelease.py", 
line 40, in 
   import scriptutil
 File "/local/home/voditas/ws/open/lucene/dev-tools/scripts/scriptutil.py", 
line 26, in 
   from typing import Self, override
   ImportError: cannot import name 'override' from 'typing' 
(/usr/local/lib/python3.11/typing.py)
   ```
   
   Maybe we can make it clearer what version Python is required?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on PR #14511:
URL: https://github.com/apache/lucene/pull/14511#issuecomment-2829961538

   Added the changes entry.
   
   >undo the new line in SlowImpactsEnum?
   
   ```
   ./gradlew tidy
   ./gradlew spotlessApply
   ./gradlew spotlessJavaApply
   ```
   
   All these command seem to be removing it when I add it back.
   Is there another gradle task ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


jpountz commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059900940


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
 
   @Override
   public int numLevels() {
-return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+return level1LastDocID == NO_MORE_DOCS ? 1 : 2;

Review Comment:
   I understand why this change helps, but this problem is not unique to term 
queries indexed with IndexOptions.DOCS, `ConstantScoreScorer` (used by many 
queries) causes the same problem. I'd rather fix the root cause than merge this 
workaround that we may forget to remove later.
   
   > Also, can this give incorrect results when norms are enabled ? Since, we 
are not reading the impacts.
   
   Impacts help compute upper bounds of the score over ranges of doc IDs. Since 
scores are required to not increase when the norm increases, a score computed 
with norm=1 will always be greater than or equal to a score computed with any 
other norm value. So this is correct, it may just return a score upper bound 
that is greater than the actual best score from the block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059942355


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
 
   @Override
   public int numLevels() {
-return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+return level1LastDocID == NO_MORE_DOCS ? 1 : 2;

Review Comment:
   > Since scores are required to not increase when the norm increases
   
   Wasn't aware of this. Makes sense now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


jpountz commented on PR #14543:
URL: https://github.com/apache/lucene/pull/14543#issuecomment-2829992135

   For reference, the new tests found a similar bug with disjunctive queries 
that configure a minimum number of matching clauses, so I fixed it too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]

2025-04-25 Thread via GitHub


expani commented on code in PR #14511:
URL: https://github.com/apache/lucene/pull/14511#discussion_r2059808880


##
lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java:
##
@@ -1286,14 +1298,11 @@ public long cost() {
 
   @Override
   public int numLevels() {
-return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 
1 : 2;
+return level1LastDocID == NO_MORE_DOCS ? 1 : 2;

Review Comment:
   I had made these changes to bring back the same behavior as 9.11.1
   
   Without these changes, it doesn't read the skip data [ and exits here 
](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L85)
 right after the min competitive score is set.
   
   Whereas in 9.11.1, it reads the skip data [by entering this part of 
ImpactDISI](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L88-L97)
 which @msfroh described in the mail chain
   
   ```
   It was fast because (once the collector has filled its priority queue), 
   we'd check the (constant) impacts to find the first block that's strictly 
better 
   than the min competitive score. Since all scores are equal, that would 
quickly skip to the end.
   ```
   
   Although, not keeping this achieves the same result. Should I add a TODO to 
remove this later after we fix the scrorers you mentioned ? 
   
   Also, can this give incorrect results when norms are enabled ? Since, we are 
not reading the impacts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]

2025-04-25 Thread via GitHub


thecoop commented on code in PR #14482:
URL: https://github.com/apache/lucene/pull/14482#discussion_r2059809065


##
lucene/core/src/java/org/apache/lucene/store/Directory.java:
##
@@ -79,6 +83,31 @@ public abstract class Directory implements Closeable {
*/
   public abstract long fileLength(String name) throws IOException;
 
+  protected void validateIOContext(IOContext context) {
+Map, List> 
hintClasses =
+
context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass));
+
+// there should only be one of FileType, FileData, DataAccess
+List fileTypes =
+hintClasses.getOrDefault(FileTypeHint.class, List.of());
+if (fileTypes.size() > 1) {
+  throw new IllegalArgumentException("Multiple file type hints specified: 
" + fileTypes);
+}
+List fileData = 
hintClasses.getOrDefault(FileDataHint.class, List.of());
+if (fileData.size() > 1) {
+  throw new IllegalArgumentException("Multiple file data hints specified: 
" + fileData);
+}
+List dataAccess =
+hintClasses.getOrDefault(DataAccessHint.class, List.of());
+if (dataAccess.size() > 1) {
+  throw new IllegalArgumentException("Multiple data access hints 
specified: " + dataAccess);
+}
+  }
+
+  protected ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   I've been looking at trying to push `ReadAdvice` into `MMapDirectory` 
completely - the complication is `SerialIOCountingDirectory`, which uses 
`ReadAdvice` to infer readahead. Maybe best to look at that in more detail in a 
later PR. Changing that will let us remove `ReadAdvice` from `Directory` 
completely



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow docID == NO_MORE_DOCS for asserting leaf reader [lucene]

2025-04-25 Thread via GitHub


gf2121 merged PR #14555:
URL: https://github.com/apache/lucene/pull/14555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]

2025-04-25 Thread via GitHub


peteralfonsi commented on PR #14543:
URL: https://github.com/apache/lucene/pull/14543#issuecomment-2830980718

   @jpountz Thanks for the help with the tests - didn't realize 10.2 was coming 
soon. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enhancing the Turkish stop word list with additional common words [lucene]

2025-04-25 Thread via GitHub


stefanvodita commented on code in PR #14549:
URL: https://github.com/apache/lucene/pull/14549#discussion_r2060613797


##
lucene/analysis/common/src/resources/org/apache/lucene/analysis/tr/stopwords.txt:
##
@@ -171,42 +372,108 @@ siz
 sizden
 sizi
 sizin
-şey
-şeyden
-şeyi
-şeyler
-şöyle
-şu
-şuna
-şunda
-şundan
-şunları
-şunu
+sonra
+sonradan
+sonraları
+sonunda
+tabii
+tam
+tamam
+tamamen
+tamamıyla
 tarafından
+tek
 trilyon
 tüm
-üç
-üzere
 var
 vardı
+vasıtasıyla
 ve
+velev
+velhasıl
+velhasılıkelam
 veya
+veyahut
 ya
+yahut
+yakinen
+yakında
+yakından
+yakınlarda
+yalnız
+yalnızca
 yani
 yapacak
-yapılan
-yapılması
-yapıyor
 yapmak
 yaptı
+yaptıkları
 yaptığı
 yaptığını
-yaptıkları
+yapılan
+yapılması
+yapıyor
 yedi
+yeniden
+yenilerde
 yerine
 yetmiş
 yine
 yirmi
+yok
 yoksa
+yoluyla
 yüz
+yüzünden
+zarfında
 zaten
+zati
+zira
+çabuk

Review Comment:
   I'm curious - are ç, ö, ü, ş normally sorted to the end of the alphabet and 
in this order?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Enhancing the Turkish stop word list with additional common words [lucene]

2025-04-25 Thread via GitHub


bahadirborasahin commented on code in PR #14549:
URL: https://github.com/apache/lucene/pull/14549#discussion_r2060660639


##
lucene/analysis/common/src/resources/org/apache/lucene/analysis/tr/stopwords.txt:
##
@@ -171,42 +372,108 @@ siz
 sizden
 sizi
 sizin
-şey
-şeyden
-şeyi
-şeyler
-şöyle
-şu
-şuna
-şunda
-şundan
-şunları
-şunu
+sonra
+sonradan
+sonraları
+sonunda
+tabii
+tam
+tamam
+tamamen
+tamamıyla
 tarafından
+tek
 trilyon
 tüm
-üç
-üzere
 var
 vardı
+vasıtasıyla
 ve
+velev
+velhasıl
+velhasılıkelam
 veya
+veyahut
 ya
+yahut
+yakinen
+yakında
+yakından
+yakınlarda
+yalnız
+yalnızca
 yani
 yapacak
-yapılan
-yapılması
-yapıyor
 yapmak
 yaptı
+yaptıkları
 yaptığı
 yaptığını
-yaptıkları
+yapılan
+yapılması
+yapıyor
 yedi
+yeniden
+yenilerde
 yerine
 yetmiş
 yine
 yirmi
+yok
 yoksa
+yoluyla
 yüz
+yüzünden
+zarfında
 zaten
+zati
+zira
+çabuk

Review Comment:
   I am not sure if this has any performance implications for Lucene, but the 
answer is no.
   
   In Turkish alphabetical order, the letters ç, ğ, ı, ö, ş, and ü are placed 
after their non-diacritical counterparts (c, g, i, o, s, u) rather than at the 
end of the alphabet. The standard Turkish alphabet order is:
   
   a, b, c, ç, d, e, f, g, ğ, h, ı, i, j, k, l, m, n, o, ö, p, r, s, ş, t, u, 
ü, v, y, z



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org