Re: [PR] Disable sort optimization when tracking all docs [lucene]
github-actions[bot] commented on PR #14395: URL: https://github.com/apache/lucene/pull/14395#issuecomment-2831654167 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
jpountz commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830429994 I hope you don't mind, I updated this PR title and description to better reflect the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
jpountz commented on code in PR #14543: URL: https://github.com/apache/lucene/pull/14543#discussion_r2060222310 ## lucene/core/src/test/org/apache/lucene/search/TestBoolean2ScorerSupplier.java: ## @@ -315,6 +318,9 @@ public void testDisjunctionLeadCost() throws IOException { new BooleanScorerSupplier( new FakeWeight(), subs, RandomPicks.randomFrom(random(), ScoreMode.values()), 0, 100) .get(100); // triggers assertions as a side-effect +new BooleanScorerSupplier( +new FakeWeight(), subs, RandomPicks.randomFrom(random(), ScoreMode.values()), 0, 100) +.bulkScorer(); // triggers assertions as a side-effect Review Comment: Thanks, I had only run lucky seeds that had not exercised ScoreMode.TOP_SCORES, which triggers different logic for producing a bulk scorer (MaxScoreBulkScorer instead of BooleanScorer). This is a real failure. I decided to relax assertions a bit instead of refactoring BooleanScorerSupplier too much, since all cases when the lead cost is greater than or equal to the cost of a clause are practically equivalent and mean that this clause is leading iteration. This also helped simplify tests a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
jpountz commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830415622 This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] add RawTFSimilarity class [lucene]
cpoerschke commented on PR #13749: URL: https://github.com/apache/lucene/pull/13749#issuecomment-2830589655 > Your reference to `DelimitedTermFrequencyTokenFilter` suggests that the freq here is more a feature than an actual frequency of a term in a doc. From an API perspective, this would make me want to expose it via an IndexableField sub class, with a query factory, a bit like `FeatureQuery` but for integer values? (belatedly) thanks for this mention! yes, the value is a feature of a term in a doc, and actually originally non-integer. (still work in progress) https://github.com/apache/solr/pull/3318 documentation now includes both a `RawTFSimilarity` and a `FeatureQuery` section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
ChrisHegarty commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830590012 > @ChrisHegarty @jpountz Moved the change log to 10.2.1 Eh! I think you moved it to 10.2.0, rather than 10.2.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 opened a new pull request, #14557: URL: https://github.com/apache/lucene/pull/14557 Backport https://github.com/apache/lucene/pull/14511 to branch_10x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] [Bug] Lead cost in boolean conjunction queries can be miscalculated [lucene]
ChrisHegarty closed issue #14542: [Bug] Lead cost in boolean conjunction queries can be miscalculated URL: https://github.com/apache/lucene/issues/14542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
ChrisHegarty merged PR #14543: URL: https://github.com/apache/lucene/pull/14543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830734669 To be clear, i raised https://github.com/apache/lucene/pull/14557 and https://github.com/apache/lucene/pull/14558 for backporting. I plan to merge this now if no one objects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830738175 @expani could you resolve the conflicts so that i can merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2059828062 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: I've added an override function to `MMapDirectory` so the `ReadAdvice` can be explicitly specified on a per-file basis ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: I've added an override function to `MMapDirectory` in eb80be0a7c so the `ReadAdvice` can be explicitly specified on a per-file basis -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]
mikemccand commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2830109470 > > The Linux change targets both MGLRU and normal LRU. The impact is more pronounced in MGLRU, as page reclamation is more aggressive there. However, the semantic change for this advice is the same in both cases. In the latest kernels, using `MADV_RANDOM` does not mark the page as accessed, regardless of whether MGLRU is in use. That's a big shift of semantic for our default read advice. > > Easy argument to change the default to `NORMAL`. +1 to go back to `NORMAL` as default, until we can better understand the regressions we (OpenSearch users, Elasticsearch users, and Amazon product search (my team)) are seeing with `MADV_RANDOM`. I think `MADV_RANDOM` can also be harmful for "hot" (index expected to mostly fit in RAM) use cases. For our service (Amazon product search), which is mostly hot, we had to hard-override back to `IOContext.DEFAULT` for `.vec` and `.veq` (quantized vectors) in a hackity way (subclass `MMapDirectory` to insert shim (that rewrites the `IOContext`) into `openInput` -- oooh as @jpountz describes at https://github.com/apache/lucene/issues/14348#issuecomment-2730966937, except opposite), in some cases (lighting a new commit point during NRT replication) where we had to turn off `MMapDirectory.setPreload`. At Lucene's defaults (`MADV_RANDOM` for the KNN vector files) we saw horribly slow warmup of our searchers ... basically, paging in all those vectors one at a time as "real" queries visited the HNSW graph was crazy slow (many minutes) even on crazy fast infra (AWS), whereas letting the OS do its default "thing" (bulk readahead of N pages when a page miss happens?) was much quicker. Much less "page fault amplification". Benchmarks in luceneutil also hit this -- minutes and minutes of swapping in the HNSW graph (without `.setPreload`) from a fast local SSD, but I think luceneutil is still using Lucene's `IOContext` defaults here. Actually, if we `MADV_RANDOM` and `.setPreload` to load `.vec`, what is the effect? Does the preloading still work (OS caches/touches all pages, and does mark them as accessed (so they stay cached), despite the `MADV_RANDOM`)? Is it much slower to preload when you `MADV_RANDOM` (though presumably it is sequentially bringing pages in)? > AFAIK https://github.com/apache/lucene/issues/14422 is working on fixing that "real problem". +1 to work towards this more general fix. But, sheesh, it looks so complicated, depending on hot vs cold use case, preloading or not, which part of the Lucene index (KNN, terms, postings), Linux kernel versions, ... in the mean time I think we should revert back to `NORMAL`/`DEFAULT` as Lucene's default... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]
rmuir commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2830144654 > +1 to work towards this more general fix. But, sheesh, it looks so complicated, depending on hot vs cold use case, preloading or not, which part of the Lucene index (KNN, terms, postings), Linux kernel versions, ... in the mean time I think we should revert back to `NORMAL`/`DEFAULT` as Lucene's default... Yes, my argument is that it is complicated, and lucene needs to get out of the business of it. We can flip-flop on this setting over and over again, and each time some users will experience regressions and others will get happy. I don't want to see this, it is just more of the same. Lucene needs to get out of the business of doing it. But if you want a quick fix: I'd be in favor of a PR that removes all madvise/preloading/otherwise from lucene. Let it be the "user's decision" on this shit, the different use-cases and platforms are too different, lucene cannot have "defaults" in java code and pretend like thats gonna work well across all these various use-cases and platforms. it does not work. and for users that care and want this stuff, hopefully it gets easier for them once #14422 lands. This way its not constant regressing back and forth because of settings flipflopping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830167911 Addressed comments. I want to backport these to `9.12.x` and `10.2.x` as well. Will open separate PRs for the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Smoke tester requiring Python 3.12+ [lucene]
stefanvodita commented on issue #14556: URL: https://github.com/apache/lucene/issues/14556#issuecomment-2830293555 I see we have 3.12 [configured](https://github.com/apache/lucene/blob/92d79d47cbd238137ec136f6947c0c9e86003ce0/dev-tools/scripts/pyproject.toml#L2) and at least for me that's alright. But do we have a way to warn users of the scripts that they need 3.12? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Smoke tester requiring Python 3.12+ [lucene]
rmuir commented on issue #14556: URL: https://github.com/apache/lucene/issues/14556#issuecomment-2830316620 I think there is a way, there is even some existing logic to do it (I suspect it has the wrong version set). Additionally, I know existing logic uses an outdated method to check the python version: because I disabled the linter violation around that. I've also a concern that for an old python it may just fail at `pip install` phase, and never hit such a custom check. All it takes is for a library to drop support, and culturally it seems python devs are fairly aggressive on that. We should definitely improve here though! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
benwtrent commented on code in PR #14543: URL: https://github.com/apache/lucene/pull/14543#discussion_r2060181757 ## lucene/core/src/java/org/apache/lucene/search/BooleanScorerSupplier.java: ## @@ -78,11 +86,7 @@ private long computeCost() { return minRequiredCost.getAsLong(); } else { final Collection optionalScorers = subs.get(Occur.SHOULD); Review Comment: THis is unused now. CI is mad :) ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
benwtrent commented on code in PR #14543: URL: https://github.com/apache/lucene/pull/14543#discussion_r2060183537 ## lucene/core/src/test/org/apache/lucene/search/TestBoolean2ScorerSupplier.java: ## @@ -315,6 +318,9 @@ public void testDisjunctionLeadCost() throws IOException { new BooleanScorerSupplier( new FakeWeight(), subs, RandomPicks.randomFrom(random(), ScoreMode.values()), 0, 100) .get(100); // triggers assertions as a side-effect +new BooleanScorerSupplier( +new FakeWeight(), subs, RandomPicks.randomFrom(random(), ScoreMode.values()), 0, 100) +.bulkScorer(); // triggers assertions as a side-effect Review Comment: ``` TestBoolean2ScorerSupplier > testDisjunctionLeadCost FAILED java.lang.AssertionError: FakeLazyScorer(cost=42,leadCost=54) expected:<54> but was:<9223372036854775807> at __randomizedtesting.SeedInfo.seed([59321C032B75088A:CB8AA4FC1AF92C0A]:0) at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.apache.lucene.search.TestBoolean2ScorerSupplier$FakeScorerSupplier.get(TestBoolean2ScorerSupplier.java:108) at org.apache.lucene.search.BooleanScorerSupplier.optionalBulkScorer(BooleanScorerSupplier.java:294) at org.apache.lucene.search.BooleanScorerSupplier.booleanScorer(BooleanScorerSupplier.java:216) at org.apache.lucene.search.BooleanScorerSupplier.bulkScorer(BooleanScorerSupplier.java:178) at org.apache.lucene.search.TestBoolean2ScorerSupplier.testDisjunctionLeadCost(TestBoolean2ScorerSupplier.java:323) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ``` Is failing? ``` gradlew test --tests TestBoolean2ScorerSupplier.testDisjunctionLeadCost -Dtests.seed=59321C032B75088A -Dtests.locale=yrl-VE -Dtests.timezone=Etc/GMT-14 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830552012 >I hope you don't mind, I updated this PR title and description to better reflect the change. Not at all. Thanks for taking the time to explain the different pieces of this code. It was really fun debugging this and would definitely love to visit this part of the code again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830554353 @ChrisHegarty @jpountz Moved the change log to 10.2.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
ChrisHegarty commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830633097 > This sounds safe enough for 10.2.1 for me. Can you move the CHANGES entry to 10.2.1 then? cc @ChrisHegarty What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830623186 Oops hadn't rebased with main. Fixed it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830648760 >What am I missing? This is not applicable to 10.2.1, since the only changed file is Lucene103PostingsReader.java which is not present in 10.2 ! Did the rebase mess something up ? I had updated 103PostingsReader as the initial plan was not to backport. Updated 101PostingsReader which is used in 10.2.1 Should I also raise against some other branch as well ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830669459 > Lucene103PostingsReader.java which is not present in 10.2 Yes, we have not backport `Lucene103PostingReader`, see https://github.com/apache/lucene/pull/14333#issuecomment-2824644842. I think we will need to make the same change to `Lucene101PostingReader` if we want to include this in 10.2.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2830685389 I made the same change in `Lucene101PostingsReader` as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 merged PR #14511: URL: https://github.com/apache/lucene/pull/14511 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS [lucene]
gf2121 closed issue #14445: Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS URL: https://github.com/apache/lucene/issues/14445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS [lucene]
gf2121 closed issue #14445: Term Query is slower post Lucene 9.12 for fields with IndexOptions.DOCS URL: https://github.com/apache/lucene/issues/14445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 merged PR #14558: URL: https://github.com/apache/lucene/pull/14558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 opened a new pull request, #14558: URL: https://github.com/apache/lucene/pull/14558 Backport https://github.com/apache/lucene/pull/14511 to branch_10_2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [Backport] Provide better impacts for fields indexed with IndexOptions.DOCS [lucene]
gf2121 merged PR #14557: URL: https://github.com/apache/lucene/pull/14557 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059790054 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -66,13 +67,20 @@ public final class Lucene103PostingsReader extends PostingsReaderBase { static final VectorizationProvider VECTORIZATION_PROVIDER = VectorizationProvider.getInstance(); + // Dummy impacts, composed of the maximum possible term frequency and the lowest possible // (unsigned) norm value. This is typically used on tail blocks, which don't actually record - // impacts as the storage overhead would not be worth any query evaluation speedup, since there's + // impacts as the storage overhead would not be worth any query evaluation speedup, since + // there's // less than 128 docs left to evaluate anyway. private static final List DUMMY_IMPACTS = Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L)); + // We stopped storing a placeholder impact with freq=1 for fields with IndexOptions.DOCS after + // 9.12.0 + private static final List NON_COMPETITIVE_IMPACTS = + Collections.singletonList(new Impact(1, 1L)); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059808880 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: I had made these changes to bring back the same behavior as 9.11.1 Without these changes, it doesn't read the skip data [ and exits here ](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L85) right after the min competitive score is set. Whereas in 9.11.1, it reads the skip data [by entering this part of ImpactDISI](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L88-L97) which @msfroh described in the mail chain ``` It was fast because (once the collector has filled its priority queue), we'd check the (constant) impacts to find the first block that's strictly better than the min competitive score. Since all scores are equal, that would quickly skip to the end. ``` Although, not keeping this achieves the same result. Should I add a TODO to remove this later after we fix the scrorers you mentioned ? Also, can this give incorrect results when norms are enabled ? Since, we are not reading the impacts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2059809065 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: I've been looking at trying to push `ReadAdvice` into `MMapDirectory` completely - the complication is `SerialIOCountingDirectory`, which uses `ReadAdvice` to infer readahead. Maybe best to look at that in more detail in a later PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059897538 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; } @Override public int getDocIdUpTo(int level) { -if (indexHasFreq == false) { - return NO_MORE_DOCS; -} Review Comment: Done ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1309,8 +1318,9 @@ public List getImpacts(int level) { if (level == 1) { return readImpacts(level1SerializedImpacts, level1Impacts); } + return DUMMY_IMPACTS; } -return DUMMY_IMPACTS; +return NON_COMPETITIVE_IMPACTS; Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
jpountz commented on PR #14543: URL: https://github.com/apache/lucene/pull/14543#issuecomment-2829978102 @peteralfonsi I pushed tests to your branch so that this change has a chance to make it to 10.2. I hope you don't mind. We already had good tests for `ScorerSupplier#scorer`, I just extended them to cover `ScorerSupplier#bulkScorer` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]
stefanvodita merged PR #14439: URL: https://github.com/apache/lucene/pull/14439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Smoke tester requiring Python 3.12+ [lucene]
stefanvodita opened a new issue, #14556: URL: https://github.com/apache/lucene/issues/14556 #14326 added a [line in scriptutil](https://github.com/apache/lucene/blob/92d79d47cbd238137ec136f6947c0c9e86003ce0/dev-tools/scripts/scriptutil.py#L26) that imports `override` from `typing`, which was [introduced in Python 3.12](https://github.com/python/cpython/issues/101561). Running the smoke tester with 3.11, errors out like so: ``` Traceback (most recent call last): File "/local/home/voditas/ws/open/lucene/dev-tools/scripts/smokeTestRelease.py", line 40, in import scriptutil File "/local/home/voditas/ws/open/lucene/dev-tools/scripts/scriptutil.py", line 26, in from typing import Self, override ImportError: cannot import name 'override' from 'typing' (/usr/local/lib/python3.11/typing.py) ``` Maybe we can make it clearer what version Python is required? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on PR #14511: URL: https://github.com/apache/lucene/pull/14511#issuecomment-2829961538 Added the changes entry. >undo the new line in SlowImpactsEnum? ``` ./gradlew tidy ./gradlew spotlessApply ./gradlew spotlessJavaApply ``` All these command seem to be removing it when I add it back. Is there another gradle task ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
jpountz commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059900940 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: I understand why this change helps, but this problem is not unique to term queries indexed with IndexOptions.DOCS, `ConstantScoreScorer` (used by many queries) causes the same problem. I'd rather fix the root cause than merge this workaround that we may forget to remove later. > Also, can this give incorrect results when norms are enabled ? Since, we are not reading the impacts. Impacts help compute upper bounds of the score over ranges of doc IDs. Since scores are required to not increase when the norm increases, a score computed with norm=1 will always be greater than or equal to a score computed with any other norm value. So this is correct, it may just return a score upper bound that is greater than the actual best score from the block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059942355 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: > Since scores are required to not increase when the norm increases Wasn't aware of this. Makes sense now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
jpountz commented on PR #14543: URL: https://github.com/apache/lucene/pull/14543#issuecomment-2829992135 For reference, the new tests found a similar bug with disjunctive queries that configure a minimum number of matching clauses, so I fixed it too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2059808880 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: I had made these changes to bring back the same behavior as 9.11.1 Without these changes, it doesn't read the skip data [ and exits here ](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L85) right after the min competitive score is set. Whereas in 9.11.1, it reads the skip data [by entering this part of ImpactDISI](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/ImpactsDISI.java#L88-L97) which @msfroh described in the mail chain ``` It was fast because (once the collector has filled its priority queue), we'd check the (constant) impacts to find the first block that's strictly better than the min competitive score. Since all scores are equal, that would quickly skip to the end. ``` Although, not keeping this achieves the same result. Should I add a TODO to remove this later after we fix the scrorers you mentioned ? Also, can this give incorrect results when norms are enabled ? Since, we are not reading the impacts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2059809065 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: I've been looking at trying to push `ReadAdvice` into `MMapDirectory` completely - the complication is `SerialIOCountingDirectory`, which uses `ReadAdvice` to infer readahead. Maybe best to look at that in more detail in a later PR. Changing that will let us remove `ReadAdvice` from `Directory` completely -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow docID == NO_MORE_DOCS for asserting leaf reader [lucene]
gf2121 merged PR #14555: URL: https://github.com/apache/lucene/pull/14555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
peteralfonsi commented on PR #14543: URL: https://github.com/apache/lucene/pull/14543#issuecomment-2830980718 @jpountz Thanks for the help with the tests - didn't realize 10.2 was coming soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enhancing the Turkish stop word list with additional common words [lucene]
stefanvodita commented on code in PR #14549: URL: https://github.com/apache/lucene/pull/14549#discussion_r2060613797 ## lucene/analysis/common/src/resources/org/apache/lucene/analysis/tr/stopwords.txt: ## @@ -171,42 +372,108 @@ siz sizden sizi sizin -şey -şeyden -şeyi -şeyler -şöyle -şu -şuna -şunda -şundan -şunları -şunu +sonra +sonradan +sonraları +sonunda +tabii +tam +tamam +tamamen +tamamıyla tarafından +tek trilyon tüm -üç -üzere var vardı +vasıtasıyla ve +velev +velhasıl +velhasılıkelam veya +veyahut ya +yahut +yakinen +yakında +yakından +yakınlarda +yalnız +yalnızca yani yapacak -yapılan -yapılması -yapıyor yapmak yaptı +yaptıkları yaptığı yaptığını -yaptıkları +yapılan +yapılması +yapıyor yedi +yeniden +yenilerde yerine yetmiş yine yirmi +yok yoksa +yoluyla yüz +yüzünden +zarfında zaten +zati +zira +çabuk Review Comment: I'm curious - are ç, ö, ü, ş normally sorted to the end of the alphabet and in this order? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Enhancing the Turkish stop word list with additional common words [lucene]
bahadirborasahin commented on code in PR #14549: URL: https://github.com/apache/lucene/pull/14549#discussion_r2060660639 ## lucene/analysis/common/src/resources/org/apache/lucene/analysis/tr/stopwords.txt: ## @@ -171,42 +372,108 @@ siz sizden sizi sizin -şey -şeyden -şeyi -şeyler -şöyle -şu -şuna -şunda -şundan -şunları -şunu +sonra +sonradan +sonraları +sonunda +tabii +tam +tamam +tamamen +tamamıyla tarafından +tek trilyon tüm -üç -üzere var vardı +vasıtasıyla ve +velev +velhasıl +velhasılıkelam veya +veyahut ya +yahut +yakinen +yakında +yakından +yakınlarda +yalnız +yalnızca yani yapacak -yapılan -yapılması -yapıyor yapmak yaptı +yaptıkları yaptığı yaptığını -yaptıkları +yapılan +yapılması +yapıyor yedi +yeniden +yenilerde yerine yetmiş yine yirmi +yok yoksa +yoluyla yüz +yüzünden +zarfında zaten +zati +zira +çabuk Review Comment: I am not sure if this has any performance implications for Lucene, but the answer is no. In Turkish alphabetical order, the letters ç, ğ, ı, ö, ş, and ü are placed after their non-diacritical counterparts (c, g, i, o, s, u) rather than at the end of the alphabet. The standard Turkish alphabet order is: a, b, c, ç, d, e, f, g, ğ, h, ı, i, j, k, l, m, n, o, ö, p, r, s, ş, t, u, ü, v, y, z -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org