Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-07-10 Thread via GitHub
jpountz merged PR #13359: URL: https://github.com/apache/lucene/pull/13359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-07-08 Thread via GitHub
github-actions[bot] commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2215694348 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-24 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2186240846 I will merge soon if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-18 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2176590274 I pushed a new approach. Instead of `prepareSeekExact` returning `void`, it now returns a `Supplier` and forbids calling any other method on `TermsEnum` until the `Supplier` has been con

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-17 Thread via GitHub
github-actions[bot] commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2174672285 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-03 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1624448703 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-02 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-02 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-27 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2132902792 Now that #13408 has been merged, I could update the benchmark to simply call IndexSearcher#search. ```java import java.io.IOException; import java.io.UncheckedIOE

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-24 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1613219485 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-24 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1613174241 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-23 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1611094333 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1611094333 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609713219 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609895554 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609879874 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609878850 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609839123 ## lucene/core/src/java/org/apache/lucene/search/TermQuery.java: ## @@ -150,7 +170,12 @@ public Scorer get(long leadCost) throws IOException { @Override

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609830164 ## lucene/core/src/java/org/apache/lucene/search/BlendedTermQuery.java: ## @@ -19,6 +19,7 @@ import java.io.IOException; import java.util.Arrays; import java.util.L

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609822485 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609822235 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef tex

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609821173 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3754,13 +3754,17 @@ public static Status.TermVectorStatus testTermVectors( Ter

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609713219 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,21 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609703874 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609702603 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609700636 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609634889 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-22 Thread via GitHub
mikemccand commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1609620348 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-21 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2122746505 It creates a 50GB terms dictionary while my machine only has ~28GB of RAM for the page cache, so many terms dictionary lookups result in page faults. -- This is an automated message fr

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-21 Thread via GitHub
mikemccand commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2122733760 > But I created a benchmark that starts looking like running a Lucene query that is encouraging Was this with a forced-cold index? -- This is an automated message from the Ap

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-17 Thread via GitHub
rmuir commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1604776531 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,15 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef text)

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-15 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2112625165 I iterated a bit on this change: - `TermsEnum#prepareSeekExact` is introduced, which only prefetches data which is later going to be needed by `TermsEnum#seekExact`. - `TermStates

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-13 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1598128358 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,31 @@ private boolean setEOF() { return true; }

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-11 Thread via GitHub
rmuir commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1597522761 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,31 @@ private boolean setEOF() { return true; } +

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-11 Thread via GitHub
jpountz commented on PR #13359: URL: https://github.com/apache/lucene/pull/13359#issuecomment-2105658311 This is a draft as I need to do more work on tests and making sure that this new method cannot corrupt the state of the `SegmentTermsEnum`. But I created a benchmark that start