Re: [PR] Use Collections.addAll() instead of manual array copy [lucene]
dweiss commented on code in PR #12977: URL: https://github.com/apache/lucene/pull/12977#discussion_r1441428909 ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexSplitter.java: ## @@ -67,18 +66,10 @@ public static void main(String[] args) throws Exception { if (args[1].equals("-l")) { is.listSegments(); } else if (args[1].equals("-d")) { - List segs = new ArrayList<>(); - for (int x = 2; x < args.length; x++) { -segs.add(args[x]); - } - is.remove(segs.toArray(new String[0])); + is.remove(Arrays.copyOfRange(args, 2, args.length)); Review Comment: This passes validation checks only because the method is suppressed from checks - it uses the sysouts. Arrays.copyOfRange is on the forbidden APIs list, it should be replaced with: java.util.Arrays#copyOfRange(**) @ Prefer using ArrayUtil as Arrays#copyOfRange fills zeros for bad bounds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove unnecessary fields loop from extractWeightedSpanTerms() [lucene]
dweiss commented on PR #12965: URL: https://github.com/apache/lucene/pull/12965#issuecomment-1876672741 Yep, I think this early termination condition makes sense. Could you add it, please? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Nightly benchmark regression for term dict queries [lucene]
gf2121 closed issue #12659: Nightly benchmark regression for term dict queries URL: https://github.com/apache/lucene/issues/12659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Test an FST bytes store that re-reverses (reads bytes forward) on-the-fly [lucene]
mikemccand opened a new issue, #12992: URL: https://github.com/apache/lucene/issues/12992 ### Description At read-time the FST apis must read bytes in reverse, which is perverse and unnatural for all stacks in modern CPUs / IO devices that do read-ahead optimizations for forward reading. It's quite complex to change the underlying FST format to become fundamentally forward only. It'd require rewriting node addresses, which may then take different numbers of `vInt` bytes, causing more renumbering, etc. A simpler first step might be, at FST `freeze()` time (when the FST is done being compiled), reverse all bytes in the underlying storage (on disk or on heap), and at read time, pretend to the caller that they are still reading backwards, yet actually read forwards. We could even do this separately for each store, e.g. start by testing on-heap read-time double reversal, or maybe start with on-disk where the OS's readahead optimizations may matter more. It should be relatively trivial to implement yet hard to think about, and we could see if it helps performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441712407 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); Review Comment: Can we `assert hasArray()` before this? `.array()` is optional (only valid for `DirectByteBuffer` I think). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441713259 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); +} +return new FST.BytesReader() { Review Comment: Hmm I'm still worried about test coverage of this. Conditionals like this are dangerous for Lucene, since our tests only test tiny FSTs, this code path would be rarely/never executed, likely even by our nightly benchmarks. Ideally we would find a way to randomize the page size in many tests, or at least, one test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1877096268 No no no, I am not stale! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441917076 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); +} +return new FST.BytesReader() { Review Comment: Sorry I meant TestFSTs.testRealTerms. The default block size is 32KB and the FST is 55-80KB, thus 2-3 blocks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441927006 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); Review Comment: Since we call toWriteableBufferList the array should be accessible. I think if the array is somehow not accessible HeapByteBuffer (the one being used) would throw an exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441927006 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); Review Comment: Since we call toWriteableBufferList the array should be accessible. I think if the array is somehow not accessible HeapByteBuffer (the one being used) would throw an exception and thus would be caught by tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1441950879 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); Review Comment: Yeah I realize the code is safe but it takes some thinking to confirm it -- maybe add the assert with your awesome above explanation? So future code readers know why this `ByteBuffer` is for sure backed by an array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [WIP] LUCENE-10002: Deprecate FacetsCollector#search helper methods as they internally use IndexSearcher#search(Query, Collector) API [lucene]
mikemccand commented on PR #12890: URL: https://github.com/apache/lucene/pull/12890#issuecomment-1877386385 Yeah I agree it's OK to deprecate without replacement, but maybe in the deprecated javadocs (and in `MIGRATE.txt` for 10.0) add a short explanation about using `MultiCollectorManager` for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Port PR management bot from Apache Beam [lucene]
uschindler commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1877414041 As said in the mailing list thread: We should maybe exclude "draft" PRs from this. I regularily start them for stuff that I won't merge soon. One example is #12706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce workflow for stale PRs [lucene]
uschindler commented on code in PR #12813: URL: https://github.com/apache/lucene/pull/12813#discussion_r1442004702 ## .github/workflows/stale.yml: ## @@ -22,6 +22,7 @@ jobs: with: repo-token: ${{ secrets.GITHUB_TOKEN }} days-before-pr-stale: 14 +exempt-draft-pr: true Review Comment: thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce workflow for stale PRs [lucene]
stefanvodita commented on PR #12813: URL: https://github.com/apache/lucene/pull/12813#issuecomment-1877494977 I ended up checking more of the configurable parameters after looking for the one to exclude draft PRs. Three things to note: 1. I have to configure a few more parameters to handle issues correctly and to not close PRs when they get too stale, which seems to be the default (scary!). 2. There is a `debug-only` mode we can enable, which will make it so the worflow runs without applying any changes. We can read its logs to see if it's behaving the way we want it to. If it is, we can disable debug mode and run it for real. 3. We can configure how many operations the workflow will do to avoid hitting the rate limit. The default is `30`, which is probably a good place to start. As a consequence, we would also avoid having most of the open PRs labeled stale in one go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduce workflow for stale PRs [lucene]
stefanvodita commented on PR #12813: URL: https://github.com/apache/lucene/pull/12813#issuecomment-1877528163 I did some testing with the parameters I mentioned and pushed a new revision. In the tests on my fork, a stale PR took 5 operations to process and a non-stale PR took 1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for index sorting with document blocks [lucene]
s1monw commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1442110320 ## lucene/core/src/java/org/apache/lucene/index/FieldInfos.java: ## @@ -437,6 +488,33 @@ private void verifySoftDeletedFieldName(String fieldName, boolean isSoftDeletesF } } +private void verifyParentFieldName(String fieldName, boolean isParentField) { + if (isParentField) { +if (parentFieldName == null) { + throw new IllegalArgumentException( + "this index has [" + + fieldName + + "] as parent document field already but parent document field is not configured in IWC"); +} else if (fieldName.equals(parentFieldName) == false) { + throw new IllegalArgumentException( + "cannot configure [" + + parentFieldName + + "] as parent document field ; this index uses [" + + fieldName Review Comment: I think this is confusing from it's name. The `parentFieldName` comes from IWC and fieldname is the incoming field. I think it's right the way it is, maybe we need to do some rewording altogether but when you look at the tests I think the error message makes sense? I would like to find a way that users don't have to specify these fieldnames altogether but that should be a follow up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Remove unnecessary comment [lucene]
andrross opened a new pull request, #12993: URL: https://github.com/apache/lucene/pull/12993 ### Description A [previous iteration][1] of this code used an AtomicInteger and required this comment. The committed version uses a self-documenting boolean and the comment is not needed. [1]: https://github.com/apache/lucene/pull/12689/commits/5c847a0a3b2312414c8a4623d288d8fe96d52ff8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove unnecessary comment [lucene]
andrross commented on PR #12993: URL: https://github.com/apache/lucene/pull/12993#issuecomment-1877693422 FYI @javanna -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Collections.addAll() instead of manual array copy [lucene]
dweiss commented on code in PR #12977: URL: https://github.com/apache/lucene/pull/12977#discussion_r1442253173 ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexSplitter.java: ## @@ -67,18 +66,10 @@ public static void main(String[] args) throws Exception { if (args[1].equals("-l")) { is.listSegments(); } else if (args[1].equals("-d")) { - List segs = new ArrayList<>(); - for (int x = 2; x < args.length; x++) { -segs.add(args[x]); - } - is.remove(segs.toArray(new String[0])); + is.remove(Arrays.copyOfRange(args, 2, args.length)); Review Comment: I allowed myself to correct this one and push it to your branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Collections.addAll() instead of manual array copy and misc. code cleanups [lucene]
dweiss merged PR #12977: URL: https://github.com/apache/lucene/pull/12977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]
dungba88 commented on PR #12985: URL: https://github.com/apache/lucene/pull/12985#issuecomment-1877952318 @mikemccand I'm wondering if there is already some benchmark that can show the RAM saved by this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]
msfroh commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1877979164 If nobody else is working on this, I think I'd like to take it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize FST on-heap BytesReader [lucene]
dungba88 commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1442419722 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { frozen = true; -// this operation are costly, so we want to compute it once and cache -dataInput = dataOutput.toDataInput(); +// this operation is costly, so we want to compute it once and cache +this.byteBuffers = dataOutput.toWriteableBufferList(); } @Override public FST.BytesReader getReverseBytesReader() { -assert dataInput != null; // freeze() must be called first -return new ReverseRandomAccessReader(dataInput); +assert byteBuffers != null; // freeze() must be called first +if (byteBuffers.size() == 1) { + // use a faster implementation for single-block case + return new ReverseBytesReader(byteBuffers.get(0).array()); Review Comment: That's a good idea. I've added the assertion. I decided to put it right after the ByteBuffer list is created (in `freeze()`) and assert all ByteBuffer in the list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]
stefanvodita commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1878171579 @msfroh - I was looking into this as well and had some thoughts about how to do it. We could replace [`ParallelTaxonomyArrays`](https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/ParallelTaxonomyArrays.java) with a new interface that offers three operations for each of the arrays: ``` interface ChunkedParallelTaxonomyArrays { /* Record new entry. */ public void appendParent(int parent); /* Retrieve this ordinal's parent. From the user's perspective, this is like an array look-up. */ public int getParent(int ord); /* There are some places where we need to know how many parents exist in total. */ public int sizeParents(); // Same for children and siblings ... } ``` To implement this inteface, we could use an [`IntBlockPool`](https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/core/src/java/org/apache/lucene/util/IntBlockPool.java). We would allocate new int buffers in the block pool as needed and preserve the block pool across [`DirectoryTaxonomyReader`](https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java) refreshes. There are definitely some disadvantages with the block pool idea: 1. We're preserving a mutable data-structure across taxonomy refreshes. There is [precedent](https://github.com/apache/lucene/blob/7b8aece125aabff2823626d5b939abf4747f63a7/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java#L172) though, with the caches in `DirectoryTaxonomyReader`. 2. We would be slightly overallocating by having the last buffer in the pool not be completely used, but I think this is a good trade-off to take for the increased efficiency and simplicity. What do you think, did you have something else in mind? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Clean up unused code & variables [lucene]
dungba88 opened a new pull request, #12994: URL: https://github.com/apache/lucene/pull/12994 ### Description Clean up unused code & variables in FSTCompiler -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org