Re: [PR] Optimize FST on-heap BytesReader [lucene]

2024-01-05 Thread via GitHub
mikemccand commented on code in PR #12879: URL: https://github.com/apache/lucene/pull/12879#discussion_r1442807365 ## lucene/core/src/java/org/apache/lucene/util/fst/ReadWriteDataOutput.java: ## @@ -56,14 +66,59 @@ public long ramBytesUsed() { public void freeze() { fr

Re: [PR] Add support for index sorting with document blocks [lucene]

2024-01-05 Thread via GitHub
mikemccand commented on code in PR #12829: URL: https://github.com/apache/lucene/pull/12829#discussion_r1442838857 ## lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java: ## @@ -1937,4 +1937,97 @@ public void setMergeInfo(SegmentCommitInfo info) { targetDir.clo

Re: [PR] Improve vector search speed by using FixedBitSet [lucene]

2024-01-05 Thread via GitHub
benwtrent closed pull request #12789: Improve vector search speed by using FixedBitSet URL: https://github.com/apache/lucene/pull/12789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-05 Thread via GitHub
msfroh commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1879227454 > What do you think, did you have something else in mind? Oh -- I didn't have anything in mind. I just saw the issue and thought, "Hey, I could figure out how to do that!" Soun

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-05 Thread via GitHub
stefanvodita commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1879330131 I'd be happy to work together on it! If we go the route I was proposing, there's a non-trivial amount of work to do: 1. Create the new interface for taxonomy arrays and use i

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-05 Thread via GitHub
msfroh commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1879403796 I took a look and I think we might be able to do it a little easier: ``` public abstract class ParallelTaxonomyArrays { public class ChunkedArray { private final

[PR] Split taxonomy arrays across chunks [lucene]

2024-01-05 Thread via GitHub
msfroh opened a new pull request, #12995: URL: https://github.com/apache/lucene/pull/12995 ### Description Taxonomy ordinals are added in an append-only way. Instead of reallocating a single big array when loading new taxonomy ordinals and copying all the values from th

Re: [I] Taxonomy facets: can we change massive `int[]` for parent/child/sibling tree to paged/block `int[]` to reduce RAM pressure? [lucene]

2024-01-05 Thread via GitHub
msfroh commented on issue #12989: URL: https://github.com/apache/lucene/issues/12989#issuecomment-1879502787 I ended up running with that idea (sort of) and implemented this: https://github.com/apache/lucene/pull/12995 The unit tests pass, but I don't think any of them allocate more t