mikemccand commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1712472912
> We seem to create a PagedGrowableWriter with [page size 128 MB here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34), meaning even when building a small FST, we are allocating at least 128 MB pages? OK this was really freaking me out overnight (allocating 128 MB array even for building the tiniest of FSTs), so I dug deeper, and it is a false alarm! It turns out that [`PagedGrowableWriter`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/packed/PagedGrowableWriter.java), via its [parent class `AbstractPagedMutable`, will allocate a "just big enough" final page](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/packed/AbstractPagedMutable.java#L57), instead of the full 128 MB page size. And it will reallocate whenever the `NodeHash` resizes to a larger array. There is also some sneaky power-of-2 mod trickery that ensures that that final page, even on indefinite rehashing, is always sized to exactly a power of 2. And a [real if statement to enforce it](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/packed/PackedInts.java#L867-L869). Phew! I'll open a separate tiny PR to address the wrong `bitsRequired` during rehash -- that's just a smallish performance bug when building biggish FSTs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org