mikemccand commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900
Digging into this a bit, I think I found some silly performance bugs in our current FST impl: * We seem to create a `PagedGrowableWriter` with [page size 128 MB here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34), meaning even when building a small FST, we are allocating at least 128 MB pages? * When we rehash, we create a new `PagedGrowableWriter`, with too small estimated `bitsRequired` [since we pass `count` (the number of nodes in the hash) instead of the most recently added `long node`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L182). We are actually storing the `long node` values, so it really should be `node` not `count`. The effect of this is we make `PagedGrowableWriter` work harder than necessary to reallocate when we store the next `node` that doesn't fit in that `bitsRequired`. I'll try to get the LRU hash working, but if that takes too long, we should separately fix these performance bugs (if I'm right that these are really bugs!). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org