mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900
Digging into this a bit, I think I found some silly performance bugs in our
current FST impl:
* We seem to create a `PagedGrowableWriter` with [page size 128 MB
here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34),
meaning even when building a small FST, we are allocating at least 128 MB
pages?
* When we rehash, we create a new `PagedGrowableWriter`, with too small
estimated `bitsRequired` [since we pass `count` (the number of nodes in the
hash) instead of the most recently added `long
node`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L182).
We are actually storing the `long node` values, so it really should be `node`
not `count`. The effect of this is we make `PagedGrowableWriter` work harder
than necessary to reallocate when we store the next `node` that doesn't fit in
that `bitsRequired`.
I'll try to get the LRU hash working, but if that takes too long, we should
separately fix these performance bugs (if I'm right that these are really
bugs!).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]