mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900

   Digging into this a bit, I think I found some silly performance bugs in our 
current FST impl:
     * We seem to create a `PagedGrowableWriter` with [page size 128 MB 
here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34),
 meaning even when building a small FST, we are allocating at least 128 MB 
pages?
     * When we rehash, we create a new `PagedGrowableWriter`, with too small 
estimated `bitsRequired` [since we pass `count` (the number of nodes in the 
hash) instead of the most recently added `long 
node`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L182).
  We are actually storing the `long node` values, so it really should be `node` 
not `count`.  The effect of this is we make `PagedGrowableWriter` work harder 
than necessary to reallocate when we store the next `node` that doesn't fit in 
that `bitsRequired`.
   
   I'll try to get the LRU hash working, but if that takes too long, we should 
separately fix these performance bugs (if I'm right that these are really 
bugs!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to