Re: [I] Can FST read bytes forward? [lucene]

via GitHub Mon, 13 Nov 2023 04:35:36 -0800


mikemccand commented on issue #12355:
URL: https://github.com/apache/lucene/issues/12355#issuecomment-1808084756


   +1 to find a way to reverse the bytes at compilation time.
   
   The reversal of bytes during FST compilation is so hard to think about!  It 
happens because the FST is logically append-only, and sort of grows backwards 
(from the suffixes, inwards onto prefixes), and the newly written nodes always 
point backwards to the already written (appended to growing `byte[]`, or, soon 
`DataOutput`).
   
   But logically we ought to be able to write all the bytes backwards, then 
reverse them, but then when resolving absolute or relative node addresses at 
FST read time, we'd need to re-reverse those addresses.  Or, we could try to 
rewrite the embedded node address references during/after reversal so we don't 
need to re-reverse on each node read?  The pointers will necessarily be 
different (take different number of `byte[]` after reversal) since small node 
addresses would become big node addresses and take more bytes to encode 
absolute.  It might even make the FST larger, since the common suffixes today 
will have smallish/earliesh node addresses.  This is similar to what `pack` 
used to do (actually rewrite addresses), and it was hairy.
   
   So maybe for starters we do the simple "reverse `byte[]` after writing them 
all" and then "re-reverse addresses on decode"?  I wonder if Tantivy FST has 
some sort of post-write-reverse step?  Or does it always do cache-unfriendly 
read backwards during FST traversal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Can FST read bytes forward? [lucene]

Reply via email to