mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1351949356
########## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ########## @@ -21,19 +21,18 @@ import java.util.List; import org.apache.lucene.store.DataInput; import org.apache.lucene.store.DataOutput; -import org.apache.lucene.util.Accountable; import org.apache.lucene.util.RamUsageEstimator; // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read while writing which FST needs -class BytesStore extends DataOutput implements Accountable { +class BytesStore extends FSTWriter { Review Comment: > Besides `reverse` most of the methods here also needs to write/update arbitrarily previously written bytes (within the current frontier input), such as writeBytes/writeByte/copyBytes/truncate. I think these "modify prior bytes" are only called from `fst.addNode`, where that (complex!!) method is basically using the tail of the `BytesStore` as a scratch area? I.e. it writes some bytes first, and might go back and shuffle the bytes around, depending on whether it's "direct addressing" or "binary search"? Once `FST.addNode` completes, those written bytes are never altered? > Moreover, there is also methods which requires reading of previously written bytes that can't simply fit in the DataOutput (getReverseBytesReader e.g). Hmm `getReverseBytesReader` is indeed a problem. I wonder how the [Tantivy FST impl](https://blog.burntsushi.net/transducers/) deals with this? If we take the `FSTWriter` approach, are you thinking that we could make an impl of this class based e.g. on `FileChannel` directly (supports absolute positional reads, reading from a still-appending file), bypassing Lucene's `Directory` abstraction entirely? That is not a great solution (we try to have all IO go through `Directory`), but, perhaps as an intermediate state, for users directly creating massive FSTs, it's acceptable. But that'd mean we could not fix Lucene by default to do all of its FST compilation off-heap... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org