mikemccand commented on code in PR #12624:
URL: https://github.com/apache/lucene/pull/12624#discussion_r1351949356


##########
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##########
@@ -21,19 +21,18 @@
 import java.util.List;
 import org.apache.lucene.store.DataInput;
 import org.apache.lucene.store.DataOutput;
-import org.apache.lucene.util.Accountable;
 import org.apache.lucene.util.RamUsageEstimator;
 
 // TODO: merge with PagedBytes, except PagedBytes doesn't
 // let you read while writing which FST needs
 
-class BytesStore extends DataOutput implements Accountable {
+class BytesStore extends FSTWriter {

Review Comment:
   > Besides `reverse` most of the methods here also needs to write/update 
arbitrarily previously written bytes (within the current frontier input), such 
as writeBytes/writeByte/copyBytes/truncate.
   
   I think these "modify prior bytes" are only called from `fst.addNode`, where 
that (complex!!) method is basically using the tail of the `BytesStore` as a 
scratch area?  I.e. it writes some bytes first, and might go back and shuffle 
the bytes around, depending on whether it's "direct addressing" or "binary 
search"?  Once `FST.addNode` completes, those written bytes are never altered?
   
   > Moreover, there is also methods which requires reading of previously 
written bytes that can't simply fit in the DataOutput (getReverseBytesReader 
e.g).
   
   Hmm `getReverseBytesReader` is indeed a problem.  I wonder how the [Tantivy 
FST impl](https://blog.burntsushi.net/transducers/) deals with this?
   
   If we take the `FSTWriter` approach, are you thinking that we could make an 
impl of this class based e.g. on `FileChannel` directly (supports absolute 
positional reads, reading from a still-appending file), bypassing Lucene's 
`Directory` abstraction entirely?  That is not a great solution (we try to have 
all IO go through `Directory`), but, perhaps as an intermediate state, for 
users directly creating massive FSTs, it's acceptable.  But that'd mean we 
could not fix Lucene by default to do all of its FST compilation off-heap...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to