rmuir commented on issue #14422:
URL: https://github.com/apache/lucene/issues/14422#issuecomment-2767743265

   > [@rmuir](https://github.com/rmuir) I'm curious if you can expand a bit 
more on what you have in mind, what you are describing sounds to me like how 
things are today where `ReadAdvice.RANDOM` is the context and the `MADV_RANDOM` 
flag to `madvise` is the implementation. So I'm sure I'm missing something.
   
   I think today the context is trying too hard to direct the implementation. 
   For reference, the current structure looks like this:
   
   ```java
   public enum Context {
     MERGE,
     FLUSH,
     DEFAULT
   };
   
   public enum ReadAdvice {
     NORMAL,
     RANDOM,
     SEQUENTIAL,
     RANDOM_PRELOAD
   }
   
   // dictating implementation: let the Directory decide this
   public static final IOContext DEFAULT = new 
IOContext(Constants.DEFAULT_READADVICE);
   // dictating implementation: let the Directory decide this
   public static final IOContext READONCE = new 
IOContext(ReadAdvice.SEQUENTIAL);
   
   // a jazillion IOContext ctors taking different types of Objects, all of 
which
   // set various defaults sneakily instead of letting Directory decide.
   ```
   
   If i wanted to baby-step this, i'd walk the whole thing back to an `int`, as 
a start. I definitely think a codec should be able to OR together multiple 
different flags, and that the directory should be able to make use of them in a 
generic way. Feel free to translate this into EnumSets or whatever java does :)
   
   This is just brainstorming and not properly thought out, so its just 
examples:
   
   codec should be able to specify that the file is one of, say 
`METADATA_FILE`, `DATA_FILE`, `INDEX_FILE` (these might be bad names, but think 
slurped-metadata vs stored-fields-data vs stored-fields-index). It allows the 
Directory to make decisions based upon the general category of a file (or CFS 
range?), without hardcoded file extensions, etc. Ideally type-safe(ish) which 
is an improvement over matching "*.fdt" or other hacks.
   
   codec should be able to specify the purpose of the file, say `POSTINGS`, 
`STORED_FIELDS`, `VECTORS`. Similar reasons to the above: give the "context", 
file extensions are not the solution. I don't even see any reason to pass 
`.class` names of formats here, that's also limiting. We should be able to tell 
the directory its `TERMS`, even though that's sorta an impl-detail of 
PostingsFormat and not a first-class codec api citizen. It is just a flag, 
let's be practical.
   
   separately, codec should maybe be able to specify stuff such as whether file 
is accessed `SEQUENTIAL` or `RANDOM`: and thats just expressing what may 
happen, unrelated and disconnected from read-ahead or other possible 
implementations. But I've given this piece no thought, I think the first 
problem is that the Directory doesnt even have the "basic context" such as 
file's general category and purpose. And Directory should be in control of all 
"defaults", push those out of Codec, IOContext, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to