rmuir commented on issue #14422: URL: https://github.com/apache/lucene/issues/14422#issuecomment-2767743265
> [@rmuir](https://github.com/rmuir) I'm curious if you can expand a bit more on what you have in mind, what you are describing sounds to me like how things are today where `ReadAdvice.RANDOM` is the context and the `MADV_RANDOM` flag to `madvise` is the implementation. So I'm sure I'm missing something. I think today the context is trying too hard to direct the implementation. For reference, the current structure looks like this: ```java public enum Context { MERGE, FLUSH, DEFAULT }; public enum ReadAdvice { NORMAL, RANDOM, SEQUENTIAL, RANDOM_PRELOAD } // dictating implementation: let the Directory decide this public static final IOContext DEFAULT = new IOContext(Constants.DEFAULT_READADVICE); // dictating implementation: let the Directory decide this public static final IOContext READONCE = new IOContext(ReadAdvice.SEQUENTIAL); // a jazillion IOContext ctors taking different types of Objects, all of which // set various defaults sneakily instead of letting Directory decide. ``` If i wanted to baby-step this, i'd walk the whole thing back to an `int`, as a start. I definitely think a codec should be able to OR together multiple different flags, and that the directory should be able to make use of them in a generic way. Feel free to translate this into EnumSets or whatever java does :) This is just brainstorming and not properly thought out, so its just examples: codec should be able to specify that the file is one of, say `METADATA_FILE`, `DATA_FILE`, `INDEX_FILE` (these might be bad names, but think slurped-metadata vs stored-fields-data vs stored-fields-index). It allows the Directory to make decisions based upon the general category of a file (or CFS range?), without hardcoded file extensions, etc. Ideally type-safe(ish) which is an improvement over matching "*.fdt" or other hacks. codec should be able to specify the purpose of the file, say `POSTINGS`, `STORED_FIELDS`, `VECTORS`. Similar reasons to the above: give the "context", file extensions are not the solution. I don't even see any reason to pass `.class` names of formats here, that's also limiting. We should be able to tell the directory its `TERMS`, even though that's sorta an impl-detail of PostingsFormat and not a first-class codec api citizen. It is just a flag, let's be practical. separately, codec should maybe be able to specify stuff such as whether file is accessed `SEQUENTIAL` or `RANDOM`: and thats just expressing what may happen, unrelated and disconnected from read-ahead or other possible implementations. But I've given this piece no thought, I think the first problem is that the Directory doesnt even have the "basic context" such as file's general category and purpose. And Directory should be in control of all "defaults", push those out of Codec, IOContext, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org