[PR] Use `IOContext#RANDOM` when appropriate. [lucene]

via GitHub Tue, 26 Mar 2024 09:31:39 -0700


jpountz opened a new pull request, #13222:
URL: https://github.com/apache/lucene/pull/13222


   This switches the following files to `IOContext.RANDOM`:
    - Stored fields data file.
    - Term vectors data file.
    - HNSW graph.
    - Temporary file storing vectors at merge time that we use to construct the 
merged HNSW graph.
    - Vector data files, including quantized data files.
   
   I hesitated using `IOContext.RANDOM` on terms, since they have a random 
access pattern when running term queries, but a more sequential access pattern 
when running multi-term queries. I erred on the conservative side and did not 
switch them to `IOContext.RANDOM` for now.
   
   For simplicity, I'm only touching the current codec, not previous codecs. 
There are also some known issues:
    - These files will keep using a `RANDOM` `IOContext` at merge time. We need 
some way for merge instances to get an updated `IOContext`? We have the same 
problem with `IOContext#LOAD` today.
    - With quantized vectors, raw vectors don't have random access pattern, but 
it was challenging to give raw vectors a sequential access pattern when there 
are quantized vectors and a random access pattern otherwise. So they assume a 
random access pattern all the time.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Use `IOContext#RANDOM` when appropriate. [lucene]

Reply via email to