[GitHub] [lucene] rmuir commented on pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

GitBox Wed, 17 Mar 2021 19:49:10 -0700


rmuir commented on pull request #20:
URL: https://github.com/apache/lucene/pull/20#issuecomment-801576257



   these docs were helpful already to me in confirming what I thought I 
understood of the code about how sparsity and such was currently handled. 
Looking at the docs this way is an easy way to think about what is happening, 
it is different from the code. It allows ppl to have ideas without going thru 
the code.
   
   For example I look at what you describe here, and I think there might be a 
simple optimization for the dense case ("fixed" schema where vectors are 
present for every doc): if `the number of documents having values for this 
field` == `maxdoc`, we can omit writing the next item (`the docids of documents 
having vectors, in order`) completely, save some disk space, just make the 
array null and save memory (4 bytes per document) and avoid Arrays.binarySearch 
in `advance()`. 
   
   But I will look at the code to try to really confirm that and play with it. 
Thanks again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #20: LUCENE-9844: document disk layout of Lucene90VectorFormat

Reply via email to