rmuir commented on pull request #20: URL: https://github.com/apache/lucene/pull/20#issuecomment-801576257
these docs were helpful already to me in confirming what I thought I understood of the code about how sparsity and such was currently handled. Looking at the docs this way is an easy way to think about what is happening, it is different from the code. It allows ppl to have ideas without going thru the code. For example I look at what you describe here, and I think there might be a simple optimization for the dense case ("fixed" schema where vectors are present for every doc): if `the number of documents having values for this field` == `maxdoc`, we can omit writing the next item (`the docids of documents having vectors, in order`) completely, save some disk space, just make the array null and save memory (4 bytes per document) and avoid Arrays.binarySearch in `advance()`. But I will look at the code to try to really confirm that and play with it. Thanks again. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org