benwtrent commented on PR #11860: URL: https://github.com/apache/lucene/pull/11860#issuecomment-1318779275
@rmuir @jpountz I have the following local changes (will clean up and push to the PR soon). The change seems to improve query per second compared to the unpacked codec in some cases (remains unchanged in others) The local changes are: - delta encode the neighbor doc IDs - Vint serialize them - Keep track of memory offsets and store them in the vex file as well Here are the improvements over the unpacked values (80% reduction in some cases!!!) It makes sense that the larger the `M` the bigger the impact. | packed_vex_mb_size | vex_mb_size | packed_index_build_time | index_build_time | params | dataset | percent_reduction | |--------------------|-------------|-------------------------|------------------|------------------------------------|---------------------|-------------------| | 79.9 | 161.6 | 767 | 784 | "{'M': 16, 'efConstruction': 100}" | glove-100-angular | 50.55693069 | | 108.4 | 464.1 | 1138 | 1225 | "{'M': 48, 'efConstruction': 100}" | glove-100-angular | 76.64296488 | | 2.3 | 8.2 | 36 | 36 | "{'M': 16, 'efConstruction': 100}" | mnist-784-euclidean | 71.95121951 | | 2.4 | 23.5 | 36 | 36 | "{'M': 48, 'efConstruction': 100}" | mnist-784-euclidean | 89.78723404 | | 66.1 | 392.2 | 501 | 572 | "{'M': 48, 'efConstruction': 100}" | sift-128-euclidean | 83.1463539 | | 59.7 | 136.6 | 449 | 516 | "{'M': 16, 'efConstruction': 100}" | sift-128-euclidean | 56.29575403 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org