Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]
pzygielo commented on PR #12611: URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751670565 May I ask for review, please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]
pzygielo commented on PR #12611: URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751671154 @kaivalnp @gf2121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751739864 SPANN is another option? https://www.researchgate.net/publication/356282356_SPANN_Highly-efficient_Billion-scale_Approximate_Nearest_Neighbor_Search -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751740152 (listening to @jbellis talk at Community over Code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751743673 Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that wraps jvector? (https://github.com/jbellis/jvector) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Write MSB VLong for better outputs sharing in block tree index [lucene]
gf2121 opened a new pull request, #12631: URL: https://github.com/apache/lucene/pull/12631 closes https://github.com/apache/lucene/issues/12620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Write MSB VLong for better outputs sharing in block tree index [lucene]
gf2121 commented on PR #12631: URL: https://github.com/apache/lucene/pull/12631#issuecomment-1751891562 With this change, The total size of `tip` reduced ~14% for `wikimediumall`. https://bytedance.feishu.cn/sheets/HSetsPqDrhicnet5lWrcOXMtnRc"; data-importRangeRawData-range="'Sheet1'!A1:D17"> | baseline | candidate | diff -- | -- | -- | -- _32_Lucene90_0.tip | 5166889 | 4385234 | -15.13% _65_Lucene90_0.tip | 5192450 | 4402679 | -15.21% _98_Lucene90_0.tip | 704 | 4721674 | -15.01% _cb_Lucene90_0.tip | 5591898 | 4761395 | -14.85% _fe_Lucene90_0.tip | 5549684 | 4718536 | -14.98% _fp_Lucene90_0.tip | 776845 | 716283 | -7.80% _g0_Lucene90_0.tip | 775054 | 715176 | -7.73% _gb_Lucene90_0.tip | 745250 | 685730 | -7.99% _gm_Lucene90_0.tip | 738168 | 678853 | -8.04% _gx_Lucene90_0.tip | 880015 | 809570 | -8.00% _gy_Lucene90_0.tip | 151737 | 142043 | -6.39% _gz_Lucene90_0.tip | 133664 | 125057 | -6.44% _h0_Lucene90_0.tip | 116959 | 109360 | -6.50% _h1_Lucene90_0.tip | 120897 | 112915 | -6.60% _h2_Lucene90_0.tip | 111570 | 104185 | -6.62% sum | 31606784 | 27188690 | -13.98% -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Speedup integer functions for 256bit+ vectors [lucene]
rmuir opened a new pull request, #12632: URL: https://github.com/apache/lucene/pull/12632 We can get these functions closer to optimal by just directly converting to 32-bits + `vpmulld`. See https://stackoverflow.com/a/69848057 for the motivation. You can reproduce my results with a `java -jar target/vectorbench.jar -p size=1024 Binary`. See https://github.com/rmuir/vectorbench for instructions. There is a README now! I don't touch java that much so it makes it easier on me. Skylake (256-bit): ``` Benchmark (size) Mode Cnt Score Error Units BinaryCosineBenchmark.cosineDistanceNew 1024 thrpt5 3.252 ± 1.457 ops/us BinaryCosineBenchmark.cosineDistanceNewNew1024 thrpt5 3.746 ± 0.069 ops/us BinaryDotProductBenchmark.dotProductNew 1024 thrpt5 7.080 ± 0.121 ops/us BinaryDotProductBenchmark.dotProductNewNew1024 thrpt5 8.329 ± 0.288 ops/us BinarySquareBenchmark.squareDistanceNew 1024 thrpt5 6.208 ± 0.800 ops/us BinarySquareBenchmark.squareDistanceNewNew1024 thrpt5 7.285 ± 0.629 ops/us ``` I'd appreciate if someone could test AVX-512. This codepath only impacts 256bit+ vectors so it won't change anything for your mac with 128bit vectors. I will look into that one again separately: i know how to speed it up, but I don't want things ugly. I like this change because it makes the code a little simpler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]
rmuir commented on issue #12621: URL: https://github.com/apache/lucene/issues/12621#issuecomment-1751903073 @benwtrent I looked into this more and eeked a bit more out: https://github.com/apache/lucene/pull/12632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751926396 I did manage to get a little bit more out of the arm chip. I will look at the other 2 functions there too... ``` Benchmark (size) Mode Cnt Score Error Units BinaryDotProductBenchmark.dotProductNew 1024 thrpt5 6.135 ± 0.008 ops/us BinaryDotProductBenchmark.dotProductNewNew1024 thrpt5 7.197 ± 0.028 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934272 FYI I run the benchmark on [latest benchmark commit](https://github.com/rmuir/vectorbench/commit/ef7e089a75a883d809145d2686e6a4dc1915c106) with a linux-x86-64 sever that AVX-512 supported. ``` Benchmark (size) Mode Cnt Score Error Units BinaryCosineBenchmark.cosineDistanceNew 1024 thrpt5 5.637 ± 0.003 ops/us BinaryCosineBenchmark.cosineDistanceNewNew1024 thrpt5 4.942 ± 0.009 ops/us BinaryCosineBenchmark.cosineDistanceOld 1024 thrpt5 0.848 ± 0.001 ops/us BinaryDotProductBenchmark.dotProductNew 1024 thrpt5 11.717 ± 0.013 ops/us BinaryDotProductBenchmark.dotProductNewNew1024 thrpt5 9.623 ± 0.050 ops/us BinaryDotProductBenchmark.dotProductOld 1024 thrpt5 1.953 ± 0.005 ops/us BinarySquareBenchmark.squareDistanceNew 1024 thrpt5 8.407 ± 0.020 ops/us BinarySquareBenchmark.squareDistanceNewNew1024 thrpt5 9.057 ± 0.045 ops/us BinarySquareBenchmark.squareDistanceOld 1024 thrpt5 1.651 ± 0.001 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934622 thanks for running. I will just revert it then and get folks to test arm changes. i don't want to hurt avx 512... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751938382 ok i reverted the 256-bit changes from here, and from the vectorbench, but kept the 128 bit ones for ppl to test on macs. Now this issue does the opposite of what it says, i will edit it... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751939374 I don't know how to do the same tricks for the BinarySquare one due to the subtraction. So I'm done for now. I think given the reports from @gf2121 the 256/512-bit experiment was a loss :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org