Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]

2023-10-07 Thread via GitHub
pzygielo commented on PR #12611: URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751670565 May I ask for review, please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]

2023-10-07 Thread via GitHub
pzygielo commented on PR #12611: URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751671154 @kaivalnp @gf2121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751739864 SPANN is another option? https://www.researchgate.net/publication/356282356_SPANN_Highly-efficient_Billion-scale_Approximate_Nearest_Neighbor_Search -- This is an automa

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751740152 (listening to @jbellis talk at Community over Code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub
mikemccand commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751743673 Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that wraps jvector? (https://github.com/jbellis/jvector) -- This is an automated message from the Apache

[PR] Write MSB VLong for better outputs sharing in block tree index [lucene]

2023-10-07 Thread via GitHub
gf2121 opened a new pull request, #12631: URL: https://github.com/apache/lucene/pull/12631 closes https://github.com/apache/lucene/issues/12620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Write MSB VLong for better outputs sharing in block tree index [lucene]

2023-10-07 Thread via GitHub
gf2121 commented on PR #12631: URL: https://github.com/apache/lucene/pull/12631#issuecomment-1751891562 With this change, The total size of `tip` reduced ~14% for `wikimediumall`. https://bytedance.feishu.cn/sheets/HSetsPqDrhicnet5lWrcOXMtnRc"; data-importRangeRawData-range="'Sheet1'!

[PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir opened a new pull request, #12632: URL: https://github.com/apache/lucene/pull/12632 We can get these functions closer to optimal by just directly converting to 32-bits + `vpmulld`. See https://stackoverflow.com/a/69848057 for the motivation. You can reproduce my results

Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-07 Thread via GitHub
rmuir commented on issue #12621: URL: https://github.com/apache/lucene/issues/12621#issuecomment-1751903073 @benwtrent I looked into this more and eeked a bit more out: https://github.com/apache/lucene/pull/12632 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751926396 I did manage to get a little bit more out of the arm chip. I will look at the other 2 functions there too... ``` Benchmark (size) Mode Cnt Score

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934272 FYI I run the benchmark on [latest benchmark commit](https://github.com/rmuir/vectorbench/commit/ef7e089a75a883d809145d2686e6a4dc1915c106) with a linux-x86-64 sever that AVX-512 supported

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934622 thanks for running. I will just revert it then and get folks to test arm changes. i don't want to hurt avx 512... -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751938382 ok i reverted the 256-bit changes from here, and from the vectorbench, but kept the 128 bit ones for ppl to test on macs. Now this issue does the opposite of what it says, i will edit it..

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-07 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751939374 I don't know how to do the same tricks for the BinarySquare one due to the subtraction. So I'm done for now. I think given the reports from @gf2121 the 256/512-bit experiment was a