Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]

2023-10-07 Thread via GitHub


pzygielo commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751670565

   May I ask for review, please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Avoid NPEx if the end of the stream has been reached without reading any characters [lucene]

2023-10-07 Thread via GitHub


pzygielo commented on PR #12611:
URL: https://github.com/apache/lucene/pull/12611#issuecomment-1751671154

   @kaivalnp @gf2121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub


mikemccand commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751739864

   SPANN is another option?
   
   
https://www.researchgate.net/publication/356282356_SPANN_Highly-efficient_Billion-scale_Approximate_Nearest_Neighbor_Search


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub


mikemccand commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751740152

   (listening to @jbellis talk at Community over Code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-07 Thread via GitHub


mikemccand commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1751743673

   Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that 
wraps jvector?  (https://github.com/jbellis/jvector)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Write MSB VLong for better outputs sharing in block tree index [lucene]

2023-10-07 Thread via GitHub


gf2121 opened a new pull request, #12631:
URL: https://github.com/apache/lucene/pull/12631

   closes https://github.com/apache/lucene/issues/12620


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Write MSB VLong for better outputs sharing in block tree index [lucene]

2023-10-07 Thread via GitHub


gf2121 commented on PR #12631:
URL: https://github.com/apache/lucene/pull/12631#issuecomment-1751891562

   With this change, The total size of `tip` reduced ~14% for `wikimediumall`.
   
   https://bytedance.feishu.cn/sheets/HSetsPqDrhicnet5lWrcOXMtnRc";
 data-importRangeRawData-range="'Sheet1'!A1:D17">
   
     | baseline | candidate | diff
   -- | -- | -- | --
   _32_Lucene90_0.tip | 5166889 | 4385234 | -15.13%
   _65_Lucene90_0.tip | 5192450 | 4402679 | -15.21%
   _98_Lucene90_0.tip | 704 | 4721674 | -15.01%
   _cb_Lucene90_0.tip | 5591898 | 4761395 | -14.85%
   _fe_Lucene90_0.tip | 5549684 | 4718536 | -14.98%
   _fp_Lucene90_0.tip | 776845 | 716283 | -7.80%
   _g0_Lucene90_0.tip | 775054 | 715176 | -7.73%
   _gb_Lucene90_0.tip | 745250 | 685730 | -7.99%
   _gm_Lucene90_0.tip | 738168 | 678853 | -8.04%
   _gx_Lucene90_0.tip | 880015 | 809570 | -8.00%
   _gy_Lucene90_0.tip | 151737 | 142043 | -6.39%
   _gz_Lucene90_0.tip | 133664 | 125057 | -6.44%
   _h0_Lucene90_0.tip | 116959 | 109360 | -6.50%
   _h1_Lucene90_0.tip | 120897 | 112915 | -6.60%
   _h2_Lucene90_0.tip | 111570 | 104185 | -6.62%
   sum | 31606784 | 27188690 | -13.98%
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub


rmuir opened a new pull request, #12632:
URL: https://github.com/apache/lucene/pull/12632

   We can get these functions closer to optimal by just directly converting to 
32-bits + `vpmulld`. 
   
   See https://stackoverflow.com/a/69848057 for the motivation.
   
   You can reproduce my results with a `java -jar target/vectorbench.jar -p 
size=1024 Binary`. See https://github.com/rmuir/vectorbench for instructions. 
There is a README now! I don't touch java that much so it makes it easier on me.
   
   Skylake (256-bit):
   
   ```
   Benchmark   (size)   Mode  Cnt  Score   
Error   Units
   BinaryCosineBenchmark.cosineDistanceNew   1024  thrpt5  3.252 ± 
1.457  ops/us
   BinaryCosineBenchmark.cosineDistanceNewNew1024  thrpt5  3.746 ± 
0.069  ops/us
   BinaryDotProductBenchmark.dotProductNew   1024  thrpt5  7.080 ± 
0.121  ops/us
   BinaryDotProductBenchmark.dotProductNewNew1024  thrpt5  8.329 ± 
0.288  ops/us
   BinarySquareBenchmark.squareDistanceNew   1024  thrpt5  6.208 ± 
0.800  ops/us
   BinarySquareBenchmark.squareDistanceNewNew1024  thrpt5  7.285 ± 
0.629  ops/us
   ```
   
   I'd appreciate if someone could test AVX-512. This codepath only impacts 
256bit+ vectors so it won't change anything for your mac with 128bit vectors. I 
will look into that one again separately: i know how to speed it up, but I 
don't want things ugly. I like this change because it makes the code a little 
simpler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Make `byte[]` vector comparisons faster! (if possible) [lucene]

2023-10-07 Thread via GitHub


rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1751903073

   @benwtrent I looked into this more and eeked a bit more out: 
https://github.com/apache/lucene/pull/12632


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub


rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751926396

   I did manage to get a little bit more out of the arm chip. I will look at 
the other 2 functions there too...
   ```
   Benchmark   (size)   Mode  Cnt  Score   
Error   Units
   BinaryDotProductBenchmark.dotProductNew   1024  thrpt5  6.135 ± 
0.008  ops/us
   BinaryDotProductBenchmark.dotProductNewNew1024  thrpt5  7.197 ± 
0.028  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub


gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934272

   FYI I run the benchmark on [latest benchmark 
commit](https://github.com/rmuir/vectorbench/commit/ef7e089a75a883d809145d2686e6a4dc1915c106)
 with a linux-x86-64 sever that AVX-512 supported.
   
   ```
   Benchmark   (size)   Mode  Cnt   Score   
Error   Units
   BinaryCosineBenchmark.cosineDistanceNew   1024  thrpt5   5.637 ± 
0.003  ops/us
   BinaryCosineBenchmark.cosineDistanceNewNew1024  thrpt5   4.942 ± 
0.009  ops/us
   BinaryCosineBenchmark.cosineDistanceOld   1024  thrpt5   0.848 ± 
0.001  ops/us
   BinaryDotProductBenchmark.dotProductNew   1024  thrpt5  11.717 ± 
0.013  ops/us
   BinaryDotProductBenchmark.dotProductNewNew1024  thrpt5   9.623 ± 
0.050  ops/us
   BinaryDotProductBenchmark.dotProductOld   1024  thrpt5   1.953 ± 
0.005  ops/us
   BinarySquareBenchmark.squareDistanceNew   1024  thrpt5   8.407 ± 
0.020  ops/us
   BinarySquareBenchmark.squareDistanceNewNew1024  thrpt5   9.057 ± 
0.045  ops/us
   BinarySquareBenchmark.squareDistanceOld   1024  thrpt5   1.651 ± 
0.001  ops/us
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub


rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751934622

   thanks for running. I will just revert it then and get folks to test arm 
changes. i don't want to hurt avx 512...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup integer functions for 256bit+ vectors [lucene]

2023-10-07 Thread via GitHub


rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751938382

   ok i reverted the 256-bit changes from here, and from the vectorbench, but 
kept the 128 bit ones for ppl to test on macs. Now this issue does the opposite 
of what it says, i will edit it...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-07 Thread via GitHub


rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1751939374

   I don't know how to do the same tricks for the BinarySquare one due to the 
subtraction.
   
   So I'm done for now. I think given the reports from @gf2121 the 256/512-bit 
experiment was a loss :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org