[GitHub] [lucene] tang-hi opened a new issue, #12396: Make ForUtil Vectorized

via GitHub Sun, 25 Jun 2023 09:50:32 -0700


tang-hi opened a new issue, #12396:
URL: https://github.com/apache/lucene/issues/12396


   ### Description
   
   Since the introduction of Vector API into Lucene via #12311, I have found it 
to be an interesting tool. As a result, I have attempted to use it to rewrite 
the 
[ForUtil.java](lucene/core/src/java/org/apache/lucene/codecs/lucene90/ForUtil.java)
 file. Initially, I attempted to port 
[simdcomp](https://github.com/lemire/simdcomp/blob/0c900190bbc0970f4342f2ea778564c95ca725cf/src/simdbitpacking.c)
 and implement pack and unpack for bitPerValue values that were less than or 
equal to 8. The results of my efforts can be seen below.
   | Benchmark                   | Mode | Cnt         | Score          | Error  
       | Units |
   
|-----------------------------|------|-------------|----------------|---------------|-------|
   | VectorizeBenchmark.decode1  | thrpt | 15 | 58293945.447 | ± 1524549.587 | 
ops/s |
   | VectorizeBenchmark.decode2  | thrpt | 15 | 55598229.538 | ± 4516920.135 | 
ops/s |
   | VectorizeBenchmark.decode3  | thrpt | 15 | 57163871.965 | ± 1566246.490 | 
ops/s |
   | VectorizeBenchmark.decode4  | thrpt | 15 | 55128874.528 | ± 4752397.170 | 
ops/s |
   | VectorizeBenchmark.decode5  | thrpt | 15 | 53822335.729 | ± 4599217.489 | 
ops/s |
   | VectorizeBenchmark.decode6  | thrpt | 15 | 48155246.120 | ± 7519360.551 | 
ops/s |
   | VectorizeBenchmark.decode7  | thrpt | 15 | 50253799.192 | ± 820075.648   | 
ops/s |
   | VectorizeBenchmark.decode8  | thrpt | 15 | 68849728.856 | ± 1818468.973 | 
ops/s |
   | VectorizeBenchmark.encode1  | thrpt | 15 | 33998510.772 | ± 2924618.992 | 
ops/s |
   | VectorizeBenchmark.encode2  | thrpt | 15 | 43238190.552 | ± 810373.966   | 
ops/s |
   | VectorizeBenchmark.encode3  | thrpt | 15 | 36613553.485 | ± 483115.838   | 
ops/s |
   | VectorizeBenchmark.encode4  | thrpt | 15 | 45675726.831 | ± 1081153.655 | 
ops/s |
   | VectorizeBenchmark.encode5  | thrpt | 15 | 33591855.278 | ± 1084009.112 | 
ops/s |
   | VectorizeBenchmark.encode6  | thrpt | 15 | 36110726.127 | ± 767075.709   | 
ops/s |
   | VectorizeBenchmark.encode7  | thrpt | 15 | 34754339.379 | ± 275025.123   | 
ops/s |
   | VectorizeBenchmark.encode8  | thrpt | 15 | 55075742.358 | ± 991165.320   | 
ops/s |
   | VectorizeBenchmark.vectorizedDecode1  | thrpt | 15 | 43878020.796 | ± 
7148545.623 | ops/s |
   | VectorizeBenchmark.vectorizedDecode2  | thrpt | 15 | 103091446.773 | ± 
44115190.011 | ops/s |
   | VectorizeBenchmark.vectorizedDecode3  | thrpt | 15 | 83168059.373 | ± 
24930903.852 | ops/s |
   | VectorizeBenchmark.vectorizedDecode4  | thrpt | 15 | 63156089.355 | ± 
15039408.293 | ops/s |
   | VectorizeBenchmark.vectorizedDecode5  | thrpt | 15 | 96567546.695 | ± 
37142784.493 | ops/s |
   | VectorizeBenchmark.vectorizedDecode6  | thrpt | 15 | 73897063.180 | ± 
11549757.437 | ops/s |
   | VectorizeBenchmark.vectorizedDecode7  | thrpt | 15 | 79716185.567 | ± 
29990852.039 | ops/s |
   | VectorizeBenchmark.vectorizedDecode8  | thrpt | 15 | 92621676.617 | ± 
29702056.667 | ops/s |
   | VectorizeBenchmark.vectorizedEncode1  | thrpt | 15 | 51140300.852 | ± 
139758.385 | ops/s |
   | VectorizeBenchmark.vectorizedEncode2  | thrpt | 15 | 82646100.574 | ± 
1289600.954 | ops/s |
   | VectorizeBenchmark.vectorizedEncode3  | thrpt | 15 | 88124485.953 | ± 
742170.198 | ops/s |
   | VectorizeBenchmark.vectorizedEncode4  | thrpt | 15 | 91029285.467 | ± 
5594858.437 | ops/s |
   | VectorizeBenchmark.vectorizedEncode5  | thrpt | 15 | 96843051.648 | ± 
8024430.836 | ops/s |
   | VectorizeBenchmark.vectorizedEncode6  | thrpt | 15 | 98596724.128 | ± 
10068466.227 | ops/s |
   | VectorizeBenchmark.vectorizedEncode7  | thrpt | 15 | 85885746.715 | ± 
6031740.563 | ops/s |
   | VectorizeBenchmark.vectorizedEncode8  | thrpt | 15 | 117139889.194 | ± 
8721517.095 | ops/s |
   
   However, I noticed that the compression format used in 
[ForUtil.java](lucene/core/src/java/org/apache/lucene/codecs/lucene90/ForUtil.java)
 was different. It employed some tricks to speed up the process, such as simd. 
Therefore, I attempted to vectorize it while maintaining the compression 
format. The results can be seen below.
   | Benchmark                   | Mode | Cnt | Score          | Error          
| Units |
   
|-----------------------------|------|-----|----------------|----------------|-------|
   | Benchmark.encode1           | thrpt | 15  | 38017254.040   | ± 3905466.628 
 | ops/s |
   | Benchmark.encode2           | thrpt | 15  | 45170109.395   | ± 1539203.478 
 | ops/s |
   | Benchmark.encode3           | thrpt | 15  | 38757256.653   | ± 1044709.221 
 | ops/s |
   | Benchmark.encode4           | thrpt | 15  | 49307206.891   | ± 799168.007  
 | ops/s |
   | Benchmark.encode5           | thrpt | 15  | 35130626.548   | ± 792210.817  
 | ops/s |
   | Benchmark.encode6           | thrpt | 15  | 38326892.073   | ± 981865.963  
 | ops/s |
   | Benchmark.encode7           | thrpt | 15  | 37372342.721   | ± 1177478.683 
 | ops/s |
   | Benchmark.encode8           | thrpt | 15  | 60757390.416   | ± 458876.638  
 | ops/s |
   | Benchmark.vectorizedEncode1 | thrpt | 15  | 56413094.655   | ± 435917.201  
 | ops/s |
   | Benchmark.vectorizedEncode2 | thrpt | 15  | 88770400.646   | ± 
11183716.176 | ops/s |
   | Benchmark.vectorizedEncode3 | thrpt | 15  | 39932842.378   | ± 2366921.921 
 | ops/s |
   | Benchmark.vectorizedEncode4 | thrpt | 15  | 85888128.739   | ± 5499354.172 
 | ops/s |
   | Benchmark.vectorizedEncode5 | thrpt | 15  | 34402027.732   | ± 1414839.159 
 | ops/s |
   | Benchmark.vectorizedEncode6 | thrpt | 15  | 35794303.501   | ± 782940.005  
 | ops/s |
   | Benchmark.vectorizedEncode7 | thrpt | 15  | 33845690.180   | ± 2586648.353 
 | ops/s |
   | Benchmark.vectorizedEncode8 | thrpt | 15  | 97914288.675   | ± 8971857.035 
 | ops/s |
   
   I have only implemented bitPerValue values of 1, 2, 4, and 8. I am curious 
if it is possible to change the compression format. Additionally, do you have 
any best practices for integrating vectorized code into Lucene? Any suggestions 
would be appreciated.
   
   Currently, I am working on my own 
[repo](https://github.com/tang-hi/forutil). However, the code is still in a 
rough state and lacks documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] tang-hi opened a new issue, #12396: Make ForUtil Vectorized

Reply via email to