mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1751999625

   Here are the results from running `test_all_sizes.py` then 
`results_to_md.py`:
   
   |NodeHash size|FST (mb)|RAM (mb)|FST build time (sec)|                       
                                                                                
                               
   |-------------|--------|--------|----------------|                           
                                                                                
                           
   |0|577.4|0.0|35.2|                                                           
                                                                                
                           
   |4|586.5|0.0|43.2|                                                           
                                                                                
                           
   |8|587.0|0.0|46.4|                                                           
                                                                                
                           
   |16|585.2|0.0|44.8|                                                          
                                                                                
                           
   |32|582.0|0.0|45.9|                                                          
                                                                                
                           
   |64|578.8|0.0|45.4|                                                          
                                                                                
                           
   |128|573.0|0.0|45.9|                                                         
                                                                                
                           
   |256|563.6|0.0|46.1|                                                         
                                                                                
                           
   |512|551.2|0.0|45.4|                                                         
                                                                                
                           
   |1024|537.5|0.0|45.7|                                                        
                                                                                
                           
   |2048|523.4|0.0|46.0|                                                        
                                                                                
                           
   |4096|509.5|0.1|45.6|                                                        
                                                                                
                           
   |8192|495.8|0.1|45.2|                                                        
                                                                                
                           
   |16384|481.8|0.2|46.3|                                                       
                                                                                
                           
   |32768|461.1|0.5|45.2|                                                       
                                                                                
                           
   |65536|447.2|1.0|45.7|                                                       
                                                                                
                           
   |131072|432.4|2.0|46.3|                                                      
                                                                                
                           
   |262144|418.6|4.0|46.3|                                                      
                                                                                
                           
   |524288|402.4|8.0|46.9|                                                      
                                                                                
                           
   |1048576|391.0|16.0|50.0|                                                    
                                                                                
                           
   |2097152|380.8|32.0|55.2|                                                    
                                                                                
                           
   |4194304|371.4|64.0|58.3|                                                    
                                                                                
                           
   |8388608|362.5|128.0|59.9|                                                   
                                                                                
                           
   |16777216|356.1|256.0|59.3|                                                  
                                                                                
                           
   |33554432|351.4|512.0|57.3|                                                  
                                                                                
                           
   |67108864|350.2|1024.0|52.6|                                                 
                                                                                
                           
   |134217728|350.2|2048.0|49.2|                                                
                                                                                
                           
   |268435456|350.2|4096.0|48.4|                                                
                                                                                
                           
   |536870912|350.2|8192.0|46.9|                                                
                                                                                
                           
   |1073741824|350.2|16384.0|44.5|
   
   One WTF (wow that's funny) is why a `NodeHash` size of 0 (no prefix sharing) 
creates a smaller FST than the tiny `NodeHash` sizes: it should be monotonic 
since the `NodeHash` should only enable sharing of suffixes.  Maybe something 
about the loss of locality of the FST suffix nodes, causing more bytes to refer 
to them later?  Confusing.
   
   Another observation is that it takes quite a few RAM MB to bring the final 
FST size close-ish to its optimal / minimal size (350.2 MB).
   
   It's also curious how the FST Build time grows with a larger `NodeHash` -- 
maybe this is just the added cost of maintaining/cycling the double barrel hash 
(and promoting entries from the "old" to the "new" barrel)?
   
   I will try soonish to post a similar table from `main` (unbounded 
`NodeHash`) for comparison to this approach by tuning the god-like knobs for 
controlling RAM usage during FST compilation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to