mayya-sharipova commented on PR #11743:
URL: https://github.com/apache/lucene/pull/11743#issuecomment-1236414077

   @msokolov Thanks for your feedback.
   
   I have done a similar analysis comparing 9.3 branch with this change on 
`glove-100-angular` 1 million documents,  M:16 efConstruction:100
   
   **Results with 9.3:**
   
   IndexingChaing::ramBytesUsed() reports  497089200 bytes or **497MB**
   
   **Results with current change**
   
   IndexingChaing::ramBytesUsed() reports  memory vectors: 497075904; memory 
graph: 379073392; so total: **876 Mb**
   
   
   So, you are right, much more memory used during indexing. We need at least 
16 * M * number_nodes:
   -  2 *M neighbours for each node on the lowest level * 8 bytes ( 4 bytes for 
neighbour node number + 4 bytes for neighbour score)
   - so indeed, if indexing memory buffer is set up less than that, we would 
end up with much more segments which is not desirable.
   
   
   ----
   > I wonder if we should consider rolling back the "build graph during 
indexing" change? It seems to make indexing take > 10% longer and of course 
requires more RAM, which will tend to make more and smaller segments; not a 
desirable outcome.
   
   Thanks for suggestion.  I will discuss this with our team on Tuesday, and 
will get back to you.
   
   One thing I wonder we did not observe longer total indexing time (combined 
indexing + refresh time). Is combined total indexing time + refresh time became 
larger for you?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to