Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

via GitHub Wed, 22 May 2024 09:44:15 -0700


jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2125281483


   This is an interesting idea!
   
   You do not mention it explicitly in the issue description, but presumably 
this only makes sense if an index sort is configured, otherwise merges may 
break the clustering that you are trying to create in the first place?
   
   > The DocumentWriterThreadPool will now maintain a [distinct pool of 
DWPTs](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#L47)
 for each possible outcome.
   
   I'm a bit uncomfortable with this approach. It is so heavy that it wouldn't 
perform much better than maintaining a separate `IndexWriter` per group? I 
wonder if we could do something within a single DWPT pool, e.g. could we use 
rendez-vous hashing to optimistically try to reuse the same DWPT for the same 
group as often as possible, but only on a best-effort basis, not trading 
concurrency or creating more DWPTs than indexing concurrency requires?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

Reply via email to