RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2126884759

   Thanks Mike and Adrian for the feedback.
   
   > You do not mention it explicitly in the issue description, but presumably 
this only makes sense if an index sort is configured, otherwise merges may 
break the clustering that you are trying to create in the first place?
   
   Not exactly. As mentioned, in order to ensure that grouping criteria 
invariant is maintained even during segment merges, we are introducing a new 
merge policy that acts as a decorator over the existing Tiered Merge policy. 
During a segment merge, this policy would categorize segments according to 
their grouping function outcomes before merging segments within the same 
category, thus maintaining the grouping criteria’s integrity throughout the 
merge process.
   
   > I wonder if we could do something within a single DWPT pool, e.g. could we 
use rendez-vous hashing to optimistically try to reuse the same DWPT for the 
same group as often as possible, but only on a best-effort basis, not trading 
concurrency or creating more DWPTs than indexing concurrency requires?
   
   I believe even if we use a single DWPT pool with rendezvous hashing to 
distribute DWPTs we would end up creating same number of DWPTs as having 
different DWPT pools for different group. Consider an example where we are 
grouping logs based on status code for an index and 8 concurrent indexing 
thread is indexing 2xx status code logs. This will create 8 DWPTs. Now 4 
threads starts indexing 4xx status code logs concurrently, this will require 4 
extra DWPTs for indexing logs if we want to maintain status code based 
grouping. Instead of creating new DWPTs, we can try reusing existing 4 DWPTs 
created for 2xx status code logs on best effort basis. But this will again mix 
4xx status code logs with 2xx status code logs defeating the purpose of status 
code based grouping of logs. Let me know if my understanding is correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to